Style War of Objects

In a long-term project, frontend developers have a huge burden. Complex nested selectors, unnecessary IDs, and life-saving !important’s. Did it sound familiar from somewhere? I have listed the terms…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Machine Learning From Scratch

How to write a machine learning program in c++ from scratch

Machine learning is one of those topics that when you google it you either find a bare bones explanation of what it is or an academic paper written by someone with 5 PHDs in Statistics and Computer Science and nothing in between. The world we live in today loves this buzz word, but what does it entail to make something from scratch in a language that is not python. I set out to make a c++ project in which I wanted to make handwritten digit classifier with the use of the K-Nearest-Neighbor Algorithm which was an algorithm that I found pretty intuitive and easy to understand.

Diagram by Dr Mark van Rijmenam on Medium

You will notice that we have two files which look similar to each other for both the training and test dataset, these are the so-called label files, they are simply values ranging from 0–9 representing a label for the corresponding image in the images file. As you might have noticed when trying to open these files is that they contain what appears to be gibberish, but what is really cool is that you are looking at pure byte encoded data! The files are stored in the idx file format which is the preferred format for storing matrices and other numerical data. We will need to write an algorithm to parse these files in order to get some meaningful data from them. So we move to the next step in the Machine Learning pipeline Data Preparation.

Now we can finally start programming. The project I have written will be written in c++14 as I feel that it will give an understanding at a low enough level but with the right amount of abstraction that everything will still make sense.

The two types of files we need to parse are the image data file and the label file for those images. We will write the following functions:

Now comes the tricky part, on the website we see how data is represented in the MNIST image file

This might seem strange but is actually very simple and intuitive. Looking at the first row in the table we see 0000 indicating that we are at the start of the file, the type of data we expect to find at this point in the file is a 32 bit integer with a hexadecimal value of 0x00000803 or a decimal value of 2051. This is what is called the magic number which can be seen as a signature indicating the file type. The second row in the table has an offset of 0004 indicating that we have moved 4 bytes further into the file which makes sense as the magic number was 4 byte chunk of data. Once again we expect to find a 32 bit integer with the value of 60000 which indicates the number of images in the file. Moving on to the next the two rows we expect to find the dimensions of the image that we are going to read. Now after reading this metadata we can move on to reading the pixel values of the images, which have a type of unsigned bytes, this is due to the fact that the images are grayscale and this means the pixels will have a value between 0 and 255 which is the range of that an unsigned 8 bit (1 byte) value can represent.

Now in order to read a certain number of bytes at a time from a file we are going to use the read method from the ifstream object. The read method takes in two parameters a pointer to a variable in which the data will be stored and the amount of bytes to read from the file. We are going to start with reading data from the label file first.

Now that we have the file metadata we can start reading the actual label data:

Reading the metadata from the image file is similar:

You might have noticed the use of a second method and that is the reverseInt function, the logic behind the function is not important, it is used to convert an unsigned char value back to a signed integer value and is implemented like this:

Now before we read the pixel data we are going to create an Image class just to simplify things. We are going to use Eigen matrices to store the pixel data as integer values to allow for easier mathematical operations, but you can use any data structure of choice, this just simplifies things when we implement the ML algorithm.

Now we can start to read in the pixel data from the training dataset files.

We are using unsigned chars here because they are equivalent to unsigned bytes which is what we are trying to read from the file.

As mentioned before, we are going to be using the K-Nearest-Neighbor algorithm. It is a supervised machine learning algorithm that can be used for classification or regression. Supervised learning is referring to the fact that we are training are algorithm with example data. This is perfect for our case as we want it to classify hand written digits. The assumption that is being made is that similar things exist in close proximity to each other, in our case this means that digits that belong to the same digit classification will have a similar look, to state it bluntly. We are going to be declaring the following function:

So the first step in the Algorithm after data preparation is distance calculation, we are going to use the euclidian formula which looks something like this:

We have matrices representing these pixel values so we are going to be calculating the distance between those values.

The above snippet is a bit convoluted, so let’s walk through it. First we define a vector of pairs representing a scalar distance value and the label for that image. We start iterating through all of the images taking the difference between the pixel values of the training image and the pixel values of the query image. We then convert this matrix to an 1 dimensional array which allows us to easily square the values and then sum all of the values together to get one scalar value and pairing it up with the corresponding label value. This is using some Eigen magic to simplify the operation to one line.

After constructing our vector of pairs we need to sort it in descending order, to find the shortest distances, which correspond to the images which are closest to the query image.

We are using the sort algorithm and passing it a lambda function to sort the vector of pairs according to their distances.

Now we store the labels of the top k labels from the sorted vector(remember k was given to us as a parameter).

Before we evaluate whether our guess is right we need to define a choice function which will determine which one of the K-labels is most likely to be correct. So we define a function which will take the mode of the labels.

The above function simply looks for the label that is the most common in the k number of labels that we chose.

Now we can add the following expression to the bottom of Classify function.

Which evaluates to true if our guess matches the label given to the function.

And that is about it, now we can see how accurate our algorithm is.

Now something that needs to be added is reading the testing data, but that will use the exact function we used for reading the training data and the implementation I leave up to the reader. Something else to note is that the execution will take a while, because we are iterating through 60000 images each time the classify function is called.

Add a comment

Related posts:

The Vision that We Should all Share

As this is our first post on Medium, we want to do it about the most important part in any company statement. The word “We”, or “Us”, which identifies the team behind the brand. Not just as a…

Recreating FreshMenu

We were given an assignment to recreate an application for better user experience and for that I have chosen FreshMenu application. It is a one shop stop to order delicious food online and get it…

Taking a stand

Everyone is different but we are all human. We all have capacities to love, to feel hurt, to achieve things, to fail, to be active, to be lazy, to enjoy music, to revel in solitude. Each individual…