Using AI to detect Cat and Dog pictures, with Tensorflow & Keras. (1)

Image for post
Image for post
Cat or Dog?

Hey everyone!

I am going to write up a few articles exposing the power of Convolutional Neural Networks in image detection. To begin with, we will attempt to recognize whether an image is a Cat or a Dog using a vanilla neural network.

Google Collab:

First I would like to recommend using google collab as you don’t need to install any pesky packages and can start your AI journey right away. All you need is a Gmail account, to follow this link and click File-> new notebook.

Once you create a new file press Edit -> NoteBook settings and using the drop-down menu choose the GPU hardware accelerator. This will help speed up your TensorFlow process.

Image for post
Image for post
Basic google collab image.

Developing the Dataset:

First, we need to import several packages:

Numpy will be for data manipulation, tensorflow_datasets will be used to import our dataset and we will use layers for image rescaling.

We now will import our cats_vs_dogs dataset:

In this case, we want our image to be 64px by 64px. Thus we will Keras to resize each of our images to this size. Furthermore, we also want to normalize our data so our network can analyze the images quicker.

Next, we want to separate our dataset into image data and corresponding labels. The labels will be “0” or “1” for cats and dogs respectively. We will also transform these arrays into NumPy arrays and resize our images.

Next, let’s transform our X and Y dataset into NumPy arrays. We will then check what the shape of our datasets will be.

Your output should be:

Image for post
Image for post
the shape of X and Y arrays

As you can see we have 20,000 photos and 20,000 corresponding labels. The X/trainX arrays each image is 64 by 64 pixels and the “3” represents the RGB color for each pixel.

Next, we want to transform our data from the RGB format to a gray image. This is simply for preference as I have no interest in color at the moment.

For RGBtoGray transformation, we simply need to transform the RGB vector to the value R*0.2126+G*0.7152+B*0.0722. This will grant us a final image shape of (20,000,64,64).

We also need to reshape our array into the shape (20,000,64*64) so each image can be inputted into our neural network. Whereby the network will train on 20,000 [64*64 = 4096] inputs.

Developing the Model:

We now want to develop a model capable of analyzing our dataset. First, we will import several packages:

Next, we will set up a Sequential model that goes from Nodes

512 → 256 → 128 → 64 → 10 → 1

Each node will be followed by a soft plus activation layer which essentially transforms the output using the following function:

Image for post
Image for post
soft plus function


Image for post
Image for post
SoftPlus Graphical Function

As you can see it essentially trims off negative numbers.

The code for our network is:

You should notice that our last layer uses a sigmoid activation function, this is used to force our output to be between 0 and 1. This is needed since our outputs are going to be either 0 ( cat) or 1(dog). The output will thus display a probabilistic outcome (0.2 means most likely cat while 0.8 means most likely dog).

Image for post
Image for post
Sigmoid function

We will compile our model using the rmsprop optimizer. Since our output is binary (0 or 1) we will use binary_crossentropy loss and we will measure our accuracy.

Finally, we will separate our data into 19000 training data and 1000 validation data. The latter will be used to analyze how good our network is with non-training data.

We will also train our system in batches of 128 and run it for 100 epochs.

Measuring accuracy:

The history variable stores the validation accuracy and our training accuracy. The former is the accuracy with non-training data, the following code will output the maximum accuracy granted by our network with non-training data and graphical output of accuracy vs epochs.

With our network we have the following output:

Image for post
Image for post
Accuracy vs Epochs (1)

We can see that the accuracy rate peaks at around 65%. This means our system recognizes cats from dogs 65% of the time which is okay but not great!

Next time we will see if we can get higher accuracy rates using a Convolutional Neural Network.


Part 2:


Written by

A Somali physicist, electrical engineer, Software enthusiast, and political enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store