Using AI to detect Cat and Dog pictures, with Tensorflow & Keras. (3)

If you don’t know where you are, please proceed to part 1 of this 4 part series.


We are finally here, we went from achieving a 65% accuracy peak to an 88% accuracy peak through the use of some data augmentation and convolutional neural networks.


Let’s see if we can do better, we want to hit the roof in terms of accuracy.

Pre-Trained convnet:

The number one reason for our data not reaching the heights of accuracy is the lack of data we have to train our system with. If Deep Learning is the new electricity then data is its fuel.

Thus to help us in our endeavor we will break our system into two parts. The convolutional block and the classifier block. The convolutional block will contain all our neural network components before the “Flatten” portion of our code.

We will be using a pre-trained convolutional base called the InceptionV3 architecture. The model was trained on 1.4 million images and thus has no shortage of the proverbial fuel.

Image for post
Image for post
Process of switching classifiers.

Analyzing the model:

Create a new block of code anywhere in our previous notebook, within the block write:

We first import the InceptionV3convolutional base and set that as the conv_base. We will reconfigure the model to have our input shape of (64,64,3). We will also freeze our convbase’s trainability as we want to keep the information stored within the convbase and only train the classifier portion.

Next, under the above code, type:

to get a view of the convolutional base. You should get the following output:

As you can see the architecture is made up of convolutional 2d blocks and maxpooling2d blocks which is no different from our own code in part 2. The main difference is that they have trained more data on the convolutional base and thus required more layers.

Developing our model:

We will know to change our previous model architecture to :

The rest of the model block will stay the same. Notice, we added our conv_base block just like any other layer.

Now before we run the block I must warn you that it will take a substantial amount of time due to the large nature of the conv_base.

Now, if you’re willing to wait, go ahead and run the model!

Graphical analysis of our model:

Finally, we can run our image block from part two to see the accuracy we achieved with this method:

Image for post
Image for post
A sad fail

Unfortunately, it seems using pre-trained models doesn’t help in our case. This is most likely due to the lack of data used to optimizer our classifier. This can be fixed by using more images as well as using larger images.

Interestingly, removing the following piece of code:

grants an increase of accuracy to 96%.

Image for post
Image for post
96% Accuracy rate

Thus implying the data used to train InceptionV3 does not “coincide” with our data. Furthermore, running it for 500 epochs which would take 3 hours will grant an increase in accuracy to 98%.

Next Time:

Next time, we will use our model to create a visual-based password cracker!


A Somali physicist, electrical engineer, Software enthusiast, and political enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store