Using AI to detect Cat and Dog pictures, with Tensorflow & Keras. (3)

4 min readNov 25, 2020

If you don’t know where you are, please proceed to part 1 of this 4 part series.

We are finally here, we went from achieving a 65% accuracy peak to an 88% accuracy peak through the use of some data augmentation and convolutional neural networks.

But...

Let’s see if we can do better, we want to hit the roof in terms of accuracy.

Pre-Trained convnet:

The number one reason for our data not reaching the heights of accuracy is the lack of data we have to train our system with. If Deep Learning is the new electricity then data is its fuel.

Thus to help us in our endeavor we will break our system into two parts. The convolutional block and the classifier block. The convolutional block will contain all our neural network components before the “Flatten” portion of our code.

We will be using a pre-trained convolutional base called the InceptionV3 architecture. The model was trained on 1.4 million images and thus has no shortage of the proverbial fuel.

Analyzing the model:

Create a new block of code anywhere in our previous notebook, within the block write:

from tensorflow.keras.applications.inception_v3 import InceptionV3#import InceptionV3conv_base = InceptionV3(weights='imagenet',include_top=False,input_shape=(64, 64, 3))for layer in conv_base.layers:
   layer.trainable = False

We first import the InceptionV3convolutional base and set that as the conv_base. We will reconfigure the model to have our input shape of (64,64,3). We will also freeze our convbase’s trainability as we want to keep the information stored within the convbase and only train the classifier portion.

Next, under the above code, type:

print(conv_base.summary())

to get a view of the convolutional base. You should get the following output:

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 64, 64, 3)]       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 64, 64, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 64, 64, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 32, 32, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 32, 32, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 32, 32, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 16, 16, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 16, 16, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 16, 16, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 16, 16, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 8, 8, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 8, 8, 512)         1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 8, 8, 512)         2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 8, 8, 512)         2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 4, 4, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 2, 2, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 0
Non-trainable params: 14,714,688
_________________________________________________________________
None

As you can see the architecture is made up of convolutional 2d blocks and maxpooling2d blocks which is no different from our own code in part 2. The main difference is that they have trained more data on the convolutional base and thus required more layers.

Developing our model:

We will know to change our previous model architecture to :

network = models.Sequential()network.add(conv_base)network.add(layers.Flatten())network.add(layers.Dense(256,  kernel_regularizer=regularizers.l2(0.001)))network.add(layers.LeakyReLU())network.add(layers.Dense(1,activation='sigmoid'))

The rest of the model block will stay the same. Notice, we added our conv_base block just like any other layer.

Now before we run the block I must warn you that it will take a substantial amount of time due to the large nature of the conv_base.

Now, if you’re willing to wait, go ahead and run the model!

Graphical analysis of our model:

Finally, we can run our image block from part two to see the accuracy we achieved with this method:

Unfortunately, it seems using pre-trained models doesn’t help in our case. This is most likely due to the lack of data used to optimizer our classifier. This can be fixed by using more images as well as using larger images.

Interestingly, removing the following piece of code:

for layer in conv_base.layers:
       layer.trainable = False

grants an increase of accuracy to 96%.

Thus implying the data used to train InceptionV3 does not “coincide” with our data. Furthermore, running it for 500 epochs which would take 3 hours will grant an increase in accuracy to 98%.

Next Time:

Next time, we will use our model to create a visual-based password cracker!

***GitHubCode***:
https://github.com/MoKillem/CatDogVanillaNeuralNetwork/blob/main/CNN_CATS_96%25.ipynb

Using AI to detect Cat and Dog pictures, with Tensorflow & Keras. (3)

Written by UnknownKnowns