Redes convolucionales: Cats and Dogs

Se entrenará una CNN que distingue entre imágenes de perros y gatos (clasificación binaria). El dataset está disponible en Kaggle como competición

Descargar dataset

La descarga no se hace sobre WiFi, sino sobre Colab, con lo que debería ser rápida.

import os
import tensorflow as tf
# El dataset está en Internet
origin = 'https://storage.googleapis.com/mledu-datasets/cats_and_dogs_filtered.zip'
path_to_zip = tf.keras.utils.get_file('cats_and_dogs.zip', origin=origin, extract=True)
path_to_folder = os.path.join(os.path.dirname(path_to_zip), 'cats_and_dogs_filtered')

Contenido del zip descomprimido:

cats_and_dogs_filtered
|__ train
    |______ cats: [cat.0.jpg, cat.1.jpg, cat.2.jpg ....]
    |______ dogs: [dog.0.jpg, dog.1.jpg, dog.2.jpg ...]
|__ validation
    |______ cats: [cat.2000.jpg, cat.2001.jpg, cat.2002.jpg ....]
    |______ dogs: [dog.2000.jpg, dog.2001.jpg, dog.2002.jpg ...]

El dataset está dividido en train y validation. Creamos variables que apunten a esos directorios

train_dir = os.path.join(path_to_folder, 'train')
validation_dir = os.path.join(path_to_folder, 'validation')
train_cats_dir = os.path.join(train_dir, 'cats')
train_dogs_dir = os.path.join(train_dir, 'dogs')
validation_cats_dir = os.path.join(validation_dir, 'cats')
validation_dogs_dir = os.path.join(validation_dir, 'dogs')

Contamos el número de imágenes

num_cats_tr = len(os.listdir(train_cats_dir))
num_dogs_tr = len(os.listdir(train_dogs_dir))

num_cats_val = len(os.listdir(validation_cats_dir))
num_dogs_val = len(os.listdir(validation_dogs_dir))

total_train = num_cats_tr + num_dogs_tr
total_val = num_cats_val + num_dogs_val

print('Total training cat images:', num_cats_tr)
print('Total training dog images:', num_dogs_tr)
print('Total validation cat images:', num_cats_val)
print('Total validation dog images:', num_dogs_val)
print('---')
print("Total training images:", total_train)
print("Total validation images:", total_val)

Hay 3000 imágenes (2000 para entrenar y 1000 para validar). Y está balanceado (mismo número de imágenes de perros y gatos)

Nota: se pueden ejecutar comandos shell en colab (ejemplo, !ls $train_cats_dir).

!ls $train_cats_dir 

Mostramos algunas imágenes.

import matplotlib.pyplot as plt
_ = plt.imshow(plt.imread(os.path.join(train_cats_dir, "cat.0.jpg")))
_ = plt.imshow(plt.imread(os.path.join(train_cats_dir, "cat.1.jpg")))

Las imágenes tienen distinto tamaño. Hay que igualarlo antes de introducirlas en la red neuronal.

Preprocesado de datos

Para preprocesarva, vamos a:

  • Leer imágenes de disco.
  • Decodificar contenido y convertirlo en RGB.
  • Convertir valores de enteros a coma flotante (float).
  • Reescalado a valores entre 0 y 1 (mejor para redes neuronales, esto previene posibles overflows al multiplicar por pesos).

Todas las operacfiones anteriores las realiza la clase ImageDataGenerator del paquete tf.keras. Lee las imágenes y las almacena en arrays.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
# Let's resize images to this size
IMG_HEIGHT = 150
IMG_WIDTH = 150
# Rescale the pixel values to range between 0 and 1
train_generator = ImageDataGenerator(rescale=1./255)
val_generator = ImageDataGenerator(rescale=1./255)

After defining the generators for training and validation images, the flow_from_directory method load images from the disk, applies rescaling, and resizes the images into the required dimensions.

batch_size = 32 # Read a batch of 64 images at each step
train_data_gen = train_generator.flow_from_directory(batch_size=batch_size,
                                                     directory=train_dir,
                                                     shuffle=True,
                                                     target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                     class_mode='binary')
val_data_gen = val_generator.flow_from_directory(batch_size=batch_size,
                                                 directory=validation_dir,
                                                 target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                 class_mode='binary')

Usamos generators para mostrar algunas imágenes y sus etiquetas

Next, we will extract a batch of images from the training generator, then plot several of them with matplotlib. The next function returns a batch from the dataset. The return value of next function is in form of (x_train, y_train) where x_train is the pixel values and y_train is the labels.

image_batch, labels_batch = next(train_data_gen)
# The shape will be (32, 150, 150, 3)
# This means a list of 32 images, each of which is 150x150x3.
# The 3 at the end refers to the R,G,B color channels.
# A grayscale image would be (for example) 150x150x1
print(image_batch.shape)
# The shape (32,) means a list of 64 numbers
# each of these will either be 0 or 1
print(labels_batch.shape)
# This function will plot images returned by the generator
# in a grid with 1 row and 5 columns
def plot_images(images):
  fig, axes = plt.subplots(1, 5, figsize=(10,10))
  axes = axes.flatten()
  for img, ax in zip(images, axes):
      ax.imshow(img)
      ax.axis('off')
  plt.tight_layout()
  plt.show() 
plot_images(image_batch[:5])

Next, let's retrieve the labels. All images will be labeled either 0 or 1, since this is a binary classification problem.

# Here are the first 5 labels from the dataset
# that correspond to the images above
print(labels_batch[:5])
# Here, we can see that "0" maps to cat,
# and "1" maps to dog
print(train_data_gen.class_indices)

Crear modelo

El modelo tiene 3 capas convolucionales con max pooling. Hay al final una capa completamente conectada con 256 unidades. la salida es 0 ó 1 con una función de activación sigmoid. Si cerca de 1, es un perro, si no, es un gato.

from tensorflow.keras.layers import Conv2D, Dense, Flatten, MaxPooling2D
from tensorflow.keras.models import Sequential
model = Sequential([
    Conv2D(32, 3, padding='same', activation='relu', 
           input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    MaxPooling2D(),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(256, activation='relu'),
    Dense(1, activation='sigmoid')
])

Compilamos el modelo, y seleccionamos el optimizador Adam para el descenso de gradientes, y binary cross entropy para la función de pérdidas (cross entropy mide aproximadamente la distancia entre la predicción de la red y la que querríamos que tuviera).

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

Vemos un resumen con el método summary:

model.summary()

Notar que este modelo tiene 5M de parámetros (ó pesos). El modelo está listo para entrenar, usando las salidas de antes de ImagedataGenerator

Entrenar el modelo

Use the fit method to train the network. You will train the model for 15 epochs (an epoch is one "sweep" over the training set, where each image is used once to perform a round of gradient descent, and update the models parameters). This will take one to two minutes, so let's start it now:

epochs = 15
history = model.fit(
    train_data_gen,
    epochs=epochs,
    validation_data=val_data_gen,
)

Inside model.fit, TensorFlow uses gradient descent to find useful values for all the weights in the model. When you create the model, the weights are initialized randomly, then gradually improved over time. The data generator is used to load batches of data off disk. Then, for each batch:

  • The model performs a forward pass (the images are classified by the network).
  • Then, the model performs a backward pass (the error is computed, then each weight is slightly adjusted using gradient descent to improve the accuracy on the next iteration).

Gradient descent is an iterative process. The longer you train the model, the more accurate it will become on the training set. But, the more likely it is to overfit! Meaning, the model will begin to memorize the training images, rather than learn patterns that enable it generalize to new images not included in the training set.

  • We can see whether overfitting is present by comparing the accuracy on the training and validation data.

If you look at the accuracy figures reported above, you should see that training accuracy is over 90%, while validation accuracy is only around 70%.

Comprobar overfitting

El precisión en el set de validación es importante: it helps you estimate how well our model is likely to work on new, unseen data in the future. To see how much overfitting is present (and when it occurs), we will create two plots, one for accuracy, and another for loss. Roughly, loss (or error) is the inverse of accuracy (lower is better). Unlike accuracy, loss takes the confidence of a prediction into account (a confidently wrong predicitions has a higher loss than one that is only slightly wrong).

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

Overfitting occurs when the validation loss stops decreasing. In this case, that occurs around epoch 5 (give or take). Your results may be slightly different each time you run this code (since the weights are initialized randomly).

Why does overfitting happen? When there are only a "small" number of training examples, the model sometimes learns from noises or unwanted details, to an extent that it negatively impacts the performance of the model on new examples. It means that the model will have a difficult time "generalizing" on a new dataset (making accurate predictions on images that weren't included in the training set).

Optional: reducir overfitting

Instructions

In this exercise, you will use data augmentation and dropout to improve your model. Follow along by reading and running the code below. There are two TODOs for you to complete, and a solution is given below.

Data augmentation

Overfitting occurs when there are a "small" number of training examples. One way to fix this problem is to increase the size of the training set, by gathering more data (the larger and more diverse the dataset, the better!)

We can also use a technique called "data augmentation" to increase the size of the training set, by generating new examples from existing ones by applying random transformations (for example, rotation) that yield believable-looking images.

This is especially effective when working with images. For example, our training set may only contain images of cats that are right side up. If our validation set contains images of cats that are upside down, our model may have trouble classifying them correctly. To help teach it that cats can appear in any orientation, we will randomly rotate images from our training set during training. This helps expose the model to more aspects of the data, and can lead to better generalization.

Data augmentation is built into the ImageDataGenerator. You can specifiy different transformations, and it will take care of applying then during the training.

# Let's create new data generators, this time with 
# data augmentation enabled
train_generator = ImageDataGenerator(
                    rescale=1./255,
                    rotation_range=45,
                    width_shift_range=.15,
                    height_shift_range=.15,
                    horizontal_flip=True,
                    zoom_range=0.5
                    )
train_data_gen = train_generator.flow_from_directory(batch_size=32,
                                                     directory=train_dir,
                                                     shuffle=True,
                                                     target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                     class_mode='binary')

The next cell will show how the same training image appears when used with five different types of data augmentation.

augmented_images = [train_data_gen[0][0][0] for i in range(5)]
plot_images(augmented_images)

We only apply data augmentation to the training examples, so our validation generator looks the same as before.

val_generator = ImageDataGenerator(rescale=1./255)
val_data_gen = val_generator.flow_from_directory(batch_size=32,
                                                 directory=validation_dir,
                                                 target_size=(IMG_HEIGHT, IMG_WIDTH),
                                                 class_mode='binary')

Dropout

Another technique to reduce overfitting is to introduce dropout to the network. Dropout is a form of regularization that makes it more difficult for the network to memorize rare details (instead, it is forced to learn more general patterns).

When you apply dropout to a layer it randomly drops out (set to zero) a number of activations during training. Dropout takes a fractional number as its input value, in the form such as 0.1, 0.2, 0.4, etc. This means dropping out 10%, 20% or 40% of the output units randomly from the applied layer.

When appling 0.1 dropout to a certain layer, it randomly deactivates 10% of the output units in each training epoch.

Create a new model using Dropout. You'll reuse the model definition from above, and add a Dropout layer.

from tensorflow.keras.layers import Dropout
# TODO: Your code here
# Create a new CNN that takes advantage of Dropout.
# 1) Reuse the model declared in tutorial above.
# 2) Add a new line that says "Dropout(0.2)," immediately
# before the line that says "Flatten()".

Solution

#@title
model = Sequential([
    Conv2D(32, 3, padding='same', activation='relu', 
           input_shape=(IMG_HEIGHT, IMG_WIDTH ,3)),
    MaxPooling2D(),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Dropout(0.2),
    Flatten(),
    Dense(256, activation='relu'),
    Dense(1, activation='sigmoid')
])

After introducing dropout to the network, compile your model and view the layers summary. You should see a Dropout layer right before flatten.

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.summary()

Train your new model

Add code to train your new model. Previously, we trained for 15 epochs. You will need to train this new modek for more epochs, as data augmentation and dropout make it more difficult for a CNN to memorize the training data (this is what we want!).

Here, you'll train this model for 25 epochs. This may take a few minutes, and you may need to train it for longer to reach peak accuracy. If you like, you can continue experimenting with that at home.

epochs = 25
# TODO: your code here
# Add code to call model.fit, using your new
# data generators with image augmentation
# For reference, see the "Train the model"
# section above

Solution

#@title
history = model.fit(
    train_data_gen,
    epochs=epochs,
    validation_data=val_data_gen,
)

Evaluate your new model

Finally, let's again create plots of accuracy and loss (we use these plots often in practice!) Now, compare the loss and accuracy curves for the training and validation data. Were you able to achieve a higher validation accuracy than before? Note that even this model will eventually overfit. To prevent that, we use a technique called early stopping (we stop training when the validation loss is no longer decreasing).

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']

loss = history.history['loss']
val_loss = history.history['val_loss']

epochs_range = range(epochs)

plt.figure(figsize=(8, 8))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()