In this exercise, we’ll build a convolutional neural network (CNN) for image classification. This is one of the original and more straightforward uses of CNNs. More complex uses include:
But all methods are based around two basic operations:
In general, these steps are repeated several times. As this progresses, the small shapes identified in the first set of convolutions are progressively combined into larger structures. For example, a series of small curves or lines could be aggregated into a cat’s eye.
The basic idea behind image classification is to link
features of an image to a single label or class. For example,
we might have a photograph of a cat, with the label Cat
and
one of a dog with the label Dog
. The goal of the model is
to identify what shapes and colors, and combination of these can help
differentiate between these two classes.
We’ll use a dataset from Kaggle (https://www.kaggle.com/moltean/fruits) containing over 90,000 images of fruits and vegetables. We’ll just be using a subset of these data, and you’ll need to download the zip file fruits.zip from gtihub. I’d suggest making a new folder for this lab - move the zip file to this once it is downloaded.
Once you have unzipped the data, take a look in the fruits
folder. This is already set up in the standard way for image
classification. There are two high level folders Training
and Testing
, each of which contains a subset of the images.
Not too surprisingly, the first will be used to train the model, and the
second to test the model. Within each of these, there will be a set of
folders, one per class of fruit. The name of the folders is used as the
label for each image, and is what Tensorflow will use.
Note that this is a specific format for image classification. There are other options, for example, if you are working with continuous outcomes.
The TensorFlow and Keras libraries were first designed to be used in
Python. There has been quite a lot of work recently to allow access in
R, mainly using the reticulate
package which allows
exchanges between R and Python. You do still need an version of Python
for this to work, but this can be installed directly from R.
First install the R keras
package:
install.packages('keras')
Once this step is complete, load th elibrary, and run the
install_keras()
function. This will install a local version
of Python, as well as download all the necessary add-ons to build deep
learning models:
library(keras)
install_keras()
This can take a few minutes, but should all load without problem. If you do get error messages, please let us know so we can try to fix them
Let’s start, as usual, by loading the libraries we’ll need for the lab:
library(dplyr)
library(tidyr)
library(keras)
library(ggplot2)
Now set the path to the folder containing the training images you downloaded. If you’ve copied these to your datafiles folder, this will look something like this:
train_image_files_path = "./datafiles/fruits/Training/"
If you have any questions about setting this path, please ask.
You can visualize any of the images using the imager package (you’ll need to install this):
library(imager)
im = load.image(paste0(train_image_files_path, "Banana/0_100.jpg"))
plot(im)
You can try other images by changing the folder name and filename. Note that these are somewhat idealized images, with a blank white background.
Next, we’ll define the classes that we are going to process. There are 16 different types of fruit in the dataset, making this a multi-class classification problem.
fruit_list = c("Kiwi", "Banana", "Apricot", "Avocado",
"Cocos", "Clementine", "Mandarine", "Orange",
"Limes", "Lemon", "Peach", "Plum",
"Raspberry", "Strawberry", "Pineapple", "Pomegranate")
# store the number of classes
output_n = length(fruit_list)
The original images are 100x100 pixels. Ideally we’d use these at
their full resolution, but as this is an example, we’ll reduce the
resolution to 20x20 to make this a bit faster to run, so we set the
target size here. We’ll use these values (stored in
target_size
to define the input tensor shape in the
network. (A good follow-up test would be to increase this and see how
much it impacts the predictions.)
img_width = 20
img_height = 20
target_size = c(img_width, img_height)
The other dimension we need for image processing is the image depth. These are RGB images with three color channels. The input tensors then will be rank 4, with shape (\(n\), 20, 20, 3), where \(n\) is the number of images.
channels = 3
The last parameter we’ll set here is the batch size. This is the same parameter that we have used before to control the rate at which the network weights are updated. We’ll also use this to control the number of images that are loaded into memory at any step. This is very useful if you’re working on a computer with limited memory (like my old laptop).
batch_size = 32
Keras has several functions to facilitate working with images. We’ll start by creating an image generator. This acts a bit like a pipeline and will carry various pre-processing steps. These include data augmentation: simple transformations of the images to supplement the original image. We’re not going to use that here, but some example code is given in the appendix to illustrate how you might use this.
We’ll create a generator for the training images. This will rescale each channel to a 0-1 range (the RGB channels have values between 0 and 255), and it will hold aside 30% of the training images for validation. We’ll use this to check for overfitting during the training process.
train_data_gen = image_data_generator(
rescale = 1/255,
validation_split = 0.3)
The next function we’ll use is a flow function. This function controls how Keras will read in the images for any training step (i.e. any update of the network weights). There are several arguments here:
train_image_files_path
: The path to the top-level
folder containing the training imagestrain_data_gen
: The image data generatorsubset
: The subset of images to use from the generator
for training. As we set the validation_size
to 0.3, this
will be 1 - 0.3 = 0.7, or 70% of the imagestarget_size
: The size for rescaling each imageclass_mode
: The type of label used (this will one-hot
encode the labels of the images)classes
: The set of categories to use. This is the list
we defined earlier and needs to match the subfolder names. If this is
not included, this will use all subfolders, and create a list of
labelsbatch_size
: The number of images to import for any
update stepseed
: a value to initialize the random number generator
(this is only there to ensure consistent results)train_image_array_gen = flow_images_from_directory(train_image_files_path,
train_data_gen,
subset = 'training',
target_size = target_size,
class_mode = "categorical",
classes = fruit_list,
batch_size = batch_size,
seed = 42)
## Found 5401 images belonging to 16 classes.
The function will tell you how many images (and classes) it found in the folder you defined. If this is 0, go back and check the folder path you defined earlier. We’ll also create a flow for the validation images. The only difference here is in the definition of the subset.
valid_image_array_gen = flow_images_from_directory(train_image_files_path,
train_data_gen,
subset = 'validation',
target_size = target_size,
class_mode = "categorical",
classes = fruit_list,
batch_size = batch_size,
seed = 42)
## Found 2308 images belonging to 16 classes.
Note that these flow generators contain various useful bits of information. For example, to check the number of images (we’ll also use this number when training the model):
train_samples = train_image_array_gen$n
valid_samples = valid_image_array_gen$n
print(paste(train_samples, valid_samples))
## [1] "5401 2308"
Or you can get the number of images per class (type of fruit)
cat("Number of images per class:")
## Number of images per class:
table(factor(train_image_array_gen$classes))
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 327 343 345 299 343 343 343 336 343 345 345 313 343 345 343 345
Which suggest this is a relatively well-balanced dataset. This also contains various information about the files, resolution, channels, etc.
Let’s now set up the model. As this is quite a complex model, we’ll do this as a series of steps rather than in one go.
First, create a template sequential model
model = keras_model_sequential()
Next we add the first hidden layer. This is a convolutional layer,
where we’ll create 16 filters (or convolutions) based on the original
images, with a 3x3 kernel. We’ll pad the output of this layer so that it
has the same size as the input (same
). Note that we also
need to define the size of the input tensors (width, height and
channels).
model %>%
layer_conv_2d(filter = 16, kernel_size = c(3,3), padding = "same", input_shape = c(img_width, img_height, channels))
We’ll then take the output of this layer and pass it through a ReLU activation function (this could have been included directly in the convolutional layer, but this allows a little more control on the process):
model %>%
layer_activation("relu")
Now, we add a max-pooling layer. As a reminder, this reduces the resolution of the output from the previous layer by a simple filter, forcing the next layer of the network to focus on larger image features. We’ll also add a dropout layer. This is a form of regularization. It randomly sets some connection weights to 0 (i.e. having no contribution to the model), which can reduce overfitting.
model %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_dropout(0.25)
Let’s add another convolutional layer, this time with 32 filters, and pass this through a different activation function (a leaky ReLU)
model %>%
layer_conv_2d(filter = 32, kernel_size = c(3,3), padding = "same") %>%
layer_activation_leaky_relu(0.5)
We’ll take the output of this function and normalize the weights. This is a simple method that adjusts the mean weight to close to zero and reduces the amount variation. This helps avoid gradient problems with very small or very large weights
model %>%
layer_batch_normalization()
And we’ll run the output of this through a max-pooling function with dropout:
model %>%
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_dropout(0.25)
Now we’ll add layers to connect the output of this last max-pooling
step to the output (the fruit classes). The first thing we need to do is
to flatten the output. The output of the max-pooling is a tensor of
shape (5, 5, 32). The size of 5 is a result of the two max-pooling
operations and the 32 is the number of filters from the second
convolution. The layer_flatten()
function will flatten this
into a rank 1 tensor of shape (800).
model %>%
layer_flatten()
Next we’ll pass this flattened layer through a dense layer, with a ReLU activation and a dropout
model %>%
layer_dense(100) %>%
layer_activation("relu") %>%
layer_dropout(0.5)
Finally, we need to output predictions. As this is a multiclass task, the final layer needs to have the same number of nodes as classes (16). This is passed through a softmax activation function. This transforms the predictions for all classes into probabilities (i.e. they have to sum to 1).
model %>%
layer_dense(output_n) %>%
layer_activation("softmax")
More practically, we’ll create a function that will build the model in one go. This will allow us to easily create new versions for testing.
create_model = function() {
model <- keras_model_sequential() %>%
# add convolution layer
layer_conv_2d(filter = 16, kernel_size = c(3,3), padding = "same",
input_shape = c(img_width, img_height, channels)) %>%
layer_activation("relu") %>%
# Use max pooling
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_dropout(0.25) %>%
# Second hidden layer
layer_conv_2d(filter = 32, kernel_size = c(3,3), padding = "same") %>%
layer_activation_leaky_relu(0.5) %>%
layer_batch_normalization() %>%
# Use max pooling
layer_max_pooling_2d(pool_size = c(2,2)) %>%
layer_dropout(0.25) %>%
# Flatten and feed into dense layer
layer_flatten() %>%
layer_dense(100) %>%
layer_activation("relu") %>%
layer_dropout(0.5) %>%
# Outputs
layer_dense(output_n) %>%
layer_activation("softmax")
return(model)
}
Let’s take a look at the whole thing:
model = create_model()
summary(model)
## Model: "sequential_1"
## ________________________________________________________________________________
## Layer (type) Output Shape Param # Trainable
## ================================================================================
## conv2d_3 (Conv2D) (None, 20, 20, 16) 448 Y
## activation_5 (Activation) (None, 20, 20, 16) 0 Y
## max_pooling2d_3 (MaxPooling2 (None, 10, 10, 16) 0 Y
## D)
## dropout_5 (Dropout) (None, 10, 10, 16) 0 Y
## conv2d_2 (Conv2D) (None, 10, 10, 32) 4640 Y
## leaky_re_lu_1 (LeakyReLU) (None, 10, 10, 32) 0 Y
## batch_normalization_1 (Batch (None, 10, 10, 32) 128 Y
## Normalization)
## max_pooling2d_2 (MaxPooling2 (None, 5, 5, 32) 0 Y
## D)
## dropout_4 (Dropout) (None, 5, 5, 32) 0 Y
## flatten_1 (Flatten) (None, 800) 0 Y
## dense_3 (Dense) (None, 100) 80100 Y
## activation_4 (Activation) (None, 100) 0 Y
## dropout_3 (Dropout) (None, 100) 0 Y
## dense_2 (Dense) (None, 16) 1616 Y
## activation_3 (Activation) (None, 16) 0 Y
## ================================================================================
## Total params: 86932 (339.58 KB)
## Trainable params: 86868 (339.33 KB)
## Non-trainable params: 64 (256.00 Byte)
## ________________________________________________________________________________
Our model has a little under 87,000 parameters or weights to train (hence the need for a lot of images). Note that there are small set of non-trainable parameter from the normalization layer.
The next step is to compile the model. This asss an optimization
function, a loss function and performance metric. A good option for the
loss function is categorical_crossentropy
, which tries to
maximize the difference between the distribution of multiple categories,
and we’ll use a backpropagation optimizer. The metric is the error
between the predicted class and observed class aggregated across all
samples. Accuracy is the simplest of these for classification exercises
and gives the proportion of all images that are correctly
classified.
model %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_rmsprop(),
metrics = "accuracy"
)
We’ll now train the model for 20 epochs. As we are using a data
generator function to supply the images and data to the model, we use
the fit_generator()
function instead of the
fit()
function we have previously used. We specify:
train_image_array_gen
: The generator of the training
samplessteps_per_epoch
: The number of update steps in each
epoch (usually just number of samples / batch size)epochs
: number of full training iterationsvalidation_data
: the generator of the validation
samplesvalidation_steps
: the number of validation steps per
epochThis takes a few minutes to train (on my laptop). It’s worth remembering what is going on here: the algorithm is reading in batches of 32 images, rescaling them, updating model weights through back propagation and then repeating the whole thing 20 times. As we previously defined a separate validation set (and image generator), this routine will calculate two losses: - The training loss. This is how accurately the model can predict the images that are being used to update the weights - The validation loss. This is how accurately the model can predict a set of training images that are not used in updating the weights
As the model continues to train, you should see the loss (the crossentropy) decrease for both of these, but will likely stabilize at a certain point. The accuracy should (hopefully) increase over time.
epochs = 20
hist <- model %>% fit(
train_image_array_gen,
steps_per_epoch = as.integer(train_samples / batch_size),
epochs = epochs,
validation_data = valid_image_array_gen,
validation_steps = as.integer(valid_samples / batch_size)
)
Now plot the evolution of the loss function and performance metric:
plot(hist)
The plot shows a steep decline in both loss values, but no real
improvement beyond epoch 3. It’s quite likely that this model has
overfit - become too tuned to the training data to allow prediction.
There are a variety of ways we can avoid this, but a simple one is to
slow the rate at which the model learns. We’ll refit the model, but will
add a learning_rate
parameter to the compilation step.
model = create_model()
model %>% compile(
loss = "categorical_crossentropy",
optimizer = optimizer_rmsprop(learning_rate = 0.0001),
metrics = "accuracy"
)
hist <- model %>% fit(
train_image_array_gen,
steps_per_epoch = as.integer(train_samples / batch_size),
epochs = epochs,
validation_data = valid_image_array_gen,
validation_steps = as.integer(valid_samples / batch_size)
)
plot(hist)
The plot shows a good evolution of both the loss and performance metrics.
We’ll now evaluate this model on the set of test images. Normally, we’d retrain the model using the full training set (including the validation set), but to save time, we’ll just proceed with the existing model. In order to do this, we first need to create a new image generator for the test images:
test_image_files_path <- "./datafiles/fruits/Test/"
test_data_gen <- image_data_generator(rescale = 1/255)
test_generator <- flow_images_from_directory(
test_image_files_path,
test_data_gen,
target_size = target_size,
class_mode = "categorical",
classes = fruit_list,
batch_size = 1,
shuffle = FALSE,
seed = 42)
## Found 2592 images belonging to 16 classes.
The only differences here from the previous code is that we use the
test image folder, and we no longer specify a validation parameter. Now
we can use this to evaluate the model using evaluate()
:
results <- model %>%
evaluate(test_generator,
steps = as.integer(test_generator$n))
## 2592/2592 - 17s - loss: 0.2086 - accuracy: 0.9325 - 17s/epoch - 7ms/step
print(results)
## loss accuracy
## 0.2086322 0.9324846
Which gives us an accuracy of about 0.93 which is a very good classifier. It is worth noting that this is partly because the images have all been cleaned and prepared; achieving this level of accuracy with images take ‘in the wild’ would require much more work in setting up and training the model.
We’ll finish by obtaining predictions for the set of test images, and
building a confusion matrix based on these. Predictions for new samples
can be obtained using the predict_generator()
function.
Note that if you were predicting for completely new images, you would
need to make a new generator for these. We’ll start by resetting the
image generator (this just ensures that everything will align in the
output), then obtain the predictions:
test_generator$reset()
predictions <- model %>%
predict(
test_generator,
steps = as.integer(test_generator$n)
)
## 2592/2592 - 3s - 3s/epoch - 1ms/step
For each image, there is the predicted probability of each class:
predictions[1,]
## [1] 9.984205e-01 1.300145e-07 1.360483e-10 8.630380e-05 1.362863e-03
## [6] 7.293721e-14 5.064761e-08 1.394456e-08 2.414268e-05 1.028145e-04
## [11] 1.220951e-06 1.428428e-09 5.182119e-12 5.544225e-09 1.891201e-06
## [16] 4.449551e-09
To get the predicted labels, we simply need to find the column with
the highest probability. We can use R’s max.col()
function
for this:
pred_class <- max.col(predictions)
The test generator stores the observed labels for the test set, so
let’s extract that and store it as obs_class
:
obs_class <- test_generator$classes
We can now make a confusion matrix between the observed and predicted classes:
pred_table <- table(obs_class, pred_class)
Note that the row and column indices are different (this is because R starts indices at 1 and Keras starts at 0). We can simply replace these with the fruit labels
rownames(pred_table) <- colnames(pred_table) <- fruit_list
pred_table
Kiwi | Banana | Apricot | Avocado | Cocos | Clementine | Mandarine | Orange | Limes | Lemon | Peach | Plum | Raspberry | Strawberry | Pineapple | Pomegranate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kiwi | 156 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Banana | 0 | 166 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Apricot | 0 | 0 | 68 | 0 | 0 | 0 | 0 | 96 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Avocado | 0 | 0 | 0 | 143 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Cocos | 0 | 0 | 0 | 0 | 166 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Clementine | 0 | 0 | 0 | 0 | 0 | 166 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Mandarine | 0 | 0 | 0 | 0 | 0 | 0 | 166 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Orange | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 160 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Limes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 166 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Lemon | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 164 | 0 | 0 | 0 | 0 | 0 | 0 |
Peach | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 37 | 0 | 0 | 119 | 0 | 0 | 0 | 0 | 8 |
Plum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 151 | 0 | 0 | 0 | 0 |
Raspberry | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 166 | 0 | 0 | 0 |
Strawberry | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 164 | 0 | 0 |
Pineapple | 0 | 0 | 0 | 0 | 34 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 132 | 0 |
Pomegranate | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 164 |
This shows why we got such a high accuracy - nearly all the images
are correctly classified. Let’s finish by plotting this result. Here, we
convert this confusion matrix into a long data frame, and calculate the
number of each observed image class. We then use this to calculate the
percentage correctly identified, and use ggplot2’s
geom_tile
to plot this out. Note that you will need the
reshape2 package here:
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
pred_df <- melt(pred_table, value.name = "count") %>%
group_by(obs_class) %>%
mutate(n = sum(count)) %>%
ungroup()
p <- pred_df %>%
filter(count > 0) %>%
mutate(percentage_pred = count / n * 100) %>%
ggplot(aes(x = obs_class, y = pred_class,
fill = percentage_pred,
label = round(percentage_pred, 2))) +
geom_tile() +
#scale_fill_continuous() +
scale_fill_gradient(low = "blue", high = "red") +
geom_text(color = "white") +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
labs(x = "True class",
y = "Predicted class",
fill = "Percentage\nof predictions",
title = "True v. predicted class labels",
subtitle = "Percentage of test images predicted for each label")
print(p)
This is an example of an image generator that will perform data augmentation. In each epoch, each image is transformed according to a set of random modifications. The parameters here set limits on the amount of transformation that any method will carry out. For example, images will be randomly rotated by an amount between + and - 40 degrees.
train_data_gen <- image_data_generator(
rescale = 1/255,
rotation_range=40, ## Random rotation (+/- 40 degrees)
width_shift_range=0.2, ## Random horizontal shift (+/- proportion of image size)
height_shift_range=0.2, ## Random vertical shift (+/- proportion of image size)
shear_range=0.2, ## Shear angle in counter-clockwise direction in degrees
zoom_range=0.2, ## Zoom range. Zooms by 1 +/- 0.2 from original
horizontal_flip=True, ## Randomly flips 50% of images
fill_mode='nearest' ## How to fill newly created pixels (nearest neighbor)
validation_split = 0.3)