Vgg16 bottleneck features

Vgg16 model bottleneck features

Problem statement

Convolutional neural network(CNN) is a powerful tool for image classification. However, it takes many images (on the order of hundreds of thousands) to train a CNN from scratch in order to achieve high accuracy. For example, vgg16 is a convolutional neural network model and explained in details here. The model is trained on 14 million images to recognize an image as one of 1000 categories with an accuracy of 92.5%.

In MemoTrek project, a CNN is trained to classify an image as a particular landscape category (such as desert, lake, skyscraper, etc.). There are 24 categories in total. But the number of training images is small (about 100 images per category). To achieve a reasonably high accuracy with few training images, a technique called "bottleneck feature" was applied.

Strategy

We would like to use the generic image features already learned in vgg16 and repurpose them for our own CNN. Since vgg16 was trained on millions of images and rather sophisticated, our CNN is as if "standing on the shoulders of giants".

Bottleneck features are the last activation maps before the fully-connected layers in a vgg16 model. If we only use the vgg16 model up until the fully-connected layers, we can convert the input X (image of size 224 x 224 x 3, for example) into the output Y with size 512 x 7 x 7. We then train a simple CNN with fully connected layers using Y as input and categorical values Z as output.

Code sample

I created a CNN named "model_load" to create bottleneck features based on vgg16, and feed it "train_data_dir". The input is an image of size 224 x 224 x 3, the output is of size 512 x 7 x 7.

generator = datagen.flow_from_directory(train_data_dir, target_size=(img_width, img_height), batch_size=32, class_mode=None, shuffle=False)
bottleneck_features_train = model_load.predict_generator(generator, nb_train_samples)

I then created a simple CNN named "model_train" to convert input of size 512 x 7 x 7 into a categorical value using fully-connected layers.

model_train = Sequential()
model_train.add(Flatten(input_shape=train_data.shape[1:]))
model_train.add(Dense(256, activation='relu'))
model_train.add(Dropout(0.5))
model_train.add(Dense(nb_category, activation='softmax'))
model_train.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
model_train.fit(train_data, train_labels,
nb_epoch=nb_epoch, batch_size=32,
validation_data=(validation_data, validation_labels))

The full code can be found here in the form of a jupyter notebook.