Image Classification Based on Image Feature Descriptor and Bag of Visual Words
Abstract
In this project, we implemented an image classification method based on low-dimensional feature vectors generated with the scale invariant feature transform (SIFT) descriptor, using the bag-of-visual-words (BOVW) method. It enables classification of high resolution images with feature vector-based machine learning methods, such as support vector machine. We evaluated the performance of this method by testing its classification accuracy on a small-scale high resolution food image dataset. We compared its performance with AlexNet, a deep convolutional neuron network that directly takes the high-dimensional image as input and performs end-to-end classification. The optimal 5-class classification result from the SIFT descriptor is close to results from AlexNet trained on the same dataset, confirming its validity, while being much inferior to pre-trained AlexNet fine-tuned on the dataset. The results demonstrated a potential of using image descriptors for low-cost and efficient image classification on small datasets.