Bird Classification

Previous Work, Approach

Our approach was to utilize Convolutional Neural Networks (CNNs), Transfer learning, Data Augmentation, and Parameter Fine tuning. This involved initializing our network with a pretrained model as a feature extractor, and then training it on our transformed/augmented birds dataset. Our initial code layout was from Professor Joseph Redmon’s tutorial on transfer learning applied to bird classification, which can be found here.

We experimented with multiple pretrained models including ResNet18, ResNet152 (limited epochs), MobileNetv2, EfficientNetb3, and densenet161 to find the best feature-extracting network for our problem. We noticed that because these models were trained on datasets like ImageNet--which includes thousands of images with numerous objects--we would already have a better-than-decent feature extracting model as a starting point. However, we also noticed that our problem was a bit more specific than object detection within images. We believed that since we were distinguishing between a large number of bird species, the best pretrained models would be the ones that extracted more features (a larger output layer). We thought this because a fair amount of the bird species in our datasets were similar in shape and body structure (especially the smaller birds, which are harder to distinguish). It made sense to us that increasing the number of features would enable the model to be more confident (return a higher likelihood) in its predictions of closely-looking, but different, bird species. For example, if there were many features relating to the shape or structure of a bird (which can be similar among many species of birds), adding more features might make it helpful to discern between more specific characteristics (e.g., beak length, color and color variations, body part positions on the body (ex. where the eye is)). We adjusted and tried the different models based on this assumption (model with more features => better results). This is also just a good rule of thumb for most image classification tasks. Note: We were somewhat limited by Google Colab’s (the programming interface we used) GPU usage policy and couldn’t use models that had very large layers/features.

In addition to finding the best pretrained model, we updated the preexisting image transformer code to fit various pretrained model specifications and to produce better results. We first transformed the images by resizing them to 256x256 resolution. Then, since we noticed in the images that the birds were typically placed near the center, we cropped them to 224x224. We chose this number because the models were trained to expect images of this size. Training images would also be randomly flipped horizontally so that we would reduce overfitting, and on some models, we normalized the images.

Lastly, we fine-tuned our best models by adjusting hyperparameters. Specifically, we added a decay (dividing by a rate of two) for our learning rate to start decreasing it at certain epochs. This included implementing a new scheduler and would accelerate training/reduce overfitting. We saw this in action from the model described by this bird classification article here. We didn’t want to decrease the learning rate too much though, because that could lead to worse performance (see this article here). We also increased the epochs minimally and progressively as we switched to better models to give our models more time to train. Note: We were limited to about 10 epochs as the max for a model because of the GPU constraints mentioned above. Progressive iterations were necessary to find the best model because training took a long time for higher epochs (~3 hours).

Results

ResNet18

Trained on the birds dataset for 5 epochs
Learning rate was set at a constant .01
Accuracy was not that high
Served as a good baseline for the rest of our models

Losses:

Test accuracy: 67.9%

MobileNetv2

Trained for 10 epochs
Scheduled learning rate to decrease by a factor of 2 from .01 starting at the 5th epoch
Outperformed ResNet18 with accuracy of 77.6%

Losses:

Test accuracy: 77.6%

EfficientNetb3

Trained for 10 epochs
Scheduled learning rate to decrease from .01 to .001 after 5 epochs,
then to .0001 after 3 more
Outperformed ResNet18 with accuracy of 74.9%

Losses:

Test accuracy: 74.9%

densenet161

Trained for 10 epochs
Scheduled learning rate to decrease by a factor of 2 from .01 starting at the 5th epoch
Achieved our best accuracy of 85.2%

Losses:

Test accuracy: 85.2% (Top 5 on leaderboards)

Discussion

What problems did you encounter?

Google Colab had a GPU limit so we could not train for long periods of time without saving a checkpoint and starting again when the GPU availability timer reset. This limited how long we could train for and how many models we could test.

We had some trouble figuring out how we should change certain parameters to best fit the data without overfitting (ex. learning rate, epochs).

Are there next steps you would take if you kept working on the project?

Since we were limited in time, we would like to try taking some of the models that we tested further (more experimenting with parameters, more epochs). Additionally, we would also like to try pretrained models with more features to see if our assumption still holds (such as RegNet and later versions of EfficientNet). We weren’t able to do this because of Google Colab limits.

How does your approach differ from others? Was that beneficial?

Our approach was spreading our time and effort across several different models, rather than overly investing in one. We think that this method worked out well and saw its benefits from our data. For example, at around 5 epochs for some models, we noticed that the loss was already plateauing. This is seen in all the visualizations of losses, where their value becomes stagnant near the end. This makes us believe that even if we had taken 20 or even 30 epochs for a model, the accuracy and outcomes wouldn’t have changed drastically. Additionally, looking at the results of all our models, we can see that the differences in accuracy were quite significant, with a range of 17.3%. Therefore, it was much more important to prioritize finding a model that fit our problem best, rather than optimizing any one model.

Bird Classification

Problem Description

Previous Work, Approach

Results

ResNet18

Losses:

MobileNetv2

Losses:

EfficientNetb3

Losses:

densenet161

Losses:

Datasets

Discussion

Demo