Browser does not support (or has disabled) JavaScript, some features of this page may not work properly

Dog Breed Classification: Transfer Learning with ResNet152 + Inception v3 + Xception Three-Model Ensemble

Competition Ranking

In the Oxford University machine learning course, we participated in the Kaggle dog breed classification competition—given a photo of a dog, the model must correctly identify its breed from 120 categories. We adopted a three-model ensemble (ResNet152 + Inception v3 + Xception) transfer learning approach, ultimately achieving 91% classification accuracy on the test set and a Kaggle score of 0.30508.


The data distribution is as follows:

Dog breed data distribution
Data Distribution: sample counts across 120 dog breeds are uniformly distributed, no class imbalance handling required

1. The Iterative Path from Single Model to Three-Model Ensemble

What is most worth summarizing about this project is not the final solution itself, but the process of progressively approaching the optimal solution through four iterative rounds:

Round 1: Starting with a Single Model (Failure)

Directly used pre-trained ResNet50 / EfficientNet-B0, replacing the final fully connected layer and starting training. The result showed validation set accuracy far lower than training set accuracy—classic overfitting. A single model's 2048-dimensional feature vector was insufficient to cover the fine-grained differences across 120 breeds.

Round 2: Freezing Parameters + Data Augmentation (Partial Improvement)

Froze ResNet50 convolutional layer weights, trained only the final FC layer, combined with RandomCrop, ColorJitter, and HorizontalFlip data augmentation. Overfitting was somewhat alleviated, but classification on complex images remained unstable.

Round 3: Dual-Model Feature Fusion (Continuous Progress)

Introduced Inception v3 and ResNet152 to extract features in parallel, concatenating the two models' 2048-dimensional feature vectors into 4096 dimensions. Multi-scale features (Inception) complemented deep features (ResNet), markedly improving accuracy, though training time doubled and some redundant features led to new overfitting.

Inception v3 Network Architecture
Inception v3 Architecture: Entry Flow → Middle Flow (×8 repetitions) → Exit Flow, final output 2048-dimensional feature vector

Round 4: Three-Model Ensemble + Dynamic Learning Rate (Best)

Final solution: ResNet152 + Inception v3 + Xception three models in parallel, each with frozen pre-trained weights, concatenating 6144-dimensional feature vectors fed into the final fully connected layer:

class CombinedResNetInception(nn.Module):
    def __init__(self, n_class):
        # Load three pre-trained models
        self.resnet = models.resnet152(pretrained=True)
        self.inception = models.inception_v3(pretrained=True)
        self.xception = timm.create_model('xception', pretrained=True)

        # Freeze all convolutional layer weights
        for param in self.resnet.parameters():   param.requires_grad = False
        for param in self.inception.parameters(): param.requires_grad = False
        for param in self.xception.parameters():  param.requires_grad = False

        # Remove classification heads, retain feature extractors
        self.inception.fc = nn.Identity()
        self.xception.fc = nn.Identity()
        self.resnet.fc = nn.Identity()

        # Concatenate 2048×3 = 6144 dims → 120 classes
        self.fc = nn.Linear(2048 + 2048 + 2048, n_class)

    def forward(self, x):
        x_resnet = self.resnet(x)
        x_inception = self.inception(x)
        x_xception = self.xception(x)
        x = torch.cat((x_resnet, x_inception, x_xception), dim=1)
        return self.fc(x)

Combined with Adam optimizer (lr=0.0001) + ReduceLROnPlateau (patience=2, factor=0.5) for dynamic learning rate adjustment, training automatically stops early when validation set accuracy stops improving.


2. Training Strategy

  • Transfer Learning: All three models pre-trained on ImageNet, feature extraction layers frozen, only the final FC classification head trained
  • Data Augmentation: RandomResizedCrop(299) + RandomHorizontalFlip + Normalize — Inception/Xception fixed input 299×299
  • Loss Function: CrossEntropyLoss
  • Learning Rate Scheduling: ReduceLROnPlateau halves learning rate when validation accuracy fails to improve for 2 consecutive epochs
  • Early Stopping: Training stops when validation accuracy fails to improve for multiple consecutive epochs

3. Results and Takeaways

The final model achieved the following on the 120-class classification task:

MetricValue
Test Set Accuracy91%
Average Loss0.009
Kaggle Score0.30508

From nearly zero foundation in neural networks to ultimately building a three-model ensemble pipeline and achieving respectable results on Kaggle—the core takeaway of this project is the methodology of continuous iteration in the face of failure: each failure tells us what the model is still lacking, and each improvement is one step closer to the optimal solution. This debugging mindset applies not only to deep learning but is also a universal logic for engineering problem-solving.

Tags: Portfolio
Author: 月儿
Date:2024年08月31日

Comments