Pytorch迁移学习实战:基于vgg11做MNIST数据集的分类

Jennifer ·
更新时间:2024-09-20
· 866 次阅读

在对一个新的数据集进行分类时,经常使用一些预训练好的模型,如AlexNet,VGG,Resnet等,并且将网络的最后几层fc进行修改,以适应新数据集的类别个数。

这样做的可行性在于,深度神经网络通过预训练得到数据分层级的特征,比较shallow的层学习到的是一些低层级的语义特征,如边缘信息、颜色、纹理等等,这些特征在不同的分类任务中大多是不变的;最后几层学习到的是一些高层的特征,这样是不同数据集的区别所在,因此通常对VGG,Resnet的最后几层进行修改,利用预训练好的模型进行微调,以在新的数据集上进行分类。

本文利用pytorch中自带的VGG11模型进行迁移学习,使得模型在MNIST数据集上进行分类。首先可以打印看一下vgg11模型的具体结构,模型概述如下:

VGG(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(11): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(12): ReLU(inplace=True)
(13): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(14): ReLU(inplace=True)
(15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(16): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(19): ReLU(inplace=True)
(20): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
(classifier): Sequential(
(0): Linear(in_features=25088, out_features=4096, bias=True)
(1): ReLU(inplace=True)
(2): Dropout(p=0.5, inplace=False)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Dropout(p=0.5, inplace=False)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)

可见vgg11最后的输出是1000维向量,如果要满足MNIST数据集,需要将其改为10维的输出。目前我了解到的模型改法有两种:其一是对VGG某一层进行修改,如改变Linear的维度,或Conv2d的通道数;第二种是在VGG的基础上进行layer的添加,如在全连接层前面再加一个Conv+Relu+MaxPool的block。

一、在layer的基础上进行修改

这种做法相对简单。如果仅仅是对已有的layer进行改造,比如将最后的全连接层的神经元个数修改一下,并且最终得到的输出为10维,训练代码如下(下面的代码中将calssifier的(0),(3),(6)三个Linear层的神经元个数进行修改,并修改了输入的卷积层维度):

import torch from torchvision import models from torchvision import datasets from torchvision import transforms import torch.nn as nn from torch.optim import Adam epoch = 1 lr = 0.0001 batch_size = 64 def build_model(): vgg11 = models.vgg11(pretrained=True) #MNIST为单通道,而原模型输入为3通道,这里仅对卷积层输入维度做出改变 vgg11.features[0] = nn.Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) in_channel = 25088 out_channel = [4096, 1024, 10] #修改最后三个fc层 for i in range(7): if i % 3 == 0: vgg11.classifier[i] = nn.Linear(in_channel, out_channel[int(i/3)]) in_channel = out_channel[int(i/3)] return vgg11 model = build_model() transform = transforms.Compose([ transforms.Resize(224), transforms.ToTensor(), transforms.Normalize(mean=[0.5], std=[0.5]) ]) train_data = datasets.MNIST(root='./data/', train=True, transform=transform, download=True) test_data = datasets.MNIST(root='./data/', train=False, transform=transform, download=True) train_loader = torch.utils.data.DataLoader( dataset=train_data, batch_size=batch_size, shuffle=True ) test_loader = torch.utils.data.DataLoader( dataset=test_data, batch_size=1, shuffle=True ) optim = Adam(model.parameters(), lr=lr, betas=[0.5, 0.99]) criterion = nn.CrossEntropyLoss() for epoch in range(epoch): for i, (img, y) in enumerate(train_loader): print(i) pred_y = model(img) loss = criterion(pred_y, y) optim.zero_grad() loss.backward() optim.step() if i % 1 == 0: print('epoch:[{}/1] loss:{}'.format(epoch, loss)) torch.save(model.state_dict(), './vgg_mnist.pth')

训练过程只迭代了一个epoch,得到的模型在测试集上跑得到了97.55%的分类准确率。

以上代码的训练过程中,无论是预训练好的layer还是新改变的全连接层和输入卷积层,都会受梯度下降的影响改变参数。如果仅仅想对新修改的layer进行训练,而冻结前面预训练好的layer,可以在上述代码中定义vgg11的后面加入下述代码:

for param in vgg11.parameters(): param.require_grad = False

这样没有改变过的layer的参数就不会更新,保持pre-trained好的参数;训练过程中仅更新修改过的layer的参数,即最后的三层fc和输入卷积层。

二、改变模型的结构

这种做法相对复杂。首先要定义一个和VGG源码中一样的class,然后在这个模型类中对模型的结构(forward方法)进行修改。

并且这是不能直接加载预训练参数:在第一种方法里,首先创建了一个与源码vgg11一样的模型,并且在调用vgg11时对原始模型已经加载了预训练参数,layer的修改在参数的加载之后,因此不存在维度不一致(layer改动)或layer不存在(layer删除)的现象。而在这一种方法中,由于重新定义了一个模型class,所以在加载预训练参数前,模型结构可能已经发生了改变,所以需要筛选出需要加载的参数。

如何判断pre-trained中哪些参数有用?首先从模型文件中,得到原始vgg11的模型数据字典,记为pretrained_dict;在实例化新定义的模型model之后,得到model这个对象的state_dict,记作model_dict。随后就是判断一下pretrained_dict中的key是否在model_dict中出现,如果没出现,则表明这个key对应的layer在新模型中消失了,因此不加载这层数据;若存在这个key,但是对应的value的维度与新模型中该层对应的维度不一致,说明这一个layer在新模型中修改过,因此这层pre-trained的数据也不能加载。

训练代码如下(写的比较混乱,大佬轻喷~):

import sys sys.path.append('/anaconda3/lib/python3.7/site-packages/torch/') import torch from torchvision import models from torchvision import datasets from torchvision import transforms import torch.nn as nn from torch.optim import Adam from hub import load_state_dict_from_url epoch = 1 lr = 0.0001 batch_size = 64 model_urls = { 'vgg11': 'https://download.pytorch.org/models/vgg11-bbd30ac9.pth', } class VGG(nn.Module): def __init__(self, features, extra, num_classes=10, init_weights=True): super(VGG, self).__init__() self.features = features self.extra = extra self.avgpool = nn.AdaptiveAvgPool2d((7, 7)) self.classifier = nn.Sequential( nn.Linear(512 * 7 * 7, 4096), nn.ReLU(True), nn.Dropout(), nn.Linear(4096, 4096), nn.ReLU(True), nn.Dropout(), nn.Linear(4096, num_classes), ) if init_weights: self._initialize_weights() def forward(self, x): x = self.features(x) #将额外增加的layer接入计算图做传播 x = self.extra(x) x = self.avgpool(x) x = torch.flatten(x, 1) x = self.classifier(x) return x def _initialize_weights(self): for m in self.modules(): if isinstance(m, nn.Conv2d): nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu') if m.bias is not None: nn.init.constant_(m.bias, 0) elif isinstance(m, nn.BatchNorm2d): nn.init.constant_(m.weight, 1) nn.init.constant_(m.bias, 0) elif isinstance(m, nn.Linear): nn.init.normal_(m.weight, 0, 0.01) nn.init.constant_(m.bias, 0) def make_layers(cfg, batch_norm=False): layers = [] in_channels = 1 for v in cfg: if v == 'M': layers += [nn.MaxPool2d(kernel_size=2, stride=2)] else: conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1) if batch_norm: layers += [conv2d, nn.BatchNorm2d(v), nn.ReLU(inplace=True)] else: layers += [conv2d, nn.ReLU(inplace=True)] in_channels = v return nn.Sequential(*layers) #在原始VGG11全连接层前加入两层Conv两层Relu和一层Maxpool def extra_layers(cfg): layer = [] in_channel = 512 extra1 = nn.Conv2d(in_channel, cfg[0], kernel_size=3, padding=1) extra2 = nn.Conv2d(cfg[0], cfg[1], kernel_size=3, padding=1) extra3 = nn.MaxPool2d(kernel_size=2, stride=2) layer += [extra1, nn.ReLU(inplace=True)] layer += [extra2, nn.ReLU(inplace=True)] layer += [extra3] return nn.Sequential(*layer) cfgs = { 'A': [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M'], } cfgs_extra = { 'A': [512, 512], } def _vgg(arch, cfg, pretrained, progress, **kwargs): if pretrained: kwargs['init_weights'] = False model = VGG(make_layers(cfgs[cfg]), extra_layers(cfgs_extra[cfg]), **kwargs) return model #将不变的layer参数加载进去,筛除掉已经改动过的layer预训练参数 def transfer_model(model, arch, progress=True): pretrained_dict = load_state_dict_from_url(model_urls[arch], progress=progress) model_dict = model.state_dict() pretrained_dict = transfer_state_dict(pretrained_dict, model_dict) model_dict.update(pretrained_dict) model.load_state_dict(model_dict) return model #筛除掉pre-train参数中已经改变的layer,只保留现在model中存有layer及其对应pre-trained参数 def transfer_state_dict(pretrained_dict, model_dict): state_dict = {} for k, v in pretrained_dict.items(): if k in model_dict.keys() and v.shape == model_dict[k].shape: # print(k, v.shape) state_dict[k] = v else: print("Missing key(s) in state_dict :{}".format(k)) return state_dict def vgg11(pretrained=False, progress=True, **kwargs): r"""VGG 11-layer model (configuration "A") from `"Very Deep Convolutional Networks For Large-Scale Image Recognition" `_ Args: pretrained (bool): If True, returns a model pre-trained on ImageNet progress (bool): If True, displays a progress bar of the download to stderr """ return _vgg('vgg11', 'A', pretrained, progress, **kwargs) def build_model(): vgg = vgg11(pretrained=True) vgg = transfer_model(vgg, 'vgg11') return vgg model = build_model() transform = transforms.Compose([ transforms.Resize(224), transforms.ToTensor(), transforms.Normalize(mean=[0.5], std=[0.5]) ]) train_data = datasets.MNIST(root='./data/', train=True, transform=transform, download=True) test_data = datasets.MNIST(root='./data/', train=False, transform=transform, download=True) train_loader = torch.utils.data.DataLoader( dataset=train_data, batch_size=batch_size, shuffle=True ) test_loader = torch.utils.data.DataLoader( dataset=test_data, batch_size=1, shuffle=True ) optim = Adam(model.parameters(), lr=lr, betas=[0.5, 0.99]) criterion = nn.CrossEntropyLoss() for epoch in range(epoch): for i, (img, y) in enumerate(train_loader): print(i) pred_y = model(img) loss = criterion(pred_y, y) optim.zero_grad() loss.backward() optim.step() if i % 1 == 0: print('epoch:[{}/1] loss:{}'.format(epoch, loss)) torch.save(model.state_dict(), './vgg_mnist.pth')

测试集上的结果凉凉,这或许验证了为什么迁移学习一般改动最后的fc层的原因,但这可以作为改变预训练好模型结构的一种方法。


作者:Moon_0911



pytorch 实战 学习 分类 vgg mnist

需要 登录 后方可回复, 如果你还没有账号请 注册新账号