[ pytorch ] 基本使用丨12. 训练经验丨

阅读：评论：0

文章目录

- 一些发现
- - 1. `model.eval()` 依然会训练模型！
  - 2. 让学习率=0 可以让模型不训练
  - 3. 冻结模块时记得设置`BN`模块为`.eval()`模式

一些发现

1. `model.eval()` 依然会训练模型！

model.eval() 实际上只是用于切换 Batch Normalization 和 Dropout 的方法模式（比如BN在测试阶段是要固定 running_mean 的），其它模块module该怎么训练还是会怎么训练！【pytorch系列】model.eval()用法详解

为了验证这个结论是不是正确的，我做了两个实验。

实验一：看模型.eval()时，它的权重在训练过程中会不会改变

import torch
dels import resnet50
model = resnet50(pretrained=True)
model = nn.DataParallel(model.cuda())for imgs, tgts in dataloader:model.eval()print('# BN.weight:n',        dule.bn1.weight[0]            )print('# BN.runing_mean:n',   dule.bn1.running_mean[0]      )print('# BN.requires_grad:n', dule.quires_grad )try:print('# BN.grad:n',      dule.ad[0]       )except:print('')print('n')  out = model(imgs) l = loss(out, tgts)l.backward()....

结果：

# BN.weight:tensor(0.2482, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean:tensor(0.0002, device='cuda:0')
# BN.requires_grad:True
# BN.grad:tensor(0.6201, device='cuda:0')# BN.weight: <-- 【变了】tensor(0.2479, device='cuda:0', grad_fn=<SelectBackward>)  
# BN.runing_mean: <-- 【没变】tensor(0.0002, device='cuda:0')   
# BN.requires_grad:  <-- 【看吧，grad是有的】True
# BN.grad:  <-- 【变了】tensor(0.1732, device='cuda:0')

实验二：看看真正的训练中，使用.eval()训练模型对最终识别准确率的影响

基础代码：正常 .train() 训练 mnist 数据。参考代码：zhengyima/mnist-classification

import os
import torch
 as nn
import torch.utils.data as Data
import torchvision
# torch.manual_seed(1)    # reproducible# Hyper Parameters
EPOCH = 1               # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 256
LR = 0.001              # learning rate
DOWNLOAD_MNIST = False# Mnist digits dataset
if not(ists('./mnist/')) or not os.listdir('./mnist/'):# not mnist dir or mnist is empyt dirDOWNLOAD_MNIST = Truetrain_data = torchvision.datasets.MNIST(root='./mnist/',train=True,                                     # this is training datatransform&#ansforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to# torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]download=DOWNLOAD_MNIST,
)# plot one example
print(ain_data.size())                 # (60000, 28, 28)
print(ain_labels.size())               # (60000)# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)# pick 2000 samples to speed up testing
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
test_x = torch.unsqueeze(st_data, dim=1).type(torch.FloatTensor)[:2000]/255.   # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_y = st_labels[:2000]class CNN(nn.Module):def __init__(self):super(CNN, self).__init__()v1 = nn.Sequential(         # input shape (1, 28, 28)nn.Conv2d(in_channels=1,              # input heightout_channels=16,            # n_filterskernel_size=5,              # filter sizestride=1,                   # filter movement/steppadding=2,                  # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1),                              # output shape (16, 28, 28)nn.ReLU(),                      # activationnn.MaxPool2d(kernel_size=2),    # choose max value in 2x2 area, output shape (16, 14, 14))v2 = nn.Sequential(         # input shape (16, 14, 14)nn.Conv2d(16, 32, 5, 1, 2),     # output shape (32, 14, 14)nn.ReLU(),                      # activationnn.MaxPool2d(2),                # output shape (32, 7, 7))############################################ .eval() only effect BN or Dropout ############################################self.bn = nn.BatchNorm1d(32)#########################################self.out = nn.Linear(32, 10)        # fully connected layer, output 10 classesdef forward(self, x):x = v1(x)x = v2(x)f = functional.adaptive_avg_pool2d(x, (1,1))x = f.view(f.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)############################################ .eval() only effect BN or Dropout ############################################x = self.bn(x)#########################################output = self.out(x)return output, x    # return x for visualizationcnn = CNN().cuda()
print(cnn)  # net architectureoptimizer = torch.optim.Adam(cnn.parameters(), lr=LR)   # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss()                       # the target label is not one-hotted# training and testing
for epoch in range(EPOCH):ain()# cnn.eval()for step, (b_x, b_y) in enumerate(train_loader):   # gives batch data, normalize x when iterate train_loaderb_x = b_x.cuda()b_y = b_y.cuda()output = cnn(b_x)[0]               # cnn outputloss = loss_func(output, b_y)   # cross _grad()           # clear gradients for this training steploss.backward()                 # backpropagation, compute gradientsoptimizer.step()                # apply gradientsif step % 50 == 0:test_output, last_layer = cnn(test_x.cuda())pred_y = torch.max(test_output, 1)[1].data.cpu().numpy()accuracy = float((pred_y == test_y.data.cpu().numpy()).astype(int).sum()) / float(test_y.size(0))print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.cpu().numpy(), '| test accuracy: %.2f' % accuracy)# print 10 predictions from test data
test_output, _ = cnn(test_x[:10].cuda())
pred_y = torch.max(test_output, 1)[1].data.cpu().numpy()
print(pred_y, 'prediction number')
print(test_y[:10].numpy(), 'real number')

下面的实验都是在【基础代码】的基础上进行的改动。

第一组对比实验：对比 .train() 与 .eval() 的结果。

方法	识别率
使用BN + `.train()`	96%
使用BN + `.eval()`	63%

现象：可以看到，在使用了 BN 的情况下，.eval() 训练的结果比 .train() 差很多。

使用BN + .train()

# training and testing
....
for epoch in range(EPOCH):ain()# cnn.eval()......

Epoch:  0 | train loss: 2.3722 | test accuracy: 0.24
Epoch:  0 | train loss: 0.6760 | test accuracy: 0.89
Epoch:  0 | train loss: 0.3923 | test accuracy: 0.94
Epoch:  0 | train loss: 0.2812 | test accuracy: 0.95
Epoch:  0 | train loss: 0.2167 | test accuracy: 0.96  <-- [ good ]
[7 2 1 0 4 1 4 9 5 9] prediction number
[7 2 1 0 4 1 4 9 5 9] real number

使用BN + .eval()

......
for epoch in range(EPOCH):# ain()cnn.eval()......

Epoch:  0 | train loss: 2.3212 | test accuracy: 0.09
Epoch:  0 | train loss: 2.1845 | test accuracy: 0.23
Epoch:  0 | train loss: 1.8448 | test accuracy: 0.33
Epoch:  0 | train loss: 1.4609 | test accuracy: 0.48
Epoch:  0 | train loss: 1.3265 | test accuracy: 0.63   <-- [ worse ]
[7 2 1 0 4 1 4 8 2 8] prediction number
[7 2 1 0 4 1 4 9 5 9] real number

第二组对比实验：模型中不使用 Batch Normalization时，对比 .train() 与 .eval() 的结果

方法	识别率
不使用BN + `.train()`	57%
不使用BN + `.eval()`	58%

现象：可以看到，在不使用了 BN 的情况下，.eval() 训练的结果与 .train() 的几乎一样。说明 .eval() 同样是训练模型了！

不使用BN + .train()

    def forward(self, x):x = v1(x)x = v2(x)f = functional.adaptive_avg_pool2d(x, (1,1))x = f.view(f.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)############################################ .eval() only effect BN or Dropout ############################################# x = self.bn(x)   # after commenting .bn(), there is no bn layer in network.#########################################output = self.out(x)return output, x    # return x for visualization
......
for epoch in range(EPOCH):ain()# cnn.eval()......

Epoch:  0 | train loss: 2.3226 | test accuracy: 0.11
Epoch:  0 | train loss: 2.1843 | test accuracy: 0.26
Epoch:  0 | train loss: 1.8668 | test accuracy: 0.39
Epoch:  0 | train loss: 1.4856 | test accuracy: 0.49
Epoch:  0 | train loss: 1.2870 | test accuracy: 0.57
[7 2 1 0 4 1 4 9 2 7] prediction number
[7 2 1 0 4 1 4 9 5 9] real number

不使用BN + .eval()

    def forward(self, x):x = v1(x)x = v2(x)f = functional.adaptive_avg_pool2d(x, (1,1))x = f.view(f.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)############################################ .eval() only effect BN or Dropout ############################################# x = self.bn(x)   # after commenting .bn(), there is no bn layer in network.#########################################output = self.out(x)return output, x    # return x for visualization
......
for epoch in range(EPOCH):# ain()cnn.eval()......

Epoch:  0 | train loss: 2.3099 | test accuracy: 0.11
Epoch:  0 | train loss: 2.2231 | test accuracy: 0.18
Epoch:  0 | train loss: 1.9195 | test accuracy: 0.35
Epoch:  0 | train loss: 1.6008 | test accuracy: 0.48
Epoch:  0 | train loss: 1.4178 | test accuracy: 0.58
[7 2 1 0 4 1 7 4 2 9] prediction number
[7 2 1 0 4 1 4 9 5 9] real number

2. 让学习率=0 可以让模型不训练

让学习率=0 可以让模型不训练，不过梯度还是照样计算。

import torch
dels import resnet50
model = resnet50(pretrained=True)
model = nn.DataParallel(model.cuda())### Set LR to 0
optimizer = optim.Adam(model.parameters(), lr=0)for imgs, tgts in dataloader:model.eval()print('# BN.weight:n',        dule.bn1.weight[0]            )print('# BN.runing_mean:n',   dule.bn1.running_mean[0]      )print('# BN.requires_grad:n', dule.quires_grad )try:print('# BN.grad:n',      dule.ad[0]       )except:print('')print('n')  out = model(imgs) l = loss(out, tgts)l.backward()....

# BN.weight:tensor(0.2486, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean:tensor(0.0002, device='cuda:0')
# BN.requires_grad:True
# BN.grad:tensor(0.5255, device='cuda:0')# BN.weight:  <-- 【没变】tensor(0.2486, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean: <-- 【没变】tensor(0.0002, device='cuda:0')
# BN.requires_grad:True
# BN.grad:  <-- 【变了】tensor(0.3794, device='cuda:0')

3. 冻结模块时记得设置`BN`模块为`.eval()`模式

import torch
dels import resnet50def set_bn_to_train(m):classname = m.__class__.__name__if classname.find('BatchNorm2d') != -ain()def set_bn_to_eval(m):classname = m.__class__.__name__if classname.find('BatchNorm2d') != -1:m.eval()model = resnet50(pretrained=True)
model = nn.DataParallel(model.cuda())m = dule._modules['layer4']
for name, param in m.named_parameters():quires_grad = False
m.apply(set_bn_to_eval)  # 记得要把 BN 设置成eval模式

本文发布于:2024-02-01 05:23:11，感谢您对本站的认可！

本文链接：https://www.4u4v.net/it/170673619534187.html

上一篇：MongoDB 非正常关机（拉电闸）后无法启动的解决方案

下一篇：在浏览器中输入别后发生了什么呢？