[ pytorch ] 基本使用丨12. 训练经验丨

阅读: 评论:0

[ pytorch ] 基本使用丨12. 训练经验丨

[ pytorch ] 基本使用丨12. 训练经验丨

文章目录

    • 一些发现
      • 1. `model.eval()` 依然会训练模型!
      • 2. 让 学习率=0 可以让模型不训练
      • 3. 冻结模块时记得设置`BN`模块为`.eval()`模式





一些发现

1. model.eval() 依然会训练模型!

model.eval() 实际上只是用于切换 Batch NormalizationDropout 的方法模式(比如BN在测试阶段是要固定 running_mean 的),其它模块module该怎么训练还是会怎么训练!【pytorch系列】model.eval()用法详解

为了验证这个结论是不是正确的,我做了两个实验。

实验一:看模型.eval()时,它的权重在训练过程中会不会改变

import torch
dels import resnet50
model = resnet50(pretrained=True)
model = nn.DataParallel(model.cuda())for imgs, tgts in dataloader:model.eval()print('# BN.weight:n',        dule.bn1.weight[0]            )print('# BN.runing_mean:n',   dule.bn1.running_mean[0]      )print('# BN.requires_grad:n', dule.quires_grad )try:print('# BN.grad:n',      dule.ad[0]       )except:print('')print('n')  out = model(imgs) l = loss(out, tgts)l.backward()....

结果:

# BN.weight:tensor(0.2482, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean:tensor(0.0002, device='cuda:0')
# BN.requires_grad:True
# BN.grad:tensor(0.6201, device='cuda:0')# BN.weight: <-- 【变了】tensor(0.2479, device='cuda:0', grad_fn=<SelectBackward>)  
# BN.runing_mean: <-- 【没变】tensor(0.0002, device='cuda:0')   
# BN.requires_grad:  <-- 【看吧,grad是有的】True
# BN.grad:  <-- 【变了】tensor(0.1732, device='cuda:0')



实验二:看看真正的训练中,使用.eval()训练模型对最终识别准确率的影响

基础代码:正常 .train() 训练 mnist 数据。参考代码:zhengyima/mnist-classification

import os
import torch
 as nn
import torch.utils.data as Data
import torchvision
# torch.manual_seed(1)    # reproducible# Hyper Parameters
EPOCH = 1               # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 256
LR = 0.001              # learning rate
DOWNLOAD_MNIST = False# Mnist digits dataset
if not(ists('./mnist/')) or not os.listdir('./mnist/'):# not mnist dir or mnist is empyt dirDOWNLOAD_MNIST = Truetrain_data = torchvision.datasets.MNIST(root='./mnist/',train=True,                                     # this is training datatransform&#ansforms.ToTensor(),    # Converts a PIL.Image or numpy.ndarray to# torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]download=DOWNLOAD_MNIST,
)# plot one example
print(ain_data.size())                 # (60000, 28, 28)
print(ain_labels.size())               # (60000)# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)# pick 2000 samples to speed up testing
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
test_x = torch.unsqueeze(st_data, dim=1).type(torch.FloatTensor)[:2000]/255.   # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_y = st_labels[:2000]class CNN(nn.Module):def __init__(self):super(CNN, self).__init__()v1 = nn.Sequential(         # input shape (1, 28, 28)nn.Conv2d(in_channels=1,              # input heightout_channels=16,            # n_filterskernel_size=5,              # filter sizestride=1,                   # filter movement/steppadding=2,                  # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1),                              # output shape (16, 28, 28)nn.ReLU(),                      # activationnn.MaxPool2d(kernel_size=2),    # choose max value in 2x2 area, output shape (16, 14, 14))v2 = nn.Sequential(         # input shape (16, 14, 14)nn.Conv2d(16, 32, 5, 1, 2),     # output shape (32, 14, 14)nn.ReLU(),                      # activationnn.MaxPool2d(2),                # output shape (32, 7, 7))############################################ .eval() only effect BN or Dropout ############################################self.bn = nn.BatchNorm1d(32)#########################################self.out = nn.Linear(32, 10)        # fully connected layer, output 10 classesdef forward(self, x):x = v1(x)x = v2(x)f = functional.adaptive_avg_pool2d(x, (1,1))x = f.view(f.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)############################################ .eval() only effect BN or Dropout ############################################x = self.bn(x)#########################################output = self.out(x)return output, x    # return x for visualizationcnn = CNN().cuda()
print(cnn)  # net architectureoptimizer = torch.optim.Adam(cnn.parameters(), lr=LR)   # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss()                       # the target label is not one-hotted# training and testing
for epoch in range(EPOCH):ain()# cnn.eval()for step, (b_x, b_y) in enumerate(train_loader):   # gives batch data, normalize x when iterate train_loaderb_x = b_x.cuda()b_y = b_y.cuda()output = cnn(b_x)[0]               # cnn outputloss = loss_func(output, b_y)   # cross _grad()           # clear gradients for this training steploss.backward()                 # backpropagation, compute gradientsoptimizer.step()                # apply gradientsif step % 50 == 0:test_output, last_layer = cnn(test_x.cuda())pred_y = torch.max(test_output, 1)[1].data.cpu().numpy()accuracy = float((pred_y == test_y.data.cpu().numpy()).astype(int).sum()) / float(test_y.size(0))print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.cpu().numpy(), '| test accuracy: %.2f' % accuracy)# print 10 predictions from test data
test_output, _ = cnn(test_x[:10].cuda())
pred_y = torch.max(test_output, 1)[1].data.cpu().numpy()
print(pred_y, 'prediction number')
print(test_y[:10].numpy(), 'real number')

下面的实验都是在 【基础代码】 的基础上进行的改动。


第一组对比实验:对比 .train().eval() 的结果。

方法识别率
使用BN + .train()96%
使用BN + .eval()63%

现象:可以看到,在使用了 BN 的情况下,.eval() 训练的结果比 .train() 差很多。

  • 使用BN + .train()

    # training and testing
    ....
    for epoch in range(EPOCH):ain()# cnn.eval()......
    
    Epoch:  0 | train loss: 2.3722 | test accuracy: 0.24
    Epoch:  0 | train loss: 0.6760 | test accuracy: 0.89
    Epoch:  0 | train loss: 0.3923 | test accuracy: 0.94
    Epoch:  0 | train loss: 0.2812 | test accuracy: 0.95
    Epoch:  0 | train loss: 0.2167 | test accuracy: 0.96  <-- [ good ]
    [7 2 1 0 4 1 4 9 5 9] prediction number
    [7 2 1 0 4 1 4 9 5 9] real number
    
  • 使用BN + .eval()

    ......
    for epoch in range(EPOCH):# ain()cnn.eval()......
    
    Epoch:  0 | train loss: 2.3212 | test accuracy: 0.09
    Epoch:  0 | train loss: 2.1845 | test accuracy: 0.23
    Epoch:  0 | train loss: 1.8448 | test accuracy: 0.33
    Epoch:  0 | train loss: 1.4609 | test accuracy: 0.48
    Epoch:  0 | train loss: 1.3265 | test accuracy: 0.63   <-- [ worse ]
    [7 2 1 0 4 1 4 8 2 8] prediction number
    [7 2 1 0 4 1 4 9 5 9] real number
    

第二组对比实验:模型中不使用 Batch Normalization时,对比 .train().eval() 的结果

方法识别率
不使用BN + .train()57%
不使用BN + .eval()58%

现象:可以看到,在不使用了 BN 的情况下,.eval() 训练的结果与 .train() 的几乎一样。说明 .eval() 同样是训练模型了!

  • 不使用BN + .train()

        def forward(self, x):x = v1(x)x = v2(x)f = functional.adaptive_avg_pool2d(x, (1,1))x = f.view(f.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)############################################ .eval() only effect BN or Dropout ############################################# x = self.bn(x)   # after commenting .bn(), there is no bn layer in network.#########################################output = self.out(x)return output, x    # return x for visualization
    ......
    for epoch in range(EPOCH):ain()# cnn.eval()......
    
    Epoch:  0 | train loss: 2.3226 | test accuracy: 0.11
    Epoch:  0 | train loss: 2.1843 | test accuracy: 0.26
    Epoch:  0 | train loss: 1.8668 | test accuracy: 0.39
    Epoch:  0 | train loss: 1.4856 | test accuracy: 0.49
    Epoch:  0 | train loss: 1.2870 | test accuracy: 0.57
    [7 2 1 0 4 1 4 9 2 7] prediction number
    [7 2 1 0 4 1 4 9 5 9] real number
    
  • 不使用BN + .eval()

        def forward(self, x):x = v1(x)x = v2(x)f = functional.adaptive_avg_pool2d(x, (1,1))x = f.view(f.size(0), -1)           # flatten the output of conv2 to (batch_size, 32 * 7 * 7)############################################ .eval() only effect BN or Dropout ############################################# x = self.bn(x)   # after commenting .bn(), there is no bn layer in network.#########################################output = self.out(x)return output, x    # return x for visualization
    ......
    for epoch in range(EPOCH):# ain()cnn.eval()......
    
    Epoch:  0 | train loss: 2.3099 | test accuracy: 0.11
    Epoch:  0 | train loss: 2.2231 | test accuracy: 0.18
    Epoch:  0 | train loss: 1.9195 | test accuracy: 0.35
    Epoch:  0 | train loss: 1.6008 | test accuracy: 0.48
    Epoch:  0 | train loss: 1.4178 | test accuracy: 0.58
    [7 2 1 0 4 1 7 4 2 9] prediction number
    [7 2 1 0 4 1 4 9 5 9] real number
    


2. 让 学习率=0 可以让模型不训练

让 学习率=0 可以让模型不训练,不过梯度还是照样计算。

import torch
dels import resnet50
model = resnet50(pretrained=True)
model = nn.DataParallel(model.cuda())### Set LR to 0
optimizer = optim.Adam(model.parameters(), lr=0)for imgs, tgts in dataloader:model.eval()print('# BN.weight:n',        dule.bn1.weight[0]            )print('# BN.runing_mean:n',   dule.bn1.running_mean[0]      )print('# BN.requires_grad:n', dule.quires_grad )try:print('# BN.grad:n',      dule.ad[0]       )except:print('')print('n')  out = model(imgs) l = loss(out, tgts)l.backward()....
# BN.weight:tensor(0.2486, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean:tensor(0.0002, device='cuda:0')
# BN.requires_grad:True
# BN.grad:tensor(0.5255, device='cuda:0')# BN.weight:  <-- 【没变】tensor(0.2486, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean: <-- 【没变】tensor(0.0002, device='cuda:0')
# BN.requires_grad:True
# BN.grad:  <-- 【变了】tensor(0.3794, device='cuda:0')

3. 冻结模块时记得设置BN模块为.eval()模式

import torch
dels import resnet50def set_bn_to_train(m):classname = m.__class__.__name__if classname.find('BatchNorm2d') != -ain()def set_bn_to_eval(m):classname = m.__class__.__name__if classname.find('BatchNorm2d') != -1:m.eval()model = resnet50(pretrained=True)
model = nn.DataParallel(model.cuda())m = dule._modules['layer4']
for name, param in m.named_parameters():quires_grad = False
m.apply(set_bn_to_eval)  # 记得要把 BN 设置成eval模式

本文发布于:2024-02-01 05:23:11,感谢您对本站的认可!

本文链接:https://www.4u4v.net/it/170673619534187.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:经验   pytorch
留言与评论(共有 0 条评论)
   
验证码:

Copyright ©2019-2022 Comsenz Inc.Powered by ©

网站地图1 网站地图2 网站地图3 网站地图4 网站地图5 网站地图6 网站地图7 网站地图8 网站地图9 网站地图10 网站地图11 网站地图12 网站地图13 网站地图14 网站地图15 网站地图16 网站地图17 网站地图18 网站地图19 网站地图20 网站地图21 网站地图22/a> 网站地图23