model.eval()
依然会训练模型!model.eval()
实际上只是用于切换 Batch Normalization
和 Dropout
的方法模式(比如BN在测试阶段是要固定 running_mean
的),其它模块module该怎么训练还是会怎么训练!【pytorch系列】model.eval()用法详解
为了验证这个结论是不是正确的,我做了两个实验。
实验一:看模型.eval()
时,它的权重在训练过程中会不会改变
import torch
dels import resnet50
model = resnet50(pretrained=True)
model = nn.DataParallel(model.cuda())for imgs, tgts in dataloader:model.eval()print('# BN.weight:n', dule.bn1.weight[0] )print('# BN.runing_mean:n', dule.bn1.running_mean[0] )print('# BN.requires_grad:n', dule.quires_grad )try:print('# BN.grad:n', dule.ad[0] )except:print('')print('n') out = model(imgs) l = loss(out, tgts)l.backward()....
结果:
# BN.weight:tensor(0.2482, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean:tensor(0.0002, device='cuda:0')
# BN.requires_grad:True
# BN.grad:tensor(0.6201, device='cuda:0')# BN.weight: <-- 【变了】tensor(0.2479, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean: <-- 【没变】tensor(0.0002, device='cuda:0')
# BN.requires_grad: <-- 【看吧,grad是有的】True
# BN.grad: <-- 【变了】tensor(0.1732, device='cuda:0')
实验二:看看真正的训练中,使用.eval()
训练模型对最终识别准确率的影响
基础代码:正常 .train()
训练 mnist 数据。参考代码:zhengyima/mnist-classification
import os
import torch
as nn
import torch.utils.data as Data
import torchvision
# torch.manual_seed(1) # reproducible# Hyper Parameters
EPOCH = 1 # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 256
LR = 0.001 # learning rate
DOWNLOAD_MNIST = False# Mnist digits dataset
if not(ists('./mnist/')) or not os.listdir('./mnist/'):# not mnist dir or mnist is empyt dirDOWNLOAD_MNIST = Truetrain_data = torchvision.datasets.MNIST(root='./mnist/',train=True, # this is training datatransformansforms.ToTensor(), # Converts a PIL.Image or numpy.ndarray to# torch.FloatTensor of shape (C x H x W) and normalize in the range [0.0, 1.0]download=DOWNLOAD_MNIST,
)# plot one example
print(ain_data.size()) # (60000, 28, 28)
print(ain_labels.size()) # (60000)# Data Loader for easy mini-batch return in training, the image batch shape will be (50, 1, 28, 28)
train_loader = Data.DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)# pick 2000 samples to speed up testing
test_data = torchvision.datasets.MNIST(root='./mnist/', train=False)
test_x = torch.unsqueeze(st_data, dim=1).type(torch.FloatTensor)[:2000]/255. # shape from (2000, 28, 28) to (2000, 1, 28, 28), value in range(0,1)
test_y = st_labels[:2000]class CNN(nn.Module):def __init__(self):super(CNN, self).__init__()v1 = nn.Sequential( # input shape (1, 28, 28)nn.Conv2d(in_channels=1, # input heightout_channels=16, # n_filterskernel_size=5, # filter sizestride=1, # filter movement/steppadding=2, # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1), # output shape (16, 28, 28)nn.ReLU(), # activationnn.MaxPool2d(kernel_size=2), # choose max value in 2x2 area, output shape (16, 14, 14))v2 = nn.Sequential( # input shape (16, 14, 14)nn.Conv2d(16, 32, 5, 1, 2), # output shape (32, 14, 14)nn.ReLU(), # activationnn.MaxPool2d(2), # output shape (32, 7, 7))############################################ .eval() only effect BN or Dropout ############################################self.bn = nn.BatchNorm1d(32)#########################################self.out = nn.Linear(32, 10) # fully connected layer, output 10 classesdef forward(self, x):x = v1(x)x = v2(x)f = functional.adaptive_avg_pool2d(x, (1,1))x = f.view(f.size(0), -1) # flatten the output of conv2 to (batch_size, 32 * 7 * 7)############################################ .eval() only effect BN or Dropout ############################################x = self.bn(x)#########################################output = self.out(x)return output, x # return x for visualizationcnn = CNN().cuda()
print(cnn) # net architectureoptimizer = torch.optim.Adam(cnn.parameters(), lr=LR) # optimize all cnn parameters
loss_func = nn.CrossEntropyLoss() # the target label is not one-hotted# training and testing
for epoch in range(EPOCH):ain()# cnn.eval()for step, (b_x, b_y) in enumerate(train_loader): # gives batch data, normalize x when iterate train_loaderb_x = b_x.cuda()b_y = b_y.cuda()output = cnn(b_x)[0] # cnn outputloss = loss_func(output, b_y) # cross _grad() # clear gradients for this training steploss.backward() # backpropagation, compute gradientsoptimizer.step() # apply gradientsif step % 50 == 0:test_output, last_layer = cnn(test_x.cuda())pred_y = torch.max(test_output, 1)[1].data.cpu().numpy()accuracy = float((pred_y == test_y.data.cpu().numpy()).astype(int).sum()) / float(test_y.size(0))print('Epoch: ', epoch, '| train loss: %.4f' % loss.data.cpu().numpy(), '| test accuracy: %.2f' % accuracy)# print 10 predictions from test data
test_output, _ = cnn(test_x[:10].cuda())
pred_y = torch.max(test_output, 1)[1].data.cpu().numpy()
print(pred_y, 'prediction number')
print(test_y[:10].numpy(), 'real number')
下面的实验都是在 【基础代码】 的基础上进行的改动。
第一组对比实验:对比 .train()
与 .eval()
的结果。
方法 | 识别率 |
---|---|
使用BN + .train() | 96% |
使用BN + .eval() | 63% |
现象:可以看到,在使用了 BN 的情况下,
.eval()
训练的结果比.train()
差很多。
使用BN + .train()
# training and testing
....
for epoch in range(EPOCH):ain()# cnn.eval()......
Epoch: 0 | train loss: 2.3722 | test accuracy: 0.24
Epoch: 0 | train loss: 0.6760 | test accuracy: 0.89
Epoch: 0 | train loss: 0.3923 | test accuracy: 0.94
Epoch: 0 | train loss: 0.2812 | test accuracy: 0.95
Epoch: 0 | train loss: 0.2167 | test accuracy: 0.96 <-- [ good ]
[7 2 1 0 4 1 4 9 5 9] prediction number
[7 2 1 0 4 1 4 9 5 9] real number
使用BN + .eval()
......
for epoch in range(EPOCH):# ain()cnn.eval()......
Epoch: 0 | train loss: 2.3212 | test accuracy: 0.09
Epoch: 0 | train loss: 2.1845 | test accuracy: 0.23
Epoch: 0 | train loss: 1.8448 | test accuracy: 0.33
Epoch: 0 | train loss: 1.4609 | test accuracy: 0.48
Epoch: 0 | train loss: 1.3265 | test accuracy: 0.63 <-- [ worse ]
[7 2 1 0 4 1 4 8 2 8] prediction number
[7 2 1 0 4 1 4 9 5 9] real number
第二组对比实验:模型中不使用 Batch Normalization时,对比 .train()
与 .eval()
的结果
方法 | 识别率 |
---|---|
不使用BN + .train() | 57% |
不使用BN + .eval() | 58% |
现象:可以看到,在不使用了 BN 的情况下,
.eval()
训练的结果与.train()
的几乎一样。说明.eval()
同样是训练模型了!
不使用BN + .train()
def forward(self, x):x = v1(x)x = v2(x)f = functional.adaptive_avg_pool2d(x, (1,1))x = f.view(f.size(0), -1) # flatten the output of conv2 to (batch_size, 32 * 7 * 7)############################################ .eval() only effect BN or Dropout ############################################# x = self.bn(x) # after commenting .bn(), there is no bn layer in network.#########################################output = self.out(x)return output, x # return x for visualization
......
for epoch in range(EPOCH):ain()# cnn.eval()......
Epoch: 0 | train loss: 2.3226 | test accuracy: 0.11
Epoch: 0 | train loss: 2.1843 | test accuracy: 0.26
Epoch: 0 | train loss: 1.8668 | test accuracy: 0.39
Epoch: 0 | train loss: 1.4856 | test accuracy: 0.49
Epoch: 0 | train loss: 1.2870 | test accuracy: 0.57
[7 2 1 0 4 1 4 9 2 7] prediction number
[7 2 1 0 4 1 4 9 5 9] real number
不使用BN + .eval()
def forward(self, x):x = v1(x)x = v2(x)f = functional.adaptive_avg_pool2d(x, (1,1))x = f.view(f.size(0), -1) # flatten the output of conv2 to (batch_size, 32 * 7 * 7)############################################ .eval() only effect BN or Dropout ############################################# x = self.bn(x) # after commenting .bn(), there is no bn layer in network.#########################################output = self.out(x)return output, x # return x for visualization
......
for epoch in range(EPOCH):# ain()cnn.eval()......
Epoch: 0 | train loss: 2.3099 | test accuracy: 0.11
Epoch: 0 | train loss: 2.2231 | test accuracy: 0.18
Epoch: 0 | train loss: 1.9195 | test accuracy: 0.35
Epoch: 0 | train loss: 1.6008 | test accuracy: 0.48
Epoch: 0 | train loss: 1.4178 | test accuracy: 0.58
[7 2 1 0 4 1 7 4 2 9] prediction number
[7 2 1 0 4 1 4 9 5 9] real number
让 学习率=0 可以让模型不训练,不过梯度还是照样计算。
import torch
dels import resnet50
model = resnet50(pretrained=True)
model = nn.DataParallel(model.cuda())### Set LR to 0
optimizer = optim.Adam(model.parameters(), lr=0)for imgs, tgts in dataloader:model.eval()print('# BN.weight:n', dule.bn1.weight[0] )print('# BN.runing_mean:n', dule.bn1.running_mean[0] )print('# BN.requires_grad:n', dule.quires_grad )try:print('# BN.grad:n', dule.ad[0] )except:print('')print('n') out = model(imgs) l = loss(out, tgts)l.backward()....
# BN.weight:tensor(0.2486, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean:tensor(0.0002, device='cuda:0')
# BN.requires_grad:True
# BN.grad:tensor(0.5255, device='cuda:0')# BN.weight: <-- 【没变】tensor(0.2486, device='cuda:0', grad_fn=<SelectBackward>)
# BN.runing_mean: <-- 【没变】tensor(0.0002, device='cuda:0')
# BN.requires_grad:True
# BN.grad: <-- 【变了】tensor(0.3794, device='cuda:0')
BN
模块为.eval()
模式import torch
dels import resnet50def set_bn_to_train(m):classname = m.__class__.__name__if classname.find('BatchNorm2d') != -ain()def set_bn_to_eval(m):classname = m.__class__.__name__if classname.find('BatchNorm2d') != -1:m.eval()model = resnet50(pretrained=True)
model = nn.DataParallel(model.cuda())m = dule._modules['layer4']
for name, param in m.named_parameters():quires_grad = False
m.apply(set_bn_to_eval) # 记得要把 BN 设置成eval模式
本文发布于:2024-02-01 05:23:11,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170673619534187.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |