今天是尝试用 PyTorch 框架来跑 MNIST 手写数字数据集的第三天,主要学习训练网络。本 blog 主要记录一个学习的路径以及学习资料的汇总。
注意:这是用 Python 2.7 版本写的代码
第一天(LeNet 网络的搭建):
第二天(加载 MNIST 数据集):
第三天(训练模型):
第四天(单例测试):
感谢 凯神 提供的代码与耐心指导!
from lenet import Net
import torch
import torch.optim as optim
functional as F
import matplotlib.pyplot as plt
from mnist_load import testset_loader, trainset_loaderLEARNING_RATE = 0.001
MOMENTUM = 0.9
EPOCH = 5if torch.cuda.is_available():device = torch.device('cuda')print 'cuda'
else:device = torch.device('cpu')print 'cpu'mnist_model = Net().to(device)optimizer = optim.SGD(mnist_model.parameters(),lr=LEARNING_RATE,momentum=MOMENTUM
)# save_model
def save_checkpoint(checkpoint_path, model, optimizer):# state_dict: a Python dictionary object that:# - for a model, maps each layer to its parameter tensor;# - for an optimizer, contains info about the optimizer's states and hyperparameters used.state = {'model': model.state_dict(),'optimizer' : optimizer.state_dict()}torch.save(state, checkpoint_path)print 'model saved to ', checkpoint_path# train
def mnist_train(epoch, save_interval):ain() # set training modeiteration = 0loss_plt = []for ep in range(epoch):for batch_idx, batch_data in enumerate(trainset_loader):images, labels = batch_dataimages = (device)labels = (_grad()output = mnist_model(images)loss = F.cross_entropy(output, labels)loss_plt.append(loss)loss.backward()optimizer.step()print 'Train Epoch:', ep+1, 'tBatch: ', batch_idx+1, '/', len(trainset_loader), 'tLoss: ', loss.item()# different from before: saving model checkpointsif iteration % save_interval == 0 and iteration > 0:save_checkpoint('module/pytorch-mnist-batchsize-128-%i.pth' % iteration, mnist_model, optimizer)iteration += 1mnist_test()# save the final modelsave_checkpoint('module/pytorch-mnist-batch-128-%i.pth' % iteration, mnist_model, optimizer)plt.plot(loss_plt, label='loss')plt.legend()plt.show()# test
def mnist_test():mnist_model.eval() # set evaluation modetest_loss = 0correct = _grad():for images, labels in testset_loader:images = (device)labels = (device)output = mnist_model(images)test_loss += F.cross_entropy(output, labels).item()pred = output.max(1, keepdim=True)[1] # get the index of the max log-probabilitycorrect += pred.eq(labels.view_as(pred)).sum().item()test_loss /= len(testset_loader.dataset)print 'nTest set: Average loss:', test_loss, 'tAccuracy:', (100. * correct / len(testset_loader.dataset)), '%n'if __name__ == '__main__':mnist_train(EPOCH, save_interval=1000)
.html
一开始以为是自己电脑配置(内存不够大)太低,每次 load 一个 batch 的图片数量不能太多,所以就一直在改 BATCH_SIZE 这个超参数。后面不停降低 BATCH_SIZE 还总报错,就意识到应该不是内存容量的问题。
后来查了一下,是加载数据(batch)的线程数目问题
好吧,原来 Python 写文件的时候,如果路径中的文件夹不存在,是不会自动创建好的。Mark!
凯神的解释:MOMENTUM 动量是随机梯度下降中用于更新模型权重的一个参数
.html
.html
将所有最开始读取数据时的 tensor 变量 copy 一份到指定设备 device 上,之后的运算都在指定设备上进行。
.htm
=distribute.pc_-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param
if __name__ == "__main__"
a way to save the current state of your experiment so that you can pick up from where you left off.
.html
因为后面反向传播时优化器会自动计算梯度,不要让上一次迭代的梯度影响到本次迭代的梯度
最开始有点搞不清楚这两个函数分别是干什么的。后来看视频拿个类比,我就明白了
线性回归中,权值参数的公式为:w_new = w_old + lr * gradient
loss.backward() 就相当于计算 gradient 的
optimizer.step() 就相当于根据 gradient 计算 w_new = w_old + lr * gradient 的
.html
Use both. They do different things, and have different scopes.
_grad: disables tracking of gradients in autograd.
model.eval(): changes the forward() behaviour of the module it is called upon. eg, it disables dropout and has batch norm use the entire population statistics
.html
=distribute.-task-blog-BlogCommendFromMachineLearnPai2-2.channel_param&depth_1-utm_source=distribute.-task-blog-BlogCommendFromMachineLearnPai2-2.channel_param
本文发布于:2024-02-01 13:36:21,感谢您对本站的认可!
本文链接:https://www.4u4v.net/it/170676578136977.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
留言与评论(共有 0 条评论) |