[Pytorch系列-50]:卷积神经网络 – FineTuning的统一处理流程与软件架构 – Pytorch代码实现

目录

  • 一边在训练集上训练,一边在验证集上验证。
  • 选择在整个验证集上,而不是验证集的一个batch上,其准确率最高的模型参数以及优化器参数作为最终的模型参数
  • 在整个验证集,而不是batch的目的:增加在测试集上的泛化能力
  • 在验证集上pytorch教程准确率最高的目的: 防止openstack核心模块在训练集上的过拟合

(2)步骤2:全网openstack搭建络的优化训练

相对于步骤1,步骤2的主要完成

  • 加载步骤1的训练模型,并基于此模型进一步训练
  • 开放整个网络(包括特征提取)和全连接层
  • 降低学习率100倍,以便在根据精细的层面进行训练

1.3 本文概述

本文主要针对步骤-2的pyPyTorchtorch实现。

步骤-2与步骤-1 90%的代码是相同openstack架构的,大致在10%的差异。

主要差异在:

  • 网络的参数数值的初始化
  • 网络可训练属性的设置

1.4 训练环境

文本以Resnet + CIFAR100 + GPU为例。

对于没有GPU的学习环境云计算是什么,可以把案例openstack包含两个主要模块中的网络修改成:Alexnet + CIFAR10 + CPU, 只需要几行的代码改动,并不影响软件流程和架构。

第2章 输入数据集

2.1 定义数据集加载时的数据格式的转换

#2-1 准备数据集
# 数据集格式转换
transform_train = transforms.Compose(
[transforms.Resize(256), #transforms.Scale(256)
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])


transform_test = transforms.Compose(
[transforms.Resize(256), #transforms.Scale(256)
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

2.2加载数据集

从本地加载数据集文件中加载数据集,如果没有数据集,自动从官网上在线下载数据集

# 训练数据集
train_data = dataset.CIFAR100 (root = "../datasets/cifar100",
train = True,
transform = transform_train,
download = True)

# 测试数据集
test_data = dataset.CIFAR100 (root = "../datasets/cifar100",
train = False,
transform = transform_test,
download = True)

print(train_data)
print("size=", len(train_data))
print("")
print(test_data)
print("size=", len(test_data))
Files already downloaded and verified
Files already downloaded and verified
Dataset CIFAR100
Number of datapoints: 50000
Root location: ../datasets/cifar100
Split: Train
StandardTransform
Transform: Compose(
Resize(size=256, interpolation=bilinear, max_size=None, antialias=None)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
size= 50000

Dataset CIFAR100
Number of datapoints: 10000
Root location: ../datasets/cifar100
Split: Test
StandardTransform
Transform: Compose(
Resize(size=256, interpolation=bilinear, max_size=None, antialias=None)
CenterCrop(size=(224, 224))
ToTensor()
Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
)
size= 10000

2.3 定义数据集人工智能是什么批处理data_loader

# 批量数据读取
batch_size = 32

train_loader = data_utils.DataLoader(dataset = train_data, #训练数据
batch_size = batch_size, #每个批次读取的图片数量
shuffle = True) #读取到的数据,是否需要随机打乱顺序

test_loader = data_utils.DataLoader(dataset = test_data, #测试数据集
batch_size = batch_size,
shuffle = True)

print(train_loader)
print(test_loader)
print(len(train_data), len(train_data)/batch_size)
print(len(test_data), len(test_data)/batch_size)
<torch.utils.data.dataloader.DataLoader object at 0x0000015158CCF4C0>
<torch.utils.data.dataloader.DataLoader object at 0x000001516F052640>
50000 1562.5
10000 312.5

备注:

批处理的长度与GPU的内存,图片文件的大小有关,8G的GPU, batch size设定为32比较合适。

2.4 展现一云计算在生活中的应用个批次的图片

(1)定义显示函数

def img_show_from_torch(img_data, title = None, debug_flag = False):
# 颜色通道还原
img_data = img_data.numpy()
img_data = img_data.transpose(1,2,0)

# 标准化的还原
mean = [0.485, 0.456, 0.406]
std = [0.229, 0.224, 0.225]
img_data = std * img_data + mean

# 像素值限制
img_data = np.clip(img_data, 0, 1)

if(debug_flag == True):
print("PIL Image data")
#print("image_shape: ", img_data.shape)
#print("image_dtype: ", img_data.dtype)
print("image_type: ", type(img_data))
print(img_data)

# 显示图片
fig, ax = plt.subplots()
ax.imshow(img_data)
ax.set_title(title)


def img_show_from_torch_batch(img_data, title = None, debug_flag = False):
# 把多张图片合并成一章图片
img_data = utils.make_grid(img_data)

# 显示单张图片
img_show_from_torch(img_data, title = title, debug_flag = debug_flag)

(2)获取一个批次的图片

#显示一个batch图片
print("获取一个batch组图片")
imgs, labels = next(iter(train_loader))
print(imgs.shape)
print(labels.shape)

(3)显示单张图片

img_show_from_torch(img_data = imgs[0], debug_flag = False)

(4)显示批次图片

img_show_from_torch_batch(imgs)


                                            [Pytorch系列-50]:卷积神经网络 - FineTuning的统一处理流程与软件架构 - Pytorch代码实现

第3章 定openstack包含两个主要模块义前向计算的网络

3.1 定义操作神经网络训练是否能够训练的函数

# 设置网络参数的trainable属性, 即设置梯度迭代使能的属性
def set_model_grad_state(model, trainable_state):
for param in model.parameters():
param.requires_grad = trainable_state
# 显示网络参数允许trainable的参数,即梯度迭代使能的参数
def show_model_grad_state_enabled(model):
print("params to be trained:")
for name, parameters in model.named_parameters():
if(parameters.requires_grad == True):
print(name, ':', parameters.requires_grad)

3.2 定义创建神经网络的函数

该函数的主要任务包括:

  • 创建指定的官网预定义的神经网络(默认是1000分类),支持人工智能之父openstack与k8s区别种神经网络,可扩展。
  • 锁定特征提取层
  • 根据自身的需要,替换全连接层,适配到自身的图片种类分类(如100分类或10分类)
  • use_pretrained = True时,自动自动远程下载预训练参数,并利用预训练的模型参数(主要针对ImageNet)初始化神经网络
# model_name: 模型的名称
# num_classes:输出种类
# lock_feature_extract:是否锁定特征提取网络
# use_pretrained:是否需要使用预训练参数初始化自定义的神经网络
# feature_extact_trainable: 特征提取层是否能够训练,即是否需要锁定特征提取层
def initialize_model(model_name, num_classes, use_pretrained = False, feature_extact_trainable = True):
model = None
input_size = 0

if(model_name == "resnet"):
if(use_pretrained == True):
# 使用预训练参数
model = models.resnet101(pretrained = True)

# 锁定特征提取层
set_model_grad_state(model, feature_extact_trainable)

#替换全连接层
num_in_features = model.fc.in_features
model.fc = nn.Sequential(nn.Linear(num_in_features, num_classes))
else:
model = models.resnet101(pretrained = False, num_classes = num_classes)
input_size = 224
elif(model_name == "alexnet"):
if(use_pretrained == True):
# 使用预训练参数
model = models.alexnet(pretrained = True)

# 锁定特征提取层
set_model_grad_state(model, feature_extact_trainable)

#替换全连接层
num_in_features = model.classifier[6].in_features
model.classifier[6] = nn.Sequential(nn.Linear(num_in_features, num_classes))
else:
model = models.alexnet(pretrained = False, num_classes = num_classes)
input_size = 224

elif(model_name == "vgg"):
if(use_pretrained == True):
# 使用预训练参数
model = models.vgg16(pretrained = True)

# 锁定特征提取层
set_model_grad_state(model, feature_extact_trainable)

#替换全连接层
num_in_features = model.classifier[6].in_features
model.classifier[6] = nn.Sequential(nn.Linear(num_in_features, num_classes))
else:
model = models.vgg16(pretrained = False, num_classes = num_classes)
input_size = 224
return model, input_size

备注:

从上述代码可以看出,利用官网提供的API,很方便定义一个复杂的预定义神经网络。

3.3 创建并显示创建的神经网络

# 创建网络实例
model, input_size = initialize_model(model_name = "resnet", num_classes = 100, use_pretrained = True, feature_extact_trainable=False)

print(input_size)
print(model)
224
ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(layer1): Sequential(
(0): Bottleneck(
(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer2): Sequential(
(0): Bottleneck(
(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer3): Sequential(
(0): Bottleneck(
(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(3): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(4): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(5): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(6): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(7): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(8): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(9): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(10): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(11): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(12): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(13): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(14): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(15): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(16): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(17): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(18): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(19): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(20): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(21): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(22): Bottleneck(
(conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(layer4): Sequential(
(0): Bottleneck(
(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
(downsample): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
(2): Bottleneck(
(conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu): ReLU(inplace=True)
)
)
(avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
(fc): Sequential(
(0): Linear(in_features=2048, out_features=100, bias=True)
)
)

备注:

(fc): Sequential( (0): Linear(in_features=2048, out_features=100, bias=True)

  • 从这里可以看出,全深度学习连接层被替换,1000分类被替换成100分类。也就是说,加载的预训练参数,包括特征提取层和全连接层,但全连接被替代了,需要重新训练。

3.4 展示需要训练的网络参数

# 检查需要训练的参数
show_model_grad_state_enabled(model)
params to be trained:
fc.0.weight : True
fc.0.bias : True

3.5 重新设置可训练参数 =》卷积 + 全连接(new)

# 重新设定网络属性:使能所有层的训练
set_model_grad_state(model, True)

# 检查需要训练的参数
show_model_grad_state_enabled(model)
params to be trained:
conv1.weight : True
bn1.weight : True
bn1.bias : True
layer1.0.conv1.weight : True
layer1.0.bn1.weight : True
layer1.0.bn1.bias : True
layer1.0.conv2.weight : True
layer1.0.bn2.weight : True
layer1.0.bn2.bias : True
layer1.0.conv3.weight : True
layer1.0.bn3.weight : True
layer1.0.bn3.bias : True
layer1.0.downsample.0.weight : True
layer1.0.downsample.1.weight : True
layer1.0.downsample.1.bias : True
layer1.1.conv1.weight : True
layer1.1.bn1.weight : True
layer1.1.bn1.bias : True
layer1.1.conv2.weight : True
layer1.1.bn2.weight : True
layer1.1.bn2.bias : True
layer1.1.conv3.weight : True
layer1.1.bn3.weight : True
layer1.1.bn3.bias : True
layer1.2.conv1.weight : True
layer1.2.bn1.weight : True
layer1.2.bn1.bias : True
layer1.2.conv2.weight : True
layer1.2.bn2.weight : True
layer1.2.bn2.bias : True
layer1.2.conv3.weight : True
layer1.2.bn3.weight : True
layer1.2.bn3.bias : True
layer2.0.conv1.weight : True
layer2.0.bn1.weight : True
layer2.0.bn1.bias : True
layer2.0.conv2.weight : True
layer2.0.bn2.weight : True
layer2.0.bn2.bias : True
layer2.0.conv3.weight : True
layer2.0.bn3.weight : True
layer2.0.bn3.bias : True
layer2.0.downsample.0.weight : True
layer2.0.downsample.1.weight : True
layer2.0.downsample.1.bias : True
layer2.1.conv1.weight : True
layer2.1.bn1.weight : True
layer2.1.bn1.bias : True
layer2.1.conv2.weight : True
layer2.1.bn2.weight : True
layer2.1.bn2.bias : True
layer2.1.conv3.weight : True
layer2.1.bn3.weight : True
layer2.1.bn3.bias : True
layer2.2.conv1.weight : True
layer2.2.bn1.weight : True
layer2.2.bn1.bias : True
layer2.2.conv2.weight : True
layer2.2.bn2.weight : True
layer2.2.bn2.bias : True
layer2.2.conv3.weight : True
layer2.2.bn3.weight : True
layer2.2.bn3.bias : True
layer2.3.conv1.weight : True
layer2.3.bn1.weight : True
layer2.3.bn1.bias : True
layer2.3.conv2.weight : True
layer2.3.bn2.weight : True
layer2.3.bn2.bias : True
layer2.3.conv3.weight : True
layer2.3.bn3.weight : True
layer2.3.bn3.bias : True
layer3.0.conv1.weight : True
layer3.0.bn1.weight : True
layer3.0.bn1.bias : True
layer3.0.conv2.weight : True
layer3.0.bn2.weight : True
layer3.0.bn2.bias : True
layer3.0.conv3.weight : True
layer3.0.bn3.weight : True
layer3.0.bn3.bias : True
layer3.0.downsample.0.weight : True
layer3.0.downsample.1.weight : True
layer3.0.downsample.1.bias : True
layer3.1.conv1.weight : True
layer3.1.bn1.weight : True
layer3.1.bn1.bias : True
layer3.1.conv2.weight : True
layer3.1.bn2.weight : True
layer3.1.bn2.bias : True
layer3.1.conv3.weight : True
layer3.1.bn3.weight : True
layer3.1.bn3.bias : True
layer3.2.conv1.weight : True
layer3.2.bn1.weight : True
layer3.2.bn1.bias : True
layer3.2.conv2.weight : True
layer3.2.bn2.weight : True
layer3.2.bn2.bias : True
layer3.2.conv3.weight : True
layer3.2.bn3.weight : True
layer3.2.bn3.bias : True
layer3.3.conv1.weight : True
layer3.3.bn1.weight : True
layer3.3.bn1.bias : True
layer3.3.conv2.weight : True
layer3.3.bn2.weight : True
layer3.3.bn2.bias : True
layer3.3.conv3.weight : True
layer3.3.bn3.weight : True
layer3.3.bn3.bias : True
layer3.4.conv1.weight : True
layer3.4.bn1.weight : True
layer3.4.bn1.bias : True
layer3.4.conv2.weight : True
layer3.4.bn2.weight : True
layer3.4.bn2.bias : True
layer3.4.conv3.weight : True
layer3.4.bn3.weight : True
layer3.4.bn3.bias : True
layer3.5.conv1.weight : True
layer3.5.bn1.weight : True
layer3.5.bn1.bias : True
layer3.5.conv2.weight : True
layer3.5.bn2.weight : True
layer3.5.bn2.bias : True
layer3.5.conv3.weight : True
layer3.5.bn3.weight : True
layer3.5.bn3.bias : True
layer3.6.conv1.weight : True
layer3.6.bn1.weight : True
layer3.6.bn1.bias : True
layer3.6.conv2.weight : True
layer3.6.bn2.weight : True
layer3.6.bn2.bias : True
layer3.6.conv3.weight : True
layer3.6.bn3.weight : True
layer3.6.bn3.bias : True
layer3.7.conv1.weight : True
layer3.7.bn1.weight : True
layer3.7.bn1.bias : True
layer3.7.conv2.weight : True
layer3.7.bn2.weight : True
layer3.7.bn2.bias : True
layer3.7.conv3.weight : True
layer3.7.bn3.weight : True
layer3.7.bn3.bias : True
layer3.8.conv1.weight : True
layer3.8.bn1.weight : True
layer3.8.bn1.bias : True
layer3.8.conv2.weight : True
layer3.8.bn2.weight : True
layer3.8.bn2.bias : True
layer3.8.conv3.weight : True
layer3.8.bn3.weight : True
layer3.8.bn3.bias : True
layer3.9.conv1.weight : True
layer3.9.bn1.weight : True
layer3.9.bn1.bias : True
layer3.9.conv2.weight : True
layer3.9.bn2.weight : True
layer3.9.bn2.bias : True
layer3.9.conv3.weight : True
layer3.9.bn3.weight : True
layer3.9.bn3.bias : True
layer3.10.conv1.weight : True
layer3.10.bn1.weight : True
layer3.10.bn1.bias : True
layer3.10.conv2.weight : True
layer3.10.bn2.weight : True
layer3.10.bn2.bias : True
layer3.10.conv3.weight : True
layer3.10.bn3.weight : True
layer3.10.bn3.bias : True
layer3.11.conv1.weight : True
layer3.11.bn1.weight : True
layer3.11.bn1.bias : True
layer3.11.conv2.weight : True
layer3.11.bn2.weight : True
layer3.11.bn2.bias : True
layer3.11.conv3.weight : True
layer3.11.bn3.weight : True
layer3.11.bn3.bias : True
layer3.12.conv1.weight : True
layer3.12.bn1.weight : True
layer3.12.bn1.bias : True
layer3.12.conv2.weight : True
layer3.12.bn2.weight : True
layer3.12.bn2.bias : True
layer3.12.conv3.weight : True
layer3.12.bn3.weight : True
layer3.12.bn3.bias : True
layer3.13.conv1.weight : True
layer3.13.bn1.weight : True
layer3.13.bn1.bias : True
layer3.13.conv2.weight : True
layer3.13.bn2.weight : True
layer3.13.bn2.bias : True
layer3.13.conv3.weight : True
layer3.13.bn3.weight : True
layer3.13.bn3.bias : True
layer3.14.conv1.weight : True
layer3.14.bn1.weight : True
layer3.14.bn1.bias : True
layer3.14.conv2.weight : True
layer3.14.bn2.weight : True
layer3.14.bn2.bias : True
layer3.14.conv3.weight : True
layer3.14.bn3.weight : True
layer3.14.bn3.bias : True
layer3.15.conv1.weight : True
layer3.15.bn1.weight : True
layer3.15.bn1.bias : True
layer3.15.conv2.weight : True
layer3.15.bn2.weight : True
layer3.15.bn2.bias : True
layer3.15.conv3.weight : True
layer3.15.bn3.weight : True
layer3.15.bn3.bias : True
layer3.16.conv1.weight : True
layer3.16.bn1.weight : True
layer3.16.bn1.bias : True
layer3.16.conv2.weight : True
layer3.16.bn2.weight : True
layer3.16.bn2.bias : True
layer3.16.conv3.weight : True
layer3.16.bn3.weight : True
layer3.16.bn3.bias : True
layer3.17.conv1.weight : True
layer3.17.bn1.weight : True
layer3.17.bn1.bias : True
layer3.17.conv2.weight : True
layer3.17.bn2.weight : True
layer3.17.bn2.bias : True
layer3.17.conv3.weight : True
layer3.17.bn3.weight : True
layer3.17.bn3.bias : True
layer3.18.conv1.weight : True
layer3.18.bn1.weight : True
layer3.18.bn1.bias : True
layer3.18.conv2.weight : True
layer3.18.bn2.weight : True
layer3.18.bn2.bias : True
layer3.18.conv3.weight : True
layer3.18.bn3.weight : True
layer3.18.bn3.bias : True
layer3.19.conv1.weight : True
layer3.19.bn1.weight : True
layer3.19.bn1.bias : True
layer3.19.conv2.weight : True
layer3.19.bn2.weight : True
layer3.19.bn2.bias : True
layer3.19.conv3.weight : True
layer3.19.bn3.weight : True
layer3.19.bn3.bias : True
layer3.20.conv1.weight : True
layer3.20.bn1.weight : True
layer3.20.bn1.bias : True
layer3.20.conv2.weight : True
layer3.20.bn2.weight : True
layer3.20.bn2.bias : True
layer3.20.conv3.weight : True
layer3.20.bn3.weight : True
layer3.20.bn3.bias : True
layer3.21.conv1.weight : True
layer3.21.bn1.weight : True
layer3.21.bn1.bias : True
layer3.21.conv2.weight : True
layer3.21.bn2.weight : True
layer3.21.bn2.bias : True
layer3.21.conv3.weight : True
layer3.21.bn3.weight : True
layer3.21.bn3.bias : True
layer3.22.conv1.weight : True
layer3.22.bn1.weight : True
layer3.22.bn1.bias : True
layer3.22.conv2.weight : True
layer3.22.bn2.weight : True
layer3.22.bn2.bias : True
layer3.22.conv3.weight : True
layer3.22.bn3.weight : True
layer3.22.bn3.bias : True
layer4.0.conv1.weight : True
layer4.0.bn1.weight : True
layer4.0.bn1.bias : True
layer4.0.conv2.weight : True
layer4.0.bn2.weight : True
layer4.0.bn2.bias : True
layer4.0.conv3.weight : True
layer4.0.bn3.weight : True
layer4.0.bn3.bias : True
layer4.0.downsample.0.weight : True
layer4.0.downsample.1.weight : True
layer4.0.downsample.1.bias : True
layer4.1.conv1.weight : True
layer4.1.bn1.weight : True
layer4.1.bn1.bias : True
layer4.1.conv2.weight : True
layer4.1.bn2.weight : True
layer4.1.bn2.bias : True
layer4.1.conv3.weight : True
layer4.1.bn3.weight : True
layer4.1.bn3.bias : True
layer4.2.conv1.weight : True
layer4.2.bn1.weight : True
layer4.2.bn1.bias : True
layer4.2.conv2.weight : True
layer4.2.bn2.weight : True
layer4.2.bn2.bias : True
layer4.2.conv3.weight : True
layer4.2.bn3.weight : True
layer4.2.bn3.bias : True
fc.0.weight : True
fc.0.bias : True

备注:人工智能的发展及应用人工智能对人类社会发展的影响有的网络参数都参与训练。

3.6 加载先人工智能专业前的checopenstack是什么kpoint(new)

# 加载先前训练的check point
checkpoint_file = "../models/checkpoints/resnet101_cifar100_checkpoint.pth"

checkpoint = torch.load(checkpoint_file)
model.load_state_dict(checkpoint ["state_dict"])

#获取check point模型的准确率
best_accuarcy = checkpoint ["best_accuracy"]
print("best_accuarcy of check point =", best_accuarcy)
best_accuarcy of check  point = 60.83

第4章 模型训练

4.1pytorch是什么 定义模型训练流程与openstack组件有哪些策略(重点、重点人工智能对人类社会发展的影响、重点)

# 模块迁移学习/训练的定义:
# 一边在训练集上训练,一边在验证集上验证
# 策略:
# 最终选择在整个验证集上,而不是验证集的一个batch上,其准确率最高的模型参数以及优化器参数作为最终的模型参数
# 在整个验证集,而不是batch的目的:增加在测试集上的泛化能力
# 在验证集上准确率最高的目的: 防止在训练集上的过拟合
def model_train(model, train_loader, test_loader, criterion, optimizer, device, num_epoches = 1, check_point_filename=""):
# 记录训练的开始时间
time_train_start = time.time()
print('+ Train start: num_epoches = {}'.format(num_epoches))

# 历史数据,用于显示
batch_loss_history = []
batch_accuracy_history = []
best_accuracy_history = []

# 记录最好的精度,用于保存此时的模型,并不是按照epoch来保存模型,也不是保存最后的模型
best_accuracy = 0
best_epoch = 0

#使用当前的模型参数,作为best model的初始值
best_model_state = copy.deepcopy(model.state_dict())

# 把模型迁移到 GPU device上
model.to(device)

# epoch层
for epoch in range(num_epoches):
time_epoch_start = time.time()
print('++ Epoch start: {}/{}'.format(epoch, num_epoches-1))

epoch_size = 0
epoch_loss_sum = 0
epoch_corrects = 0

# 数据集层
#每训练完一个epoch,进行一次全训练样本的训练和一次验证样本的验证
for dataset in ["train", "valid"]:
time_dataset_start = time.time()
print('+++ dataset start: epoch = {}, dataset = {}'.format(epoch, dataset))

if dataset == "train":
model.train() # 设置在训练模式
data_loader = train_loader
else:
model.eval() # 设置在验证模式
data_loader = test_loader

dataset_size = len(data_loader.dataset)
dataset_loss_sum = 0
dataset_corrects = 0

# batch层
# begin to operate in mode
for batch, (inputs, labels) in enumerate(data_loader):
# (0) batch size
batch_size = inputs.size(0)

#(1) 指定数据处理的硬件单元
inputs = inputs.to(device)
labels = labels.to(device)

#(2) 复位优化器的梯度
optimizer.zero_grad()

# session层
with torch.set_grad_enabled (dataset == "train"):
#(3) 前向计算输出
outputs = model(inputs)

#(4) 计算损失值
loss = criterion(outputs, labels)

if(dataset == "train"):
#(5) 反向求导
loss.backward()

#(6) 反向迭代
optimizer.step()

# (7-1) 统计当前batch的loss(包括训练集和验证集)
batch_loss = loss.item()

# (7-2) # 统计当前batch的正确样本的个数和精度(包括训练集和验证集)
# 选择概率最大的索引作为分类值
_, predicteds = torch.max(outputs, 1)
batch_corrects = (predicteds == labels.data).sum().item()
batch_accuracy = 100*batch_corrects/batch_size

#(8-1)统计当前dataset总的loss(包括训练集和验证集)
dataset_loss_sum += batch_loss * batch_size

#(8-2)统计当前dataset正确样本的总数(包括训练集和验证集)
dataset_corrects += batch_corrects

# 把训练结果添加到history log,用于后期的图形显示
batch_loss_history.append(batch_loss)
batch_accuracy_history.append(batch_accuracy)

if(batch % 100 == 0):
print('++++ batch done: epoch = {}, dataset = {}, batch = {}/{}, loss = {:.4f}, accuracy = {:.4f}%'.format(epoch, dataset, batch, dataset_size//batch_size, batch_loss, batch_accuracy))

# 统计dataset的平均loss
dataset_loss_average = dataset_loss_sum/dataset_size

# 统计dataset的平均准确率
dataset_accuracy_average = 100*dataset_corrects/dataset_size

# 统计当前epoch总的loss
epoch_loss_sum += dataset_loss_sum

# 统计当前epoch总的正确数
epoch_corrects += dataset_corrects

# epoch_size
epoch_size += dataset_size

#模型保存:此处策略为:在验证集上,每次精度提升的时候,都保存一次模型参数,防止过拟合
if (dataset == "valid") and (dataset_accuracy_average > best_accuracy):
# 保存当前的最佳精度(防止过拟合)
best_accuracy = dataset_accuracy_average
# 保存最佳epoch(检查是否有过拟合训练)
best_epoch = epoch

print('+++ model save with new best_accuracy = '.format(best_accuracy))
# 获取当前的模型参数
best_model_state = copy.deepcopy(model.state_dict())
state = {
"state_dict": model.state_dict(),
"best_accuracy": best_accuracy,
"optimizer": optimizer.state_dict(),
}
if (check_point_filename != ""):
torch.save(state, check_point_filename)

best_accuracy_history.append(best_accuracy)

time_dataset_done = time.time()
time_dataset_elapsed = time_dataset_done - time_dataset_start
print('+++ dataset done:epoch = {}, dataset = {}, loss = {:.4f}, accuracy = {:.4f}%, elapsed time = {:0f}m {:.0f}s'.format(epoch, dataset, dataset_loss_average, dataset_accuracy_average, time_dataset_elapsed//60, time_dataset_elapsed %60))

# 统计epoch的平均loss
epoch_loss_average = epoch_loss_sum/epoch_size

# 统计epoch的平均正确率
epoch_accuarcy_average = 100*epoch_corrects/epoch_size

time_epoch_done = time.time()
time_epoch_elapsed = time_epoch_done - time_epoch_start

print('++ epoch done: epoch = {}, loss = {:.4f}, accuracy = {:.4f}%, elapsed time = {:0f}m {:.0f}s'.format(epoch, epoch_loss_average, epoch_accuarcy_average, time_epoch_elapsed//60, time_epoch_elapsed %60))

# 恢复最佳模型
model.load_state_dict(best_model_state)

# 记录训练的结束时间
time_train_done = time.time()
time_train_elapsed = time_train_done - time_train_start
print('+ Train Finished: elapsed time = {:0f}m {:.0f}s'.format(time_train_elapsed//60, time_train_elapsed %60))

return (model, batch_loss_history, batch_accuracy_history, best_accuracy_history)

4.2指定反向计算的loss函数

# 指定loss函数
loss_fn = nn.CrossEntropyLoss()
#loss_fn = nn.NLLLoss()
print(loss_fn)
CrossEntropyLoss()

4.3指定反向计算的优化云计算在生活中的应用器/算法

# 指定优化器
Learning_rate = 1e4 #学习率

# optimizer = SGD: 基本梯度下降法
# parameters:指明要优化的参数列表
# lr:指明学习率
#optimizer = torch.optim.Adam(model.parameters(), lr = Learning_rate)
optimizer = torch.optim.SGD(model.parameters(), lr = Learning_rate, momentum=0.9)
print(optimizer)

4.4 训练前的准备

# 训练前准备
# 检查是否支持GPU,如果支持,则使用GPU训练,否则,使用CPU训练
if torch.cuda.is_available():
device_name = "cuda:0"
else:
device_name = "cpu"

# 生成torch的device对象
device = torch.device(device_name)
print(device)

#把模型计算部署在GPUS上
model = model.to(device)

#把loss计算转移到GPU
loss_fn = loss_fn.to(device) # 自适应选择法
#loss_fn.cuda() # 强制指定法

#保存训练后的模型
model_trained_path = "../models/checkpoint.pth"

# 定义迭代次数
epochs = 5

4.5 开始训练

checkpoint_file = "../checkpoints/alexnet_checkpoint.pth"
model, batch_loss_history, batch_accuracy_history, best_accuracy_history = model_train(
model = model,
train_loader = train_loader,
test_loader = test_loader,
criterion = loss_fn,
optimizer = optimizer,
device = device,
num_epoches = epochs,
check_point_filename = checkpoint_file)
..............................................
++ Epoch start: 4/4
+++ dataset start: epoch = 4, dataset = train
++++ batch done: epoch = 4, dataset = train, batch = 0/1562, loss = 0.1395, accuracy = 96.8750%
++++ batch done: epoch = 4, dataset = train, batch = 200/1562, loss = 0.2769, accuracy = 84.3750%
++++ batch done: epoch = 4, dataset = train, batch = 400/1562, loss = 0.3304, accuracy = 90.6250%
++++ batch done: epoch = 4, dataset = train, batch = 600/1562, loss = 0.0935, accuracy = 100.0000%
++++ batch done: epoch = 4, dataset = train, batch = 800/1562, loss = 0.1541, accuracy = 93.7500%
++++ batch done: epoch = 4, dataset = train, batch = 1000/1562, loss = 0.0708, accuracy = 100.0000%
++++ batch done: epoch = 4, dataset = train, batch = 1200/1562, loss = 0.3096, accuracy = 90.6250%
++++ batch done: epoch = 4, dataset = train, batch = 1400/1562, loss = 0.2055, accuracy = 93.7500%
+++ dataset done:epoch = 4, dataset = train, loss = 0.1884, accuracy = 95.0060%, elapsed time = 9.000000m 15s
+++ dataset start: epoch = 4, dataset = valid
++++ batch done: epoch = 4, dataset = valid, batch = 0/312, loss = 0.5524, accuracy = 81.2500%
++++ batch done: epoch = 4, dataset = valid, batch = 200/312, loss = 0.2919, accuracy = 90.6250%
+++ model save with new best_accuracy = 80.69
+++ dataset done:epoch = 4, dataset = valid, loss = 0.6712, accuracy = 80.6900%, elapsed time = 0.000000m 41s
++ epoch done: epoch = 4, loss = 0.2688, accuracy = 92.6200%, elapsed time = 9.000000m 56s
+ Train Finished: best_epoch = 4, best_accuracy = 80.69, elapsed time = 50.000000m 3s

备注:

在epoch时,在验证云计算与物联网的关系集获得了80.69%的准确率,这说明,还可以进一步的训练。

训练集的准确pytorch中文文档率,明显高于验证集。

第5章 模型评估

5.1 可视化loss变化过程

#显示batch accuracy的历史数据
plt.grid()
plt.xlabel("iters")
plt.ylabel("")
plt.title("Batch loss", fontsize = 12)
plt.plot(batch_loss_history, "r")
plt.show()


                                            [Pytorch系列-50]:卷积神经网络 - FineTuning的统一处理流程与软件架构 - Pytorch代码实现

从上图可以看出,loss函数已经收敛。

5.2 可视化accuracy变化过程

#显示batch accuracy的历史数据
plt.grid()
plt.xlabel("iters")
plt.ylabel("%")
plt.title("Batch accuracy", fontsize = 12)
plt.plot(batch_accuracy_history, "b+")
plt.show()


                                            [Pytorch系列-50]:卷积神经网络 - FineTuning的统一处理流程与软件架构 - Pytorch代码实现

从上图可以看出,Accuracy已经收敛。

5.3可视化best accuracy变化过程

#显示Best准确率的历史数据
plt.grid()
plt.xlabel("iters")
plt.ylabel("%")
plt.title("best accuracy", fontsize = 12)
plt.plot(best_accuracy_history, "b+")
plt.show()

5.4定义评估模型

#定义模型验证函数
def model_eval(model, data_loader, device):
print("model_eval start")
# 进行评测的时候网络不更新梯度
with torch.no_grad():

# 把模型部署到指定的device上
model = model.to(device)

# 把模型设置评估模式
model = model.eval()

dataset_len = len(data_loader.dataset)
dataset_corrects = 0
dataset_size = 0

#启动batch验证
for batch, (inputs, labels) in enumerate(data_loader):
inputs = inputs.to(device)
labels = labels.to(device)

# 获得batch batch size
batch_size = inputs.size(0)

#对batch中所有样本进行预测
outputs = model(inputs)

#对batch中每个样本的预测结果,选择最可能的分类
_, predicted = torch.max(outputs.data, 1)

#对batch中的所有结果进行比较"
bool_results = (predicted == labels)

#统计预测正确样本的个数
batch_corrects = bool_results.sum().item()

#统计预测正确样本的精度
batch_accuracy = 100 * batch_corrects/batch_size

if(batch % 100 == 0):
print('batch {} In {} accuracy = {:.4f}'.format(batch, dataset_len/batch_size, batch_accuracy))

#对batch中的样本数进行累计
dataset_corrects += batch_corrects
dataset_size += batch_size

dataset_len_accuracy = 100 * dataset_corrects/dataset_len
dataset_size_accuracy = 100 * dataset_corrects/dataset_size

print('model_eval done: Final accuracy = {}/{} = {:.4f}'.format(dataset_corrects, dataset_len, dataset_len_accuracy))
print('model_eval done: Final accuracy = {}/{} = {:.4f}'.format(dataset_corrects, dataset_size, dataset_size_accuracy))
return (dataset_len_accuracy, dataset_size_accuracy)

5.5在训练集上评估

# 训练集上评估
model_eval(model = model, data_loader = train_loader, device = device)
model_eval start
batch 0 In 1562.5 accuracy = 81.2500
batch 100 In 1562.5 accuracy = 71.8750
batch 200 In 1562.5 accuracy = 78.1250
batch 300 In 1562.5 accuracy = 78.1250
batch 400 In 1562.5 accuracy = 65.6250
batch 500 In 1562.5 accuracy = 68.7500
batch 600 In 1562.5 accuracy = 78.1250
batch 700 In 1562.5 accuracy = 59.3750
batch 800 In 1562.5 accuracy = 81.2500
batch 900 In 1562.5 accuracy = 75.0000
batch 1000 In 1562.5 accuracy = 68.7500
batch 1100 In 1562.5 accuracy = 87.5000
batch 1200 In 1562.5 accuracy = 65.6250
batch 1300 In 1562.5 accuracy = 78.1250
batch 1400 In 1562.5 accuracy = 78.1250
batch 1500 In 1562.5 accuracy = 84.3750
model_eval done: Final accuracy = 36901/50000 = 73.8020
model_eval done: Final accuracy = 36901/50000 = 73.8020

Out[66]:

(73.802, 73.802)

5.6在测试集上评估

# 测试集上评估
model_eval(model = model, data_loader = test_loader, device = device)
model_eval start
batch 0 In 312.5 accuracy = 50.0000
batch 100 In 312.5 accuracy = 65.6250
batch 200 In 312.5 accuracy = 71.8750
batch 300 In 312.5 accuracy = 53.1250
model_eval done: Final accuracy = 6725/10000 = 67.2500
model_eval done: Final accuracy = 6725/10000 = 67.2500

Out[67]:

(67.25, 67.25)

备注:

第6章 模型存储

#存储模型
torch.save(model, "../models/after-trained/resetnet_model_cifar100.pkl")

#存储参数
torch.save(model.state_dict() , "../models/after-trained/resetnet_model_cifar100_param.pkl")

第7人工智能对人类社会发展的影响章 笔者感悟

经过进一步的训练,整体的性能相对于基准,提升了10%点左右。

还有进一步的提升空间。