模型训练
在模型训练中,一个完整的训练过程(step)需要实现以下三步:
正向计算:模型预测结果(logits),并与正确标签(label)求预测损失(loss)。
反向传播:利用自动微分机制,自动求模型参数(parameters)对于loss的梯度(gradients)。
参数优化:将梯度更新到参数上。
MindSpore使用函数式自动微分机制,因此针对上述步骤需要实现:
定义正向计算函数。
使用value_and_grad通过函数变换获得梯度计算函数。
定义训练函数,使用set_train设置为训练模式,执行正向计算、反向传播和参数优化。
[11]:
# Instantiate loss function and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = nn.SGD(model.trainable_params(), 1e-2)
# 1. Define forward function
def forward_fn(data, label):
logits = model(data)
loss = loss_fn(logits, label)
return loss, logits
# 2. Get gradient function
grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)
# 3. Define function of one-step training
def train_step(data, label):
(loss, _), grads = grad_fn(data, label)
optimizer(grads)
return loss
def train(model, dataset):
size = dataset.get_dataset_size()
model.set_train()
for batch, (data, label) in enumerate(dataset.create_tuple_iterator()):
loss = train_step(data, label)
if batch % 100 == 0:
loss, current = loss.asnumpy(), batch
print(f"loss: {loss:>7f} [{current:>3d}/{size:>3d}]")
除训练外,我们定义测试函数,用来评估模型的性能。
[12]:
def test(model, dataset, loss_fn):
num_batches = dataset.get_dataset_size()
model.set_train(False)
total, test_loss, correct = 0, 0, 0
for data, label in dataset.create_tuple_iterator():
pred = model(data)
total += len(data)
test_loss += loss_fn(pred, label).asnumpy()
correct += (pred.argmax(1) == label).asnumpy().sum()
test_loss /= num_batches
correct /= total
print(f"Test: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
训练过程需多次迭代数据集,一次完整的迭代称为一轮(epoch)。在每一轮,遍历训练集进行训练,结束后使用测试集进行预测。打印每一轮的loss值和预测准确率(Accuracy),可以看到loss在不断下降,Accuracy在不断提高。
[13]:
epochs = 3
for t in range(epochs):
print(f"Epoch {t+1}\n-------------------------------")
train(model, train_dataset)
test(model, test_dataset, loss_fn)
print("Done!")
Epoch 1
-------------------------------
loss: 2.302088 [ 0/938]
loss: 2.290692 [100/938]
loss: 2.266338 [200/938]
loss: 2.205240 [300/938]
loss: 1.907198 [400/938]
loss: 1.455603 [500/938]
loss: 0.861103 [600/938]
loss: 0.767219 [700/938]
loss: 0.422253 [800/938]
loss: 0.513922 [900/938]
Test:
Accuracy: 83.8%, Avg loss: 0.529534
Epoch 2
-------------------------------
loss: 0.580867 [ 0/938]
loss: 0.479347 [100/938]
loss: 0.677991 [200/938]
loss: 0.550141 [300/938]
loss: 0.226565 [400/938]
loss: 0.314738 [500/938]
loss: 0.298739 [600/938]
loss: 0.459540 [700/938]
loss: 0.332978 [800/938]
loss: 0.406709 [900/938]
Test:
Accuracy: 90.2%, Avg loss: 0.334828
Epoch 3
-------------------------------
loss: 0.461890 [ 0/938]
loss: 0.242303 [100/938]
loss: 0.281414 [200/938]
loss: 0.207835 [300/938]
loss: 0.206000 [400/938]
loss: 0.409646 [500/938]
loss: 0.193608 [600/938]
loss: 0.217575 [700/938]
loss: 0.212817 [800/938]
loss: 0.202862 [900/938]
Test:
Accuracy: 91.9%, Avg loss: 0.280962
Done!
更多细节详见模型训练。