模型训练

在模型训练中,一个完整的训练过程(step)需要实现以下三步:

正向计算:模型预测结果(logits),并与正确标签(label)求预测损失(loss)。

反向传播:利用自动微分机制,自动求模型参数(parameters)对于loss的梯度(gradients)。

参数优化:将梯度更新到参数上。

MindSpore使用函数式自动微分机制,因此针对上述步骤需要实现:

定义正向计算函数。

使用value_and_grad通过函数变换获得梯度计算函数。

定义训练函数,使用set_train设置为训练模式,执行正向计算、反向传播和参数优化。

[11]:

# Instantiate loss function and optimizer

loss_fn = nn.CrossEntropyLoss()

optimizer = nn.SGD(model.trainable_params(), 1e-2)

# 1. Define forward function

def forward_fn(data, label):

logits = model(data)

loss = loss_fn(logits, label)

return loss, logits

# 2. Get gradient function

grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)

# 3. Define function of one-step training

def train_step(data, label):

(loss, _), grads = grad_fn(data, label)

optimizer(grads)

return loss

def train(model, dataset):

size = dataset.get_dataset_size()

model.set_train()

for batch, (data, label) in enumerate(dataset.create_tuple_iterator()):

loss = train_step(data, label)

if batch % 100 == 0:

loss, current = loss.asnumpy(), batch

print(f"loss: {loss:>7f} [{current:>3d}/{size:>3d}]")

除训练外,我们定义测试函数,用来评估模型的性能。

[12]:

def test(model, dataset, loss_fn):

num_batches = dataset.get_dataset_size()

model.set_train(False)

total, test_loss, correct = 0, 0, 0

for data, label in dataset.create_tuple_iterator():

pred = model(data)

total += len(data)

test_loss += loss_fn(pred, label).asnumpy()

correct += (pred.argmax(1) == label).asnumpy().sum()

test_loss /= num_batches

correct /= total

print(f"Test: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

训练过程需多次迭代数据集,一次完整的迭代称为一轮(epoch)。在每一轮,遍历训练集进行训练,结束后使用测试集进行预测。打印每一轮的loss值和预测准确率(Accuracy),可以看到loss在不断下降,Accuracy在不断提高。

[13]:

epochs = 3

for t in range(epochs):

print(f"Epoch {t+1}\n-------------------------------")

train(model, train_dataset)

test(model, test_dataset, loss_fn)

print("Done!")

Epoch 1

-------------------------------

loss: 2.302088 [ 0/938]

loss: 2.290692 [100/938]

loss: 2.266338 [200/938]

loss: 2.205240 [300/938]

loss: 1.907198 [400/938]

loss: 1.455603 [500/938]

loss: 0.861103 [600/938]

loss: 0.767219 [700/938]

loss: 0.422253 [800/938]

loss: 0.513922 [900/938]

Test:

Accuracy: 83.8%, Avg loss: 0.529534

Epoch 2

-------------------------------

loss: 0.580867 [ 0/938]

loss: 0.479347 [100/938]

loss: 0.677991 [200/938]

loss: 0.550141 [300/938]

loss: 0.226565 [400/938]

loss: 0.314738 [500/938]

loss: 0.298739 [600/938]

loss: 0.459540 [700/938]

loss: 0.332978 [800/938]

loss: 0.406709 [900/938]

Test:

Accuracy: 90.2%, Avg loss: 0.334828

Epoch 3

-------------------------------

loss: 0.461890 [ 0/938]

loss: 0.242303 [100/938]

loss: 0.281414 [200/938]

loss: 0.207835 [300/938]

loss: 0.206000 [400/938]

loss: 0.409646 [500/938]

loss: 0.193608 [600/938]

loss: 0.217575 [700/938]

loss: 0.212817 [800/938]

loss: 0.202862 [900/938]

Test:

Accuracy: 91.9%, Avg loss: 0.280962

Done!

更多细节详见模型训练。