What is weight decay?权值衰减

Weight decay is a regularization technique by adding a small penalty, usually the L2 norm of the weights (all the weights of the model), to the loss function.

loss = loss + weight decay parameter * L2 norm of the weights

Some people prefer to only apply weight decay to the weights and not the bias. PyTorch applies weight decay to both weights and bias.

Why do we use weight decay?

To prevent overfitting.
To keep the weights small and avoid exploding gradient. Because the L2 norm of the weights are added to the loss, each iteration of your network will try to optimize/minimize the model weights in addition to the loss. This will help keep the weights as small as possible, preventing the weights to grow out of control, and thus avoid exploding gradient.

How do we use weight decay?

To use weight decay, we can simply define the weight decay parameter in the torch.optim.SGD optimizer or the torch.optim.Adam optimizer. Here we use 1e-4 as a default for weight_decay.

optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, weight_decay=1e-4)optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)

本站仅提供存储服务，所有内容均由用户发布，如发现有害或侵权内容，请点击举报。