打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
深度学习的 weight decay

What is weight decay?权值衰减

Weight decay is a regularization technique by adding a small penalty, usually the L2 norm of the weights (all the weights of the model), to the loss function.

loss = loss + weight decay parameter * L2 norm of the weights

Some people prefer to only apply weight decay to the weights and not the bias. PyTorch applies weight decay to both weights and bias.

Why do we use weight decay?

  • To prevent overfitting.

  • To keep the weights small and avoid exploding gradient. Because the L2 norm of the weights are added to the loss, each iteration of your network will try to optimize/minimize the model weights in addition to the loss. This will help keep the weights as small as possible, preventing the weights to grow out of control, and thus avoid exploding gradient.

How do we use weight decay?

To use weight decay, we can simply define the weight decay parameter in the torch.optim.SGD optimizer or the torch.optim.Adam optimizer. Here we use 1e-4 as a default for weight_decay.

optimizer = torch.optim.SGD(model.parameters(), lr=1e-3, weight_decay=1e-4)optimizer = torch.optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)
本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
PyTorch 预训练模型,保存,读取和更新模型参数以及多 GPU 训练模型
torch.optim优化算法理解之optim.Adam()
02-快速入门:使用PyTorch进行机器学习和深度学习的基本工作流程(笔记 代码)
这些神经网络调参细节,你都了解了吗
全面讨论泛化 (generalization) 和正则化 (regularization)
深度学习优化算法的总结与梳理(从 SGD 到 AdamW 原理和代码解读)
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服