PyTorch 中的损失函数 - 文章 - 开发者社区

损失函数（Loss function）

不管是深度学习还是机器学习中，损失函数扮演着至关重要的角色。损失函数（或称为代价函数）用来评估模型的预测值与真实值的差距，损失函数越小，模型的效果越好。损失函数是一个计算单个数值的函数，它指导模型学习，在学习过程将试图使其值最小化。

picture.image

基于深度学习的图像分类，将传统的图像分类流程（预处理、特征提取、分类器），全部体现在各种层的组合，有卷积层、池化层、全连接层，图像分类流程如下图所示。

picture.image

训练过程中主要是求解模型的参数，一个输入图片经过多个卷积、池化，它们提取图像特征图，图像特征图拉伸为一维特征向量，连接全连接层，将特征图映射到标签（类别），可知输入图片属于每个类别的概率值。

选取概率值最大的类别作为预测的结果。根据推理的结果与图片的真实类别的差距，用 损失函数 计算差距值，再通过梯度下降的方法求解模型参数。参数确定之后，模型就确定了，可以推理测试集中新的图片。

常见的回归损失函数：L1 Loss、L2 Loss、Smooth L1 Loss

常见的分类损失函数：0-1 Loss、交叉熵、Negative Log-Likelihood Loss、Weighted Cross Entropy Loss 、Focal Loss

这些损失函数通过 torch.nn 库和 torch.nn.functional库导入。这两个库很类似，都涵盖了神经网络的各层操作，只是用法有点不同，nn 是类实现，nn.functional 是函数实现。

nn.xxx 需要先实例化并传入参数，然后以函数调用的方式调用实例化的对象并传入输入数据。

nn.functional.xxx 无需实例化，可直接使用。

L1 Loss (Mean Absolute Error，MAE)

L1 损失函数计算预测张量中的每个值与真实值之间的平均绝对误差。它首先计算预测张量中的每个值与真实值之间的绝对差值，并计算所有绝对差值的总和。最后，它计算该和值的平均值以获得平均绝对误差（MAE）。L1 损失函数对于处理噪声非常鲁棒。

picture.image

Numpy 实现如下：


        
          
import numpy as np  
y_pred = np.array([0.000, 0.100, 0.200])  
y_true = np.array([0.000, 0.200, 0.250])  
# Defining Mean Absolute Error loss function  
def mae(pred, true):  
    # Find absolute difference  
    differences = pred - true  
    absolute_differences = np.absolute(differences)  
    # find the absoute mean  
    mean_absolute_error = absolute_differences.mean()  
    return mean_absolute_error  
mae_value = mae(y_pred, y_true)  
print ("MAE error is: " + str(mae_value))  
# MAE error is: 0.049999999999999996

PyTorch 实现如下：

torch.nn.L1Loss(*size_average=None*, *reduce=None*, *reduction='mean'*)


        
          
import torch  
MAE_Loss = torch.nn.L1Loss() # 实例化  
input = torch.tensor(y_pred)  
target = torch.tensor(y_true)  
output = MAE_Loss(input, target)  
print(output)  
# tensor(0.0500, dtype=torch.float64)

torch.nn.functional.l1_loss(*input*, *target*, *size_average=None*, *reduce=None*, *reduction='mean'*)


        
          
input = torch.tensor(y_pred)  
target = torch.tensor(y_true)  
output = torch.nn.functional.l1_loss(input, target)  
print(output)  
# tensor(0.0500, dtype=torch.float64)

L2 Loss (Mean-Squared Error，MSE)

均方误差与平均绝对误差有一些惊人的相似之处。它不是像平均绝对误差那样计算预测值和真实值之间的绝对差，而是计算平方差。这样做，相对较大的差异会受到更多的惩罚，而相对较小的差异则会受到更少的惩罚。然而，MSE 被认为在处理异常值和噪声方面不如 MAE 稳健。

picture.image

PyTorch 的 MSE 损失函数如下。

torch.nn.MSELoss(*size_average=None*, *reduce=None*, *reduction='mean'*)

torch.nn.functional.mse_loss(*input*, *target*, *size_average=None*, *reduce=None*, *reduction='mean'*)

Smooth L1 Loss

Smooth L1 损失函数通过结合了 MSE 和 MAE 的优点，来自 Fast R-CNN 论文。

picture.image

当真实值和预测值之间的绝对差低于时，使用 MSE 损失。MSE 损失曲线是一条连续曲线，这意味着每个损失值处的梯度都会变化，并且可以在任何地方可导。然而，对于非常大的损失值，梯度爆炸，使用平均绝对误差，当绝对差变得大于并且消除了潜在的梯度爆炸时，其梯度对于每个损失值几乎是恒定的。

PyTorch 使用示例：

torch.nn.SmoothL1Loss(*size_average=None*, *reduce=None*, *reduction='mean'*, *beta=1.0*)

torch.nn.functional.smooth_l1_loss(*input*, *target*, *size_average=None*, *reduce=None*, *reduction='mean'*, *beta=1.0*)


        
          
import torch  
  
loss = torch.nn.SmoothL1Loss()  
input = torch.randn(3, 5, requires_grad=True)  
target = torch.randn(3, 5)  
output1 = loss(input, target)  
output2 = torch.nn.functional.smooth_l1_loss(input, target)  
  
print('output1: ',output1)  
print('output2: ',output2)   
  
# output1:  tensor(0.7812, grad\_fn=<SmoothL1LossBackward0>)  
# output2:  tensor(0.7812, grad\_fn=<SmoothL1LossBackward0>)

0-1 Loss

0-1 Loss 它直接比较预测值和真实值是否一致，不一致为1，一致为0。

其中，表示真实值，表示预测值。0-1 Loss 本质是计算分类错误的个数，函数也不可导，在需要反向传播的学习任务中，无法被使用。

Cross-Entropy Loss

Cross-Entropy（交叉熵）是理解分类损失函数的基础，给定两个离散分布和，交叉熵的公式如下：

可以近似将交叉熵理解为衡量两个分布的距离，假设两个分布表示真实值，表示预测值，通过优化模型参数，降低和之间的距离，当距离趋近0，预测值也在逼近真实值。

通常，当使用交叉熵损失时，我们的网络的输出是 Softmax 层，这确保了神经网络的输出为概率值（介于0-1之间的值），公式如下。

其中，表示全连接输出向量的第个元素，表示 softmax 网络输出的概率分布中第个元素。

PyTorch 使用示例：

torch.nn.CrossEntropyLoss(*weight=None*, *size_average=None*, *ignore_index=- 100*, *reduce=None*, *reduction='mean'*, *label_smoothing=0.0*)

torch.nn.functional.cross_entropy(*input*, *target*, *weight=None*, *size_average=None*, *ignore_index=- 100*, *reduce=None*, *reduction='mean'*, *label_smoothing=0.0*)


        
          
loss = torch.nn.CrossEntropyLoss()  
inputs = torch.randn(3, 5, requires_grad=True)  
target = torch.empty(3, dtype=torch.long).random_(5)  
output1 = loss(inputs, target)  
output2 = torch.nn.functional.cross_entropy(inputs, target)  
  
print('output1: ',output1)  
print('output2: ',output2)   
# output1:  tensor(1.4797, grad\_fn=<NllLossBackward0>)  
# output2:  tensor(1.4797, grad\_fn=<NllLossBackward0>)

请注意，打印输出中的梯度函数 grad_fn=<NllLossBackward0> 是负对数似然损失（NLL）。这实际上揭示了交叉熵损失将负对数似然损失与 log-softmax 层相结合。

Negative Log-Likelihood Loss

Negative Log-Likelihood (NLL) 损失函数的工作原理与交叉熵损失函数非常相似。表达式如下：

注：NLL要求网络最后一层使用 softmax 作为激活函数。通过softmax将输出值映射为每个类别的概率值。

如前面在交叉熵部分所述，交叉熵损失结合了 log-softmax 层和 NLL 损失，以获得交叉熵损失的值。这意味着NLL损失可以通过使神经网络的最后一层是 log-softmax 而不是正常的 softmax 获得交叉熵损失值。

PyTorch 使用示例：

torch.nn.NLLLoss(*weight=None*, *size_average=None*, *ignore_index=- 100*, *reduce=None*, *reduction='mean'*)

torch.nn.functional.nll_loss(*input*, *target*, *weight=None*, *size_average=None*, *ignore_index=- 100*, *reduce=None*, *reduction='mean'*)


        
          
import torch.nn as nn  
import torch.nn.functional as F  
  
m = nn.LogSoftmax(dim=1)  
loss = nn.NLLLoss()  
  
# input is of size N x C = 3 x 5  
inputs = torch.randn(3, 5, requires_grad=True)  
  
# each element in target has to have 0 <= value < C  
target = torch.tensor([1, 0, 4])  
  
output1 = loss(m(inputs), target)  
output2 = F.nll_loss(m(inputs), target)  
   
print('output1: ',output1)  
print('output2: ',output2)

Weighted Cross Entropy Loss

加权交叉熵损失（Weighted Cross Entropy Loss ）是给较少的类别加权重。公式如下:

其中与表示类别的权重。不同类别的权重和它们的数量成比例，比如一个类别a的数据有 20 个，另一个类别 b 的数据有 80 个，那么 a 的权重是，b的权重就是。

Focal Loss

出自何凯明的《Focal Loss for Dense Object Detection》，Focal Loss 可以解决数据之间的样本不均衡和样本难易程度不一样。比如在病变图像的识别，一方面有病变的图片数量比较少，无病变的图片数量多；另一方面，有病变的图像中的病变区域占整张图片是比较小，特征难以学习，病变图片难以识别。

Focal Loss 在原始的 Cross Entropy Loss 上改进，先回顾一下 Cross Entropy Loss：

picture.image

为了解决数据不均衡，Focal Loss 添加权重，公式如下：

picture.image

Focal Loss是对简单的数据添加一个小的权重，让损失函数更加关注困难的数据训练，即添加了，公式如下：

picture.image

这样二分类的 Focal Loss 表达式如下：

picture.image

多分类的 Focal Loss 表达式如下：

picture.image

参数和 () 分别用于控制正/负样本的比例，其取值范围为[0, 1]。的取值一般可通过交叉验证来选择合适的值。
参数称为聚焦参数，其取值范围为，目的是通过 减少易分类 样本的权重，从而使模型在训练时更专注于困难样本。当时，Focal Loss 就退化为交叉熵损失，越大，对易分类样本的惩罚力度就越大。

PyTorch 自定义 Focal Loss 损失函数的示例如下：


        
          
import torch  
import torch.nn as nn  
import torch.nn.functional as F  
  
  
class FocalLoss(nn.Module):  
    def \_\_init\_\_(self, weight=None, gamma=2., reduction='none'):  
        nn.Module.__init__(self)  
        self.weight = weight  
        self.gamma = gamma  
        self.reduction = reduction  
      
    def forward(self, input\_tensor, target\_tensor):  
        log_prob = F.log_softmax(input_tensor, dim=-1)  
        prob = torch.exp(log_prob)  
        return F.nll_loss(  
            ((1 - prob) ** self.gamma) * log_prob,  
            target_tensor,  
            weight=self.weight,  
            reduction=self.reduction  
        )

使用自定义的损失函数：


        
          
weights = torch.ones(7)  
loss = FocalLoss(gamma=2, weight=weights)  
inputs = torch.randn(3, 7, requires_grad=True)  
target = torch.empty(3, dtype=torch.long).random_(7)  
print('inputs:', inputs)  
print('target:', target)  
output = loss(inputs, target)  
print('output:', output)

结果如下：


        
          
inputs: tensor([[ 0.5688, -1.1567,  1.8231, -0.2724, -1.2335,  0.9968,  0.9643],  
        [-0.1824,  0.3010,  1.7070,  0.8743,  0.4528,  1.4306, -2.3726],  
        [-2.5052, -0.3744,  0.3718, -1.5129, -2.0459,  1.0374, -0.5433]],  
       requires_grad=True)  
target: tensor([6, 5, 1])  
output: tensor([1.1599, 0.7283, 1.6924], grad_fn=<NllLossBackward0>)