部署架构已固定，如何无痛涨点？ - 文章 - 开发者社区

大家好，我是灿视。昨天好兄弟发了一篇文章，文章是：“ 教你如何更好的训练目标检测模型 ”。

安利各位可以点击链接，去看看好兄弟总结的那篇文章。该篇文章主要从 数据输入的数据增强 、 网络结构 、 检测head 等多方面进行了实验。其次，也主要针对 分辨率大小 以及 rpn 中 proposals的个数 做了一些对比实验。

详细解读 | Google与Waymo教你如何更好的训练目标检测模型！！！

这里，我又回忆起之前面试的一家公司，该家公司问过我一道题。“ 如一个分类网络，当我们网络的部署架构已经固定了，那么我们怎么增强网络的性能，来涨点呢？比如分类的呢？ ”

部署架构固定，如何增强网络提取特征的能力？

为什么会有这样的需求呢？

因为需要考虑到部署！

举例子，针对细粒度分类，在大类中进行小类的分类，像是自然界的猫狗分类。那么对狗再进行二哈，柯基这样的分类，就属于细粒度分类。而我目前做的农业领域的分类工作，就是属于细粒度的任务。

目前有一些方案，是采用的方式来找到区域，再进行各种骚操作。但是目前有一个问题，这种算或者的计算量很大，如果有很多这样的操作，部署这块就很难做了。

同理，如模型集成等工作，都是在部署方面增加很多工作量的工作，或者就是无法进行的工作。因此，现在我更多的只能在这样的网络中进行操作，这样的话，在部署的时候就会省去很多事情。

在这里我也想给出一些我自己的一些建议，帮助各位可以更好的摸鱼，毕竟不改部署这块的代码，还是很舒服的。

通用的Trick方案

这个可以主要参考李沐老师的《 Bag of tricks for Convolution Neural Network 》。

使用低精度浮点数(如)和适当大点的 来训练.

之所以用多个图片组成来训练，为的是提升计算的并行性和减少数据通信带来的。但是，太大的 也不一定好，因为对于凸优化问题，优化过程收敛的速率（而不是收敛的结果！）会随着 的增大而降低。即，同样数量的前提下，大 训练的模型验证机精度要比小 训练的模型差。

当采用大的时候，也需要做相对应的参数修改 ：

初始学习率线性缩放
随着的增大，可以线性增大学习率。对于初始学习率，这里有一个参考的公式。
学习率预热到初始学习率
在网络开始学习的时候，权重更新的梯度很大，如果一开始就用很大的学习率，很可能造成训练中数值不稳定。所以，最开始训练的时候，应该用比较小的学习率，然后训练过程稳定之后切回到最初的初始学习率。假设我们用前个做这个预热的过程， 初始学习率设置为，那么在的时候，学习率设置为 ，让学习率逐步增大到初始学习率；
BN层用零初始化
的残差块中，非等量映射的那一支的最后一层可能是层。在常规初始化策略中，这个层的和一般分别初始化为和，但是如果把初始化为的话，那样 残差块就相当于没有了，整个网络的层数也相当于减少了。
取消bias decay
只在卷积层和全连接层做weight decay(L2正则化) ，其他参数，例如和层的和不要做正则。

余弦学习率 或者 Step decay 训练中学习率的调整策略是至关重要的。按照一定比例在一定数量后缩小学习率的这种策略最为常见如果一轮中有个，那么在第个时的学习率
标签平滑(label smoothing)
可以参考我们之前的文章：

[ picture.image

理论与举例，说明标签平滑有效！](http://mp.weixin.qq.com/s?__biz=MzkzNDIxMzE1NQ==&mid=2247487607&idx=1&sn=2eb425cfee69db57053fe1e8f10e617a&chksm=c241f33bf5367a2df33558af537bdd2330b4d085cfcb55d178b08c4e85608df9336e274969f5&scene=21#wechat_redirect)

l1、l2正则
可以参考我们之前的文章：

[ picture.image

你够全面了解L1与L2正则吗？](http://mp.weixin.qq.com/s?__biz=MzkzNDIxMzE1NQ==&mid=2247486609&idx=1&sn=76fc19df55a2d7f605b8203ccd5f101c&chksm=c241efddf53666cb70ebb3b44a40154778c94c2a2c2fda74418ce994805b99bd2ee644becf68&scene=21#wechat_redirect)

各种方案

同样，可以参考我们之前的文章，目前我们也整理了最全的方案。 出一道题给你们做一下吧，是一个师妹面试腾讯的时候被问到的 。与能否一起用？为什么呢？

  **答案在下面的两篇文章中** ：

[ picture.image

我丢！算法岗必问！建议收藏！](http://mp.weixin.qq.com/s?__biz=MzkzNDIxMzE1NQ==&mid=2247486110&idx=1&sn=d2a9c6a4c80fb6f9894618440e03aff9&chksm=c241e9d2f53660c41550815280236435dbce523bb04188511e03ddd203333b3c10013fef65dc&scene=21#wechat_redirect)

[ picture.image

我再丢！算法必问！](http://mp.weixin.qq.com/s?__biz=MzkzNDIxMzE1NQ==&mid=2247486168&idx=1&sn=ce1920cf5ff9a78d2f24c4ad65632b06&chksm=c241e994f5366082788b9fe906e93fec7e92da55a66aaab342d98eaa7c39ef599114dc0740b5&scene=21#wechat_redirect)

数据输入角度

1. 传统的数据增强

这个基本上就是训练网络的标配了，像是 随机左右翻转 、 随机上下翻转 、 对比度增强 、 旋转一定角度 的操作了。


        
          
from torchvision import transforms  
trans = transforms.Compose([  
    transforms.CenterCrop(10),  
    transforms.ToTensor(),  
])  
  
'''  
其他常用的数据增强的方法：  
Resize：把给定的图片resize到given size  
Normalize：Normalized an tensor image with mean and standard deviation  
ToTensor：convert a PIL image to tensor (H*W*C) in range [0,255] to a torch.Tensor(C*H*W) in the range [0.0,1.0]  
ToPILImage: convert a tensor to PIL image  
Scale：目前已经不用了，推荐用Resize  
CenterCrop：在图片的中间区域进行裁剪  
RandomCrop：在一个随机的位置进行裁剪  
RandomHorizontalFlip：以0.5的概率水平翻转给定的PIL图像  
RandomVerticalFlip：以0.5的概率竖直翻转给定的PIL图像  
RandomResizedCrop：将PIL图像裁剪成任意大小和纵横比  
Grayscale：将图像转换为灰度图像  
RandomGrayscale：将图像以一定的概率转换为灰度图像  
FiceCrop：把图像裁剪为四个角和一个中心  
Pad：填充  
ColorJitter：随机改变图像的亮度对比度和饱和度。  
'''

2. mixup/cutmix/Mosaic/Cutout/Random Erase等数据增强方式

mixup

picture.image

先上代码，之前面试的时候，被要求过写这些代码 ：


        
          
def mixup\_data(x, y, alpha=1.0, use\_cuda=True):  
  
    '''Compute the mixup data. Return mixed inputs, pairs of targets, and lambda'''  
    if alpha > 0.:  
        lam = np.random.beta(alpha, alpha)  
    else:  
        lam = 1.  
    batch_size = x.size()[0]  
    if use_cuda:  
        index = torch.randperm(batch_size).cuda()  
    else:  
        index = torch.randperm(batch_size)  
  
    mixed_x = lam * x + (1 - lam) * x[index,:]  
    y_a, y_b = y, y[index]  
    return mixed_x, y_a, y_b, lam

由代码看出，_ 并不是同时取出两个，而是取一个batch，并将该中的样本顺序打乱（），然后再进行加权求和。

具体步骤如下：

对于输入的一个的待测图片，我们将其和随机抽取的图片进行融合，融合比例为，得到混合张量；
第1步中图片融合的比例是之间的随机实数，符合分布，相加时两张图对应的每个像素值直接相加，即 + *_；将1中得到的混合张量传递给得到输出张量，
随后计算损失函数时，我们针对两个图片的标签分别计算损失函数，然后按照比例进行损失函数的加权求和，即 = * _ + _；
反向求导更新参数。

参考代码：


        
          
criterion = nn.CrossEntropyLoss()  
optimizer = optim.SGD(net.parameters(), lr=base_learning_rate, momentum=0.9, weight_decay=args.decay)  
  
def mixup\_criterion(y\_a, y\_b, lam):  
    return lambda criterion, pred: lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)  
      
""" 训练 """  
  
def train(epoch):  
    print('\nEpoch: %d' % epoch)  
    net.train()  
    train_loss = 0  
    correct = 0  
    total = 0  
    for batch_idx, (inputs, targets) in enumerate(trainloader):  
        if use_cuda:  
            inputs, targets = inputs.cuda(), targets.cuda()  
        """ generate mixed inputs, two one-hot label vectors and mixing coefficient """  
        inputs, targets_a, targets_b, lam = mixup_data(inputs, targets, args.alpha, use_cuda)         
        inputs, targets_a, targets_b = Variable(inputs), Variable(targets_a), Variable(targets_b)  
        outputs = net(inputs)  
        """ 计算loss """  
        loss_func = mixup_criterion(targets_a, targets_b, lam)  
        loss = loss_func(criterion, outputs)  
        """ 更新梯度 """  
          
        optimizer.zero_grad()  
        loss.backward()  
        optimizer.step()

cutmix
先看下效果是什么样：

picture.image

流程：对一对图片做操作，简单讲就是随机生成一个裁剪框,裁剪掉图的相应位置，然后用图片相应位置的放到图中被裁剪的区域形成新的样本，计算损失时同样采用加权求和的方式进行求解。

两张图合并操作定义如下：

其中，表示二进制矩阵，表示从两个图像中删除并填充的位置，实际就是用来标记需要裁剪的区域和保留的区域，裁剪的区域值均为，其余位置为。是所有元素都是的矩阵，维度大小与M相同。图像和组合得到新样本，最后两个图的标签也对应求加权和。权值同一样是采用分布随机得到，的值为论文中取值为，这样加权系数就服从分布。主要是用另一个训练图像中的补丁替换了图像区域，并且比生成了更多的本地自然图像。


        
          
import numpy as np  
"""输入为：样本的size和生成的随机lamda值"""  
def rand\_bbox(size, lam):  
    W = size[2]  
    H = size[3]  
    """论文里的公式2，求出B的rw,rh"""  
    cut_rat = np.sqrt(1. - lam)  
    cut_w = np.int(W * cut_rat)  
    cut_h = np.int(H * cut_rat)  
  
    # uniform  
    """论文里的公式2，求出B的rx,ry（bbox的中心点）"""  
    cx = np.random.randint(W)  
    cy = np.random.randint(H)  
   
 # np.clip限制大小  
 """限制B坐标区域不超过样本大小"""  
    bbx1 = np.clip(cx - cut_w // 2, 0, W)  
    bby1 = np.clip(cy - cut_h // 2, 0, H)  
    bbx2 = np.clip(cx + cut_w // 2, 0, W)  
    bby2 = np.clip(cy + cut_h // 2, 0, H)  
  
    return bbx1, bby1, bbx2, bby2  
      
for i, (input, target) in enumerate(train_loader):  
        # measure data loading time  
        data_time.update(time.time() - end)  
  
        input = input.cuda()  
        target = target.cuda()  
  
        r = np.random.rand(1)  
        if args.beta > 0 and r < args.cutmix_prob:  
            # generate mixed sample  
            """设定lamda的值，服从beta分布"""  
            lam = np.random.beta(args.beta, args.beta)  
            rand_index = torch.randperm(input.size()[0]).cuda()  
            """获取batch里面的两个随机样本 """  
            target_a = target  
            target_b = target[rand_index]  
            """获取裁剪区域bbox坐标位置 """  
            bbx1, bby1, bbx2, bby2 = rand_bbox(input.size(), lam)  
            """将原有的样本A中的B区域，替换成样本B中的B区域"""  
            input[:, :, bbx1:bbx2, bby1:bby2] = input[rand_index, :, bbx1:bbx2, bby1:bby2]  
            # adjust lambda to exactly match pixel ratio  
            """根据剪裁区域坐标框的值调整lam的值 """   
            lam = 1 - ((bbx2 - bbx1) * (bby2 - bby1) / (input.size()[-1] * input.size()[-2]))  
            # compute output  
            """计算模型输出 """  
            output = model(input)  
            """计算损失 """  
            loss = criterion(output, target_a) * lam + criterion(output, target_b) * (1. - lam)  
        else:  
            # compute output  
            output = model(input)  
            loss = criterion(output, target)

Mosaic

数据增强方法主要思想是将四张图片进行随机裁剪，再拼接到一张图上作为训练数据。这样做的好处是丰富了图片的背景，并且四张图片拼接在一起变相地提高了*，在进行的时候也会计算四张图片，所以对本身*不是很依赖。

如图所示，就是的一种情况，根据边界，需要修改代码。 picture.image

这里，又是大佬们的代码：


        
          
  
import os  
import numpy as np  
import cv2  
import random  
import math  
   
def random\_affine(img, targets=(), degrees=10, translate=.1, scale=.1, shear=10, border=0):  
    # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(.1, .1), scale=(.9, 1.1), shear=(-10, 10))  
    # https://medium.com/uruvideo/dataset-augmentation-with-random-homographies-a8f4b44830d4  
    # targets = [cls, xyxy]  
   
    height = img.shape[0] + border * 2  
    width = img.shape[1] + border * 2  
   
    # Rotation and Scale  
    R = np.eye(3)  
    a = random.uniform(-degrees, degrees)  
    # a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations  
    s = random.uniform(1 - scale, 1 + scale)  
    # s = 2 ** random.uniform(-scale, scale)  
    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(img.shape[1] / 2, img.shape[0] / 2), scale=s)  
   
    # Translation  
    T = np.eye(3)  
    T[0, 2] = random.uniform(-translate, translate) * img.shape[0] + border  # x translation (pixels)  
    T[1, 2] = random.uniform(-translate, translate) * img.shape[1] + border  # y translation (pixels)  
   
    # Shear  
    S = np.eye(3)  
    S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)  
    S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)  
   
    # Combined rotation matrix  
    M = S @ T @ R  # ORDER IS IMPORTANT HERE!!  
    if (border != 0) or (M != np.eye(3)).any():  # image changed  
        img = cv2.warpAffine(img, M[:2], dsize=(width, height), flags=cv2.INTER_LINEAR, borderValue=(114, 114, 114))  
   
    # Transform label coordinates  
    n = len(targets)  
    if n:  
        # warp points  
        xy = np.ones((n * 4, 3))  
        xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1  
        xy = (xy @ M.T)[:, :2].reshape(n, 8)  
        # create new boxes  
        x = xy[:, [0, 2, 4, 6]]  
        y = xy[:, [1, 3, 5, 7]]  
        xy = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T  
   
        # # apply angle-based reduction of bounding boxes  
        # radians = a * math.pi / 180  
        # reduction = max(abs(math.sin(radians)), abs(math.cos(radians))) ** 0.5  
        # x = (xy[:, 2] + xy[:, 0]) / 2  
        # y = (xy[:, 3] + xy[:, 1]) / 2  
        # w = (xy[:, 2] - xy[:, 0]) * reduction  
        # h = (xy[:, 3] - xy[:, 1]) * reduction  
        # xy = np.concatenate((x - w / 2, y - h / 2, x + w / 2, y + h / 2)).reshape(4, n).T  
   
        # reject warped points outside of image  
        xy[:, [0, 2]] = xy[:, [0, 2]].clip(0, width)  
        xy[:, [1, 3]] = xy[:, [1, 3]].clip(0, height)  
        w = xy[:, 2] - xy[:, 0]  
        h = xy[:, 3] - xy[:, 1]  
        area = w * h  
        area0 = (targets[:, 3] - targets[:, 1]) * (targets[:, 4] - targets[:, 2])  
        ar = np.maximum(w / (h + 1e-16), h / (w + 1e-16))  # aspect ratio  
        i = (w > 4) & (h > 4) & (area / (area0 * s + 1e-16) > 0.2) & (ar < 10)  
   
        targets = targets[i]  
        targets[:, 1:5] = xy[i]  
   
    return img, targets  
   
def load\_image(img\_files, index,img\_size=640):  
    # loads 1 image from dataset, returns img, original hw, resized hw  
    path = img_files[index]  
    img = cv2.imread(path)  # BGR  
    assert img is not None, 'Image Not Found ' + path  
    h0, w0 = img.shape[:2]  # orig hw  
    r = img_size / max(h0, w0)  # resize image to img\_size  
    if r != 1:  # always resize down, only resize up if training with augmentation  
        img = cv2.resize(img, (int(w0 * r), int(h0 * r)), interpolation=1)  
    return img, (h0, w0), img.shape[:2]  # img, hw\_original, hw\_resized  
   
def load\_mosaic(img\_files,index,img\_size,labels):  
    # loads images in a mosaic  
    labels4 = []  
    s = img_size  
    xc, yc = [int(random.uniform(s * 0.5, s * 1.5)) for _ in range(2)]  # mosaic center x, y  
    indices = [index] + [random.randint(0, len(labels) - 1) for _ in range(3)]  # 3 additional image indices  
    for i, index in enumerate(indices):  
        # Load image  
        img, _, (h, w) = load_image(img_files, index)  
        # place img in img4  
        if i == 0:  # top left  
            img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles  
            x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)  
            x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)  
        elif i == 1:  # top right  
            x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc  
            x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h  
        elif i == 2:  # bottom left  
            x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)  
            x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, max(xc, w), min(y2a - y1a, h)  
        elif i == 3:  # bottom right  
            x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)  
            x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)  
        img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]  
        padw = x1a - x1b  
        padh = y1a - y1b  
   
        # Labels  
        x = labels[index]  
        labels_ = x.copy()  
        if x.size > 0:  # Normalized xywh to pixel xyxy format  
            labels_[:, 1] = w * (x[:, 1] - x[:, 3] / 2) + padw  
            labels_[:, 2] = h * (x[:, 2] - x[:, 4] / 2) + padh  
            labels_[:, 3] = w * (x[:, 1] + x[:, 3] / 2) + padw  
            labels_[:, 4] = h * (x[:, 2] + x[:, 4] / 2) + padh  
        labels4.append(labels_)  
   
    # Concat/clip labels  
    if len(labels4):  
        labels4 = np.concatenate(labels4, 0)  
        # np.clip(labels4[:, 1:] - s / 2, 0, s, out=labels4[:, 1:])  # use with center crop  
        np.clip(labels4[:, 1:], 0, 2 * s, out=labels4[:, 1:])  # use with random\_affine  
   
    # Augment  
    # img4 = img4[s // 2: int(s * 1.5), s // 2:int(s * 1.5)]  # center crop (WARNING, requires box pruning)  
    img4, labels4 = random_affine(img4, labels4,  
                                  degrees=0.0,  
                                  translate=0.0,  
                                  scale=0.5,  
                                  shear=0.0,  
                                  border=-s // 2)  # border to remove  
    return img4, labels4  
   
img_files = []#图片路径列表  
labelss = []#[类别，中心点x的归一化,中心点y的归一化,w的归一化,h的归一化]  
for dir in os.listdir(r'G:\dirsfirst\data\dataset\labels'):  
    name = dir.replace('txt','jpg')  
    path = os.path.join(r'G:\dirsfirst\data\dataset\images',name)  
    img_files.append(path)  
    with open(os.path.join(r'G:\dirsfirst\data\dataset\labels',dir)) as f:  
        l = [0]  
        data = f.read().strip().split(' ')  
        for d in data[1:]:  
            l.append(float(d))  
        labelss.append(np.array([l]))  
for index in range(len(labelss)):  
    img, labels =load_mosaic(img_files=img_files,index=index,img_size=640,labels=labelss)  
    for label in labels:  
        x1 = int(label[1])  
        y1 = int(label[2])  
        x2 = int(label[3])  
        y2 = int(label[4])  
        cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)  
    cv2.imshow('',img)  
    cv2.waitKey(0)

Cutout

主要效果如图所示：

picture.image

参考代码：


        
          
class Cutout(object):  
    """Randomly mask out one or more patches from an image.  
    Args:  
        n\_holes (int): Number of patches to cut out of each image.  
        length (int): The length (in pixels) of each square patch.  
    """  
    def \_\_init\_\_(self, n\_holes, length):  
        self.n_holes = n_holes  
        self.length = length  
  
    def \_\_call\_\_(self, img):  
        """  
        Args:  
            img (Tensor): Tensor image of size (C, H, W).  
        Returns:  
            Tensor: Image with n\_holes of dimension length x length cut out of it.  
        """  
        h = img.size(1)  
        w = img.size(2)  
  
        mask = np.ones((h, w), np.float32)  
  
        for n in range(self.n_holes):  
         # (x,y)表示方形补丁的中心位置  
            y = np.random.randint(h)  
            x = np.random.randint(w)  
  
            y1 = np.clip(y - self.length // 2, 0, h)  
            y2 = np.clip(y + self.length // 2, 0, h)  
            x1 = np.clip(x - self.length // 2, 0, w)  
            x2 = np.clip(x + self.length // 2, 0, w)  
  
            mask[y1: y2, x1: x2] = 0.  
  
        mask = torch.from_numpy(mask)  
        mask = mask.expand_as(img)  
        img = img * mask  
  
        return img