基于数据增强的高原鼠兔目标检测

陈海燕，甄霞军，赵涛涛

基于数据增强的高原鼠兔目标检测

陈海燕，甄霞军，赵涛涛

(兰州理工大学计算机与通信学院，甘肃兰州 730050)

针对基于卷积神经网络的高原鼠兔目标检测模型在实际应用中缺乏训练数据的问题，提出一种前景与背景融合的数据增强方法：首先对训练集数据进行前景和背景的分离，对分离的前景作图像随机变换，对分离的背景用背景像素随机覆盖，得到前景集合和背景集合；
从前景集合和背景集合中随机选取前景和背景，进行像素加融合；
再从训练集中随机选取样本，将标注边界框区域采用剪切粘贴方法融合到训练图像的随机位置，得到增强数据集。采用两阶段的弱监督迁移学习训练模型，第一阶段在增强数据集上对模型预训练；
第二阶段在原始训练集上微调预训练模型，得到检测模型。对自然场景下高原鼠兔目标检测的结果表明：在相同的试验条件下，基于前景与背景融合数据增强的目标检测模型的平均精度优于未数据增强、Mosaic和CutOut数据增强的目标检测模型；
基于前景、背景融合数据增强的目标检测模型的最优平均精度为78.4%，高于Mosaic的72.60%、Cutout的75.86%和Random Erasing的77.4%。

高原鼠兔；
样本缺乏；
数据增强；
迁移学习；
样本平衡

高原鼠兔目标检测是对其进行种群数量统计及研究种群动态变化的基础[1–2]。基于卷积神经网络(CNN)的目标检测模型需要大量的训练数据[3–10]，而自然场景下的高原鼠兔多分布于高原山地，采集高原鼠兔图像困难，导致基于CNN的高原鼠兔目标检测模型缺乏训练数据[11]。

在图像分类和目标检测领域，普遍认为模型的性能与训练样本数量的对数成正比[12]。而由于隐私和安全性等因素的影响，使样本的获取难度大，导致模型缺乏训练数据，引起过拟合[13]。针对缺乏训练数据导致的模型过拟合问题，通常采用早停止(early stopping)、正则化(regularization)、选择丢弃(dropout)和数据增强(data augmentation)等方法解决[14]。数据增强的目的是对训练数据集扩充，从而降低模型的过拟合[15–16]。图像变换是一种常用的数据增强方法，通常采用图像的仿射、扭曲和随机剪切等变换来扩大数据集[15]。常见的基于图像变换的数据增强方法主要有Cutout[17]、Random Erasing[18]、MixUp[19]、Mosaic[3]等。这些基于图像变换的方法生成的图像与原数据集图像有相同的语义信息，对提高训练数据的多样性和泛化能力有限[20]。SHIN等[21]在研究少样本的船舶检测时，使用前景提取和粘贴的方法生成新样本，提高前景目标的位置分布模式，在扩大船舶数据集时提高数据集的多样性，虽然在特定数据集中表现出较好的性能，但并未考虑数据集中前景和背景类别不平衡的问题。当数据集中前景和背景类别不平衡时会导致基于anchor机制的目标检测模型在训练中存在正负anchor不平衡的问题。文献[20]指出，通过改变前景目标在图像中的位置，可以丰富前景目标的位置分布模式，提高数据集的多样性。文献[22]指出，增加图像中小目标的数量，可以增加与前景目标相交的先验框数量，平衡训练中的正负样本。

受到文献[21]和文献[22]中前景提取和粘贴增强方法的启发，针对模型缺乏训练数据的问题，笔者提出一种前景和背景融合的数据增强方法，记为FBFAP。在此方法的基础上，结合两阶段的弱监督迁移学习训练模型，第1阶段在增强数据集上预训练模型，第2阶段在原始数据集上微调预训练模型，得到检测模型，然后用最终的目标检测模型对高原鼠兔进行检测。

研究[22]表明，在前景和背景融合中，随着单张图像上融合的前景数量增多，模型的精度反而会下降，融合前景数量为2时模型达到最优。以此为依据，将融合的前景目标数设置为2。

以文献[9]中的Faster R–CNN为基础模型，结构如图1所示，采用两阶段的弱监督迁移学习方法进行训练。第1阶段，在FBFAP增强的数据集上对模型预训练，优化器为Adam，初始学习率为0.001，每5个epoch，学习率衰减为原来的0.1；
beta1和beta2分别是0.9和0.999；
训练30个epoch, BatchSize为4。第2阶段，在原始数据集上微调预训练的模型，参数设置与第1阶段一致。试验平台为搭载NVIDIA Titan V 显卡的图形工作站，CUDA和CUDNN版本分别是10.1.168和7.6.1，操作系统为Ubuntu LTS 16.04。模型实现框架为Pytorch 1.1和Torchvision 0.3，使用Python 3.5.2编程。

为了评价FBFAP方法的性能，采用查准率()召回率()平均精度(AP)作为评价指标。

图1　检测模型结构

3.1　数据采集

高原鼠兔数据为在青藏高原东北部(101˚35ˊ36˝～ 102˚58ˊ15˝E、33˚58ˊ21˝～34˚48ˊ48˝N)甘南草原采集的高原鼠兔图像，总共1100张，其中900张作为训练集，200张作为测试集，分别记为D–train和D–test。数据集格式与Pascal VOC[23]数据集格式一致。为了便于研究，将训练集随机分成大小为100、200和600的训练集，分别记为DI–train，DII–train，DIII–train。使用数据增强方法得到的数据集记为增强数据集。

3.2　原始数据集上的检测结果

表1所示是在DI–train、DII–train、DIII–train和D–train训练集上训练的基于Faster R–CNN的目标检测模型的和AP值。可以发现，训练数据集越大，模型的性能越好，进一步表明大数据集对深度卷积神经网络训练的必要性。由于DI–train和DII–train上的性能较差，因此在DI–train和DII–train上进行研究更具代表性，最后实现对D–train增强。

表1　原始数据集上的查准率与召回率和平均精度

3.3　对DI–train, DII–train和D–train数据增强后的检测结果

使用FBFAP方法分别对DI–train和DII–train增强，并使用两阶段的弱监督迁移学习训练基于Faster R–CNN的目标检测模型。表2所示为DI–train增强数据集大小对模型检测性能的影响。结果表明：FBFAP方法能够有效提高模型的和AP；
DII–train增强数据集大小对模型检测性能的影响类似，且在DI–train和DII–train的增强数据集是600时取得最优AP，分别为66.74%和78.40%。

对D–train使用FBFAP数据增强，增强数据集大小为600，此时的值为49.62，值为93.81，AP值为83.34，可以发现FBFAP提高了模型的召回率和平均精度。

表2　DI–train增强数据的查准率与召回率和平均精度

为了进一步说明FBFAP方法的有效性，使用 Mosaic、Cutout和Random Erasing方法分别对DI–train和DII–train增强，在相同的增强样本数量和相同的试验条件下训练检测模型，并将他们的检测结果与本检测结果相比较。

图2为不同数据增强方法对高原鼠兔目标检测的示例，红色矩形框表示目标检测的结果。可以看出，相对于未数据增强、Mosaic数据增强、Cutout数据增强、Random Erasing数据增强的目标检测方法，基于FBFAP的高原鼠兔目标检测方法更准确。

[1] 陈海燕，陈刚琦．基于语义分割的高原鼠兔目标检测[J]．华中科技大学学报(自然科学版)，2020，48(7)：7–12．

[2] 陈海燕，陈刚琦，张华清．基于SegNet模型的高原鼠兔的图像分割[J]．湖南农业大学学报(自然科学版)，2020，46(6)：749–752.

[3] BOCHKOVSKIY A，WANG C Y，LIAO H Y M . YOLOv4：optimal speed and accuracy of object detection [EB/OL].[2021–03–29].https://arxiv.org/pdf/2004.10934v1.pdf.

[4] GIRSHICK R，DONAHUE J，DARRELL T，et al．Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition Columbus．New York：IEEE，2014：580–587.

[5] HE K M，GKIOXARI G，DOLLÁR P，et al．Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision. New York：IEEE，2017：2980–2988.

[6] LIU W，ANGUELOV D，ERHAN D，et al．SSD：Single Shot MultiBox Detector[C]//Proceedings of the 14th European Conference on Computer Vision．Berlin：Springer，2016：21–37.

[7] SERMANET P，EIGEN D，ZHANG X et al．OverFeat：Integrated Recognition，Localization and Detection using Convolutional Networks [EB/OL]．[2021–03–29]．https:// arxiv.org/pdf/1312.6229.pdf.

[8] REDMON J，DIVVALA S，GIRSHICK R，et al．You only look once：unified，real-time object detection[C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition．New York：IEEE，2016：779–788.

[9] REN S Q，HE K M，GIRSHICK R，et al．Faster R-CNN：towards real-time object detection with region proposal networks[J]．IEEE transactions on pattern analysis and machine intelligence，2017，39(6)：1137–1149.

[10] TIAN Z，SHEN C H，CHEN H，et al．FCOS：fully convolutional one-stage object detection[C]//2019 IEEE/ CVF International Conference on Computer Vision (ICCV)．New York：IEEE，2017：9626–9635.

[11] 张爱华，王帆，陈海燕.基于改进CV模型的目标多色彩图像分割[J]．华中科技大学学报(自然科学版)，2018，46(1)：63–66．

[12] SUN C，SHRIVASTAVA A，SINGH S，et al．Revisiting unreasonable effectiveness of data in deep learning era [C]//2017 IEEE International Conference on Computer Vision(ICCV)．New York：IEEE，2017：843–852.

[13] LU J，GONG P，YE J，et al．Learning from very few samples：a survey[EB/OL]．[2021–03–29]．https://arxiv. org/pdf/2009.02653.

[14] WU Q F，CHEN Y P，MENG J．DCGAN-based data augmentation for tomato leaf disease identification[J]. IEEE Access，2020，8：98716–98728.

[15] SHORTEN C，KHOSHGOFTAAR T M．A survey on image data augmentation for deep learning[J]．Journal of Big Data 2019，6：60.

[16] TAKAHASHI R，MATSUBARA T，UEHARA K．Data augmentation using random image cropping and patching for deep CNNs[J]．IEEE Transactions on Circuits and Systems for Video Technology，2020，30(9)：2917–2931.

[17] DEVRES T，TAYLOR G W．Improved regularization of convolutional neural networks with cutout[EB/OL]. [2021–03–29]．https://arxiv.org/pdf/1708.04552.

[18] ZHONG Z，ZHENG L，KANG G，et al．Random Erasing Data Augmentation [EB/OL]．[2021–03–29]．https://arxiv. org/pdf/1708.04896.

[19] ZHANG H Y，MOUSTAPHA C，YANN N D，et al. MixUp：beyond empirical risk minimization[EB/OL]. [2021–03–29]．https://arxiv.org/pdf/1710.09412.

[20] BANG S，BAEK F，PARK S，et al．Image augmentation to improve construction resource detection using genera- tive adversarial networks，cut-and-paste，and image transformation techniques[J]．Automation in Construction，2020，115：103198.

[21] SHIN H C，LEE K I，LEE C E．Data augmentation method of object detection for deep learning in maritime image[C]//2020 IEEE International Conference on Big Data and Smart Computing(BigComp)．New York：IEEE，2020：463–466.

[22] KISANTAL M，WOJNA Z，MURAWSKI J，et al. Augmentation for small object detection[EB/OL]. [2021–03–29]．https://arxiv.org/pdf/1902.07296.

[23] EVERINGHAM M，VAN GOOL L，WILLIAMS C K I，et al．The pascal visual object classes (voc) challenge[J]. International Journal of Computer Vision，2010，88(2)：303–338.

Target detection ofbased on the data augmentation

CHEN Haiyan，ZHEN Xiajun，ZHAO Taotao

(School of Computer and Communication, Lanzhou University of Technology, Lanzhou Gansu 730050, China)

Aiming at the problem that thetarget detection model based on convolutional neural network lacks training data in practical application, a data augmentation method is proposed by the fusion of foreground and background. Firstly, separate the foreground and the background of the training data, with image transforming the separated foreground randomly and covering the separated background by background pixels, to obtain the foreground set and the background set, respectively. The foreground and background are randomly selected from the foreground set and the background set, respectively and are fused based on pixel addition. Then randomly select a sample from the training set, and use the cut-and-paste method to fuse the labeled bounding box area of the selected sample to the training images" random positions to obtain an augmented data set. A two-stage weakly supervised transfer learning was used as the train the model. The first stage pre-trains the model dependent on the augmented data set. The second stage fine-tunes the pre-training model to obtain the detection model. Under the same experimental conditions, the experimental results of the target detection ofin natural scenes show that the average accuracy of the target detection model based on this method is better than that of the target detection model without data augmentation, Mosaic, and Cutout data augmentation. The optimal AP of the target detection model based on data augmentation method by the fusion of foreground and background is 78.4%, which is higher than 72.6% of Mosaic method, 75.86% of Cutout method, and 77.4% of Random Erasing method.

; lack of samples; data augmentation; transfer learning; sample balance

TP319

1007-1032(2022)04-0496-05

陈海燕，甄霞军，赵涛涛．基于数据增强的高原鼠兔目标检测[J]．湖南农业大学学报(自然科学版)，2022，48(4)：496–500．

CHEN H Y，ZHEN X J，ZHAO T T．Target detection ofbased on the data augmentation[J]．Journal of Hunan Agricultural University(Natural Sciences)，2022，48(4)：496–500．

http://xb.hunau.edu.cn

2021–05–07

2022–03–25

国家自然科学基金项目(62161019、62061024)

陈海燕(1978—)，女，甘肃陇西人，博士，副教授，主要从事图像处理研究，chenhaiyan@sina.com

责任编辑：罗慧敏

英文编辑：吴志立

猜你喜欢集上前景高原关于短文本匹配的泛化性和迁移性的研究分析计算机研究与发展(2022年1期)2022-01-19我国旅游房地产开发前景的探讨建材发展导向(2021年6期)2021-06-09基于互信息的多级特征选择算法计算机应用(2020年12期)2020-12-31四种作物北方种植有前景今日农业(2020年17期)2020-12-15高原往事家教世界·创新阅读(2020年4期)2020-06-03迸射当代工人·精品C(2020年1期)2020-05-20高原往事家教世界·创新阅读(2020年1期)2020-05-11高原往事家教世界·创新阅读(2020年2期)2020-04-07离岸央票：需求与前景中国外汇(2019年11期)2019-08-27量子纠缠的来历及应用前景太空探索(2016年10期)2016-07-10

推荐访问:高原增强检测

3.1 数据采集

3.2 原始数据集上的检测结果

3.3 对DI–train, DII–train和D–train数据增强后的检测结果

3.1　数据采集

3.2　原始数据集上的检测结果

3.3　对DI–train, DII–train和D–train数据增强后的检测结果