基于改进S3FD网络的人脸检测算法

李宇豪; 吕晓琪; 谷宇; 张明; 李菁

doi:10.7510/jgjs.issn.1001-3806.2021.06.008

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名

邮箱

手机号码

标题

留言内容

验证码

基于改进S3FD网络的人脸检测算法

内蒙古科技大学信息工程学院内蒙古自治区模式识别与智能图像处理重点实验室，包头 014010

内蒙古工业大学信息工程学院，呼和浩特 010051

大连海事大学信息科学技术学院，大连 116026

作者简介: 李宇豪(1996-)，男，硕士研究生，主要从事数字图像处理方面的研究.

通讯作者: 吕晓琪, lxiaoqi@imut.edu.cn ;

基金项目:

教育部“春晖计划”合作科研项目教外司留1383号

国家自然科学基金资助项目 62001255

国家自然科学基金资助项目 61841204

内蒙古自治区高等学校科学研究项目 NJZY145

内蒙古自治区自然科学基金资助项目 2015MS0604

内蒙古自治区自然科学基金资助项目 2019MS06003

国家自然科学基金资助项目 61771266

中图分类号: TP391.41

Face detection algorithm based on improved S3FD network

Inner Mongolia Key Laboratory of Pattern Recognition and Intelligent Image Processing, College of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China

Institute of Information Engineering, Inner Mongolia University of Technology, Hohhot 010051, China

School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China

Corresponding author: LÜ Xiaoqi, lxiaoqi@imut.edu.cn ;

CLC number: TP391.41

摘要: 为了解决人脸检测存在小目标人脸携带的特征信息少且相对较为模糊，导致检测难度较高的问题，采用将尺度不变人脸检测器(S3FD)网络与通道和空间注意力机制相结合的网络作为主干，在通道和空间上建立了特征之间的权重关系，强化特征提取能力，将原本S3FD所输出的特征图经扩大感受野后进行上采样，使得上一层特征图的输出包含了下一层特征图的特征。结果表明, widerface数据集的3个不同等级的验证集的平均精准率分别为95.0%，93.7%，86.4%，与原S3FD相比分别提高了1.3%，1.2%，0.5%。本文中提出的算法在人脸检测中具有较好的检测效果。

Abstract: In face detection, the small target face carries less feature information and is relatively fuzzy, which leads to higher detection difficulty. In order to solve this problem, a novel algorithm was designed. The network that combines the single shot scale-invariant face detector (S3FD) network with the channel and the spatial attention mechanism was used as the backbone, and the channel and the spatial establish the weight relationship between the features, which strengthens the feature extraction ability. Then, the receptive field of the original S3FD output feature map was expanded and then up-sampled, so that the output of the feature map of the previous layer includes the features of the feature map of the next layer. Result: The average precision (AP) values of this algorithm on the three different levels of widerface verification datasets are 95.0%, 93.7%, and 86.4%, respectively, which are increased by 1.3%, 1.2%, and 0.5% compared with the original S3FD. The algorithm proposed in this paper has a better detection effect in face detection.

Key words:

methods

AP/%

easy

medium

hard

S3FD

93.7

92.5

85.9

S3FD-CBAM

93.8

92.7

86.4

S3FD-double-stage

92.9

91.3

82.1

S3FD-double-stage-max

92.8

91.1

82.6

S3FD-CBAM-RFBNet

95.0

93.7

86.4

methods

AP/%

easy

medium

hard

S3FD^[18]

93.7

92.5

85.9

ZHU^[20]

94.9

93.3

86.1

IS3FD^[31]

94.4

93.5

87.9

IS3FD-fast^[31]

93.1

92.4

86.4

WANG^[32]

93.0

87.3

58.3

S3FD-CBAM-RFB(proposed method)

95.0

93.7

86.4

基于改进S3FD网络的人脸检测算法

通讯作者: 吕晓琪, lxiaoqi@imut.edu.cn;

作者简介: 李宇豪(1996-)，男，硕士研究生，主要从事数字图像处理方面的研究

1. 内蒙古科技大学信息工程学院内蒙古自治区模式识别与智能图像处理重点实验室，包头 014010

2. 内蒙古工业大学信息工程学院，呼和浩特 010051

3. 大连海事大学信息科学技术学院，大连 116026

收稿日期: 2020-12-22

录用日期: 2021-01-23

网络出版日期: 2021-11-25

基金项目: 教育部“春晖计划”合作科研项目教外司留1383号国家自然科学基金资助项目 62001255国家自然科学基金资助项目 61841204内蒙古自治区高等学校科学研究项目 NJZY145内蒙古自治区自然科学基金资助项目 2015MS0604内蒙古自治区自然科学基金资助项目 2019MS06003国家自然科学基金资助项目 61771266

关键词:

全文HTML

引言

人脸检测是计算机视觉领域主要关注点，在人脸对齐、人脸分析、人脸识别和人脸跟踪等领域有着广泛的应用。给定一幅图像，人脸检测的目标是确定是否存在任何人脸。如果有，则返回每个人脸的边界框。早期人脸检测采用模板匹配技术与手工特征，其代表性成果是ROWLEY等人提出的方法^[1-2]。ROWLEY的方法有不错的精度，但由于分类器的设计和密集滑动窗口的采样设计导致速度太慢。2001年, VIOLA和JONES^[3]设计了Viola-Jones(VJ)框架。之后部分可变形模型(deformable part models，DPM)被一些工作^[4-6]用来处理人脸检测任务。随着机器学习以及深度学习在计算机视觉领域的不断发展，且卷积神经网络(convolutional neural networks，CNN)在ImageNet分类任务^[7]取得进展，利用神经网络进行目标检测逐步成为主流技术方法^[8-9]。级联网络(CascadeCNN)^[10]延续了VIOLA和JONES的想法, 实现了不错的检测效果。QIN等人^[11]提出整体训练CascadeCNN，从而实现优化。Faceness利用人脸属性分类的卷积神经网络来检测部分遮挡的人脸^[12]。多任务卷积神经网络(multi-task convolutional neural networks, MTCNN)^[13]进一步拓展了级联CNN的思想, 还有部分采用目标检测领域的思想，将整体网络结构分为P-Net, R-Net和O-Net。JIANG等人^[14]将faster R-CNN^[15]应用到人脸检测取得较好的效果。基于上下文的多尺度区域的卷积神经网络(contextual multi-scale region-based CNN, CMS-R-CNN)^[16]采用上下文信息帮助进行人脸检测从而提升了性能，将网络整体分为上部分采用区域生成网络(region proposal network，RPN)进行检测, 下部分结合人体的人脸长宽等信息进行检测。WAN等人^[17]将faster R-CNN与困难负样本优化结合取得了不错的效果。在含有大场景的少量的人脸图像中准确率已取得较高水平, 但在小场景的大量的人脸图像中准确率较低。针对多尺度人脸检测，2017年, ZHANG提出尺度不变人脸检测器(single shot scale-invariant face detector，S3FD)，结合了faster R-CNN中的RPN和SSD^[19](single shot multibox detector)中的锚点机制。2018年, ZHU等人^[20]、LI等人^[21]将人脸检测准确性进一步提高，同年GU等人^[22]利用多尺度的目标检测在3-D方面取得了不错的效果。2019年, LI和TANG等人^[23]提出PyramidBox⁺⁺。

针对人脸检测中因目标特征信息较少、检测困难的问题，本文中提出一种基于S3FD的人脸检测算法。使用以视觉几何组(visual geometry group, VGG)VGG16^[24]网络为主干的S3FD^[18]进行特征提取，在整体网络结构的中间加入卷积注意力模块^[25](convolutional block attention module，CBAM)，利用空间和通道注意力来进行不同特征通道和特征图位置的权重系数，对后续的网络机构进行特征强化，最后利用特征金字塔网络^[26](feature pyramid network，FPN)结构的形式，将其中FPN网络结构的卷积部分替换成为感受野模块(receptive field block，RFB)网络结构，利用RFB^[27]中扩大感受野的效果作用于不同尺度下的特征图，从而减少目标特征信息的丢失，并且在不大量增加额外参数量的情况下完成人脸检测。

3. 结论

为解决在小目标人脸的检测准确率相对较差的问题，本文中利用S3FD网络结构为主干进行针对小目标人脸的检测算法研究。为解决在卷积过程中目标特征丢失以及图像清晰度不够的问题，该方法将S3FD、通道和空间注意力机制、RFB扩大感受野模块和多尺度特征金字塔相结合，减少目标特征在卷积过程中的损失，使得网络的整体准确率得到提高。本文中模型在参量方面没有进行大范围的增加，使得网络模型在检测过程中具有较好的速度，在一定程度上满足了对目标检测快速处理的需求。人脸检测中检测速度和检测准确率是一对矛盾体，如何在提高速度的同时提高人脸检测的准确率一直是人脸检测这一领域的重点，随着网络结构的不断优化以及硬件设备的不断提高，各种理论的不断发展和成熟，在不久的将来更快更准的人脸检测将会出现，本文中模型在保证一定速度的基础之上依旧保持较好的准确度，可促进更快更准的人脸检测技术的发展。

参考文献 (32)

[1]	ROWLEY H A, BALUJA S, KANADE T. Neural network-based face detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2002, 20(1): 23-38.
[2]	ROWLEY H A, BALUJA S, KANADE T. Rotation invariant neural network-based face detection[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 1998: 38-44.
[3]	VIOLA P, JONES M J. Robust real-time face detection[J]. International Journal of Computer Vision, 2004, 57(2): 137-154. doi: 10.1023/B:VISI.0000013087.49260.fb
[4]	MATHIAS M, BENENSON R, PEDERSOLI M, et al. Face detection without bells and whistles[C]// European Conference on Computer Vision. Zurich, Switzerland: ECCV, 2014: 720- 735.
[5]	YAN J, LEI Z, WEN L, et al. The fastest deformable part model for object detection[C]//Computer Vision and Pattern Recognition. New York, USA: IEEE, 2014: 2497-2504.
[6]	ZHU X, RAMANAN D. Face detection, pose estimation, and landmark localization in the wild[C]//Computer Vision and Pattern Re-cognition. New York, USA: IEEE, 2012: 2879-2886.
[7]	KRIZHEVSKY A, SUTSKEVER I, HINTON G, et al. ImageNet classification with deep convolutional neural networks ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386
[8]	LEI H L, ZHANG B H. Crowd count algorithm based on multi-model deep convolution network integration[J]. Laser Technology, 2019, 43(4): 476-481(in Chinese).
[9]	CHEN Q X, WU W Ch, ASKAR H. Detection algorithm based on multi-scale spotted target modeling[J]. Laser Technology, 2020, 44(4): 520-524(in Chinese).
[10]	LI H, LIN Z, SHEN X, et al. A convolutional neural network cascade for face detection[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2015: 5325-5334.
[11]	QIN H, YAN J, LI X, et al. Joint training of cascaded CNN for face detection[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2016: 3456-3465.
[12]	YANG S, LUO P, LOY C C, et al. From facial parts responses to face detection: A deep learning approach[C]//International Confe-rence on Computer Vision. New York, USA: IEEE, 2015: 3676-3684.
[13]	ZHANG K, ZHANG Z, LI Z, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503. doi: 10.1109/LSP.2016.2603342
[14]	JIANG H, LEARNED-MILLER E. Face detection with the faster R-CNN[C]//Automatic Face and Gesture Recognition. New York, USA: IEEE, 2017: 650-657.
[15]	REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6): 1137-1149.
[16]	ZHU C, ZHENG Y, LUU K, et al. CMS-RCNN: Contextual multi-scale region-based CNN for unconstrained face detection[EB/OL]. (2016-06-16)[2020-12-22]. https://arxiv.org/pdf/1606.05413.pdf.
[17]	WAN S, CHEN Z, ZHANG T, et al. Bootstrapping face detection with hard negative examples[EB/OL]. (2016-08-07)[2020-12-22]. https://arxiv.org/pdf/1608.02236.pdf.
[18]	ZHANG S, ZHU X, LEI Z, et al. S3fd: Single shot scale-invariant face detector[C]//International Conference on Computer Vision. New York, USA: IEEE, 2017: 192-201.
[19]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]// European Conference on Computer Vision. Amsterdam, The Netherlands: Springer International Publishing, 2016: 21-37.
[20]	ZHU C, TAO R, LUU K, et al. Seeing small faces from robust anchor's perspective[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2018: 5127-5136.
[21]	LI J, WANG Y, WANG C, et al. DSFD: Dual shot face detector[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2019: 5055-5064.
[22]	GU Y, LU X Q, YANG L D, et al. Automatic lung nodule detection using a 3D deep convolutional neural network combined with a multi-scale prediction strategy in chest CTs[J]. Computers in Biology and Medicine, 2018, 103: 220-231. doi: 10.1016/j.compbiomed.2018.10.011
[23]	LI Z, TANG X, HAN J, et al. PyramidBox^{+ +}: High performance detector for finding tiny face[EB/OL]. (2019-08-07)[2020-12-22]. https://arxiv.org/pdf/1904.00386.pdf.
[24]	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2020-12-22]. https://arxiv.org/pdf/1409.1556.pdf.
[25]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]// European Conference on Computer Vision. Munich, Germany: ECCV, 2018: 3-19.
[26]	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Computer Vision and Pattern Recognition. New York, USA: IEEE, 2017: 936-944.
[27]	LIU S, HUANG D, WANG Y. Receptive field block net for accurate and fast object detection[J]. Lecture Notes in Computer Science, 2018, 11215: 404-419.
[28]	SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Computer Vision and Pattern Recognition. New York, USA: IEEE, 2014: 1-9.
[29]	BA J, MNIH V, KAVUKCUOGLU K. Multiple object recognition with visual attention[EB/OL]. (2015-04-23)[2020-12-22]. https://arxiv.org/pdf/1412.7755.pdf.
[30]	YANG S, LUO P, LOY C C, et al. Wider face: A face detection benchmark[C]// Computer Vision and Pattern Recognition. New York, USA: IEEE, 2016: 5525-5533.
[31]	ZHANG H Sh. Research on key algorithms of face detection and face recognition in video surveillance[D]. Chengdu: University of Electronic Science and Technology of China, 2019: 11-48(in Chinese).
[32]	WANG M, SU H S, LIU G H, et al. Classroom face detection algorithm based on convolutional neural network[J]. Laser & Optoelectronics Progress, 2019, 56(21): 211501(in Chinese).

[1]	吴家洲 , 刘君 , 施佳文 , 张胜 . 激光焊缝图像分割与颜色识别方法研究. 激光技术, 2023, 47(5): 723-728. doi: 10.7510/jgjs.issn.1001-3806.2023.05.022
[2]	贺锋涛 , 吴倩倩 , 杨祎 , 张建磊 , 王炳辉 , 张依 . 基于深度学习的激光散斑图像识别技术研究. 激光技术, 2024, 48(3): 443-448. doi: 10.7510/jgjs.issn.1001-3806.2024.03.022
[3]	马飞 , 王梓璇 , 刘思雨 . 基于深度图像先验的高光谱图像去噪方法. 激光技术, 2024, 48(3): 379-386. doi: 10.7510/jgjs.issn.1001-3806.2024.03.013
[4]	陶昕辰 , 朱涛 , 黄玉玲 , 高恬曼 , 何博 , 吴迪 . 基于DDR GAN的低质量图像增强算法. 激光技术, 2023, 47(3): 322-328. doi: 10.7510/jgjs.issn.1001-3806.2023.03.006
[5]	常颖 , 常大俊 . 改进型卷积神经网络焊点缺陷识别算法研究. 激光技术, 2020, 44(6): 779-783. doi: 10.7510/jgjs.issn.1001-3806.2020.06.023
[6]	江天 , 沈会良 , 杨冬晓 , 刘建军 , 邹哲 . 基于模糊局部信息C均值的太赫兹图像目标检测. 激光技术, 2015, 39(3): 289-294. doi: 10.7510/jgjs.issn.1001-3806.2015.03.001
[7]	李文龙 , 戈海龙 , 任远 , 成巍 . 图像处理技术在激光熔池温度检测的应用. 激光技术, 2018, 42(5): 599-604. doi: 10.7510/jgjs.issn.1001-3806.2018.05.004
[8]	孟宇帆 , 张丽君 , 何长涛 , 肖婧 , 阳宁静 , 冯国英 , 韩敬华 . 基于图像处理的激光清洗飞机蒙皮特性和机制研究. 激光技术, 2024, 48(3): 303-311. doi: 10.7510/jgjs.issn.1001-3806.2024.03.002
[9]	刘逸飞 , 苏亚 , 姚晓天 , 崔省伟 , 杨丽君 , 周聪聪 , 何松 . OCT无创血糖检测图像处理最优化方法研究. 激光技术, 2023, 47(2): 178-184. doi: 10.7510/jgjs.issn.1001-3806.2023.02.004
[10]	张明淳 , 牛春晖 , 刘力双 , 刘洋 . 用于无人机探测系统的红外小目标检测算法. 激光技术, 2024, 48(1): 114-120. doi: 10.7510/jgjs.issn.1001-3806.2024.01.018
[11]	朱金辉 , 张宝华 , 谷宇 , 李建军 , 张明 . 基于双邻域对比度的红外小目标检测算法. 激光技术, 2021, 45(6): 794-798. doi: 10.7510/jgjs.issn.1001-3806.2021.06.020
[12]	陈树越 , 刘金星 , 丁艺 . 基于小波变换的红外与X光图像融合方法研究. 激光技术, 2015, 39(5): 685-688. doi: 10.7510/jgjs.issn.1001-3806.2015.05.021
[13]	热孜亚·艾沙 , 艾斯卡尔·艾木都拉 . 基于元学习的红外弱小点状目标跟踪算法. 激光技术, 2021, 45(3): 396-404. doi: 10.7510/jgjs.issn.1001-3806.2021.03.023
[14]	王其华 , 叶苗 . 基于裂变自举粒子滤波的红外目标跟踪处理. 激光技术, 2011, 35(1): 141-144. doi: 10.3969/j.issn.1001-3806.2011.01.038
[15]	孙越娇 , 雷武虎 , 胡以华 , 赵楠翔 , 任晓东 . 基于视觉显著模型的遥感图像舰船快速检测. 激光技术, 2018, 42(3): 379-384. doi: 10.7510/jgjs.issn.1001-3806.2018.03.017
[16]	李昌海 , 叶玉堂 , 沈淦松 , 徐伟 , 叶涵 , 姚景昭 . 基于图像轮廓分析的LCD线路缺陷检测. 激光技术, 2013, 37(2): 207-210. doi: 10.7510/jgjs.issn.1001-3806.2013.02.017
[17]	朱文艳 , 李莹 , 袁飞 , 冯少彤 , 聂守平 . 基于JPEG压缩编码的小波域多图像融合算法研究. 激光技术, 2014, 38(3): 425-430. doi: 10.7510/jgjs.issn.1001-3806.2014.03.031
[18]	陈锋 , 张闻文 , 虞文俊 , 陈钱 , 顾国华 . 基于小波变换的EMCCD微光图像融合算法. 激光技术, 2014, 38(2): 155-160. doi: 10.7510/jgjs.issn.1001-3806.2014.02.003
[19]	刘艾琳 . 基于提升小波变换的红外图像双重滤波算法. 激光技术, 2015, 39(4): 545-548. doi: 10.7510/jgjs.issn.1001-3806.2015.04.026
[20]	刘凯 , 王慧琴 , 吴萌 , 相建凯 , 卢英 . 基于提升小波的古铜镜X光图像融合方法研究. 激光技术, 2020, 44(1): 113-118. doi: 10.7510/jgjs.issn.1001-3806.2020.01.020

留言板