使用卷积神经网络应用面部识别用于进入访问控制

论文标题

使用卷积神经网络应用面部识别用于进入访问控制

Application of Facial Recognition using Convolutional Neural Networks for Entry Access Control

论文作者

Ankile, Lars Lien, Heggland, Morgan Feet, Krange, Kjartan

论文摘要

本文的目的是通过使用卷积神经网络来设计解决面部识别问题的解决方案，以便将解决方案应用于基于摄像机的家庭入门访问控制系统中。更具体地说，本文着重于解决监督的分类问题，即将人们的图像视为输入，并将图像中的人分类为作者之一。提出了两种方法：（1）建立和培训一个名为Woodnet的神经网络，从头开始，（2）通过利用在Imagenet数据库中预先培训的网络来利用转移学习，并将其调整到该项目的数据和类中。为了训练模型以识别作者，创建了包含15万张图像的数据集，对作者和其他图像进行了平衡。从视频和图像增强技术中提取图像对数据集创建有用。结果是两个模型，以高精度将数据集中的个体分类，在持有的测试数据上达到了超过99％的精度。预先训练的模型拟合的速度明显快于木网，并且似乎可以更好地概括。但是，这些结果带有一些警告。由于数据集的编译方式以及高精度，因此有理由相信这些模型在某种程度上符合数据。数据汇编方法的另一个结果是，测试数据集可能与培训数据没有足够的不同，从而限制了其验证模型概括的能力。但是，利用基于网络磁盘的系统中的模型，实时对面部进行分类，显示出令人鼓舞的结果，并表明这些模型至少在某些类别中相当良好（请参阅随附的视频）。

The purpose of this paper is to design a solution to the problem of facial recognition by use of convolutional neural networks, with the intention of applying the solution in a camera-based home-entry access control system. More specifically, the paper focuses on solving the supervised classification problem of taking images of people as input and classifying the person in the image as one of the authors or not. Two approaches are proposed: (1) building and training a neural network called WoodNet from scratch and (2) leveraging transfer learning by utilizing a network pre-trained on the ImageNet database and adapting it to this project's data and classes. In order to train the models to recognize the authors, a dataset containing more than 150 000 images has been created, balanced over the authors and others. Image extraction from videos and image augmentation techniques were instrumental for dataset creation. The results are two models classifying the individuals in the dataset with high accuracy, achieving over 99% accuracy on held-out test data. The pre-trained model fitted significantly faster than WoodNet, and seems to generalize better. However, these results come with a few caveats. Because of the way the dataset was compiled, as well as the high accuracy, one has reason to believe the models over-fitted to the data to some degree. An added consequence of the data compilation method is that the test dataset may not be sufficiently different from the training data, limiting its ability to validate generalization of the models. However, utilizing the models in a web-cam based system, classifying faces in real-time, shows promising results and indicates that the models generalized fairly well for at least some of the classes (see the accompanying video).

下载PDF全文

下载文献需遵守相关版权规定

论文标题