Kai Wang's FIT Research Blog | 论文集注: [Machine Learning | Computer Vision | DeepID] Deep Learning Face Representation from Predicting 10,000 Classes / 从预测10000个类别中深度学习人脸表达

Overview / 概览

The main task for this paper is human face verification, it's a sub-problem of verification. Different with classification, verification problem is given two instance and find out did this two instances are from the same class or not.

At the end of this paper, they claim that this algorithm got 97.45% verification accuracy on LFW face data set(human accuracy is 97.53%).

Different with other verification algorithms, the whole process been separated into two parts - high-level features generate and face verification. This paper focus on how to use deep learning algorithm to generate high-level features (ConvNet, DeepID) rather than the verification algorithm self.

本文的主要的研究领域是验证问题的子集 - 人脸验证. 与分类问题不同, 验证是的输入数据是两个不同的实例, 需要算法判断出这两个实例是否属于同一个集合/类.

论文在最后声称该分类算法在LFW数据集上的的准确率已经达到97.45%, 仅次于人类的验证准确率97.53%.

与其他验证算法不同, 在本篇文章中整个过程被作者分成了两个部分 - 高级特征的生成和人脸验证. 论文将着重讨论如何生成高级特征而不是验证算法本身.

DeepID (High-level Feature Generate / 高级特征的生成)

DeepID basically is a set of high-level feature vectors. Each high-level feature vector is generated by a deep model (ConvNet, taken from the last hidden layer). The structure is shown in this figure:

简单来说, DeepID 是一个高级特征向量的集合, 其中每一个高级特征向量的生成依赖于一个深度模型 (论文使用卷积神经网络作为深度模型, DeepID 即网络隐含层最后一层), 结构图下图所示:

To lead the deep models generate more effective features, the deep model is been trained as a multi-class classifier to identify each instance (which class/face) rather than a binary classifier to verify two instances (same class/face or not). The reason is they want make full use of super learning capacity ("adds a strong regularization to ConvNets", "shared hidden representations that can classify all the identities well") to get good generalization ability high-level features and avoid over-fitting to a small subset.

For each deep model, they been trained with different parameter (network structure) and with different parts of original data. In this paper, there are 60 ConvNets and each face image are generated to 60 different face patches (with 10 regions, three scales, and RGB or gray channels).

为了引导深度模型生成更有效的特征, 深度模型被训练为一个多类分类器去识别每一个实例(属于哪个类/人脸)而不是一个二类分类器去验证两个实例 (是否属于同一类). 这样做的原因是在于作者想充分的利用深度学习的潜力 ("向卷积神经网络添加一个强壮的正规化调节","共享可以对所有类型良好分类的隐含表示法") 去生成具有良好泛化能力的高级特征并避免深度模型过拟合于一个小的子集.

对于每个深度模型, 他们被以不同的参数 (网络结构) 和相同数据的不同部分进行训练. 本篇论文里使用了60个卷积神经网络, 每一个人脸数据也被分成了60个不同的子数据 (10个区域, 3个缩放, 和RGB图或灰度图)

Deep ConvNet

Base on the ConvNet's potential properties, each hidden layer is a new set of features. In this paper, they decrease the number of nodes layer by layer, force the ConvNet to summary the information and get the more global and high-level features at the top layers. This paper use four convolutional layers (with max-polling) to extract high-level features.

基于卷积神经网络的隐含属性, 每一个隐含层其实是一组新的特征. 在论文中, 作者将每层所含神经元数目逐层递减, 迫使卷积神经网络去总结信息, 并在高层上获得更加全局和高级的特性. 论文使用了四层卷积层 (包含 max-polling) 去提取高级特征,

Input Layer / 输入层: "39 x 31 x k for rectangle patches, and 31 x 31 x k for square patches and k = 3 for color patches and k = 1 for gray patches"

DeepID Layer / DeepID 层: "The dimension of DeepID layer is fixed to 160" and "DeepID layer is fully connected to both the thrid and fourth convolutional layers (after maxpolling)"

Output Layer / 输出层: "The dimension of output layer varies according to the number of classes it predicts". They use n-way softmax to predict probability distribution over n classes.

$$y_{i} = \frac{exp\left ( y_{i}^{'} \right )}{\sum_{j=1}^{n}exp\left ( y_{i}^{'} \right )}$$

$$y_{i}{'} = \sum_{j=1}^{160}x_{i}\cdot w_{i,j} + b_{j}$$

"The ConvNet is leaned by minimizing: $\log{y_{t}}$ with the $t$-th target class"

Hidden Neurons / 隐含神经元: ReLU ( $y = max\left ( 0, x \right )$) function is been used for this ConvNet

Face Regions (Input Data)

Total 60 face patches with ten regions, three scales and the mid-point of the two mouth centers.

Top: Ten medium scales, left global region, right local region, centered around five facial landmarks (eye centers, nose tip and mouth corners)

Bottom: 3 scales (shown two patches)

Face Verification

Two different techniques (algorithms) are applied to the face verification task. The first on is Joint Bayesian, base on the paper's description, "Joint Bayesian has been highly successful for face verification". The second is Neural Network, as the comparison group "to see if other models can also learn from the extracted features and how much the features and a good face verification model contribute to the performance, respectively".

Joint Bayesian

Neural Network

Reference

[1] Y. Sun, X. Wang and X. Tang, "Deep Learning Face Representation from Predicting 10,000 Classes", 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891 - 1898, 2014.

Kai Wang's FIT Research Blog | 论文集注

页面

2016年1月12日星期二

[Machine Learning | Computer Vision | DeepID] Deep Learning Face Representation from Predicting 10,000 Classes / 从预测10000个类别中深度学习人脸表达