Kai Wang's FIT Research Blog | 论文集注: 2016

2016年9月26日星期一

[Machine Learning | Incremental Learning] Learn++.NC: Combining Ensemble of Classifiers With Dynamically Weighted Consult-and-Vote for Efficient Incremental Learning of New Classes

Overview / 概览

This paper introduce an incremental learning algorithm called Learn++ .NC (New class). It's an upgrade version of Learn++. Learn++ use a two layered structure handle incremental learning problem. The first layer is weak-learner, each time a new data branch comes in, Learn++ will training a new set of weak-learner. Second layer is linear voting with uniform weight.

本片论文介绍了一个名为Learn++ .NC 的增量学习算法, 该算法为Learn++ 的升级版. Learn++ 使用了两层结构来解决增量学习的问题. 第一层是弱学习层, 该层包含多个弱学习器, 每当有新数据集到来, Learn++ 将会训练一批新的弱学习器. 第二层为一个统一权重的线性投票层.

But Learn++ has some weakness, such as "outvoting". Base on the paper said ""

[1] Muhlbaier, M., Topalis, A. and Polikar, R. (2009). Learn++.NC: Combining Ensemble of Classifiers With Dynamically Weighted Consult-and-Vote for Efficient Incremental Learning of New Classes. IEEE Transactions on Neural Networks, 20(1), pp.152-168.

2016年5月4日星期三

[Project] 3D Design using 3ds Max - Ancient China Garden /中国木构古建筑 - 园林

[Project] 3D Design using 3ds Max - Ancient China Main Hall /中国木构古建筑 - 大殿

2016年5月2日星期一

[Project] Ray-Tracing Engine

Basic Ray-tracing

Basic Ray-tracing (Detail Image)

Basic Ray-tracing with Linear regression AI

2016年2月6日星期六

Summary of some class incremental learning related papers / 一些类增量学习相关论文摘要

Abstract

Incremental learning is a sub research area of machine learning. Most research work are focus on instance incremental learning (Training data increase) which is the number of classes is fixed but the training data is increased by time. Class incremental learning means the number of classes may increase by time and the learning algorithm should classify all new and old classes without re-train the entire algorithm. But currently this problem is not well studied by previous paper.

Algorithms

In this summary, we introduce 12 papers, from 2001 to 2014. By analysis their approach, we can separate those class incremental learning papers into 3 classes - single classifier [6, 7, 8], two layer multi classifier [1, 2, 3, 4, 5] and tree structure multi classifier. [9, 10, 11, 12].

Single Classifier

Two Layer Multi Classifier

Learn++ [1] is a typical two layer class incremental learning algorithm, they use the first layer (they call it weak learner) to learning how to classify each instance, then like Ada-boost algorithm, they use vote strategy to make the final decision.

Different with Ada-boost algorithm, all the weak classifier are not trained at the same time and same class number. Each time a new set of data comes in, they will generate and training a new set of weak classifier it with current data set and current class number. So the lasted set of weak classifier can classify all classes for currently.

But with the time goes on, the number of weak classifier will increase, and they will need more new classifiers to cover the wrong decision (generated by previous classifier).

Learn++ .NC [2] is a new version of Learning++ algorithm to fix those problem. They came up a strategy to avoid the previous classifiers make a wrong decision when the current instance they haven't learned.

Paper [3, 4, 5] gives other versions of Learn++ algorithm to solve unbalance data, class changes and concept drift ($P\left ( C \right )$ changes, $P\left ( F \right )$ changes and $P\left ( C | F\right )$ changes, $C$ is denote class space distribution, $F$ is denote feature space distribution)

Tree Structured Multi Classifier

Different with two layer learning algorithm, [9, 10, 11, 12] gives a new set of algorithms with tree structure. The tree structure can potentially give a hierarchical analysis for each classes (like cat and dog are sub-class of animal), and also the tree structure can keep the new class update in a small area (sub-tree) to optimize the training speed.

Paper [9] using several binary support vector machine (SVM) to build a binary decision tree. Each binary SVM will learning one specific class (one vs all classifier). The instance been marked as "other" will send to next level. When a new class comes, a new binary SVM will be created and trained as a new root node.

["C" or other]

/ \

["A or B"] "C"

/ \

"A" "B"

Different with paper [9], paper [10, 11] gives a more general binary tree structure, They use random forest algorithm to training multiple decision trees. For each decision tree, unsupervised split algorithm (different tree with different parameter) is applied on branch node to separate each instance and the leaf node with supervised multi-classification algorithm will make the final decision.

NCMFs (nearest class mean forests) algorithm [10] use NCM algorithm to separate each instance an make classification. Paper [11] gives more detail analysis and apply the structure on SVM (SVMFs).

For new classes comes, they have four different strategies to extend the tree - ULS (only update leaf node), IGT (generate new layer with old split function), RTST (new split function and new layer) and RUST (update part split function and generate new layer).

Paper [12] also use split function at branch node and classify at leaf node, But instead of using binary split, they decide to calculate a similarity matrix and use it to separate each instance to one of multiple branches. Each node in this tree structure is a convolutional neural network (with softmax output layer). All the CNN hidden network (without output layer) have the same structure (layer size and node size at each layer).

When a new class comes, they build two new classifiers, the first one is just extend current leaf node, and second is extend a new layer and modify current leaf node (classification) to a branch node (split). After training those new classifiers, they will let them compete and choose the best strategy.

To accelerate the training progress, they copy the previous CNN weights (without output layer) to the new CNN network and create a new softmax output layer by the requirement (branch node by the number of branches, leaf node by the number of classes) with random weights. So the new CNN network will be trained on previous knowledge rather from zero.

Annotate

Modify a existing multi-class algorithm to solve the new class problem may not the optimal solution. The reason we want to handle new class because training a classifier from ground zero is time consuming, We want accelerate this process and let the algorithm learning new knowledge base on previous experiences. The final goal is to develop an algorithm can keep learning and the accuracy maintain on some threshold.

For general (no class incremental) learning algorithm, the algorithm capability is fixed when it been initialized. So modify the original algorithm to fit new class without change the algorithm structure\parameter will not work very well. The classification accuracy will decrease when the number of classes increase. When the number of class or the complexity of decision surface over the algorithm's capability, the classification accuracy will dramatic decrease. If we set the algorithm capability in a high level at beginning, the whole training speed will slow (and structural redundancy for some algorithm), lose the significant advantage compared with non-incremental learning algorithm.

对于一般的非类增量学习算法, 算法的潜在承载能力在一开始算法初始化的时候就已经固定了. 所以在不改变算法本身结构或参数的情况下, 将其调节为一个类增量算法并不是最有的解决方案. 算法分类的准确率会随着类的增加而降低, 当触及当前算法承载能力的时候, 分类准确率会大大降低(预测). 而在一开始就将算法承载能力设大的话, 算法整体训练速度就会降低, 对于非增量学习算法就失去了明显的优势.

Reference

[1]R. Polikar, L. Upda, S. Upda and V. Honavar, "Learn++: an incremental learning algorithm for supervised neural networks", IEEE Trans. Syst., Man, Cybern. C, vol. 31, no. 4, pp. 497-508, 2001.

[2]M. Muhlbaier, A. Topalis and R. Polikar, "Learn++.NC: Combining Ensemble of Classifiers With Dynamically Weighted Consult-and-Vote for Efficient Incremental Learning of New Classes", IEEE Trans. Neural Netw., vol. 20, no. 1, pp. 152-168, 2009.

[3]G. Ditzler, M. Muhlbaier and R. Polikar, "Incremental Learning of New Classes in Unbalanced Datasets: Learn + + .UDNC", Multiple Classifier Systems, pp. 33-42, 2010.

[4]G. Ditzler, G. Rosen and R. Polikar, "Incremental learning of new classes from unbalanced data", The 2013 International Joint Conference on Neural Networks (IJCNN), 2013.

[5]R. Elwell and R. Polikar, "Incremental Learning of Concept Drift in Nonstationary Environments", IEEE Trans. Neural Netw., vol. 22, no. 10, pp. 1517-1531, 2011.

[6]B. Zhang, J. Su and X. Xu, "A Class-Incremental Learning Method for Multi-Class Support Vector Machines in Text Classification", 2006 International Conference on Machine Learning and Cybernetics, 2006.

[7]T. Mensink, J. Verbeek, F. Perronnin and G. Csurka, "Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2624-2637, 2013.

[8]I. Kuzborskij, F. Orabona and B. Caputo, "From N to N+1: Multiclass Transfer Incremental Learning", 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013.

[9]B. Zhang, J. Su and X. Xu, "A Class-Incremental Learning Method for Multi-Class Support Vector Machines in Text Classification", 2006 International Conference on Machine Learning and Cybernetics, 2006.

[10]M. Ristin, M. Guillaumin, J. Gall and L. Gool, "Incremental Learning of NCM Forests for Large-Scale Image Classification", 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014.

[11]M. Ristin, M. Guillaumin, J. Gall and L. Van Gool, "Incremental Learning of Random Forests for Large-Scale Image Classification", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1-1, 2015.

[12]T. Xiao, J. Zhang, K. Yang, Y. Peng and Z. Zhang, "Error-Driven Incremental Learning in Deep Convolutional Neural Network for Large-Scale Image Classification",Proceedings of the ACM International Conference on Multimedia - MM '14, 2014.

2016年1月31日星期日

[Project] 3D Game Engine (Unfinish)

Overview

Language :

C/C++11, OpenGL, DirectX

Platform :

Windows / Linux

System Layer :

Math Lib, Container Lib, Memory Pool

Resource Manager, Message Manager,System Console, Log Manager

Uniform File IO Interface, Uniform Device Access Interface

User Layer :

Uniform Render Interface (OpenGL, DirectX)

Shader Manager

GUI Manager

Object Manager (2D & 3D Object)

Physics ( Collision Detection)

System Running Snapshot

2016年1月12日星期二

[Machine Learning | Computer Vision | DeepID] Deep Learning Face Representation from Predicting 10,000 Classes / 从预测10000个类别中深度学习人脸表达

Overview / 概览

The main task for this paper is human face verification, it's a sub-problem of verification. Different with classification, verification problem is given two instance and find out did this two instances are from the same class or not.

At the end of this paper, they claim that this algorithm got 97.45% verification accuracy on LFW face data set(human accuracy is 97.53%).

Different with other verification algorithms, the whole process been separated into two parts - high-level features generate and face verification. This paper focus on how to use deep learning algorithm to generate high-level features (ConvNet, DeepID) rather than the verification algorithm self.

本文的主要的研究领域是验证问题的子集 - 人脸验证. 与分类问题不同, 验证是的输入数据是两个不同的实例, 需要算法判断出这两个实例是否属于同一个集合/类.

论文在最后声称该分类算法在LFW数据集上的的准确率已经达到97.45%, 仅次于人类的验证准确率97.53%.

与其他验证算法不同, 在本篇文章中整个过程被作者分成了两个部分 - 高级特征的生成和人脸验证. 论文将着重讨论如何生成高级特征而不是验证算法本身.

DeepID (High-level Feature Generate / 高级特征的生成)

DeepID basically is a set of high-level feature vectors. Each high-level feature vector is generated by a deep model (ConvNet, taken from the last hidden layer). The structure is shown in this figure:

简单来说, DeepID 是一个高级特征向量的集合, 其中每一个高级特征向量的生成依赖于一个深度模型 (论文使用卷积神经网络作为深度模型, DeepID 即网络隐含层最后一层), 结构图下图所示:

To lead the deep models generate more effective features, the deep model is been trained as a multi-class classifier to identify each instance (which class/face) rather than a binary classifier to verify two instances (same class/face or not). The reason is they want make full use of super learning capacity ("adds a strong regularization to ConvNets", "shared hidden representations that can classify all the identities well") to get good generalization ability high-level features and avoid over-fitting to a small subset.

For each deep model, they been trained with different parameter (network structure) and with different parts of original data. In this paper, there are 60 ConvNets and each face image are generated to 60 different face patches (with 10 regions, three scales, and RGB or gray channels).

为了引导深度模型生成更有效的特征, 深度模型被训练为一个多类分类器去识别每一个实例(属于哪个类/人脸)而不是一个二类分类器去验证两个实例 (是否属于同一类). 这样做的原因是在于作者想充分的利用深度学习的潜力 ("向卷积神经网络添加一个强壮的正规化调节","共享可以对所有类型良好分类的隐含表示法") 去生成具有良好泛化能力的高级特征并避免深度模型过拟合于一个小的子集.

对于每个深度模型, 他们被以不同的参数 (网络结构) 和相同数据的不同部分进行训练. 本篇论文里使用了60个卷积神经网络, 每一个人脸数据也被分成了60个不同的子数据 (10个区域, 3个缩放, 和RGB图或灰度图)

Deep ConvNet

Base on the ConvNet's potential properties, each hidden layer is a new set of features. In this paper, they decrease the number of nodes layer by layer, force the ConvNet to summary the information and get the more global and high-level features at the top layers. This paper use four convolutional layers (with max-polling) to extract high-level features.

基于卷积神经网络的隐含属性, 每一个隐含层其实是一组新的特征. 在论文中, 作者将每层所含神经元数目逐层递减, 迫使卷积神经网络去总结信息, 并在高层上获得更加全局和高级的特性. 论文使用了四层卷积层 (包含 max-polling) 去提取高级特征,

Input Layer / 输入层: "39 x 31 x k for rectangle patches, and 31 x 31 x k for square patches and k = 3 for color patches and k = 1 for gray patches"

DeepID Layer / DeepID 层: "The dimension of DeepID layer is fixed to 160" and "DeepID layer is fully connected to both the thrid and fourth convolutional layers (after maxpolling)"

Output Layer / 输出层: "The dimension of output layer varies according to the number of classes it predicts". They use n-way softmax to predict probability distribution over n classes.

$$y_{i} = \frac{exp\left ( y_{i}^{'} \right )}{\sum_{j=1}^{n}exp\left ( y_{i}^{'} \right )}$$

$$y_{i}{'} = \sum_{j=1}^{160}x_{i}\cdot w_{i,j} + b_{j}$$

"The ConvNet is leaned by minimizing: $\log{y_{t}}$ with the $t$-th target class"

Hidden Neurons / 隐含神经元: ReLU ( $y = max\left ( 0, x \right )$) function is been used for this ConvNet

Face Regions (Input Data)

Total 60 face patches with ten regions, three scales and the mid-point of the two mouth centers.

Top: Ten medium scales, left global region, right local region, centered around five facial landmarks (eye centers, nose tip and mouth corners)

Bottom: 3 scales (shown two patches)

Face Verification

Two different techniques (algorithms) are applied to the face verification task. The first on is Joint Bayesian, base on the paper's description, "Joint Bayesian has been highly successful for face verification". The second is Neural Network, as the comparison group "to see if other models can also learn from the extracted features and how much the features and a good face verification model contribute to the performance, respectively".

Joint Bayesian

Neural Network

Reference

[1] Y. Sun, X. Wang and X. Tang, "Deep Learning Face Representation from Predicting 10,000 Classes", 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891 - 1898, 2014.

订阅：博文 (Atom)

页面

2016年9月26日星期一

Overview / 概览

2016年5月4日星期三

2016年5月2日星期一

2016年2月6日星期六

Abstract

Algorithms

Single Classifier

Two Layer Multi Classifier

Tree Structured Multi Classifier

Annotate

Reference

2016年1月31日星期日

Overview

Language :

Platform :

System Layer :

User Layer :

System Running Snapshot

2016年1月12日星期二

Overview / 概览

DeepID (High-level Feature Generate / 高级特征的生成)

Deep ConvNet

Input Layer / 输入层: "39 x 31 x k for rectangle patches, and 31 x 31 x k for square patches and k = 3 for color patches and k = 1 for gray patches"

DeepID Layer / DeepID 层: "The dimension of DeepID layer is fixed to 160" and "DeepID layer is fully connected to both the thrid and fourth convolutional layers (after maxpolling)"

Output Layer / 输出层: "The dimension of output layer varies according to the number of classes it predicts". They use n-way softmax to predict probability distribution over n classes.

Hidden Neurons / 隐含神经元: ReLU ( $y = max\left ( 0, x \right )$) function is been used for this ConvNet

Face Regions (Input Data)

Top: Ten medium scales, left global region, right local region, centered around five facial landmarks (eye centers, nose tip and mouth corners)

Bottom: 3 scales (shown two patches)

Face Verification

Joint Bayesian

Neural Network

Reference