度量学习（Metric Learning）

度量（Metric）的定义

在数学中，一个度量（或距离函数）是一个定义集合中元素之间距离的函数。一个具有度量的集合被称为度量空间。

度量学习（Metric Learning）也就是常说的相似度学习。距离测度学习的目的即为了衡量样本之间的相近程度，而这也正是模式识别的核心问题之一。大量的机器学习方法，比如K近邻、支持向量机、径向基函数网络等分类方法以及K-means聚类方法，还有一些基于图的方法，其性能好坏都主要有样本之间的相似度量方法的选择决定。如果需要计算两张图片之间的相似度，如何度量图片之间的相似度使得不同类别的图片相似度小而相同类别的图片相似度大（maximize the inter-class variations and minimize the intra-class variations）就是度量学习的目标。
例如如果我们的目标是识别人脸，那么就需要构建一个距离函数去强化合适的特征（如发色，脸型等）；而如果我们的目标是识别姿势，那么就需要构建一个捕获姿势相似度的距离函数。为了处理各种各样的特征相似度，我们可以在特定的任务通过选择合适的特征并手动构建距离函数。然而这种方法会需要很大的人工投入，也可能对数据的改变非常不鲁棒。度量学习作为一个理想的替代，可以根据不同的任务来自主学习出针对某个特定任务的度量距离函数。

起源

　　Eric Xing在NIPS 2002提出。

优点

　　度量学习通常的目标是使同类样本之间的距离尽可能缩小，不同类样本之间的距离尽可能放大。

缺点

　　TODO

应用领域

　　人脸识别、物体识别、音乐的相似性、人体姿势估计、信息检索、语音识别、手写体识别等领域。

1 为什么要用度量学习？

很多的算法越来越依赖于在输入空间给定的好的度量。例如K-means、K近邻方法、SVM等算法需要给定好的度量来反映数据间存在的一些重要关系。这一问题在无监督的方法（如聚类）中尤为明显。举一个实际的例子，考虑图1的问题，假设我们需要计算这些图像之间的相似度（或距离，下同）（例如用于聚类或近邻分类）。面临的一个基本的问题是如何获取图像之间的相似度，例如如果我们的目标是识别人脸，那么就需要构建一个距离函数去强化合适的特征（如发色，脸型等）；而如果我们的目标是识别姿势，那么就需要构建一个捕获姿势相似度的距离函数。为了处理各种各样的特征相似度，我们可以在特定的任务通过选择合适的特征并手动构建距离函数。然而这种方法会需要很大的人工投入，也可能对数据的改变非常不鲁棒。度量学习作为一个理想的替代，可以根据不同的任务来自主学习出针对某个特定任务的度量距离函数。

图 1

2 度量学习的方法

根据相关论文[2，3，4]，度量学习方法可以分为通过线性变换的度量学习和度量学习的非线性模型。

2.1 通过线性变换的度量学习

由于线性度量学习具有简洁性和可扩展性(通过核方法可扩展为非线性度量方法)，现今的研究重点放在了线性度量学习问题上。线性的度量学习问题也称为马氏度量学习问题，可以分为监督的和非监督的学习算法。

监督的马氏度量学习可以分为以下两种基本类型:

I 监督的全局度量学习：该类型的算法充分利用数据的标签信息。如

Information-theoretic metric learning(ITML)
Mahalanobis Metric Learning for Clustering([1]中的度量学习方法，有时也称为MMC)
Maximally Collapsing Metric Learning (MCML)

II 监督的局部度量学习：该类型的算法同时考虑数据的标签信息和数据点之间的几何关系。如

Neighbourhood Components Analysis (NCA)
Large-Margin Nearest Neighbors (LMNN)
Relevant Component Analysis(RCA)
Local Linear Discriminative Analysis(Local LDA)

此外，一些很经典的非监督线性降维算法可以看作属于非监督的马氏度量学习。如

主成分分析(Pricipal Components Analysis, PCA)
多维尺度变换(Multi-dimensional Scaling, MDS)
非负矩阵分解(Non-negative Matrix Factorization,NMF)
独立成分分析(Independent components analysis, ICA)
邻域保持嵌入(Neighborhood Preserving Embedding,NPE)
局部保留投影(Locality Preserving Projections. LPP)

2.2 度量学习的非线性模型

非线性的度量学习更加的一般化，非线性降维算法可以看作属于非线性度量学习。经典的算法有等距映射(Isometric Mapping,ISOMAP) 、局部线性嵌入(Locally Linear Embedding, LLE) ，以及拉普拉斯特征映射(Laplacian Eigenmap，LE ) 等。另一个学习非线性映射的有效手段是通过核方法来对线性映射进行扩展。此外还有如下几个方面

Non-Mahalanobis Local Distance Functions
Mahalanobis Local Distance Functions
Metric Learning with Neural Networks

conventional Mahalanobis distance metric learning

传统马氏距离度量学习是从训练集 X 中寻找矩阵 M∈Rd×d ，计算两个样本 x1 , x2 之间的马氏距离：

d M (x i, x j) = (x i - x j) T M (x i - x j) - - - - - - - - - - - - - - - - - \sqrt

由于

M 为对称半正定矩阵，因此可以分解为：

M = W T W

其中

W∈Rp×d,p<d 。那么：

d M (x i, x j) = (x i - x j) T W T W (x i - x j) - - - - - - - - - - - - - - - - - - - - \sqrt = ∥ W x i - W x j ∥ 2

根据上面公式可知传统的马氏距离度量学习是通过寻找一个线性转换将每一个样本

xi 投影到低维子空间中（因为

p<d ），投影后样本间的欧式距离即为原空间中的马氏距离。

discriminative deep metric learning (DDML)

由于传统方法用到的线性变换不能够捕捉面部图片所依赖的非线性流形（nonlinear manifold）

线性流型
几何空间的直线或平面具有性质：集合中任意2点生成的直线一定包含在这个集合里，即直线和平面是平和直的。把平和直的概念推广到高维就能得到线性流形的概念。

为了解决传统方法的限制，论文提到将样本投影到高维特征空间中，在高维空间中进行距离度量。
这里写图片描述

3 应用

度量学习已应用于计算机视觉中的图像检索和分类、人脸识别、人类活动识别和姿势估计，文本分析和一些其他领域如音乐分析，自动化的项目调试，微阵列数据分析等[4]。

解法

Reference 2中可找到《An Overview of Distance Metric Learning》、《Distance Metric Learning: A Comprehensive Survey》。

Supervised Distance Metric Learning

Methods	Locality	Linearity	Learning Strategies	Code Download
Probablistic Global Distance Metric Learning (PGDM)	global	linear	constrained convex programming	by Eric P. Xing
Relevant Components Analysis (RCA)	global	linear	capture global structure; use equivalence constraints	by Aharon Bar-Hillel and Tomer Hertz,
Discriminative Component Analysis (DCA)	global	linear	improve RCA by exploring negative constraints	by Steven C.H. Hoi
Local Fisher Discriminant Analysis (LFDA)	local	linear	extend LDA by assigning greater weights to closer connecting examples	[by Masashi Sugiyama]
Neighborhood Component Analysis (NCA)	local	linear	extend the nearest neighbor classifier toward metric learing	[by Charless C. Fowlkes]
Large Margin NN Classifier (LMNN)	local	linear	extend NCA through a maximum margin framework	[by Kilian Q. Weinberger]
Localized Distance Metric Learning (LDM)	local	linear	optimize local compactness and local separability in a probabilistic framework	[by Liu Yang]
DistBoost	global	linear	learn distance functions by training binary classifiers with margins in a boosting framework	by Tomer Hertz and Aharon Bar-Hillel

			notes on calling its kernel version
Active Distance Metric Learning (BAYES+VAR)	global	linear	select example pairs with the greatest uncertainty, posterior estimation with a full Bayesian treatment	[by Liu Yang]

- Unsupervised Distance Metric Learning

Methods	Locality	Linearity	Learning Strategies	Code Download
Principal Component Analysis(PCA)	global structure preserved	linear	best preserve the variance of the data	[by Deng Cai]
Multidimensional Scaling(MDS)	global structure preserved	linear	best preserve inter-point distance in low-rank	[ included in Matlab Toolbox for Dimensionality Reduction]
ISOMAP	global structure preserved	nonlinear	preserve the geodesic distances	[by J. B. Tenenbaum, V. de Silva and J. C. Langford]
Laplacian Eigenamp (LE)	local structure preserved	nonlinear	preserve local neighbor	[by Mikhail Belkin]
Locality Preserving Projections (LPP)	local structure preserved	linear	linear approximation to LE	[LPP by Deng Cai]

			[Kernel LPP by Deng Cai]
Locally Linear Embedding (LLE)	local structure preserved	nonlinear	nonlinear preserve local neighbor	[by Sam T. Roweis and Lawrence K. Saul]

			Hessian LLE can be found at [MANI fold Learning Matlab Demo, by Todd Wittman]
Neighborhood Preserving Embedding (NPE)	lobal structure preserved	linear	linear approximation to LLE	[by Deng Cai]

实现

Python

metric-learn
https://pypi.python.org/pypi/metric-learn/
- LMNN
  python from metric_learn import LMNN import numpy as np X = np.array([[0., 0., 1.], [0., 0., 2.], [1.,0.,0.], [2.,0.,0.], [2.,2.,2.], [2.,5.,4.]]) Y = np.array([1, 1, 2, 2, 0, 0]) lmnn = LMNN(k=2, learn_rate=1e-6) lmnn.fit(X, Y, verbose=False) Y_c = lmnn.transform(X)
  - output
    text >>> Y_c array([[ 0. , -0.07987306, 0.11081795], [ 0. , -0.15974612, 0.22163591], [ 0.07113444, 0. , 0. ], [ 0.14226889, 0. , 0. ], [ 0.14226889, -0.04460763, 0.06188978], [ 0.14226889, -0.03164602, 0.04390651]])

Matlab

DistLearnKit
http://www.cs.cmu.edu/~liuy/distlearn.htm

R

Supervised Distance Metric Learning
https://github.com/road2stat/sdml

应用

　　TODO

顶级会议上矩阵学习的paper清单：http://blog.csdn.net/lzt1983/article/details/7831524

近2年顶级会议上度量学习相关的论文，数量之多，颇受震动。这其中怕是不乏灌水炒作新概念的文章，看来DML大有前几年sparse coding的势头啊。

ICML 2012

Maximum Margin Output Coding

Information-theoretic Semi-supervised Metric Learning via Entropy Regularization

A Hybrid Algorithm for Convex Semidefinite Optimization

Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation

Similarity Learning for Provably Accurate Sparse Linear Classification

ICML 2011

Learning Discriminative Fisher Kernels

Learning Multi-View Neighborhood Preserving Projections

CVPR 2012

Order Determination and Sparsity-Regularized Metric Learning for Adaptive Visual Tracking

Non-sparse Linear Representations for Visual Tracking with Online Reservoir Metric Learning

Unsupervised Metric Fusion by Cross Diffusion

Learning Hierarchical Similarity Metrics

Large Scale Metric Learning from Equivalence Constraints

Neighborhood Repulsed Metric Learning for Kinship Verification

Learning Robust and Discriminative Multi-Instance Distance for Cost Effective Video Classification

PCCA: a new approach for distance learning from sparse pairwise constraints

Group Action Induced Distances for Averaging and Clustering Linear Dynamical Systems with Applications to the Analysis of Dynamic Visual Scenes

CVPR 2011

A Scalable Dual Approach to Semidefinite Metric Learning

AdaBoost on Low-Rank PSD Matrices for Metric Learning with Applications in Computer Aided Diagnosis

Adaptive Metric Differential Tracking (HUST)

Tracking Low Resolution Objects by Metric Preservation (HUST)

ACM MM 2012

Optimal Semi-Supervised Metric Learning for Image Retrieval

Low Rank Metric Learning for Social Image Retrieval

Activity-Based Person Identification Using Sparse Coding and Discriminative Metric Learning

Deep Nonlinear Metric Learning with Independent Subspace Analysis for Face Verification

ACM MM 2011

Biased Metric Learning for Person-Independent Head Pose Estimation

ICCV 2011

Learning Mixtures of Sparse Distance Metrics for Classification and Dimensionality Reduction

Unsupervised Metric Learning for Face Identification in TV Video

Random Ensemble Metrics for Object Recognition

Learning Nonlinear Distance Functions using Neural Network for Regression with Application to Robust Human Age Estimation

Learning parameterized histogram kernels on the simplex manifold for image and action classification

ECCV 2012

Metric Learning for Large Scale Image Classification: Generalizing to New Classes at Near-Zero Cost

Dual-force Metric Learning for Robust Distractor Resistant Tracker

Learning to Match Appearances by Correlations in a Covariance Metric Space

Image Annotation Using Metric Learning in Semantic Neighbourhoods

Measuring Image Distances via Embedding in a Semantic Manifold

Supervised Earth Mover’s Distance Learning and Its Computer Vision Applications

Learning Class-to-Image Distance via Large Margin and L1-norm Regularization

Labeling Images by Integrating Sparse Multiple Distance Learning and Semantic Context Modeling

IJCAI 2011

Distance Metric Learning Under Covariate Shift

Learning a Distance Metric by Empirical Loss Minimization

AAAI 2011

Efficiently Learning a Distance Metric for Large Margin Nearest Neighbor Classification

NIPS 2011

Learning a Distance Metric from a Network

Learning a Tree of Metrics with Disjoint Visual Features

Metric Learning with Multiple Kernels

KDD 2012

Random Forests for Metric Learning with Implicit Pairwise Position Dependence

WSDM 2011

Mining Social Images with Distance Metric Learning for Automated Image Tagging

机器学习数据集

UCI machine learning repository：http://archive.ics.uci.edu/ml/

一些DML的参考资源，以后有时间再详细谈谈。

1. Wikipedia

2. CMU的Liu Yang总结的关于DML的综述页面。对DML的经典算法进行了分类总结，其中她总结的论文非常有价值，也是我的入门读物。

3. ECCV 2010的turorial。

4. Weinberger的页面，上面有LMNN（Distance Metric Learning for Large Margin Nearest Neighbor Classification）的论文、sclides和代码。

5. ITML(Information Throretic Metric Learning)。ITML是DML的经典算法，获得了ICML 2007的best paper award。sclides。

参考文献

[1] Xing E P, Jordan M I, Russell S, et al. Distance metric learning with application to clustering with side-information[C]//Advances in neural information processing systems. 2002: 505-512.

[2] Kulis B. Metric learning: A survey[J]. Foundations and Trends in Machine Learning, 2012, 5(4): 287-364.

[3] Yang L, Jin R. Distance metric learning: A comprehensive survey[J]. Michigan State Universiy, 2006, 2.

[4]王微. 融合全局和局部信息的度量学习方法研究[D]. 中国科学技术大学, 2014.