LeastSquares Conditional Density Estimation

 SUGIYAMA Masashi
 Tokyo Institute of Technology

 TAKEUCHI Ichiro
 Nagoya Institute of Technology

 SUZUKI Taiji
 The University of Tokyo

 KANAMORI Takafumi
 Nagoya University

 HACHIYA Hirotaka
 Tokyo Institute of Technology

 OKANOHARA Daisuke
 The University of Tokyo
Access this Article
Search this Article
Author(s)

 SUGIYAMA Masashi
 Tokyo Institute of Technology

 TAKEUCHI Ichiro
 Nagoya Institute of Technology

 SUZUKI Taiji
 The University of Tokyo

 KANAMORI Takafumi
 Nagoya University

 HACHIYA Hirotaka
 Tokyo Institute of Technology

 OKANOHARA Daisuke
 The University of Tokyo
Abstract
Estimating the conditional mean of an inputoutput relation is the goal of regression. However, regression analysis is not sufficiently informative if the conditional distribution has multimodality, is highly asymmetric, or contains heteroscedastic noise. In such scenarios, estimating the conditional distribution itself would be more useful. In this paper, we propose a novel method of conditional density estimation that is suitable for multidimensional continuous variables. The basic idea of the proposed method is to express the conditional density in terms of the density ratio and the ratio is directly estimated without going through density estimation. Experiments using benchmark and robot transition datasets illustrate the usefulness of the proposed approach.
Journal

 IEICE Transactions on Information and Systems

IEICE Transactions on Information and Systems 93(3), 583594, 20100301
The Institute of Electronics, Information and Communication Engineers
References: 49

1
 A general class of coefficients of divergence of one distribution from another

ALI S. M.
J. Royal Statistical Society, Series B 28(1), 131142, 1966
Cited by (1)

2
 Discriminative learning for differing training and test distributions

BICKEL S.
Proc. 24th International Conference on Machine Learning, 2007, 8188, 2007
Cited by (1)

3
 <no title>

BISHOP C. M.
Pattern Recognition and Machine Learning, 2006
Cited by (1)

4
 <no title>

CHAPELLE O. eds.
SemiSupervised Learning, 2006
Cited by (1)

5
 Semiparametric density estimation under a twosample density ratio model

CHENG K. F.
Bernoulli 10(4), 583604, 2004
Cited by (1)

6
 Informationtype measures of difference of probability distributions and indirect observation

CSISZAR I.
Studia Scientiarum Mathematicarum Hungarica 2, 229318, 1967
Cited by (1)

7
 <no title>

EDMUNDS D. eds.
Function Spaces, Entropy Numbers, Differential Operators, 1996
Cited by (1)

8
 <no title>

HARDLE W.
Nonparametric and Semiparametric Models, 2004
Cited by (1)

9
 <no title>

HASTIE T.
The Elements of Statistical Learning : Data Mining, Inference, and Prediction, 2001
Cited by (1)

10
 Correcting sample selection bias by unlabeled data

HUANG J.
Advances in Neural Information Processing Systems 19, 601608, 2007
Cited by (1)

11
 Estimating and visualizing conditional densities

HYNDMAN R. J.
J. Computational and Graphical Statistics 5(4), 315336, 1996
Cited by (1)

12
 Efficient direct density ratio estimation for nonstationarity adaptation and outlier detection

KANAMORI T.
Advances in Neural Information Processing Systems 21, 809816, 2009
Cited by (1)

13
 A leastsquares approach to direct importance estimation

KANAMORI T.
J. Machine Learning Research 10, 13911445, 2009
Cited by (1)

14
 <no title>

KANAMORI T.
Condition number analysis of kernelbased density ratio estimation, 2009
Cited by (1)

15
 εentropy and εcapacity of sets in function spaces

KOLMOGOROV A. N.
American Mathematical Society Translations 17(2), 277364, 1961
Cited by (1)

16
 Conditional random fields : Probabilistic models for segmenting and labeling sequence data

LAFFERTY J.
Proc. 18th International Conference on Machine Learning, 2001, 282289, 2001
Cited by (1)

17
 Quantile regression in reproducing kernel Hilbert spaces

LI Y.
J. American Statistical Association 102(477), 255268, 2007
Cited by (1)

18
 On estimation of characters obtained in statistical procedure of recognition

LUNTZ A.
Technicheskaya Kibernetica 3, 1969
Cited by (1)

19
 Convergence rates and asymptotic normality for series estimators

NEWEY W. K.
J. Econometrics 70(1), 147168, 1997
Cited by (1)

20
 Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization

NGUYEN X.
Advances in Neural Information Processing Systems 20, 10891096, 2008
Cited by (1)

21
 Nonparametric estimation of the likelihood ratio and divergence functionals

NGUYEN X.
Proc. IEEE International Symposium on Information Theory, Nice, France, 2007, 20162020, 2007
Cited by (1)

22
 Inferences for casecontrol and semiparametric twosample density ratio models

QIN J.
Biometrika 85(3), 619639, 1998
Cited by (1)

23
 The R Manuals

R Development Core Team
http://www.rproject.org, 2008
Cited by (1)

24
 Smoothed analysis of the condition numbers and growth factors of matrices

SANKAR A.
SIAM J. Matrix Anal. Appl. 28(2), 446476, 2006
Cited by (1)

25
 <no title>

SCHOLKOPF B.
Learning with Kernels, 2002
Cited by (1)

26
 Remarks on fitting and interpreting mixture models

SCOTT D. W.
Computing Science and Statistics 31, 104109, 1999
Cited by (1)

27
 Direct importance estimation with model selection and its application to covariate shift adaptation

SUGIYAMA M.
Advances in Neural Information Processing Systems 20, 14331440, 2008
Cited by (1)

28
 <no title>

SUTTON R. S.
Reinforcement Learning : An Introduction, 1998
Cited by (1)

29
 Nonparametric quantile estimation

TAKEUCHI I.
J. Machine Learning Research 7, 12311264, 2006
Cited by (1)

30
 Nonparametric conditional density estimation using piecewiselinear solution path of kernel quantile regression

TAKEUCHI I.
Neural Comput. 21(2), 533559, 2009
Cited by (1)

31
 Mixtures of gaussian processes

TRESP V.
Advances in Neural Information Processing Systems 13, 654660, 2001
Cited by (1)

32
 Direct density ratio estimation for largescale covariate shift adaptation

TSUBOI Y.
Proc. Eighth SIAM International Conference on Data Mining (SDM2008), Atlanta, Georgia, USA, April, 443454, 2008
Cited by (1)

33
 <no title>

VAN DER VAART A. W.
Weak Convergence and Empirical Processes with Applications to Statistics, 1996
Cited by (1)

34
 <no title>

WEISBERG S.
Applied Linear Regression, 1985
Cited by (1)

35
 Methods for estimating a conditional distribution function

WOLFF R. C. L.
J. American Statistical Association 94(445), 154163, 1999
Cited by (1)

36
 The covering number in learning theory

ZHOU D. X.
J. Complexity Archive 18(3), 739767, 2002
Cited by (1)

37
 Robust and efficient estimation by minimizing a density power divergence

BASU A.
Biometrika 85, 549559, 1998
DOI Cited by (9)

38
 Maximum likelihood from incomplete data via the em algorithm

DEMPSTER A. P.
Journal of the Royal Statistical Society, Series B 39(1), 138, 1977
Cited by (35)

39
 Estimation of Conditional Densities and Sensitivity Measures in Nonlinear Dynamical Systems

FAN J.
Biometrika 83(1), 189206, 1996
DOI Cited by (5)

40
 Nonparametric maximum likelihood estimation by the method of sieves

GEMAN S.
The Annals of Statistics 10(2), 401414, 1982
DOI Cited by (2)

41
 Information theory and statistical mechanics

JAYNES E. T.
Phys. Rev. 106, 620630, 1957
DOI Cited by (12)

42
 Some results on Tchebycheffian spline functions

KIMELDORF G. S.
Journal of Mathematical Analysis and Applications 33, 8295, 1971
DOI Cited by (19)

43
 On information and sufficiency

Kullback S
Ann. Math. Stat 22, 7986, 1951
DOI Cited by (85)

44
 Leastsquares policy iteration

LAGOUDAKIS M. G.
Journal of Machine Learning Research 4, 11071149, 2003
Cited by (9)

45
 Smoothed analysis of algorithms : Why the simplex algorithm usually takes polynomial time

SPIELMAN D. A.
J. ACM 51(3), 385463, 2004
Cited by (1)

46
 Dimensionality reduction for density ratio estimation in highdimensional spaces

SUGIYAMA M.
Neural Networks 23(1), 4459, 2010
Cited by (6)

47
 Direct importance estimation for covariate shift adaptation (Data mining and statistical science)

SUGIYAMA M. , Suzuki Taiji , Nakajima Shinichi
Annals of the Institute of Statistical Mathematics 60(4), 699746, 200812
Cited by (18)

48
 Direct Importance Estimation with Gaussian Mixture Models

YAMADA Makoto , SUGIYAMA Masashi
IEICE Transactions on Information and Systems 92(10), 21592162, 20091001
JSTAGE References (17) Cited by (5)

49
 Direct Density Ratio Estimation for Largescale Covariate Shift Adaptation

Tsuboi Yuta , Kashima Hisashi , Hido Shohei , Bickel Steffen , Sugiyama Masashi
Journal of information processing (17), 138155, 20090408
Cited by: 3

1
 Direct Importance Estimation with a Mixture of Probabilistic Principal Component Analyzers

YAMADA Makoto , SUGIYAMA Masashi , WICHERN Gordon , SIMM Jaak
IEICE Transactions on Information and Systems 93(10), 28462849, 20101001
JSTAGE References (17) Cited by (4)

2
 機械学習入門 [in Japanese]

杉山 将
オペレーションズ・リサーチ : 経営の科学 = [O]perations research as a management science [r]esearch 57(7), 353359, 20120701
References (49) Cited by (1)

3
 A Unified Framework of Density Ratio Estimation under Bregman Divergence

SUGIYAMA Masashi , SUZUKI Taiji , KANAMORI Takafumi
IEICE technical report 110(265), 3344, 20101028
References (63)