Minimum covariance determinant
Corresponding Author
Mia Hubert
Department of Mathematics-LStat, Katholieke Universiteit Leuven, Celestijnenlaan 200B, B-3001 Leuven, Belgium
Department of Mathematics-LStat, Katholieke Universiteit Leuven, Celestijnenlaan 200B, B-3001 Leuven, BelgiumSearch for more papers by this authorMichiel Debruyne
Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium
Search for more papers by this authorCorresponding Author
Mia Hubert
Department of Mathematics-LStat, Katholieke Universiteit Leuven, Celestijnenlaan 200B, B-3001 Leuven, Belgium
Department of Mathematics-LStat, Katholieke Universiteit Leuven, Celestijnenlaan 200B, B-3001 Leuven, BelgiumSearch for more papers by this authorMichiel Debruyne
Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium
Search for more papers by this authorAbstract
The minimum covariance determinant (MCD) estimator is a highly robust estimator of multivariate location and scatter. It can be computed efficiently with the FAST-MCD algorithm of Rousseeuw and Van Driessen. Since estimating the covariance matrix is the cornerstone of many multivariate statistical methods, the MCD has also been used to develop robust and computationally efficient multivariate techniques.
In this paper, we review the MCD estimator, along with its main properties such as affine equivariance, breakdown value, and influence function. We discuss its computation, and list applications and extensions of the MCD in theoretical and applied multivariate statistics. Copyright © 2009 John Wiley & Sons, Inc.
This article is categorized under:
- Statistical and Graphical Methods of Data Analysis > Robust Methods
RELATED WIREs ARTICLES
REFERENCES
- 1Rousseeuw PJ. Least median of squares regression. J Am Stat Assoc 1984, 79: 871–880.
- 2Rousseeuw PJ. Multivariate estimation with high breakdown point. In: W Grossmann, G Pflug, I Vincze, W Wertz, eds. Mathematical Statistics and Applications, Vol. B. Dordrecht: Reidel Publishing Company; 1985; 283–297.
10.1007/978-94-009-5438-0_20 Google Scholar
- 3Rousseeuw PJ, Van Driessen K. A fast algorithm for the Minimum Covariance Determinant estimator. Technometrics 1999, 41: 212–223.
- 4Hettich S, Bay SD. The UCI KDD Archive. Irvine, CA: University of California, Department of Information and Computer Science; 1999.
- 5Maronna RA, Martin DR, Yohai VJ. Robust statistics: Theory and Methods. New York: Wiley; 2006.
- 6Croux C, Haesbroeck G. Influence function and efficiency of the Minimum Covariance Determinant scatter matrix estimator. J Multivariate Anal 1999, 71: 161–190.
- 7Pison G, Van Aelst S, Willems G. Small sample corrections for LTS and MCD. Metrika 2002, 55: 111–123.
- 8Butler RW, Davies PL, Jhun M. Asymptotics for the Minimum Covariance Determinant estimator. Ann Stat 1993, 21: 1385–1400.
- 9Lopuhaä HP, Rousseeuw PJ. Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann Stat 1991, 19: 229–248.
- 10Lopuhaä HP. Asymptotics of reweighted estimators of multivariate location and scatter. Ann Stat 1999, 27: 1638–1665.
- 11Rousseeuw PJ, van Zomeren BC. Unmasking multivariate outliers and leverage points. J Am Stat Assoc 1990, 85: 633–651.
- 12Hardin J, Rocke DM. The distribution of robust distance. J Comput Graph Stat 2005, 14: 928–946.
- 13Stahel WA, Robuste schätzungen: infinitesimale optimalität und schätzungen von kovarianzmatrizen. PhD thesis, ETH Zürich, 1981.
- 14Donoho DL, Gasko M. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann Stat 1992, 20: 1803–1827.
- 15Roelant E, Van Aelst S, Willems G. The minimum weighted covariance determinant estimator. Metrika 2009, 70: 177–204.
- 16Davies L. Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices. Ann Stat 1987, 15: 1269–1292.
- 17Rousseeuw PJ. Discussion on ‘Breakdown and groups’. Ann Stat 2005, 33: 1004–1009.
- 18Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust Statistics: The Approach Based on Influence Functions. New York: Wiley; 1986.
- 19Rousseeuw PJ, Leroy AM. Robust Regression and Outlier Detection. New York: Wiley-Interscience; 1987.
10.1002/0471725382 Google Scholar
- 20Butler RW. Nonparametric interval and point prediction using data trimmed by a Grubbs-type outlier rule. Ann Stat 1982, 10: 197–204.
- 21Croux C, Rousseeuw PJ. A class of high-breakdown scale estimators based on subranges. Commun Stat Theory Methods 1992, 21: 1935–1951.
- 22Croux C, Haesbroeck G. Maxbias curves of robust scale estimators based on subranges. Metrika 2001, 53: 101–122.
- 23Croux C, Haesbroeck G. Maxbias curves of location estimators based on subranges. J Nonparametr Stat 2002, 14: 295–306.
- 24Hawkins D, Olive D. Inconsistency of resampling algorithms for high breakdown regression estimators and a new algorithm. J Am Stat Assoc 2002, 97: 136–148.
- 25Verboven S, Hubert M. LIBRA: a Matlab library for robust analysis. Chemometr Intell Lab Syst 2005, 75: 127–136.
- 26Rousseeuw PJ, Van Driessen K. Computing LTS regression for large data sets. Data Min Knowl Discov 2006, 12: 29–45.
- 27Simpson DG, Ruppert D, Carroll RJ. On one-step GM-estimates and stability of inferences in linear regression. J Am Stat Assoc 1992, 87: 439–450.
- 28Coakley CW, Hettmansperger TP. A bounded influence, high breakdown, efficient regression estimator. J Am Stat Assoc 1993, 88: 872–880.
- 29Hubert M, Rousseeuw PJ. Robust regression with both continuous and binary regressors. J Stat Plann Infer 1996, 57: 153–163.
- 30Rousseeuw PJ, Christmann A. Robustness against separation and outliers in logistic regression. Comput Stat Data Anal 2003, 43: 315–332.
- 31Croux C, Haesbroeck G. Implementing the Bianco and Yohai estimator for logistic regression. Comput Stat Data Anal 2003, 44: 273–295.
- 32Rousseeuw PJ, Van Aelst S, Van Driessen K, Agulló J. Robust multivariate regression. Technometrics 2004, 46: 293–305.
- 33Agulló J, Croux C, Van Aelst S. The multivariate least trimmed squares estimator. J Multivariate Anal 2008, 99: 311–318.
- 34Croux C, Haesbroeck G. Principal components analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 2000, 87: 603–618.
- 35Pison G, Rousseeuw PJ, Filzmoser P, Croux C. Robust factor analysis. J Multivariate Anal 2003, 84: 145–172.
- 36Croux C, Dehon C. Analyse canonique basée sur des estimateurs robustes de la matrice de covariance. Rev Stat Appl 2002, 2: 5–26.
- 37Hubert M, Rousseeuw PJ, Vanden Branden K. ROBPCA: a new approach to robust principal components analysis. Technometrics 2005, 47: 64–79.
- 38Debruyne M, Hubert M. The influence function of the Stahel-Donoho covariance estimator of smallest outlyingness. Stat Probab Lett 2009, 79: 275–282.
- 39Hubert M, Verboven S. A robust PCR method for high-dimensional regressors. J Chemometr 2003, 17: 438–452.
- 40Hubert M, Vanden Branden K. Robust methods for Partial Least Squares Regression. J Chemometr 2003, 17: 537–549.
- 41Vanden Branden K, Hubert M. Robustness properties of a robust PLS regression method. Anal Chim Acta 2004, 515: 229–241.
- 42Maronna RA. Principal components and orthogonal regression based on robust scales. Technometrics 2005, 47: 264–273.
- 43Willems G, Pison G, Rousseeuw PJ, Van Aelst S. A robust Hotelling test. Metrika 2002, 55: 125–138.
- 44Willems G, Van Aelst S. A fast bootstrap method for the MCD estimator. In: J Antoch, ed. Proceedings in Computational Statistics. Heidelberg: Springer-Verlag; 2004; 1979–1986.
- 45Cheng T-C, Victoria-Feser M. High breakdown estimation of multivariate location and scale with missing observations. Br J Math Stat Psychol 2002, 55: 317–335.
- 46Copt S, Victoria-Feser M-P. Fast algorithms for computing high breakdown covariance matrices with missing data. In: M Hubert, G Pison, A Struyf, S Van Aelst, eds. Theory and Applications of Recent Robust Methods (Basel). Statistics for Industry and Technology: Birkhäuser; 2004; 71–82.
10.1007/978-3-0348-7958-3_7 Google Scholar
- 47Serneels S, Verdonck T. Principal component analysis for data containing outliers and missing elements. Comput Stat Data Anal 2008, 52: 1712–1727.
- 48Hawkins DM, McLachlan GJ. High-breakdown linear discriminant analysis. J Am Stat Assoc 1997, 92: 136–143.
- 49Hubert M, Van Driessen K. Fast and robust discriminant analysis. Comput Stat Data Anal 2004, 45: 301–320.
- 50Vanden Branden K, Hubert M. Robust classification in high dimensions based on the SIMCA method. Chemometr Intell Lab Syst 2005, 79: 10–21.
- 51Rocke DM, Woodruff DL. A synthesis of outlier detection and cluster identification, technical report, 1999.
- 52Hardin J, Rocke DM. Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput Stat Data Anal 2004, 44: 625–638.
- 53Gallegos MT, Ritter G. A robust method for cluster analysis. Ann Stat 2005, 33: 347–380.
- 54Vandev DL, Neykov NM. About regression estimators with high breakdown point. Statistics 1998, 32: 111–129.
- 55Hadi AS, Luceño A. Maximum trimmed likelihood estimators: a unified approach, examples and algorithms. Comput Stat Data Anal 1997, 25: 251–272.
- 56Müller CH, Neykov N. Breakdown points of trimmed likelihood estimators and related estimators in generalized linear models. J Stat Plann Infer 2003, 116: 503–519.
- 57Čižek P. Robust and efficient adaptive estimation of binary-choice regression models. J Am Stat Assoc 2008, 103: 687–696.
- 58Cuesta-Albertos JA, Gordaliza A, Matrán C. Trimmed k-means: an attempt to robustify quantizers. Ann Stat 1997, 25: 553–576.
- 59Cuesta-Albertos JA, Matrán C, Mayo-Iscar A. Robust estimation in the normal mixture model based on robust clustering. J R Stat Soc Ser B 2008, 70: 779–802.
- 60García-Escudero LA, Gordaliza A, San Martín R, Van Aelst S, Zamar RH. Robust linear clustering. J R Stat Soc B 2009, 71: 1–18.
- 61Víšek JÁ. The least weighted squares I. the asymptotic linearity of normal equations. Bull Czech Econ Soc 2002, 9: 31–58.
- 62Salibian-Barrera M, Yohai VJ. A fast algorithm for S-regression estimates. J Comput Graph Stat 2006, 15: 414–427.
- 63Zaman A, Rousseeuw PJ, Orhan M. Econometric applications of high-breakdown robust regression techniques. Econ Lett 2001, 71: 1–8.
- 64Welsh R, Zhou X. Application of robust statistics to asset allocation models. Revstat 2007, 5: 97–114.
- 65Prastawa M, Bullitt E, Ho S, Gerig G. A brain tumor segmentation framework based on outlier detection. Med Image Anal 2004, 8: 275–283.
- 66Jensen WA, Birch JB, Woodal WH. High breakdown estimation methods for phase I multivariate control charts. Qual Reliab Eng Int 2007, 23: 615–629.
- 67Neykov NM, Neytchev PN, Van Gelder PHAJM, Todorov VK. Robust detection of discordant sites in regional frequency analysis. Water Resour Res 2007, 43..
- 68Vogler C, Goldenstein S, Stolfi J, Pavlovic V, Metaxas D. Outlier rejection in high-dimensional deformable models. Image Vis Comput 2007, 25: 274–284.
- 69Lu Y, Wang J, Kong J, Zhang B, Zhang J. An integrated algorithm for MRI brain images segmentation. Comput Vis Approaches Med Image Anal 2006, 4241: 132–1342.
- 70van Helvoort PJ, Filzmoser P, van Gaans PFM. Sequential Factor Analysis as a new approach to multivariate analysis of heterogeneous geochemical datasets: An application to a bulk chemical characterization of fluvial deposits (Rhine-Meuse delta, The Netherlands). Appl Geochem 2005, 20: 2233–2251.