Principal component analysis
Corresponding Author
Hervé Abdi
School of Behavioral and Brain Sciences, The University of Texas at Dallas, MS: GR4.1, Richardson, TX 75080-3021, USA
School of Behavioral and Brain Sciences, The University of Texas at Dallas, MS: GR4.1, Richardson, TX 75080-3021, USASearch for more papers by this authorLynne J. Williams
Department of Psychology, University of Toronto Scarborough, Ontario, Canada
Search for more papers by this authorCorresponding Author
Hervé Abdi
School of Behavioral and Brain Sciences, The University of Texas at Dallas, MS: GR4.1, Richardson, TX 75080-3021, USA
School of Behavioral and Brain Sciences, The University of Texas at Dallas, MS: GR4.1, Richardson, TX 75080-3021, USASearch for more papers by this authorLynne J. Williams
Department of Psychology, University of Toronto Scarborough, Ontario, Canada
Search for more papers by this authorAbstract
Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables. Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity of the observations and of the variables as points in maps. The quality of the PCA model can be evaluated using cross-validation techniques such as the bootstrap and the jackknife. PCA can be generalized as correspondence analysis (CA) in order to handle qualitative variables and as multiple factor analysis (MFA) in order to handle heterogeneous sets of variables. Mathematically, PCA depends upon the eigen-decomposition of positive semi-definite matrices and upon the singular value decomposition (SVD) of rectangular matrices. Copyright © 2010 John Wiley & Sons, Inc.
This article is categorized under:
- Statistical and Graphical Methods of Data Analysis > Multivariate Analysis
- Statistical and Graphical Methods of Data Analysis > Dimension Reduction
REFERENCES
- 1Pearson K. On lines and planes of closest fit to systems of points in space. Philos Mag A 1901, 6: 559–572.
10.1080/14786440109462720 Google Scholar
- 2Cauchy AL. Sur l'équation à l'aide de laquelle on détermine les inégalités séculaires des mouvements des planètes, vol. 9. O-euvres Complètes (IIème Série); Paris: Blanchard; 1829.
- 3Grattan-Guinness I. The Rainbow of Mathematics. New York: Norton; 1997.
- 4Jordan C. Mémoire sur les formes bilinéaires. J Math Pure Appl 1874, 19: 35–54.
- 5Stewart GW. On the early history of the singular value decomposition. SIAM Rev 1993, 35: 551–566.
- 6Boyer C, Merzbach U. A History of Mathematics. 2nd ed. New York: John Wiley & Sons; 1989.
- 7Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol 1933, 25: 417–441.
10.1037/h0071325 Google Scholar
- 8Jolliffe IT. Principal Component Analysis. New York: Springer; 2002.
- 9Jackson JE. A User's Guide to Principal Components. New York: John Wiley & Sons; 1991.
10.1002/0471725331 Google Scholar
- 10Saporta G, Niang N. Principal component analysis: application to statistical process control. In: G Govaert, ed. Data Analysis. London: John Wiley & Sons; 2009, 1–23.
10.1002/9780470611777.ch1 Google Scholar
- 11Abdi H. Singular Value Decomposition (SVD) and Generalized Singular Value Decomposition(GSVD). In: NJ Salkind, ed. Encyclopedia of Measurement and Statistics. Thousand Oaks: Sage Publications; 2007, 907–912.
- 12Abdi H. Eigen-decomposition: eigenvalues and eigenvectors. In: NJ Salkind, ed. Encyclopedia of Measurement and Statistics. Thousand Oaks: Sage Publications; 2007, 304–308.
- 13Takane Y. Relationships among various kinds of eigenvalue and singular value decompositions. In: H Yanai, A Okada, K Shigemasu, Y Kano, J Meulman, eds. New Developments in Psychometrics. Tokyo: Springer Verlag; 2002, 45–56.
- 14Abdi H. Centroid. Wiley Interdisciplinary Reviews: Computational Statistics, 2009, 1: 259–260.
10.1002/wics.31 Google Scholar
- 15Kruskal JB. Factor analysis and principal component analysis: Bilinear methods. In: WH Kruskal, JM Tannur, eds. International Encyclopedia of Statistics. New York: The Free Press; 1978, 307–330.
- 16Gower J. Statistical methods of comparing different multivariate analyses of the same data. In: F Hodson, D Kendall, P Tautu, eds. Mathemematics in the Archæological and Historical Sciences. Edinburgh: Edingburh University Press; 1971, 138–149.
- 17Lingoes J, Schönemann P. Alternative measures of fit for the Schönemann-Carrol matrix fitting algorithm. Psychometrika 1974, 39: 423–427.
- 18Abdi H. RV cofficient and congruence coefficient. In: NJ Salkind, ed. Encyclopedia of Measurement and Statistics. Thousand Oaks: Sage Publications; 2007, 849–853.
- 19Dray S. On the number of principal components: a test of dimensionality based on measurements of similarity between matrices. Comput Stat Data Anal 2008, 52: 2228–2237.
- 20Quenouille M. Notes on bias and estimation. Biometrika 1956, 43: 353–360.
- 21Efron B. The Jackknife, the Bootstrap and other Resampling Plans, vol. 83, CMBF-NSF Regional Conference Series in Applied Mathematics: New York SIAM; 1982.
10.1137/1.9781611970319 Google Scholar
- 22Abdi H, Williams LJ. Jackknife. In: NJ Salkind, ed. Encyclopedia of Research Design. Thousand Oaks: Sage Publications; 2010, (In press).
- 23Peres-Neto PR, Jackson DA, Somers KM. How many principal components? stopping rules for determining the number of non-trivial axes revisited. Comput Stat Data Anal 2005, 49: 974–997.
- 24Cattell RB. The scree test for the number of factors. Multivariate Behav Res 1966, 1: 245–276.
- 25Kaiser HF. A note on Guttman's lower bound for the number of common factors. Br J Math Stat Psychol 1961, 14: 1–2.
- 26O'Toole AJ, Abdi H, Deffenbacher KA, Valentin D. A low dimensional representation of faces in the higher dimensions of the space. J Opt Soc Am [Ser A] 1993, 10: 405–411.
- 27Geisser S. A predictive approach to the random effect model. Biometrika 1974, 61: 101–107.
- 28Tennenhaus M. La régression PLS. Paris: Technip; 1998.
- 29Stone M. Cross-validatory choice and assessment of statistical prediction. J R Stat Soc [Ser A] 1974, 36: 111–133.
- 30Wold S. PLS for multivariate linear modeling. In: H van de Waterbeemd, ed. Chemometric Methods in Molecular Design. Weinheim: Wiley-VCH Verlag; 1995, 195–217.
- 31Malinowski ER. Factor Analysis in Chemistry. 3rd ed. New York: John Wiley & Sons; 2002.
- 32Eastment HT, Krzanowski WJ. Cross-validatory choice of the number of components from a principal component analysis. Technometrics 1982, 24: 73–77.
- 33Wold S. Cross-validatory estimation of the number of components in factor and principal component analysis. Technometrics 1978, 20: 397–405.
- 34Diaconis P, Efron B. Computer intensive methods in statistics. Sci Am 1983, 248: 116–130.
- 35Holmes S. Using the bootstrap and the Rv coefficient in the multivariate context. In: E Diday, ed. Data Analysis, Learning, Symbolic and Numeric Knowledge. New York: Nova Science; 1989, 119–132.
- 36Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman and Hall; 1993.
10.1007/978-1-4899-4541-9 Google Scholar
- 37Jackson DA. Stopping rules in principal aomponents analysis: a comparison of heuristical and statistical approaches. Ecology 1993, 74: 2204–2214.
- 38Jackson DA. Bootstrapped principal components analysis: a reply to Mehlman et al. Ecology 1995, 76: 644–645.
- 39Mehlman DW, Sheperd UL, Kelt DA. Bootstrapping principal components analysis: a comment. Ecology 1995, 76: 640–643.
- 40Abdi H. Multivariate analysis. In: M Lewis-Beck, A Bryman, T Futing, eds. Encyclopedia for Research Methods for the Social Sciences. Thousand Oaks, CA: Sage Publications 2003, 669–702.
- 41Kaiser HF. The varimax criterion for analytic rotation in factor analysis. Psychometrika 1958, 23: 187–200.
- 42Thurstone LL. Multiple Factor Analysis. Chicago, IL: University of Chicago Press; 1947.
- 43Stone JV. Independent Component Analysis: A Tutorial Introduction. Cambridge: MIT Press; 2004.
10.7551/mitpress/3717.001.0001 Google Scholar
- 44Abdi H. Partial least square regression, Projection on latent structures Regression, PLS-Regression. Wiley Interdisciplinary Reviews: Computational Statistics, 2 2010, 97–106.
10.1002/wics.51 Google Scholar
- 45Lebart L, Fénelon JP. Statistique et informatique appliquées. Paris: Dunod; 1975.
- 46Benzécri J-P. L'analyse des données, Vols. 1 and 2. Paris: Dunod; 1973.
- 47Greenacre MJ. Theory and Applications of Correspondence Analysis. London: Academic Press; 1984.
- 48Greenacre MJ. Correspondence Analysis in Practice. 2nd ed. Boca Raton, FL: Chapman & Hall/CRC; 2007.
10.1201/9781420011234 Google Scholar
- 49Abdi H, Valentin D. Multiple correspondence analysis. In: NJ Salkind, ed. Encyclopedia of Measurement and Statistics. Thousand Oaks, CA: Sage Publications; 2007, 651–657.
- 50Hwang H, Tomiuk MA, Takane Y. Correspondence analysis, multiple correspondence analysis and recent developments. In: R Millsap, A Maydeu-Olivares, eds. Handbook of Quantitative Methods in Psychology. London: Sage Publications 2009, 243–263.
10.4135/9780857020994.n11 Google Scholar
- 51Abdi H, Williams LJ. Correspondence analysis. In: NJ Salkind, ed. Encyclopedia of Research Design. Thousand Oaks: Sage Publications; 2010, (In press).
- 52Brunet E. Faut-il pondérer les données linguistiques. CUMFID 1989, 16: 39–50.
- 53Escofier B, Pagès J. Analyses factorielles simples et multiples: objectifs, méthodes, interprétation. Paris: Dunod; 1990.
- 54Escofier B, Pagès J. Multiple factor analysis. Comput Stat Data Anal 1994, 18: 121–140.
- 55Abdi H, Valentin D. Multiple factor analysis (mfa). In: NJ Salkind, ed. Encyclopedia of Measurement and Statistics. Thousand Oaks, CA: Sage Publications; 2007, 657–663.
- 56Diamantaras KI, Kung SY. Principal Component Neural Networks: Theory and Applications. New York: John Wiley & Sons; 1996.
- 57Abdi H, Valentin D, Edelman B. Neural Networks. Thousand Oaks, CA: Sage; 1999.
10.4135/9781412985277 Google Scholar
- 58Nakache JP, Lorente P, Benzécri JP, Chastang JF. Aspect pronostics et thérapeutiques de l'infarctus myocardique aigu. Les Cahiers de, Analyse des Données, 1977, 2: 415–534.
- 59Saporta G, Niang N. Correspondence analysis and classification. In: M Greenacre, J Blasius, eds. Multiple Correspondence Analysis and Related Methods. Boca Raton, FL: Chapman & Hall; 2006, 371–392.
10.1201/9781420011319.ch16 Google Scholar
- 60Abdi H. Discriminant correspondence analysis. In: NJ Salkind, ed. Encyclopedia of Measurement and Statistics. Thousand Oaks, CA: Sage Publications; 2007, 270–275.
- 61Abdi H, Williams LJ. Barycentric discriminant analyis (BADIA). In: NJ Salkind, ed. Encyclopedia of Research Design. Thousand Oaks, CA: Sage; 2010, (In press).
- 62Abdi H, Valentin D. Mathématiques pour les sciences cognitives. Grenoble: PUG; 2006.
- 63Strang G. Introduction to Linear Algebra. Cambridge, MA: Wellesley-Cambridge Press; 2003.
10.1007/978-3-642-55631-9 Google Scholar
- 64Harris RJ. A Primer of Multivariate Statistics. Mahwah, NJ: Lawrence Erlbaum Associates; 2001.
10.4324/9781410600455 Google Scholar
- 65Good I. Some applications of the singular value decomposition of a matrix. Technometrics 1969, 11: 823–831.
- 66Eckart C, Young G. The approximation of a matrix by another of a lower rank. Psychometrika 1936, 1: 211–218.
- 67Abdi H. Factor Rotations. In M Lewis-Beck, A. Bryman, T. Futimg, eds. Encyclopedia for Research Methods for the Social Sciences. Thousand Oaks, CA: Sage Publications; 2003, 978–982.