Statistically, PCA finds lines, planes and hyper-planes in the K-dimensional space that approximate the data as well as possible in the least squares sense. The goal is to extract the important information from the data and to express this information as a set of summary indices called principal components. PCA is a very flexible tool and allows analysis of datasets that may contain, for example, multicollinearity, missing values, categorical data, and imprecise measurements. PCA goes back to Cauchy but was first formulated in statistics by Pearson, who described the analysis as finding “ lines and planes of closest fit to systems of points in space”. This overview may uncover the relationships between observations and variables, and among the variables. The most important use of PCA is to represent a multivariate data table as smaller set of variables (summary indices) in order to observe trends, jumps, clusters and outliers. PCA forms the basis of multivariate data analysis based on projection methods. It has been widely used in the areas of pattern recognition and signal processing and is a statistical method under the broad title of factor analysis. Principal component analysis today is one of the most popular multivariate statistical techniques. Using PCA can help identify correlations between data points, such as whether there is a correlation between consumption of foods like frozen fish and crisp bread in Nordic countries.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |