**Principal component analysis (PCA):** PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance. In scikit-learn, `PCA`

is implemented as a *transformer* object that learns n components in its `fit`

method, and can be used on new data to project it on these components.

PCA centers but does not scale the input data for each feature before applying the SVD. The optional parameter parameter `whiten=True`

makes it possible to project the data onto the singular space while scaling each component to unit variance.

The `PCA`

object also provides a probabilistic interpretation of the PCA that can give a likelihood of data based on the amount of variance it explains. As such it implements a score method that can be used in cross-validation:

The `PCA`

object is very useful, but has certain limitations for large datasets. The biggest limitation is that `PCA`

only supports batch processing, which means all of the data to be processed must fit in main memory. The `IncrementalPCA`

object uses a different form of processing and allows for partial computations which almost exactly match the results of `PCA`

while processing the data in a minibatch fashion. `IncrementalPCA`

makes it possible to implement out-of-core Principal Component Analysis either by:

- Using its
`partial_fit`

method on chunks of data fetched sequentially from the local hard drive or a network database. - Calling its fit method on a memory mapped file using
`numpy.memmap`

.

`IncrementalPCA`

only stores estimates of component and noise variances, in order update `explained_variance_ratio_`

incrementally. This is why memory usage depends on the number of samples per batch, rather than the number of samples to be processed in the dataset.

It is often interesting to project data to a lower-dimensional space that preserves most of the variance, by dropping the singular vector of components associated with lower singular values.

For instance, if we work with 64×64 pixel gray-level pictures for face recognition, the dimensionality of the data is 4096 and it is slow to train an RBF support vector machine on such wide data. Furthermore we know that the intrinsic dimensionality of the data is much lower than 4096 since all pictures of human faces look somewhat alike. The samples lie on a manifold of much lower dimension (say around 200 for instance). The PCA algorithm can be used to linearly transform the data while both reducing the dimensionality and preserve most of the explained variance at the same time.

The class `PCA`

used with the optional parameter `svd_solver='randomized'`

is very useful in that case: since we are going to drop most of the singular vectors it is much more efficient to limit the computation to an approximated estimate of the singular vectors we will keep to actually perform the transform.

The class `PCA`

used with the optional parameter `svd_solver='randomized'`

is very useful in that case: since we are going to drop most of the singular vectors it is much more efficient to limit the computation to an approximated estimate of the singular vectors we will keep to actually perform the transform.

For instance, the following shows 16 sample portraits (centered around 0.0) from the Olivetti dataset. On the right hand side are the first 16 singular vectors reshaped as portraits. Since we only require the top 16 singular vectors of a dataset with size nsamples=400 and nfeatures=64×64=4096, the computation time is less than 1s:

If we note nmax=max(nsamples,nfeatures) and nmin=min(nsamples,nfeatures), the time complexity of the randomized `PCA`

is O(nmax2⋅ncomponents) instead of O(nmax2⋅nmin) for the exact method implemented in `PCA`

.

The memory footprint of randomized `PCA`

is also proportional to 2⋅nmax⋅ncomponents instead of nmax⋅nmin for the exact method.

`KernelPCA`

is an extension of PCA which achieves non-linear dimensionality reduction through the use of kernels (see Pairwise metrics, Affinities and Kernels). It has many applications including denoising, compression and structured prediction (kernel dependency estimation). `KernelPCA`

supports both `transform`

and `inverse_transform`

.