Package mdp :: Package nodes :: Class NIPALSNode
[hide private]
[frames] | no frames]

Class NIPALSNode


Perform Principal Component Analysis using the NIPALS algorithm.

This algorithm is particularly useful if you have more variables than observations, or in general when the number of variables is huge and calculating a full covariance matrix may be infeasible. It's also more efficient of the standard PCANode if you expect the number of significant principal components to be a small. In this case setting output_dim to be a certain fraction of the total variance, say 90%, may be of some help.


Reference

Reference for NIPALS (Nonlinear Iterative Partial Least Squares): Wold, H. Nonlinear estimation by iterative least squares procedures. in David, F. (Editor), Research Papers in Statistics, Wiley, New York, pp 411-444 (1966).

More information about Principal Component Analysis*, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Original code contributed by: Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).

Instance Methods [hide private]
 
__init__(self, input_dim=None, output_dim=None, dtype=None, conv=1e-08, max_it=100000)
Initializes an object of type 'NIPALSNode'.
 
_stop_training(self, debug=False)
Concatenate the collected data in a single array.
 
_train(self, x)
Collect all input data in a list.
 
stop_training(self, debug=False)
Concatenate the collected data in a single array.
 
train(self, x)
Collect all input data in a list.

Inherited from unreachable.newobject: __long__, __native__, __nonzero__, __unicode__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

    Inherited from PCANode
tuple
_adjust_output_dim(self)
This function is used if the output dimensions is smaller than the input dimension (so only the larger eigenvectors have to be kept). If required it sets the output dim.
 
_check_output(self, y)
numpy.ndarray
_execute(self, x, n=None)
Project the input on the first 'n' principal components.
numpy.ndarray
_inverse(self, y, n=None)
Project data from the output to the input space using the first 'n' components.
 
_set_output_dim(self, n)
numpy.ndarray
execute(self, x, n=None)
Project the input on the first 'n' principal components.
float
get_explained_variance(self)
The explained variance is the fraction of the original variance that can be explained by self._output_dim PCA components. If for example output_dim has been set to 0.95, the explained variance could be something like 0.958...
numpy.ndarray
get_projmatrix(self, transposed=1)
Returns the projection matrix.
numpy.ndarray
get_recmatrix(self, transposed=1)
Returns the the back-projection matrix (i.e. the reconstruction matrix).
numpy.ndarray
inverse(self, y, n=None)
Project data from the output to the input space using the first 'n' components.
    Inherited from Node
 
__add__(self, other)
 
__call__(self, x, *args, **kwargs)
Calling an instance of Node is equivalent to calling its execute method.
 
__repr__(self)
repr(x)
 
__str__(self)
str(x)
 
_check_input(self, x)
 
_check_train_args(self, x, *args, **kwargs)
 
_get_supported_dtypes(self)
Return the list of dtypes supported by this node.
 
_get_train_seq(self)
 
_if_training_stop_training(self)
 
_pre_execution_checks(self, x)
This method contains all pre-execution checks.
 
_pre_inversion_checks(self, y)
This method contains all pre-inversion checks.
 
_refcast(self, x)
Helper function to cast arrays to the internal dtype.
 
_set_dtype(self, t)
 
_set_input_dim(self, n)
 
copy(self, protocol=None)
Return a deep copy of the node.
 
get_current_train_phase(self)
Return the index of the current training phase.
 
get_dtype(self)
Return dtype.
 
get_input_dim(self)
Return input dimensions.
 
get_output_dim(self)
Return output dimensions.
 
get_remaining_train_phase(self)
Return the number of training phases still to accomplish.
 
get_supported_dtypes(self)
Return dtypes supported by the node as a list of numpy.dtype objects.
 
has_multiple_training_phases(self)
Return True if the node has multiple training phases.
 
is_training(self)
Return True if the node is in the training phase, False otherwise.
 
save(self, filename, protocol=-1)
Save a pickled serialization of the node to filename. If filename is None, return a string.
 
set_dtype(self, t)
Set internal structures' dtype.
 
set_input_dim(self, n)
Set input dimensions.
 
set_output_dim(self, n)
Set output dimensions.
Static Methods [hide private]
    Inherited from Node
 
is_invertible()
Return True if the node can be inverted, False otherwise.
 
is_trainable()
Return True if the node can be trained, False otherwise.
Instance Variables [hide private]
  avg
Mean of the input data (available after training).
  d
Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
  explained_variance
When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
  v
Transposed of the projection matrix (available after training).
Properties [hide private]

Inherited from object: __class__

    Inherited from Node
  _train_seq
List of tuples:
  dtype
dtype
  input_dim
Input dimensions
  output_dim
Output dimensions
  supported_dtypes
Supported dtypes
Method Details [hide private]

__init__(self, input_dim=None, output_dim=None, dtype=None, conv=1e-08, max_it=100000)
(Constructor)

 
Initializes an object of type 'NIPALSNode'.
Parameters:
  • input_dim (int) - The input dimensionality.
  • output_dim (int or float) - The number of principal components to be kept can be specified as 'output_dim' directly (e.g. 'output_dim=10' means 10 components are kept) or by the fraction of variance to be explained (e.g. 'output_dim=0.95' means that as many components as necessary will be kept in order to explain 95% of the input variance).
  • dtype (numpy.dtype or str) - The datatype.
  • conv (float) - Convergence threshold for the residual error.
  • max_it (int) - Maximum number of iterations.
Overrides: object.__init__

_stop_training(self, debug=False)

 
Concatenate the collected data in a single array.
Parameters:
  • debug - Determines if singular matrices itself are stored in self.cov_mtx and self.dcov_mtx to be examined, given that stop_training fails because of singular covmatrices. Default is False.
Raises:
  • mdp.NodeException - If negative eigenvalues occur, the covariance matrix may be singular or no component amounts to variation exceeding var_abs.
Overrides: Node._stop_training

_train(self, x)

 
Collect all input data in a list.
Parameters:
  • x - The training data.
Overrides: Node._train

stop_training(self, debug=False)

 
Concatenate the collected data in a single array.
Parameters:
  • debug - Determines if singular matrices itself are stored in self.cov_mtx and self.dcov_mtx to be examined, given that stop_training fails because of singular covmatrices. Default is False.
Raises:
  • mdp.NodeException - If negative eigenvalues occur, the covariance matrix may be singular or no component amounts to variation exceeding var_abs.
Overrides: Node.stop_training

train(self, x)

 
Collect all input data in a list.
Parameters:
  • x - The training data.
Overrides: Node.train

Instance Variable Details [hide private]

avg

Mean of the input data (available after training).

d

Variance corresponding to the PCA components (eigenvalues of the covariance matrix).

explained_variance

When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

v

Transposed of the projection matrix (available after training).