Package mdp :: Package nodes :: Class PCANode
[hide private]
[frames] | no frames]

Class PCANode


Filter the input data through the most significatives of its principal components.


Reference

More information about Principal Component Analysis, a.k.a. discrete Karhunen-Loeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Instance Methods [hide private]
 
__init__(self, input_dim=None, output_dim=None, dtype=None, svd=False, reduce=False, var_rel=1e-12, var_abs=1e-15, var_part=None)
Initializes an object of type 'PCANode'.
tuple
_adjust_output_dim(self)
This function is used if the output dimensions is smaller than the input dimension (so only the larger eigenvectors have to be kept). If required it sets the output dim.
 
_check_output(self, y)
numpy.ndarray
_execute(self, x, n=None)
Project the input on the first 'n' principal components.
numpy.ndarray
_inverse(self, y, n=None)
Project data from the output to the input space using the first 'n' components.
 
_set_output_dim(self, n)
 
_stop_training(self, debug=False)
Stop the training phase.
 
_train(self, x)
Update the covariance matrix.
numpy.ndarray
execute(self, x, n=None)
Project the input on the first 'n' principal components.
float
get_explained_variance(self)
The explained variance is the fraction of the original variance that can be explained by self._output_dim PCA components. If for example output_dim has been set to 0.95, the explained variance could be something like 0.958...
numpy.ndarray
get_projmatrix(self, transposed=1)
Returns the projection matrix.
numpy.ndarray
get_recmatrix(self, transposed=1)
Returns the the back-projection matrix (i.e. the reconstruction matrix).
numpy.ndarray
inverse(self, y, n=None)
Project data from the output to the input space using the first 'n' components.
 
stop_training(self, debug=False)
Stop the training phase.
 
train(self, x)
Update the covariance matrix.

Inherited from unreachable.newobject: __long__, __native__, __nonzero__, __unicode__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

    Inherited from Node
 
__add__(self, other)
 
__call__(self, x, *args, **kwargs)
Calling an instance of Node is equivalent to calling its execute method.
 
__repr__(self)
repr(x)
 
__str__(self)
str(x)
 
_check_input(self, x)
 
_check_train_args(self, x, *args, **kwargs)
 
_get_supported_dtypes(self)
Return the list of dtypes supported by this node.
 
_get_train_seq(self)
 
_if_training_stop_training(self)
 
_pre_execution_checks(self, x)
This method contains all pre-execution checks.
 
_pre_inversion_checks(self, y)
This method contains all pre-inversion checks.
 
_refcast(self, x)
Helper function to cast arrays to the internal dtype.
 
_set_dtype(self, t)
 
_set_input_dim(self, n)
 
copy(self, protocol=None)
Return a deep copy of the node.
 
get_current_train_phase(self)
Return the index of the current training phase.
 
get_dtype(self)
Return dtype.
 
get_input_dim(self)
Return input dimensions.
 
get_output_dim(self)
Return output dimensions.
 
get_remaining_train_phase(self)
Return the number of training phases still to accomplish.
 
get_supported_dtypes(self)
Return dtypes supported by the node as a list of numpy.dtype objects.
 
has_multiple_training_phases(self)
Return True if the node has multiple training phases.
 
is_training(self)
Return True if the node is in the training phase, False otherwise.
 
save(self, filename, protocol=-1)
Save a pickled serialization of the node to filename. If filename is None, return a string.
 
set_dtype(self, t)
Set internal structures' dtype.
 
set_input_dim(self, n)
Set input dimensions.
 
set_output_dim(self, n)
Set output dimensions.
Static Methods [hide private]
    Inherited from Node
 
is_invertible()
Return True if the node can be inverted, False otherwise.
 
is_trainable()
Return True if the node can be trained, False otherwise.
Instance Variables [hide private]
  avg
Mean of the input data (available after training).
  d
Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
  explained_variance
When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
  v
Transposed of the projection matrix (available after training).
Properties [hide private]

Inherited from object: __class__

    Inherited from Node
  _train_seq
List of tuples:
  dtype
dtype
  input_dim
Input dimensions
  output_dim
Output dimensions
  supported_dtypes
Supported dtypes
Method Details [hide private]

__init__(self, input_dim=None, output_dim=None, dtype=None, svd=False, reduce=False, var_rel=1e-12, var_abs=1e-15, var_part=None)
(Constructor)

 

Initializes an object of type 'PCANode'.

The number of principal components to be kept can be specified as 'output_dim' directly (e.g. 'output_dim=10' means 10 components are kept) or by the fraction of variance to be explained (e.g. 'output_dim=0.95' means that as many components as necessary will be kept in order to explain 95% of the input variance).

Note: When the *reduce* switch is enabled, the actual number of principal components (self.output_dim) may be different from that set when creating the instance.
Parameters:
  • input_dim (int) - Dimensionality of the input. Default is None.
  • output_dim (int) - Dimensionality of the output. Default is None.
  • dtype (numpy.dtype, str) - Datatype of the input. Default is None.
  • svd (bool) - If True use Singular Value Decomposition instead of the standard eigenvalue problem solver. Use it when PCANode complains about singular covariance matrices. Default is Flase.
  • reduce (bool) - Keep only those principal components which have a variance larger than 'var_abs' and a variance relative to the first principal component larger than 'var_rel' and a variance relative to total variance larger than 'var_part' (set var_part to None or 0 for no filtering). Default is False.
  • var_rel (float) - Variance relative to first principal component threshold. Default is 1E-12.
  • var_abs (float) - Absolute variance threshold. Default is 1E-15.
  • var_part (float) - Variance relative to total variance threshold. Default is None.
Overrides: object.__init__

_adjust_output_dim(self)

 
This function is used if the output dimensions is smaller than the input dimension (so only the larger eigenvectors have to be kept). If required it sets the output dim.
Returns: tuple
The eigenvector range.

_check_output(self, y)

 
Overrides: Node._check_output

_execute(self, x, n=None)

 

Project the input on the first 'n' principal components.

If 'n' is not set, use all available components.

Parameters:
  • x (numpy.ndarray) - Input with at least 'n' principle components.
  • n (int) - Number of first principle components.
Returns: numpy.ndarray
The projected input.
Overrides: Node._execute

_inverse(self, y, n=None)

 

Project data from the output to the input space using the first 'n' components.

If 'n' is not set, use all available components.

Parameters:
  • y (numpy.ndarray) - Data to be projected to the input space.
  • n (int) - Number of first principle components.
Returns: numpy.ndarray
The projected data
Overrides: Node._inverse

_set_output_dim(self, n)

 
Overrides: Node._set_output_dim

_stop_training(self, debug=False)

 
Stop the training phase.
Parameters:
  • debug (bool) - Determines if singular matrices itself are stored in self.cov_mtx and self.dcov_mtx to be examined, given that stop_training fails because of singular covmatrices. Default is False.
Raises:
  • mdp.NodeException - If negative eigenvalues occur, the covariance matrix may be singular or no component amounts to variation exceeding var_abs.
Overrides: Node._stop_training

_train(self, x)

 
Update the covariance matrix.
Parameters:
  • x (numpy.ndarray) - The training data.
Overrides: Node._train

execute(self, x, n=None)

 

Project the input on the first 'n' principal components.

If 'n' is not set, use all available components.

Parameters:
  • x (numpy.ndarray) - Input with at least 'n' principle components.
  • n (int) - Number of first principle components.
Returns: numpy.ndarray
The projected input.
Overrides: Node.execute

get_explained_variance(self)

 

The explained variance is the fraction of the original variance that can be explained by self._output_dim PCA components. If for example output_dim has been set to 0.95, the explained variance could be something like 0.958...

Note: If output_dim was explicitly set to be a fixed number of components, there is no way to calculate the explained variance.
Returns: float
The explained variance.

get_projmatrix(self, transposed=1)

 
Returns the projection matrix.
Parameters:
  • transposed (bool) - Determines whether the transposed projection matrix is returned. Default is True.
Returns: numpy.ndarray
The projection matrix.

get_recmatrix(self, transposed=1)

 
Returns the the back-projection matrix (i.e. the reconstruction matrix).
Parameters:
  • transposed (bool) - Determines whether the transposed back-projection matrix (i.e. the reconstruction matrix) is returned. Default is True.
Returns: numpy.ndarray
The back-projection matrix (i.e. the reconstruction matrix).

inverse(self, y, n=None)

 

Project data from the output to the input space using the first 'n' components.

If 'n' is not set, use all available components.

Parameters:
  • y (numpy.ndarray) - Data to be projected to the input space.
  • n (int) - Number of first principle components.
Returns: numpy.ndarray
The projected data
Overrides: Node.inverse

stop_training(self, debug=False)

 
Stop the training phase.
Parameters:
  • debug (bool) - Determines if singular matrices itself are stored in self.cov_mtx and self.dcov_mtx to be examined, given that stop_training fails because of singular covmatrices. Default is False.
Raises:
  • mdp.NodeException - If negative eigenvalues occur, the covariance matrix may be singular or no component amounts to variation exceeding var_abs.
Overrides: Node.stop_training

train(self, x)

 
Update the covariance matrix.
Parameters:
  • x (numpy.ndarray) - The training data.
Overrides: Node.train

Instance Variable Details [hide private]

avg

Mean of the input data (available after training).

d

Variance corresponding to the PCA components (eigenvalues of the covariance matrix).

explained_variance

When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.

v

Transposed of the projection matrix (available after training).