Additional utilities¶
MDP offers some additional utilities of general interest
in the mdp.utils
module. Refer to the
API
for the full documentation and interface description.
- mdp.utils.CovarianceMatrix
This class stores an empirical covariance matrix that can be updated incrementally. A call to the
fix
method returns the current state of the covariance matrix, the average and the number of observations, and resets the internal data.Note that the internal sum is a standard
__add__
operation. We are not using any of the fancy sum algorithms to avoid round off errors when adding many numbers. If you want to contribute aCovarianceMatrix
class that uses such algorithms we would be happy to include it in MDP. For a start see the Python recipe by Raymond Hettinger. For a review about floating point arithmetic and its pitfalls see What every computer scientist should know about floating-point arithmetic by David Goldberg, ACM Computing Surveys, Vol 23, No 1, March 1991.- mdp.utils.VartimeCovarianceMatrix
This class stores an empirical covariance matrix that can be updated incrementally. A call to the
fix
method returns the current state of the covariance matrix, the average and the number of observations, and resets the internal data.As compared to the
CovarianceMatrix
class, this class accepts sampled input in conjunction with a non-constant time increment between samples. The covariance matrix is then computed as a (centered) scalar product between functions, that is sampled unevenly, using the trapezoid rule.- mdp.utils.DelayCovarianceMatrix
- This class stores an empirical covariance matrix between the signal and time delayed signal that can be updated incrementally.
- mdp.utils.MultipleCovarianceMatrices
- Container class for multiple covariance matrices to easily execute operations on all matrices at the same time.
- mdp.utils.dig_node (node)
- Crawl recursively an MDP
Node
looking for arrays. Return (dictionary, string), where the dictionary is: { attribute_name: (size_in_bytes, array_reference)} and string is a nice string representation of it. - mdp.utils.get_node_size (node)
- Get
node
total byte-size usingcPickle
with protocol=2. (The byte-size is related the memory needed by the node). - mdp.utils.progressinfo (sequence, length, style, custom)
A fully configurable text-mode progress info box tailored to the command-line die-hards. To get a progress info box for your loops use it like this:
>>> for i in progressinfo(sequence): ... do_something(i)
You can also use it with generators, files or any other iterable object, but in this case you have to specify the total length of the sequence:
>>> for line in progressinfo(open_file, nlines): ... do_something(line)
A few examples of the available layouts:
[===================================73%==============>...................] Progress: 67%[======================================> ] 23% [02:01:28] - [00:12:37]
- mdp.utils.QuadraticForm
- Define an inhomogeneous quadratic form as
1/2 x'Hx + f'x + c
. This class implements the quadratic form analysis methods presented in: Berkes, P. and Wiskott, L. On the analysis and interpretation of inhomogeneous quadratic forms as receptive fields. Neural Computation, 18(8): 1868-1895. (2006). - mdp.utils.refcast (array, dtype)
- Cast the array to
dtype
only if necessary, otherwise return a reference. - mdp.utils.rotate (mat, angle, columns, units)
Rotate in-place a NxM data matrix in the plane defined by the
columns
when observation are stored on rows. Observations are rotated counterclockwise. This corresponds to the following matrix-multiplication for each data-point (unchanged elements omitted):[ cos(angle) -sin(angle) [ x_i ] sin(angle) cos(angle) ] * [ x_j ]
- mdp.utils.random_rot (dim, dtype)
- Return a random rotation matrix, drawn from the Haar distribution (the only uniform distribution on SO(n)). The algorithm is described in the paper Stewart, G.W., The efficient generation of random orthogonal matrices with an application to condition estimators, SIAM Journal on Numerical Analysis, 17(3), pp. 403-409, 1980. For more information see this Wikipedia entry.
- mdp.utils.symeig_semidefinite_ldl (A, B, eigenvectors, turbo, rng, type, overwrite, rank_threshold, dfc_out)
LDL-based routine to solve generalized symmetric positive semidefinite eigenvalue problems. This can be used in case the normal
symeig()
call in_stop_training()
throwsSymeigException ('Covariance matrices may be singular')
.This solver uses SciPy’s raw LAPACK interface to access LDL decomposition.
Roughly as efficient as ordinary eigenvalue solving. Can exploit range parameter for performance just as well as the backend for ordinary symmetric eigenvalue solving enables. This is the recommended and most efficient approach, but it requires SciPy 1.0 or newer.
- mdp.utils.symeig_semidefinite_pca (A, B, eigenvectors, turbo, range, type, overwrite, rank_threshold, dfc_out)
PCA-based routine to solve generalized symmetric positive semidefinite eigenvalue problems. This can be used in case the normal
symeig()
call in_stop_training()
throwsSymeigException ('Covariance matrices may be singular')
.It applies PCA to B and filters out rank deficit before it applies symeig() to A. It is roughly twice as expensive as the ordinary eigh implementation.
One of the most stable and accurate approaches. Roughly twice as expensive as ordinary symmetric eigenvalue solving as it solves two symmetric eigenvalue problems. Only the second one can exploit range parameter for performance.
- mdp.utils.symeig_semidefinite_reg (A, B, eigenvectors, turbo, range, type, overwrite, rank_threshold, dfc_out)
Regularization-based routine to solve generalized symmetric positive semidefinite eigenvalue problems. This can be used in case the normal
symeig()
call in_stop_training()
throwsSymeigException ('Covariance matrices may be singular')
.This solver applies a moderate regularization to B before applying eigh/symeig. Afterwards it properly detects the rank deficit and filters out malformed features. For full range, this procedure is (approximately) as efficient as the ordinary eigh implementation, because all additional steps are computationally cheap. For shorter range, the LDL method should be preferred.
Roughly as efficient as ordinary eigenvalue solving if no range is given. If range is given, depending on the backend for ordinary symmetric eigenvalue solving, this method can be much slower than an ordinary symmetric eigenvalue solver that can exploit range for performance.
- mdp.utils.symeig_semidefinite_svd (A, B, eigenvectors, turbo, range, type, overwrite, rank_threshold, dfc_out)
SVD-based routine to solve generalized symmetric positive semidefinite eigenvalue problems. This can be used in case the normal
symeig()
call in_stop_training()
throwsSymeigException ('Covariance matrices may be singular')
.One of the most stable and accurate approaches. Involves solving two svd problems. Computational cost can vary greatly depending on the backends used. E.g. SVD from SciPy appears to be much faster than SVD from NumPy. Based on this it can be faster or slower than the PCA based approach.
- mdp.utils.symrand (dim_or_eigv, dtype)
- Return a random symmetric (Hermitian) matrix with eigenvalues uniformly distributed on (0,1].
HTML Slideshows¶
The mdp.utils
module contains some classes and helper function to
display animated results in a Webbrowser. This works by creating an
HTML file with embedded JavaScript code, which dynamically loads
image files (the images contain the content that you want to animate
and can for example be created with matplotlib).
MDP internally uses the open source Templete templating libray,
written by David Bau.
The easiest way to create a slideshow it to use one of these two helper function:
- mdp.utils.show_image_slideshow (filenames, image_size, filename=None, title=None, **kwargs)
- Write the slideshow into a HTML file, open it in the browser and
return the file name.
filenames
is a list of the images files that you want to display in the slideshow.image_size
is a 2-tuple containing the width and height at which the images should be displayed. There are also a couple of additional arguments, which are documented in the docstring. - mdp.utils.image_slideshow (filenames, image_size, title=None, **kwargs)
- This function is similar to
show_image_slideshow
, but it simply returns the slideshow HTML code (including the JavaScript code) which you can then embed into your own HTML file. Note that the default slideshow CSS code is not included, but it can be accessed inmdp.utils.IMAGE_SLIDESHOW_STYLE
.
Note that there are also two demos in the Examples section Slideshow.
Graph module¶
MDP contains mdp.graph
, a lightweight package to handle directed graphs.
- mdp.graph.Graph
Represent a directed graph. This class contains several methods to create graph structures and manipulate them, among which
add_tree
: Add a tree to the graph.The tree is specified with a nested list of tuple, in a LISP-like notation. The values specified in the list become the values of the single nodes. Return an equivalent nested list with the nodes instead of the values.
Example::
>>> g = mdp.graph.Graph() >>> a = b = c = d = e = None >>> nodes = g.add_tree( (a, b, (c, d ,e)) )
Graph
g
corresponds to this tree, with all node values beingNone
:a / \ b c / \ d e
topological_sort
: Perform a topological sort of the nodes.dfs
,undirected_dfs
: Perform Depth First sort.bfs
,undirected_bfs
: Perform Breadth First sort.connected_components
: Return a list of lists containing the nodes of all connected components of the graph.is_weakly_connected
: Return True if the graph is weakly connected.
- mdp.graph.GraphEdge
- Represent a graph edge and all information attached to it.
- mdp.graph.GraphNode
- Represent a graph node and all information attached to it.
- mdp.graph.recursive_map (fun, seq)
- Apply a function recursively on a sequence and all subsequences.
- mdp.graph.recursive_reduce (func, seq, *argv)
- Apply
reduce(func, seq)
recursively to a sequence and all its subsequences.