Package mdp :: Class OnlineFlow
[hide private]
[frames] | no frames]

Class OnlineFlow


An 'OnlineFlow' is a sequence of nodes that are trained online and executed
together to form a more complex algorithm.  Input data is sent to the
first node and is successively processed by the subsequent nodes along
the sequence.

Using an online flow as opposed to manually handling a set of nodes has a
clear advantage: The general online flow implementation automatates the
training (including supervised training and multiple training phases),
execution, and inverse execution (if defined) of the whole sequence.

To understand the compatible node sequences for an OnlineFlow, the following terminology is useful:
   A "trainable" node: node.is_trainable() returns True, node.is_training() returns True.
   A "trained" node: node.is_trainable() returns True, node.is_training() returns False.
   A "non-trainable" node: node.is_trainable() returns False, node.is_training() returns False.

OnlineFlow node sequence can contain
(a) only OnlineNodes
    (Eg. [OnlineCenteringNode(), IncSFANode()],
or
(b) a mix of OnlineNodes and trained/non-trainable Nodes
    (eg. [a fully trained PCANode, IncSFANode()] or [QuadraticExpansionNode(), IncSFANode()],
or
(c) a mix of OnlineNodes/trained/non-trainable Nodes and a terminal trainable Node (but not an OnlineNode) whose
training hasn't finished
    (eg. [IncSFANode(), QuadraticExpansionNode(), a partially or untrained SFANode]).

Differences between a Flow and an OnlineFlow:
a) In Flow, data is processed sequentially, training one node at a time. That is, the second
   node's training starts only after the first node is "trained". Whereas, in an OnlineFlow data is
   processed simultaneously training all the nodes at the same time.
   Eg:

   flow = Flow([node1, node2]), onlineflow = OnlineFlow([node1, node2])

   Let input x = [x_0, x_1, ...., x_n], where x_t a sample or a mini batch of samples.

   Flow training:
        node1 trains on the entire x. While node1 is training, node2 is inactive.
        node1 training completes. node2 training begins on the node1(x).

        Therefore, Flow goes through all the data twice. Once for each node.

   OnlineFlow training:
        node1 trains on x_0. node2 trains on the output of node1 (node1(x_0))
        node1 trains on x_1. node2 trains on the output of node1 (node1(x_1))
        ....
        node1 trains on x_n. node2 trains on the output of node1 (node1(x_n))

        OnlineFlow goes through all the data only once.

b) Flow requires a list of dataiterables with a length equal to the
   number of nodes or a single numpy array. OnlineFlow requires only one
   input dataiterable as each node is trained simultaneously.

c) Additional train args (supervised labels etc) are passed to each node through the
   node specific dataiterable. OnlineFlow requires the dataiterable to return a list
   that contains tuples of args for each node: [x, (node0 args), (node1 args), ...]. See
   train docstring.

Crash recovery is optionally available: in case of failure the current
state of the flow is saved for later inspection.

OnlineFlow objects are Python containers. Most of the builtin 'list'
methods are available. An 'OnlineFlow' can be saved or copied using the
corresponding 'save' and 'copy' methods.

Instance Methods [hide private]
 
__add__(self, other)
 
__delitem__(self, key)
 
__iadd__(self, other)
 
__init__(self, flow, crash_recovery=False, verbose=False)
Keyword arguments:
 
__setitem__(self, key, value)
 
_check_compatibility(self, flow)
 
_check_value_type_is_online_or_nontrainable_node(self, value)
 
_get_required_train_args_from_flow(self, flow)
 
_train_check_iterables(self, data_iterables)
Return the data iterable after some checks and sanitizing.
 
_train_node(self, data_iterable, nodenr)
Train a single node in the flow.
 
_train_nodes(self, data_iterables)
 
append(flow, node)
append node to flow end
 
extend(flow, iterable)
extend flow by appending elements from the iterable
 
insert(flow, index, node)
insert node before index
 
train(self, data_iterables)
Train all trainable nodes in the flow.

Inherited from unreachable.newobject: __long__, __native__, __nonzero__, __unicode__, next

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

    Inherited from Flow
 
__call__(self, iterable, nodenr=None)
Calling an instance is equivalent to call its 'execute' method.
 
__contains__(self, item)
 
__getitem__(self, key)
 
__iter__(self)
 
__len__(self)
 
__repr__(self)
repr(x)
 
__str__(self)
str(x)
 
_check_dimension_consistency(self, out, inp)
Raise ValueError when both dimensions are set and different.
 
_check_nodes_consistency(self, flow=None)
Check the dimension consistency of a list of nodes.
 
_check_value_type_isnode(self, value)
 
_close_last_node(self)
 
_execute_seq(self, x, nodenr=None)
 
_inverse_seq(self, x)
 
_propagate_exception(self, except_, nodenr)
 
_stop_training_hook(self)
Hook method that is called before stop_training is called.
 
copy(self, protocol=None)
Return a deep copy of the flow.
 
execute(self, iterable, nodenr=None)
Process the data through all nodes in the flow.
 
inverse(self, iterable)
Process the data through all nodes in the flow backwards (starting from the last node up to the first node) by calling the inverse function of each node. Of course, all nodes in the flow must be invertible.
node
pop(flow, index=...)
remove and return node at index (default last)
 
save(self, filename, protocol=-1)
Save a pickled serialization of the flow to 'filename'. If 'filename' is None, return a string.
 
set_crash_recovery(self, state=True)
Set crash recovery capabilities.
Static Methods [hide private]
    Inherited from Flow
 
_get_required_train_args(node)
Return arguments in addition to self and x for node.train.
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__add__(self, other)
(Addition operator)

 
Overrides: Flow.__add__

__delitem__(self, key)
(Index deletion operator)

 
Overrides: Flow.__delitem__

__iadd__(self, other)

 
Overrides: Flow.__iadd__

__init__(self, flow, crash_recovery=False, verbose=False)
(Constructor)

 

Keyword arguments:

flow -- a list of Nodes
crash_recovery -- set (or not) Crash Recovery Mode (save node
                  in case a failure)
verbose -- if True, print some basic progress information

Overrides: object.__init__
(inherited documentation)

__setitem__(self, key, value)
(Index assignment operator)

 
Overrides: Flow.__setitem__

_check_compatibility(self, flow)

 

_check_value_type_is_online_or_nontrainable_node(self, value)

 

_get_required_train_args_from_flow(self, flow)

 

_train_check_iterables(self, data_iterables)

 
Return the data iterable after some checks and sanitizing.

Note that this method does not distinguish between iterables and
iterators, so this must be taken care of later.

Overrides: Flow._train_check_iterables

_train_node(self, data_iterable, nodenr)

 
Train a single node in the flow.

nodenr -- index of the node in the flow

Overrides: Flow._train_node
(inherited documentation)

_train_nodes(self, data_iterables)

 

append(flow, node)

 
append node to flow end
Overrides: Flow.append

extend(flow, iterable)

 
extend flow by appending elements from the iterable
Overrides: Flow.extend

insert(flow, index, node)

 
insert node before index
Overrides: Flow.insert

train(self, data_iterables)

 

Train all trainable nodes in the flow.

'data_iterables' is a single iterable (including generator-type iterators if the last node has no multiple training phases) that must return data arrays to train nodes (so the data arrays are the 'x' for the nodes). Note that the data arrays are processed by the nodes which are in front of the node that gets trained, so the data dimension must match the input dimension of the first node.

'data_iterables' can also be a 2D or a 3D numpy array. A 2D array trains all the nodes incrementally, while a 3D array supports online training in batches (=shape[1]).

'data_iterables' can also return a list or a tuple, where the first entry is 'x' and the rest are the required args for training all the nodes in the flow (e.g. for supervised training).

(x, (node-0 args), (node-1 args), ..., (node-n args)) - args for n nodes

if say node-i does not require any args, the provided (node-i args) are ignored. So, one can simply use None for the nodes that do not require args.

(x, (node-0 args), ..., None, ..., (node-n args)) - No args for the ith node.

Overrides: Flow.train