Node List¶
Full API documentation: nodes

class
mdp.nodes.
PCANode
¶ Filter the input data through the most significatives of its principal components.
Variables:  avg – Mean of the input data (available after training).
 v – Transposed of the projection matrix (available after training).
 d – Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
 explained_variance – When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
Reference
More information about Principal Component Analysis, a.k.a. discrete KarhunenLoeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, SpringerVerlag (1986).
Full API documentation: PCANode

class
mdp.nodes.
WhiteningNode
¶ Whiten the input data by filtering it through the most significant of its principal components.
All output signals have zero mean, unit variance and are decorrelated.
Variables:  avg – Mean of the input data (available after training).
 v – Transpose of the projection matrix (available after training).
 d – Variance corresponding to the PCA components (eigenvalues of the covariance matrix).
 explained_variance – When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
Full API documentation: WhiteningNode

class
mdp.nodes.
NIPALSNode
¶ Perform Principal Component Analysis using the NIPALS algorithm.
This algorithm is particularly useful if you have more variables than observations, or in general when the number of variables is huge and calculating a full covariance matrix may be infeasible. It’s also more efficient of the standard PCANode if you expect the number of significant principal components to be a small. In this case setting output_dim to be a certain fraction of the total variance, say 90%, may be of some help.
Variables:  avg – Mean of the input data (available after training).
 d – Variance corresponding to the PCA components.
 v – Transposed of the projection matrix (available after training).
 explained_variance – When output_dim has been specified as a fraction of the total variance, this is the fraction of the total variance that is actually explained.
Reference
Reference for NIPALS (Nonlinear Iterative Partial Least Squares): Wold, H. Nonlinear estimation by iterative least squares procedures. in David, F. (Editor), Research Papers in Statistics, Wiley, New York, pp 411444 (1966).
More information about Principal Component Analysis*, a.k.a. discrete KarhunenLoeve transform can be found among others in I.T. Jolliffe, Principal Component Analysis, SpringerVerlag (1986).
Original code contributed by: Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).
Full API documentation: NIPALSNode

class
mdp.nodes.
FastICANode
¶ Perform Independent Component Analysis using the FastICA algorithm.
Note that FastICA is a batchalgorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
FastICA does not support the telescope mode (the convergence criterium is not robust in telescope mode). criterium is not robust in telescope mode).
History:
 1.4.1998 created for Matlab by Jarmo Hurri, Hugo Gavert, Jaakko Sarela, and Aapo Hyvarinen
 7.3.2003 modified for Python by Thomas Wendler
 3.6.2004 rewritten and adapted for scipy and MDP by MDP’s authors
 25.5.2005 now independent from scipy. Requires Numeric or numarray
 26.6.2006 converted to numpy
 14.9.2007 updated to Matlab version 2.5
Variables:  white – The whitening node used for preprocessing.
 filters – The ICA filters matrix (this is the transposed of the projection matrix after whitening).
 convergence – The value of the convergence threshold.
Reference
Aapo Hyvarinen (1999). Fast and Robust FixedPoint Algorithms for Independent Component Analysis IEEE Transactions on Neural Networks, 10(3):626634.
Full API documentation: FastICANode

class
mdp.nodes.
CuBICANode
¶ Perform Independent Component Analysis using the CuBICA algorithm.
Note that CuBICA is a batchalgorithm, which means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
As an alternative to this batch mode you might consider the telescope mode (see the docs of the
__init__
method).Variables:  white – The whitening node used for preprocessing.
 filters – The ICA filters matrix (this is the transposed of the projection matrix after whitening).
 convergence – The value of the convergence threshold.
Reference
Blaschke, T. and Wiskott, L. (2003). CuBICA: Independent Component Analysis by Simultaneous Third and FourthOrder Cumulant Diagonalization. IEEE Transactions on Signal Processing, 52(5), pp. 12501256.
Full API documentation: CuBICANode

class
mdp.nodes.
TDSEPNode
¶ Perform Independent Component Analysis using the TDSEP algorithm.
Note
That TDSEP, as implemented in this Node, is an online algorithm, i.e. it is suited to be trained on huge data sets, provided that the training is done sending small chunks of data for each time.
Variables:  white – The whitening node used for preprocessing.
 filters – The ICA filters matrix (this is the transposed of the projection matrix after whitening).
 convergence – The value of the convergence threshold.
Reference
Ziehe, Andreas and Muller, KlausRobert (1998). TDSEP an efficient algorithm for blind separation using time structure. in Niklasson, L, Boden, M, and Ziemke, T (Editors), Proc. 8th Int. Conf. Artificial Neural Networks (ICANN 1998).
Full API documentation: TDSEPNode

class
mdp.nodes.
JADENode
¶ Perform Independent Component Analysis using the JADE algorithm.
Note that JADE is a batchalgorithm. This means that it needs all input data before it can start and compute the ICs. The algorithm is here given as a Node for convenience, but it actually accumulates all inputs it receives. Remember that to avoid running out of memory when you have many components and many time samples.
JADE does not support the telescope mode.
Reference
Cardoso, JeanFrancois and Souloumiac, Antoine (1993). Blind beamforming for non Gaussian signals. Radar and Signal Processing, IEE Proceedings F, 140(6): 362370.
Cardoso, JeanFrancois (1999). Highorder contrasts for independent component analysis. Neural Computation, 11(1): 157192.
Original code contributed by: Gabriel Beckers (2008).
History
 May 2005 version 1.8 for MATLAB released by JeanFrancois Cardoso
 Dec 2007 MATLAB version 1.8 ported to Python/NumPy by Gabriel Beckers
 Feb 15 2008 Python/NumPy version adapted for MDP by Gabriel Beckers
Full API documentation: JADENode

class
mdp.nodes.
SFANode
¶ Extract the slowly varying components from the input data.
Variables:  avg – Mean of the input data (available after training)
 sf – Matrix of the SFA filters (available after training)
 d – Delta values corresponding to the SFA components (generalized
eigenvalues). [See the docs of the
get_eta_values
method for more information]
Reference
More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715770 (2002).
Full API documentation: SFANode

class
mdp.nodes.
SFA2Node
¶ Get an input signal, expand it in the space of inhomogeneous polynomials of degree 2 and extract its slowly varying components.
Theget_quadratic_form
method returns the inputoutputfunction of one of the learned unit as a
QuadraticForm
object. See the documentation ofmdp.utils.QuadraticForm
for additional information.Reference:
More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715770 (2002).
Full API documentation: SFA2Node

class
mdp.nodes.
VartimeSFANode
¶ Extract the slowly varying components from the input data. This node can be understood as a generalization of the SFANode that allows nonconstant time increments between samples.
In particular, this node numerically computes the integrals involved in the SFA problem formulation by applying the trapezoid rule.
Variables:  avg – Mean of the input data (available after training)
 sf – Matrix of the SFA filters (available after training)
 d – Delta values corresponding to the SFA components (generalized
eigenvalues). [See the docs of the
get_eta_values
method for more information]
Reference
More information about Slow Feature Analysis can be found in Wiskott, L. and Sejnowski, T.J., Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715770 (2002).
Full API documentation: VartimeSFANode

class
mdp.nodes.
ISFANode
¶ Perform Independent Slow Feature Analysis on the input data.
Variables:  RP – The global rotationpermutation matrix. This is the filter applied on input_data to get output_data
 RPC – The complete global rotationpermutation matrix. This is a matrix of dimension input_dim x input_dim (the ‘outer space’ is retained)
 covs – A mdp.utils.MultipleCovarianceMatrices instance
input_data. After convergence the uppermost
output_dim
xoutput_dim
submatrices should be almost diagonal.self.covs[n1]
is the covariance matrix relative to then
th timelag
Note
They are not cleared after convergence. If you need to free some memory, you can safely delete them with:
>>> del self.covs
Variables:  initial_contrast – A dictionary with the starting contrast and the SFA and ICA parts of it.
 final_contrast – Like the above but after convergence.
Note
If you intend to use this node for large datasets please have a look at the
stop_training
method documentation for speeding things up.Reference
Blaschke, T. , Zito, T., and Wiskott, L. (2007). Independent Slow Feature Analysis and Nonlinear Blind Source Separation. Neural Computation 19(4):9941021 (2007) http://itb.biologie.huberlin.de/~wiskott/Publications/BlasZitoWisk2007ISFANeurComp.pdf
Full API documentation: ISFANode

class
mdp.nodes.
XSFANode
¶ Perform Nonlinear Blind Source Separation using Slow Feature Analysis. This node is designed to iteratively extract statistically independent sources from (in principle) arbitrary invertible nonlinear mixtures. The method relies on temporal correlations in the sources and consists of a combination of nonlinear SFA and a projection algorithm. More details can be found in the reference given below (once it’s published).
The node has multiple training phases. The number of training phases depends on the number of sources that must be extracted. The recommended way of training this node is through a container flow:
>>> flow = mdp.Flow([XSFANode()]) >>> flow.train(x)
doing so will automatically train all training phases. The argument
x
to theFlow.train
method can be an array or a list of iterables (see the section about Iterators in the MDP tutorial for more info). If the number of training samples is large, you may run into memory problems: use data iterators and chunk training to reduce memory usage.If you need to debug training and/or execution of this node, the suggested approach is to use the capabilities of BiMDP. For example:
>>> flow = mdp.Flow([XSFANode()]) >>> tr_filename = bimdp.show_training(flow=flow, data_iterators=x) >>> ex_filename, out = bimdp.show_execution(flow, x=x)
this will run training and execution with bimdp inspection. Snapshots of the internal flow state for each training phase and execution step will be opened in a web brower and presented as a slideshow.
Reference
Sprekeler, H., Zito, T., and Wiskott, L. (2009). An Extension of Slow Feature Analysis for Nonlinear Blind Source Separation. Journal of Machine Learning Research. http://cogprints.org/7056/1/SprekelerZitoWiskottCogprints2010.pdf
Full API documentation: XSFANode

class
mdp.nodes.
GSFANode
¶ This node implements “GraphBased SFA (GSFA)”, which is the main component of hierarchical GSFA (HGSFA).
For further information, see: EscalanteB A.N., Wiskott L, “How to solve classification and regression problems on highdimensional data with a supervised extension of Slow Feature Analysis”. Journal of Machine Learning Research 14:36833719, 2013
Full API documentation: GSFANode

class
mdp.nodes.
iGSFANode
¶ This node implements “informationpreserving graphbased SFA (iGSFA)”, which is the main component of hierarchical iGSFA (HiGSFA).
For further information, see: EscalanteB., A.N. and Wiskott, L., “Improved graphbased {SFA}: Information preservation complements the slowness principle”, eprint arXiv:1601.03945, http://arxiv.org/abs/1601.03945, 2017.
Full API documentation: iGSFANode

class
mdp.nodes.
FDANode
¶ Perform a (generalized) Fisher Discriminant Analysis of its input. It is a supervised node that implements FDA using a generalized eigenvalue approach.
Note
FDANode has two training phases and is supervised so make sure to pay attention to the following points when you train it:
 call the
train
method with two arguments: the input data and the labels (see the doc string of thetrain
method for details).  if you are training the node by hand, call the
train
method twice.  if you are training the node using a flow (recommended), the
only argument to
Flow.train
must be a list of(data_point, label)
tuples or an iterator returning lists of such tuples, not a generator. TheFlow.train
function can be called just once as usual, since it takes care of rewinding the iterator to perform the second training step.
Variables:  avg – Mean of the input data (available after training).
 v – Transposed of the projection matrix, so that
output = dot(inputself.avg, self.v)
(available after training).
Reference
More information on Fisher Discriminant Analysis can be found for example in C. Bishop, Neural Networks for Pattern Recognition, Oxford Press, pp. 105112.
Full API documentation: FDANode
 call the

class
mdp.nodes.
FANode
¶ Perform Factor Analysis.
The current implementation should be most efficient for long data sets: the sufficient statistics are collected in the training phase, and all EMcycles are performed at its end.
The
execute
method returns the Maximum A Posteriori estimate of the latent variables. Thegenerate_input
method generates observations from the prior distribution.Variables:  mu – Mean of the input data (available after training).
 A – Generating weights (available after training).
 E_y_mtx – Weights for Maximum A Posteriori inference.
 sigma – Vector of estimated variance of the noise for all input components.
Reference
More information about Factor Analysis can be found in Max Welling’s classnotes: http://www.ics.uci.edu/~welling/classnotes/classnotes.html , in the chapter ‘Linear Models’.
Full API documentation: FANode

class
mdp.nodes.
RBMNode
¶ Restricted Boltzmann Machine node. An RBM is an undirected probabilistic network with binary variables. The graph is bipartite into observed (visible) and hidden (latent) variables. By default, the
execute
method returns the probability of one of the hiden variables being equal to 1 given the input. Use thesample_v
method to sample from the observed variables given a setting of the hidden variables, andsample_h
to do the opposite. Theenergy
method can be used to compute the energy of a given setting of all variables.Reference
For more information on RBMs, see Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668
The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):17111800
Variables:  w – Generative weights between hidden and observed variables.
 bv – Bias vector of the observed variables.
 bh – Bias vector of the hidden variables.
Full API documentation: RBMNode

class
mdp.nodes.
RBMWithLabelsNode
¶ Restricted Boltzmann Machine with softmax labels. An RBM is an undirected probabilistic network with binary variables. In this case, the node is partitioned into a set of observed (visible) variables, a set of hidden (latent) variables, and a set of label variables (also observed), only one of which is active at any time. The node is able to learn associations between the visible variables and the labels. By default, the
execute
method returns the probability of one of the hiden variables being equal to 1 given the input. Use thesample_v
method to sample from the observed variables (visible and labels) given a setting of the hidden variables, andsample_h
to do the opposite. Theenergy
method can be used to compute the energy of a given setting of all variables.Reference
The network is trained by Contrastive Divergence, as described in Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8):17111800
For more information on RBMs with labels, see:
 Geoffrey E. Hinton (2007) Boltzmann machine. Scholarpedia, 2(5):1668.
 Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:15271554.
Variables:  w – Generative weights between hidden and observed variables.
 bv – Bias vector of the observed variables.
 bh – Bias vector of the hidden variables.
Full API documentation: RBMWithLabelsNode

class
mdp.nodes.
GrowingNeuralGasNode
¶ Learn the topological structure of the input data by building a corresponding graph approximation.
The algorithm expands on the original Neural Gas algorithm (see mdp.nodes NeuralGasNode) in that the algorithm adds new nodes are added to the graph as more data becomes available. Im this way, if the growth rate is appropriate, one can avoid overfitting or underfitting the data.
Variables: graph – The corresponding mdp.graph.Graph object. Reference
More information about the Growing Neural Gas algorithm can be found in B. Fritzke, A Growing Neural Gas Network Learns Topologies, in G. Tesauro, D. S. Touretzky, and T. K. Leen (editors), Advances in Neural Information Processing Systems 7, pages 625632. MIT Press, Cambridge MA, 1995.
Full API documentation: GrowingNeuralGasNode

class
mdp.nodes.
LLENode
¶ Perform a Locally Linear Embedding analysis on the data.
Variables:  training_projection – The LLE projection of the training data (defined when training finishes).
 desired_variance – Variance limit used to compute intrinsic dimensionality.
Based on the algorithm outlined in An Introduction to Locally Linear Embedding by L. Saul and S. Roweis, using improvements suggested in Locally Linear Embedding for Classification by D. deRidder and R.P.W. Duin.
Reference
Roweis, S. and Saul, L., Nonlinear dimensionality reduction by locally linear embedding, Science 290 (5500), pp. 23232326, 2000.
Original code contributed by: Jake VanderPlas, University of Washington,
Full API documentation: LLENode

class
mdp.nodes.
HLLENode
¶ Perform a Hessian Locally Linear Embedding analysis on the data.
Variables:  training_projection – The HLLE projection of the training data (defined when training finishes).
 desired_variance – Variance limit used to compute intrinsic dimensionality.
Note
Many methods are inherited from LLENode, including _execute(), _adjust_output_dim(), etc. The main advantage of the Hessian estimator is to limit distortions of the input manifold. Once the model has been trained, it is sufficient (and much less computationally intensive) to determine projections for new points using the LLE framework.
Reference
Implementation based on algorithm outlined in Donoho, D. L., and Grimes, C., Hessian Eigenmaps: new locally linear embedding techniques for highdimensional data, Proceedings of the National Academy of Sciences 100(10): 55915596, 2003.
Original code contributed by: Jake Vanderplas, University of Washington
Full API documentation: HLLENode

class
mdp.nodes.
LinearRegressionNode
¶ Compute leastsquare, multivariate linear regression on the input data, i.e., learn coefficients
b_j
so that the linear combinationy_i = b_0 + b_1 x_1 + ... b_N x_N
, fori = 1 ... M
, minimizes the sum of squared error given the trainingx
’s andy
’s.This is a supervised learning node, and requires input data
x
and target datay
to be supplied during training (seetrain
docstring).Variables: beta – The coefficients of the linear regression. Full API documentation: LinearRegressionNode

class
mdp.nodes.
QuadraticExpansionNode
¶ Perform expansion in the space formed by all linear and quadratic monomials.
QuadraticExpansionNode()
is equivalent to aPolynomialExpansionNode(2)
Full API documentation: QuadraticExpansionNode

class
mdp.nodes.
PolynomialExpansionNode
¶ Perform expansion in a polynomial space.
Full API documentation: PolynomialExpansionNode

class
mdp.nodes.
RBFExpansionNode
¶ Expand input space with Gaussian Radial Basis Functions (RBFs).
The input data is filtered through a set of unnormalized Gaussian filters, i.e.:
y_j = exp(0.5/s_j * x  c_j^2)
for isotropic RBFs, or more in general:
y_j = exp(0.5 * (xc_j)^T S^1 (xc_j))
for anisotropic RBFs.
Full API documentation: RBFExpansionNode

class
mdp.nodes.
GeneralExpansionNode
¶ Expands the input samples by applying to them one or more functions provided.
The functions to be applied are specified by a list [f_0, …, f_k], where f_i, for 0 <= i <= k, denotes a particular function. The input data given to these functions is a twodimensional array and the output is another twodimensional array. The dimensionality of the output should depend only on the dimensionality of the input. Given a twodimensional input array x, the output of the node is then [f_0(x), …, f_k(x)], that is, the concatenation of each one of the computed arrays f_i(x).
This node has been designed to facilitate nonlinear, fixed but arbitrary transformations of the data samples within MDP flows.
Original code contributed by Alberto Escalante.
Example
>>> import mdp >>> from mdp import numx
>>> def identity(x): return x
>>> def u3(x): return numx.absolute(x)**3 #A simple nonlinear transformation
>>> def norm2(x): #Computes the norm of each sample returning an Nx1 array >>> return ((x**2).sum(axis=1)**0.5).reshape((1,1))
>>> x = numx.array([[2., 2.], [0.2, 0.3], [0.6, 1.2]]) >>> gen = mdp.nodes.GeneralExpansionNode(funcs=[identity, u3, norm2]) >>> print(gen.execute(x)) >>> [[2. 2. 8. 8. 2.82842712] >>> [ 0.2 0.3 0.008 0.027 0.36055513] >>> [ 0.6 1.2 0.216 1.728 1.34164079]]
Full API documentation: GeneralExpansionNode

class
mdp.nodes.
GrowingNeuralGasExpansionNode
¶ Perform a trainable radial basis expansion, where the centers and sizes of the basis functions are learned through a growing neural gas.
The positions of RBFs correspond to position of the nodes of the neural gas The sizes of the RBFs correspond to mean distance to the neighbouring nodes.
Note
Adjust the maximum number of nodes to control the dimension of the expansion.
Reference
More information on this expansion type can be found in: B. Fritzke. Growing cell structuresa selforganizing network for unsupervised and supervised learning. Neural Networks 7, p. 1441–1460 (1994).
Full API documentation: GrowingNeuralGasExpansionNode

class
mdp.nodes.
NeuralGasNode
¶ Learn the topological structure of the input data by building a corresponding graph approximation (original Neural Gas algorithm).
Variables:  graph – The corresponding mdp.graph.Graph object.
 max_epochs – Maximum number of epochs until which to train.
Reference
The Neural Gas algorithm was originally published in Martinetz, T. and Schulten, K.: A “NeuralGas” Network Learns Topologies. In Kohonen, T., Maekisara, K., Simula, O., and Kangas, J. (eds.), Artificial Neural Networks. Elsevier, NorthHolland., 1991.
Full API documentation: NeuralGasNode

class
mdp.nodes.
SignumClassifier
¶ This classifier node classifies as
1
if the sum of the data points is positive and as1
if the data point is negative.Full API documentation: SignumClassifier

class
mdp.nodes.
PerceptronClassifier
¶ A simple perceptron with input_dim input nodes.
Full API documentation: PerceptronClassifier

class
mdp.nodes.
SimpleMarkovClassifier
¶ A simple version of a Markov classifier.
It can be trained on a vector of tuples the label being the next element in the testing data.
Full API documentation: SimpleMarkovClassifier

class
mdp.nodes.
DiscreteHopfieldClassifier
¶ Node for simulating a simple discrete Hopfield model
Full API documentation: DiscreteHopfieldClassifier

class
mdp.nodes.
KMeansClassifier
¶ Employs KMeans Clustering for a given number of centroids.
Full API documentation: KMeansClassifier

class
mdp.nodes.
NormalizeNode
¶ Make input signal meanfree and unit variance.
Full API documentation: NormalizeNode

class
mdp.nodes.
GaussianClassifier
¶ Perform a supervised Gaussian classification.
Given a set of labelled data, the node fits a gaussian distribution to each class.
Full API documentation: GaussianClassifier

class
mdp.nodes.
NearestMeanClassifier
¶ NearestMean classifier.
Full API documentation: NearestMeanClassifier

class
mdp.nodes.
KNNClassifier
¶ KNearestNeighbour Classifier.
Full API documentation: KNNClassifier

class
mdp.nodes.
EtaComputerNode
¶ Compute the eta values of the normalized training data.
The delta value of a signal is a measure of its temporal variation, and is defined as the mean of the derivative squared, i.e.
delta(x) = mean(dx/dt(t)^2)
.delta(x)
is zero ifx
is a constant signal, and increases if the temporal variation of the signal is bigger.The eta value is a more intuitive measure of temporal variation, defined as:
eta(x) = T/(2*pi) * sqrt(delta(x))
If
x
is a signal of lengthT
which consists of a sine function that accomplishes exactlyN
oscillations, theneta(x)=N
.EtaComputerNode
normalizes the training data to have unit variance, such that it is possible to compare the temporal variation of two signals independently from their scaling.Note
 If a data chunk is tlen data points long, this node is
going to consider only the first tlen1 points together with their
derivatives. This means in particular that the variance of the
signal is not computed on all data points. This behavior is
compatible with that of
SFANode
.  This is an analysis node, i.e. the data is analyzed during training
and the results are stored internally. Use the method
get_eta
to access them.
Reference
Wiskott, L. and Sejnowski, T.J. (2002). Slow Feature Analysis: Unsupervised Learning of Invariances, Neural Computation, 14(4):715770.
Full API documentation: EtaComputerNode
 If a data chunk is tlen data points long, this node is
going to consider only the first tlen1 points together with their
derivatives. This means in particular that the variance of the
signal is not computed on all data points. This behavior is
compatible with that of

class
mdp.nodes.
HitParadeNode
¶ Collect the first
n
local maxima and minima of the training signal which are separated by a minimum gapd
.This is an analysis node, i.e. the data is analyzed during training and the results are stored internally. Use the
get_maxima
andget_minima
methods to access them.Full API documentation: HitParadeNode

class
mdp.nodes.
NoiseNode
¶ Inject multiplicative or additive noise into the input data.
Original code contributed by Mathias Franzius.
Full API documentation: NoiseNode

class
mdp.nodes.
NormalNoiseNode
¶ Special version of
NoiseNode
for Gaussian additive noise.Unlike
NoiseNode
it does not store a noise function reference but simply usesnumx_rand.normal
.Full API documentation: NormalNoiseNode

class
mdp.nodes.
TimeFramesNode
¶ Copy delayed version of the input signal on the space dimensions.
For example, for
time_frames=3
andgap=2
:[ X(1) Y(1) [ X(1) Y(1) X(3) Y(3) X(5) Y(5) X(2) Y(2) X(2) Y(2) X(4) Y(4) X(6) Y(6) X(3) Y(3) > X(3) Y(3) X(5) Y(5) X(7) Y(7) X(4) Y(4) X(4) Y(4) X(6) Y(6) X(8) Y(8) X(5) Y(5) ... ... ... ... ... ... ] X(6) Y(6) X(7) Y(7) X(8) Y(8) ... ... ]
It is not always possible to invert this transformation (the transformation is not surjective. However, the
pseudo_inverse
method does the correct thing when it is indeed possible.Full API documentation: TimeFramesNode

class
mdp.nodes.
TimeDelayNode
¶ Copy delayed version of the input signal on the space dimensions.
For example, for
time_frames=3
andgap=2
:[ X(1) Y(1) [ X(1) Y(1) 0 0 0 0 X(2) Y(2) X(2) Y(2) 0 0 0 0 X(3) Y(3) > X(3) Y(3) X(1) Y(1) 0 0 X(4) Y(4) X(4) Y(4) X(2) Y(2) 0 0 X(5) Y(5) X(5) Y(5) X(3) Y(3) X(1) Y(1) X(6) Y(6) ... ... ... ... ... ... ] X(7) Y(7) X(8) Y(8) ... ... ]
This node provides similar functionality as the
TimeFramesNode
, only that it performs a time embedding into the past rather than into the future.See
TimeDelaySlidingWindowNode
for a sliding window delay node for application in a nonbatch manner.Original code contributed by Sebastian Hoefer. Dec 31, 2010
Full API documentation: TimeDelayNode

class
mdp.nodes.
TimeDelaySlidingWindowNode
¶ TimeDelaySlidingWindowNode
is an alternative toTimeDelayNode
which should be used for online learning/execution. Whereas theTimeDelayNode
works in a batch manner, for online application a sliding window is necessary which yields only one row per call.Applied to the same data the collection of all returned rows of the
TimeDelaySlidingWindowNode
is equivalent to the result of theTimeDelayNode
.Original code contributed by Sebastian Hoefer. Dec 31, 2010
Full API documentation: TimeDelaySlidingWindowNode

class
mdp.nodes.
CutoffNode
¶ Node to cut off values at specified bounds.
Works similar to
numpy.clip
, but also works when only a lower or upper bound is specified.Full API documentation: CutoffNode

class
mdp.nodes.
AdaptiveCutoffNode
¶ Node which uses the data history during training to learn cutoff values.
As opposed to the simple
CutoffNode
, a different cutoff value is learned for each data coordinate. For example if an upper cutoff fraction of 0.05 is specified, then the upper cutoff bound is set so that the upper 5% of the training data would have been clipped (in each dimension). The cutoff bounds are then applied during execution. This node also works as aHistogramNode
, so the histogram data is stored.When
stop_training
is called the cutoff values for each coordinate are calculated based on the collected histogram data.Full API documentation: AdaptiveCutoffNode

class
mdp.nodes.
HistogramNode
¶ Node which stores a history of the data during its training phase.
The data history is stored in
self.data_hist
and can also be deleted to free memory. Alternatively it can be automatically pickled to disk.Note that data is only stored during training.
Full API documentation: HistogramNode

class
mdp.nodes.
IdentityNode
¶ Execute returns the input data and the node is not trainable.
This node can be instantiated and is for example useful in complex network layouts.
Full API documentation: IdentityNode

class
mdp.nodes.
OnlineCenteringNode
¶ OnlineCenteringNode centers the input data, that is, subtracts the arithmetic mean (average) from the input data. This is an online learnable node.
Note
The node’s train method updates the average (avg) according to the update rule:
avg < (1 / n) * x + (11/n) * avg, where n is the total number of samples observed while training.
The node’s execute method subtracts the updated average from the input and returns it.
This node also supports centering via an exponentially weighted moving average that resembles a leaky integrator:
avg < alpha * x + (1alpha) * avg, where alpha = 2. / (avg_n + 1).
avg_n intuitively denotes a “window size”. For a large avg_n, ‘avg_n’samples represent about 86% of the total weight.
Variables: avg – The updated average of the input data. Full API documentation: OnlineCenteringNode

class
mdp.nodes.
OnlineTimeDiffNode
¶ Compute the discrete time derivative of the input using backward difference approximation:
dx(n) = x(n)  x(n1), where n is the total number of input samples observed during training.
This is an online learnable node that uses a buffer to store the previous input sample = x(n1). The node’s train method updates the buffer. The node’s execute method returns the time difference using the stored buffer as its previous input sample x(n1).
This node supports both “incremental” and “batch” training types.
Example
 If the training and execute methods are called sample by sample incrementally::
 train(x[1]), y[1]=execute(x[1]), train(x[2]), y[2]=execute(x[2]), …,
 then::
 y[1] = x[1] y[2] = x[2]  x[1] y[3] = x[3]  x[2] …
 If training and execute methods are called block by block::
 train([x[1], x[2], x[3]]), [y[3], y[4], y[5]] = execute([x[3], x[4], x[5]])
 then::
 y[3] = x[3]  x[2] y[4] = x[4]  x[3] y[5] = x[5]  x[4]
Note that the stored buffer is still = x[2]. Only train() method changes the state of the node. execute’s input data is always assumed to start at get_current_train_iteration() time step.
Full API documentation: OnlineTimeDiffNode

class
mdp.nodes.
CCIPCANode
¶ CandidCovariance free Incremental Principal Component Analysis (CCIPCA) extracts the principal components from the input data incrementally.
Variables:  v – Eigenvectors
 d – Eigenvalues
Reference
More information about CandidCovariance free Incremental Principal Component Analysis can be found in Weng J., Zhang Y. and Hwang W., Candid covariancefree incremental principal component analysis, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, 1034–1040, 2003.
Full API documentation: CCIPCANode

class
mdp.nodes.
CCIPCAWhiteningNode
¶  Incrementally updates whitening vectors for the input data using CCIPCA.
CandidCovariance free Incremental Principal Component Analysis (CCIPCA) extracts the principal components from the input data incrementally.
Variables:  v – Eigenvectors
 d – Eigenvalues
Reference
More information about CandidCovariance free Incremental Principal Component Analysis can be found in Weng J., Zhang Y. and Hwang W., Candid covariancefree incremental principal component analysis, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, 1034–1040, 2003.
Full API documentation: CCIPCAWhiteningNode

class
mdp.nodes.
MCANode
¶ Minor Component Analysis (MCA) extracts minor components (dual of principal components) from the input data incrementally.
Variables:  v – Eigenvectors
 d – Eigenvalues
Reference
More information about MCA can be found in Peng, D. and Yi, Z, A new algorithm for sequential minor component analysis, International Journal of Computational Intelligence Research, 2(2):207–215, 2006.
Full API documentation: MCANode

class
mdp.nodes.
IncSFANode
¶ Incremental Slow Feature Analysis (IncSFA) extracts the slowly varying components from the input data incrementally.
Variables:  sf – Slow feature vectors
 wv – Whitening vectors
 sf_change – Difference in slow features after update
Reference
More information about IncSFA can be found in Kompella V.R, Luciw M. and Schmidhuber J., Incremental Slow Feature Analysis: Adaptive LowComplexity Slow Feature Updating from HighDimensional Input Streams, Neural Computation, 2012.
Full API documentation: IncSFANode

class
mdp.nodes.
RecursiveExpansionNode
¶ Recursively computable (orthogonal) expansions.
Variables:  lower – The lower bound of the domain on which the recursion function is defined or orthogonal.
 upper – The upper bound of the domain on which the recursion function is defined or orthogonal.
Full API documentation: RecursiveExpansionNode

class
mdp.nodes.
NormalizingRecursiveExpansionNode
¶ Recursively computable (orthogonal) expansions and a trainable transformation to the domain of the expansions.
Variables:  lower – The lower bound of the domain on which the recursion function is defined or orthogonal.
 upper – The upper bound of the domain on which the recursion function is defined or orthogonal.
Full API documentation: NormalizingRecursiveExpansionNode

class
mdp.nodes.
Convolution2DNode
¶ Convolve input data with filter banks.
Convolution can be selected to be executed by linear filtering of the data, or in the frequency domain using a Discrete Fourier Transform.
Input data can be given as 3D data, each row being a 2D array to be convolved with the filters, or as 2D data, in which case the
input_shape
argument must be specified.This node depends on
scipy
.Variables: filters – Specifies a set of 2D filters that are convolved with the input data during execution. Full API documentation: Convolution2DNode

class
mdp.nodes.
SGDRegressorScikitsLearnNode
¶ Linear model fitted by minimizing a regularized empirical loss with SGD This node has been automatically generated by wrapping the
sklearn.linear_model.stochastic_gradient.SGDRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. SGD stands for Stochastic Gradient Descent: the gradient of the loss is estimated each sample at a time and the model is updated along the way with a decreasing strength schedule (aka learning rate).The regularizer is a penalty added to the loss function that shrinks model parameters towards the zero vector using either the squared euclidean norm L2 or the absolute norm L1 or a combination of both (Elastic Net). If the parameter update crosses the 0.0 value because of the regularizer, the update is truncated to 0.0 to allow for learning sparse models and achieve online feature selection.
This implementation works with data represented as dense numpy arrays of floating point values for the features.
Read more in the User Guide.
Parameters
 loss : str, default: ‘squared_loss’
The loss function to be used. The possible values are ‘squared_loss’, ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’
The ‘squared_loss’ refers to the ordinary least squares fit. ‘huber’ modifies ‘squared_loss’ to focus less on getting outliers correct by switching from squared to linear loss past a distance of epsilon. ‘epsilon_insensitive’ ignores errors less than epsilon and is linear past that; this is the loss function used in SVR. ‘squared_epsilon_insensitive’ is the same but becomes squared loss past a tolerance of epsilon.
 penalty : str, ‘none’, ‘l2’, ‘l1’, or ‘elasticnet’
 The penalty (aka regularization term) to be used. Defaults to ‘l2’ which is the standard regularizer for linear SVM models. ‘l1’ and ‘elasticnet’ might bring sparsity to the model (feature selection) not achievable with ‘l2’.
 alpha : float
 Constant that multiplies the regularization term. Defaults to 0.0001 Also used to compute learning_rate when set to ‘optimal’.
 l1_ratio : float
 The Elastic Net mixing parameter, with 0 <= l1_ratio <= 1. l1_ratio=0 corresponds to L2 penalty, l1_ratio=1 to L1. Defaults to 0.15.
 fit_intercept : bool
 Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. Defaults to True.
 max_iter : int, optional
The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the
fit
method, and not the partial_fit. Defaults to 5. Defaults to 1000 from 0.21, or if tol is not None.New in version 0.19.
 tol : float or None, optional
The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss  tol). Defaults to None. Defaults to 1e3 from 0.21.
New in version 0.19.
 shuffle : bool, optional
 Whether or not the training data should be shuffled after each epoch. Defaults to True.
 verbose : integer, optional
 The verbosity level.
 epsilon : float
 Epsilon in the epsiloninsensitive loss functions; only if loss is ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’. For ‘huber’, determines the threshold at which it becomes less important to get the prediction exactly right. For epsiloninsensitive, any differences between the current prediction and the correct label are ignored if they are less than this threshold.
 random_state : int, RandomState instance or None, optional (default=None)
 The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 learning_rate : string, optional
The learning rate schedule:
‘constant’:
 eta = eta0
‘optimal’:
 eta = 1.0 / (alpha * (t + t0))
 where t0 is chosen by a heuristic proposed by Leon Bottou.
 ‘invscaling’: [default]
 eta = eta0 / pow(t, power_t)
‘adaptive’:
 eta = eta0, as long as the training keeps decreasing.
 Each time n_iter_no_change consecutive epochs fail to decrease the
 training loss by tol or fail to increase validation score by tol if
 early_stopping is True, the current learning rate is divided by 5.
 eta0 : double
 The initial learning rate for the ‘constant’, ‘invscaling’ or ‘adaptive’ schedules. The default value is 0.0 as eta0 is not used by the default schedule ‘optimal’.
 power_t : double
 The exponent for inverse scaling learning rate [default 0.5].
 early_stopping : bool, default=False
Whether to use early stopping to terminate training when validation score is not improving. If set to True, it will automatically set aside a fraction of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs.
New in version 0.20.
 validation_fraction : float, default=0.1
The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.
New in version 0.20.
 n_iter_no_change : int, default=5
Number of iterations with no improvement to wait before early stopping.
New in version 0.20.
 warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.
Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled. If a dynamic learning rate is used, the learning rate is adapted depending on the number of samples already seen. Calling
fit
resets this counter, whilepartial_fit
will result in increasing the existing counter. average : bool or int, optional
 When set to True, computes the averaged SGD weights and stores the
result in the
coef_
attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. Soaverage=10
will begin averaging after seeing 10 samples.  n_iter : int, optional
The number of passes over the training data (aka epochs). Defaults to None. Deprecated, will be removed in 0.21.
Changed in version 0.19: Deprecated
Attributes
coef_
: array, shape (n_features,) Weights assigned to the features.
intercept_
: array, shape (1,) The intercept term.
average_coef_
: array, shape (n_features,) Averaged weights assigned to the features.
average_intercept_
: array, shape (1,) The averaged intercept term.
n_iter_
: int The actual number of iterations to reach the stopping criterion.
Examples
>>> import numpy as np >>> from sklearn import linear_model >>> n_samples, n_features = 10, 5 >>> np.random.seed(0) >>> y = np.random.randn(n_samples) >>> X = np.random.randn(n_samples, n_features) >>> clf = linear_model.SGDRegressor(max_iter=1000, tol=1e3) >>> clf.fit(X, y) ... SGDRegressor(alpha=0.0001, average=False, early_stopping=False, epsilon=0.1, eta0=0.01, fit_intercept=True, l1_ratio=0.15, learning_rate='invscaling', loss='squared_loss', max_iter=1000, n_iter=None, n_iter_no_change=5, penalty='l2', power_t=0.25, random_state=None, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False)
See also
Ridge, ElasticNet, Lasso, sklearn.svm.SVR
Full API documentation: SGDRegressorScikitsLearnNode

class
mdp.nodes.
PatchExtractorScikitsLearnNode
¶ Extracts patches from a collection of images This node has been automatically generated by wrapping the
sklearn.feature_extraction.image.PatchExtractor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 patch_size : tuple of ints (patch_height, patch_width)
 the dimensions of one patch
 max_patches : integer or float, optional default is None
 The maximum number of patches per image to extract. If max_patches is a float in (0, 1), it is taken to mean a proportion of the total number of patches.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Full API documentation: PatchExtractorScikitsLearnNode

class
mdp.nodes.
TheilSenRegressorScikitsLearnNode
¶ TheilSen Estimator: robust multivariate regression model. This node has been automatically generated by wrapping the
sklearn.linear_model.theil_sen.TheilSenRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The algorithm calculates least square solutions on subsets with size n_subsamples of the samples in X. Any value of n_subsamples between the number of features and samples leads to an estimator with a compromise between robustness and efficiency. Since the number of least square solutions is “n_samples choose n_subsamples”, it can be extremely large and can therefore be limited with max_subpopulation. If this limit is reached, the subsets are chosen randomly. In a final step, the spatial median (or L1 median) is calculated of all least square solutions.Read more in the User Guide.
Parameters
 fit_intercept : boolean, optional, default True
 Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
 copy_X : boolean, optional, default True
 If True, X will be copied; else, it may be overwritten.
 max_subpopulation : int, optional, default 1e4
 Instead of computing with a set of cardinality ‘n choose k’, where n is the number of samples and k is the number of subsamples (at least number of features), consider only a stochastic subpopulation of a given maximal size if ‘n choose k’ is larger than max_subpopulation. For other than small problem sizes this parameter will determine memory usage and runtime if n_subsamples is not changed.
 n_subsamples : int, optional, default None
 Number of samples to calculate the parameters. This is at least the number of features (plus 1 if fit_intercept=True) and the number of samples as a maximum. A lower number leads to a higher breakdown point and a low efficiency while a high number leads to a low breakdown point and a high efficiency. If None, take the minimum number of subsamples leading to maximal robustness. If n_subsamples is set to n_samples, TheilSen is identical to least squares.
 max_iter : int, optional, default 300
 Maximum number of iterations for the calculation of spatial median.
 tol : float, optional, default 1.e3
 Tolerance when calculating spatial median.
 random_state : int, RandomState instance or None, optional, default None
 A random number generator instance to define the state of the random permutations generator. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 n_jobs : int or None, optional (default=None)
 Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  verbose : boolean, optional, default False
 Verbose mode when fitting the model.
Attributes
coef_
: array, shape = (n_features) Coefficients of the regression model (median of distribution).
intercept_
: float Estimated intercept of regression model.
breakdown_
: float Approximated breakdown point.
n_iter_
: int Number of iterations needed for the spatial median.
n_subpopulation_
: int Number of combinations taken into account from ‘n choose k’, where n is the number of samples and k is the number of subsamples.
Examples
>>> from sklearn.linear_model import TheilSenRegressor >>> from sklearn.datasets import make_regression >>> X, y = make_regression( ... n_samples=200, n_features=2, noise=4.0, random_state=0) >>> reg = TheilSenRegressor(random_state=0).fit(X, y) >>> reg.score(X, y) 0.9884... >>> reg.predict(X[:1,]) array([31.5871...])
References
 TheilSen Estimators in a Multiple Linear Regression Model, 2009 Xin Dang, Hanxiang Peng, Xueqin Wang and Heping Zhang http://home.olemiss.edu/~xdang/papers/MTSE.pdf
Full API documentation: TheilSenRegressorScikitsLearnNode

class
mdp.nodes.
SparseRandomProjectionScikitsLearnNode
¶ Reduce dimensionality through sparse random projection This node has been automatically generated by wrapping the
sklearn.random_projection.SparseRandomProjection
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Sparse random matrix is an alternative to dense random projection matrix that guarantees similar embedding quality while being much more memory efficient and allowing faster computation of the projected data.If we note s = 1 / density the components of the random matrix are drawn from:
 sqrt(s) / sqrt(n_components) with probability 1 / 2s
 0 with probability 1  1 / s
 +sqrt(s) / sqrt(n_components) with probability 1 / 2s
Read more in the User Guide.
Parameters
 n_components : int or ‘auto’, optional (default = ‘auto’)
Dimensionality of the target projection space.
n_components can be automatically adjusted according to the number of samples in the dataset and the bound given by the JohnsonLindenstrauss lemma. In that case the quality of the embedding is controlled by the
eps
parameter.It should be noted that JohnsonLindenstrauss lemma can yield very conservative estimated of the required number of components as it makes no assumption on the structure of the dataset.
 density : float in range ]0, 1], optional (default=’auto’)
Ratio of nonzero component in the random projection matrix.
If density = ‘auto’, the value is set to the minimum density as recommended by Ping Li et al.: 1 / sqrt(n_features).
Use density = 1 / 3.0 if you want to reproduce the results from Achlioptas, 2001.
 eps : strictly positive float, optional, (default=0.1)
Parameter to control the quality of the embedding according to the JohnsonLindenstrauss lemma when n_components is set to ‘auto’.
Smaller values lead to better embedding and higher number of dimensions (n_components) in the target projection space.
 dense_output : boolean, optional (default=False)
If True, ensure that the output of the random projection is a dense numpy array even if the input and random projection matrix are both sparse. In practice, if the number of components is small the number of zero components in the projected data will be very small and it will be more CPU and memory efficient to use a dense representation.
If False, the projected data uses a sparse representation if the input is sparse.
 random_state : int, RandomState instance or None, optional (default=None)
 Control the pseudo random number generator used to generate the matrix at fit time. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Attributes
n_component_
: int Concrete number of components computed when n_components=”auto”.
components_
: CSR matrix with shape [n_components, n_features] Random matrix used for the projection.
density_
: float in range 0.0  1.0 Concrete density computed from when density = “auto”.
Examples
>>> import numpy as np >>> from sklearn.random_projection import SparseRandomProjection >>> np.random.seed(42) >>> X = np.random.rand(100, 10000) >>> transformer = SparseRandomProjection() >>> X_new = transformer.fit_transform(X) >>> X_new.shape (100, 3947) >>> # very few components are nonzero >>> np.mean(transformer.components_ != 0) 0.0100...
See Also
GaussianRandomProjection
References
[1] Ping Li, T. Hastie and K. W. Church, 2006, “Very Sparse Random Projections”. http://web.stanford.edu/~hastie/Papers/Ping/KDD06_rp.pdf [2] D. Achlioptas, 2001, “Databasefriendly random projections”, https://users.soe.ucsc.edu/~optas/papers/jl.pdf Full API documentation: SparseRandomProjectionScikitsLearnNode

class
mdp.nodes.
LinearModelCVScikitsLearnNode
¶ This node has been automatically generated by wrapping the
sklearn.linear_model.coordinate_descent.LinearModelCV
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute.Full API documentation: LinearModelCVScikitsLearnNode

class
mdp.nodes.
DictionaryLearningScikitsLearnNode
¶ Dictionary learning This node has been automatically generated by wrapping the
sklearn.decomposition.dict_learning.DictionaryLearning
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.Solves the optimization problem:
(U^*,V^*) = argmin 0.5  Y  U V _2^2 + alpha *  U _1 (U,V) with  V_k _2 = 1 for all 0 <= k < n_components
Read more in the User Guide.
Parameters
 n_components : int,
 number of dictionary elements to extract
 alpha : float,
 sparsity controlling parameter
 max_iter : int,
 maximum number of iterations to perform
 tol : float,
 tolerance for numerical error
 fit_algorithm : {‘lars’, ‘cd’}
lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
New in version 0.17: cd coordinate descent method to improve speed.
 transform_algorithm : {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}
Algorithm used to transform the data lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection
dictionary * X'
New in version 0.17: lasso_cd coordinate descent method to improve speed.
 transform_n_nonzero_coefs : int,
0.1 * n_features
by default  Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.
 transform_alpha : float, 1. by default
 If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.
 n_jobs : int or None, optional (default=None)
 Number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  code_init : array of shape (n_samples, n_components),
 initial value for the code, for warm restart
 dict_init : array of shape (n_components, n_features),
 initial values for the dictionary, for warm restart
 verbose : bool, optional (default: False)
 To control the verbosity of the procedure.
 split_sign : bool, False by default
 Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 positive_code : bool
Whether to enforce positivity when finding the code.
New in version 0.20.
 positive_dict : bool
Whether to enforce positivity when finding the dictionary
New in version 0.20.
Attributes
components_
: array, [n_components, n_features] dictionary atoms extracted from the data
error_
: array vector of errors at each iteration
n_iter_
: int Number of iterations run.
Notes
References:
J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)
See also
SparseCoder MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA
Full API documentation: DictionaryLearningScikitsLearnNode

class
mdp.nodes.
MinMaxScalerScikitsLearnNode
¶ Transforms features by scaling each feature to a given range. This node has been automatically generated by wrapping the
sklearn.preprocessing.data.MinMaxScaler
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This estimator scales and translates each feature individually such that it is in the given range on the training set, e.g. between zero and one.The transformation is given by:
X_std = (X  X.min(axis=0)) / (X.max(axis=0)  X.min(axis=0)) X_scaled = X_std * (max  min) + min
where min, max = feature_range.
The transformation is calculated as:
X_scaled = scale * X + min  X.min(axis=0) * scale where scale = (max  min) / (X.max(axis=0)  X.min(axis=0))
This transformation is often used as an alternative to zero mean, unit variance scaling.
Read more in the User Guide.
Parameters
 feature_range : tuple (min, max), default=(0, 1)
 Desired range of transformed data.
 copy : boolean, optional, default True
 Set to False to perform inplace row normalization and avoid a copy (if the input is already a numpy array).
Attributes
min_
: ndarray, shape (n_features,) Per feature adjustment for minimum. Equivalent to
min  X.min(axis=0) * self.scale_
scale_
: ndarray, shape (n_features,)Per feature relative scaling of the data. Equivalent to
(max  min) / (X.max(axis=0)  X.min(axis=0))
New in version 0.17: scale_ attribute.
data_min_
: ndarray, shape (n_features,)Per feature minimum seen in the data
New in version 0.17: data_min_
data_max_
: ndarray, shape (n_features,)Per feature maximum seen in the data
New in version 0.17: data_max_
data_range_
: ndarray, shape (n_features,)Per feature range
(data_max_  data_min_)
seen in the dataNew in version 0.17: data_range_
Examples
>>> from sklearn.preprocessing import MinMaxScaler
>>> data = [[1, 2], [0.5, 6], [0, 10], [1, 18]] >>> scaler = MinMaxScaler() >>> print(scaler.fit(data)) MinMaxScaler(copy=True, feature_range=(0, 1)) >>> print(scaler.data_max_) [ 1. 18.] >>> print(scaler.transform(data)) [[0. 0. ] [0.25 0.25] [0.5 0.5 ] [1. 1. ]] >>> print(scaler.transform([[2, 2]])) [[1.5 0. ]]
See also
minmax_scale: Equivalent function without the estimator API.
Notes
NaNs are treated as missing values: disregarded in fit, and maintained in transform.
For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
Full API documentation: MinMaxScalerScikitsLearnNode

class
mdp.nodes.
ElasticNetCVScikitsLearnNode
¶ Elastic Net model with iterative fitting along a regularization path. This node has been automatically generated by wrapping the
sklearn.linear_model.coordinate_descent.ElasticNetCV
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. See glossary entry for crossvalidation estimator.Read more in the User Guide.
Parameters
 l1_ratio : float or array of floats, optional
 float between 0 and 1 passed to ElasticNet (scaling between
l1 and l2 penalties). For
l1_ratio = 0
the penalty is an L2 penalty. Forl1_ratio = 1
it is an L1 penalty. For0 < l1_ratio < 1
, the penalty is a combination of L1 and L2 This parameter can be a list, in which case the different values are tested by crossvalidation and the one giving the best prediction score is used. Note that a good choice of list of values for l1_ratio is often to put more values close to 1 (i.e. Lasso) and less close to 0 (i.e. Ridge), as in[.1, .5, .7, .9, .95, .99, 1]
 eps : float, optional
 Length of the path.
eps=1e3
means thatalpha_min / alpha_max = 1e3
.  n_alphas : int, optional
 Number of alphas along the regularization path, used for each l1_ratio.
 alphas : numpy array, optional
 List of alphas where to compute the models. If None alphas are set automatically
 fit_intercept : boolean
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  precompute : True  False  ‘auto’  arraylike
 Whether to use a precomputed Gram matrix to speed up
calculations. If set to
'auto'
let us decide. The Gram matrix can also be passed as argument.  max_iter : int, optional
 The maximum number of iterations
 tol : float, optional
 The tolerance for the optimization: if the updates are
smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
.  cv : int, crossvalidation generator or an iterable, optional
Determines the crossvalidation splitting strategy. Possible inputs for cv are:
 None, to use the default 3fold crossvalidation,
 integer, to specify the number of folds.
 CV splitter,
 An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs,
KFold
is used.Refer User Guide for the various crossvalidation strategies that can be used here.
Changed in version 0.20:
cv
default value if None will change from 3fold to 5fold in v0.22. copy_X : boolean, optional, default True
 If
True
, X will be copied; else, it may be overwritten.  verbose : bool or integer
 Amount of verbosity.
 n_jobs : int or None, optional (default=None)
 Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  positive : bool, optional
 When set to
True
, forces the coefficients to be positive.  random_state : int, RandomState instance or None, optional, default None
 The seed of the pseudo random number generator that selects a random
feature to update. If int, random_state is the seed used by the random
number generator; If RandomState instance, random_state is the random
number generator; If None, the random number generator is the
RandomState instance used by np.random. Used when
selection
== ‘random’.  selection : str, default ‘cyclic’
 If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e4.
Attributes
alpha_
: float The amount of penalization chosen by cross validation
l1_ratio_
: float The compromise between l1 and l2 penalization chosen by cross validation
coef_
: array, shape (n_features,)  (n_targets, n_features) Parameter vector (w in the cost function formula),
intercept_
: float  array, shape (n_targets, n_features) Independent term in the decision function.
mse_path_
: array, shape (n_l1_ratio, n_alpha, n_folds) Mean square error for the test set on each fold, varying l1_ratio and alpha.
alphas_
: numpy array, shape (n_alphas,) or (n_l1_ratio, n_alphas) The grid of alphas used for fitting, for each l1_ratio.
n_iter_
: int number of iterations run by the coordinate descent solver to reach the specified tolerance for the optimal alpha.
Examples
>>> from sklearn.linear_model import ElasticNetCV >>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=2, random_state=0) >>> regr = ElasticNetCV(cv=5, random_state=0) >>> regr.fit(X, y) ElasticNetCV(alphas=None, copy_X=True, cv=5, eps=0.001, fit_intercept=True, l1_ratio=0.5, max_iter=1000, n_alphas=100, n_jobs=None, normalize=False, positive=False, precompute='auto', random_state=0, selection='cyclic', tol=0.0001, verbose=0) >>> print(regr.alpha_) 0.1994727942696716 >>> print(regr.intercept_) 0.398... >>> print(regr.predict([[0, 0]])) [0.398...]
Notes
For an example, see examples/linear_model/plot_lasso_model_selection.py.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortrancontiguous numpy array.
The parameter l1_ratio corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. More specifically, the optimization objective is:
1 / (2 * n_samples) * y  Xw^2_2 + alpha * l1_ratio * w_1 + 0.5 * alpha * (1  l1_ratio) * w^2_2
If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:
a * L1 + b * L2
for:
alpha = a + b and l1_ratio = a / (a + b).
See also
enet_path ElasticNet
Full API documentation: ElasticNetCVScikitsLearnNode

class
mdp.nodes.
RBFSamplerScikitsLearnNode
¶ Approximates feature map of an RBF kernel by Monte Carlo approximation of its Fourier transform. This node has been automatically generated by wrapping the
sklearn.kernel_approximation.RBFSampler
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. It implements a variant of Random Kitchen Sinks.[1]Read more in the User Guide.
Parameters
 gamma : float
 Parameter of RBF kernel: exp(gamma * x^2)
 n_components : int
 Number of Monte Carlo samples per original feature. Equals the dimensionality of the computed feature space.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Examples
>>> from sklearn.kernel_approximation import RBFSampler >>> from sklearn.linear_model import SGDClassifier >>> X = [[0, 0], [1, 1], [1, 0], [0, 1]] >>> y = [0, 0, 1, 1] >>> rbf_feature = RBFSampler(gamma=1, random_state=1) >>> X_features = rbf_feature.fit_transform(X) >>> clf = SGDClassifier(max_iter=5, tol=1e3) >>> clf.fit(X_features, y) ... SGDClassifier(alpha=0.0001, average=False, class_weight=None, early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=5, n_iter=None, n_iter_no_change=5, n_jobs=None, penalty='l2', power_t=0.5, random_state=None, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False) >>> clf.score(X_features, y) 1.0
Notes
See “Random Features for LargeScale Kernel Machines” by A. Rahimi and Benjamin Recht.
[1] “Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning” by A. Rahimi and Benjamin Recht. (http://people.eecs.berkeley.edu/~brecht/papers/08.rah.rec.nips.pdf)
Full API documentation: RBFSamplerScikitsLearnNode

class
mdp.nodes.
OrthogonalMatchingPursuitCVScikitsLearnNode
¶ Crossvalidated Orthogonal Matching Pursuit model (OMP). This node has been automatically generated by wrapping the
sklearn.linear_model.omp.OrthogonalMatchingPursuitCV
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. See glossary entry for crossvalidation estimator.Read more in the User Guide.
Parameters
 copy : bool, optional
 Whether the design matrix X must be copied by the algorithm. A false value is only helpful if X is already Fortranordered, otherwise a copy is made anyway.
 fit_intercept : boolean, optional
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default True
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  max_iter : integer, optional
 Maximum numbers of iterations to perform, therefore maximum features
to include. 10% of
n_features
but at least 5 if available.  cv : int, crossvalidation generator or an iterable, optional
Determines the crossvalidation splitting strategy. Possible inputs for cv are:
 None, to use the default 3fold crossvalidation,
 integer, to specify the number of folds.
 CV splitter,
 An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs,
KFold
is used.Refer User Guide for the various crossvalidation strategies that can be used here.
Changed in version 0.20:
cv
default value if None will change from 3fold to 5fold in v0.22. n_jobs : int or None, optional (default=None)
 Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  verbose : boolean or integer, optional
 Sets the verbosity amount
Attributes
intercept_
: float or array, shape (n_targets,) Independent term in decision function.
coef_
: array, shape (n_features,) or (n_targets, n_features) Parameter vector (w in the problem formulation).
n_nonzero_coefs_
: int Estimated number of nonzero coefficients giving the best mean squared error over the crossvalidation folds.
n_iter_
: int or arraylike Number of active features across every target for the model refit with the best hyperparameters got by crossvalidating across all folds.
Examples
>>> from sklearn.linear_model import OrthogonalMatchingPursuitCV >>> from sklearn.datasets import make_regression >>> X, y = make_regression(n_features=100, n_informative=10, ... noise=4, random_state=0) >>> reg = OrthogonalMatchingPursuitCV(cv=5).fit(X, y) >>> reg.score(X, y) 0.9991... >>> reg.n_nonzero_coefs_ 10 >>> reg.predict(X[:1,]) array([78.3854...])
See also
orthogonal_mp orthogonal_mp_gram lars_path Lars LassoLars OrthogonalMatchingPursuit LarsCV LassoLarsCV decomposition.sparse_encode
Full API documentation: OrthogonalMatchingPursuitCVScikitsLearnNode

class
mdp.nodes.
SkewedChi2SamplerScikitsLearnNode
¶ Approximates feature map of the “skewed chisquared” kernel by Monte Carlo approximation of its Fourier transform. This node has been automatically generated by wrapping the
sklearn.kernel_approximation.SkewedChi2Sampler
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 skewedness : float
 “skewedness” parameter of the kernel. Needs to be crossvalidated.
 n_components : int
 number of Monte Carlo samples per original feature. Equals the dimensionality of the computed feature space.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Examples
>>> from sklearn.kernel_approximation import SkewedChi2Sampler >>> from sklearn.linear_model import SGDClassifier >>> X = [[0, 0], [1, 1], [1, 0], [0, 1]] >>> y = [0, 0, 1, 1] >>> chi2_feature = SkewedChi2Sampler(skewedness=.01, ... n_components=10, ... random_state=0) >>> X_features = chi2_feature.fit_transform(X, y) >>> clf = SGDClassifier(max_iter=10, tol=1e3) >>> clf.fit(X_features, y) SGDClassifier(alpha=0.0001, average=False, class_weight=None, early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=10, n_iter=None, n_iter_no_change=5, n_jobs=None, penalty='l2', power_t=0.5, random_state=None, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False) >>> clf.score(X_features, y) 1.0
References
See “Random Fourier Approximations for Skewed Multiplicative Histogram Kernels” by Fuxin Li, Catalin Ionescu and Cristian Sminchisescu.
See also
 AdditiveChi2Sampler : A different approach for approximating an additive
 variant of the chi squared kernel.
sklearn.metrics.pairwise.chi2_kernel : The exact chi squared kernel.
Full API documentation: SkewedChi2SamplerScikitsLearnNode

class
mdp.nodes.
RandomTreesEmbeddingScikitsLearnNode
¶ An ensemble of totally random trees. This node has been automatically generated by wrapping the
sklearn.ensemble.forest.RandomTreesEmbedding
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. An unsupervised transformation of a dataset to a highdimensional sparse representation. A datapoint is coded according to which leaf of each tree it is sorted into. Using a onehot encoding of the leaves, this leads to a binary coding with as many ones as there are trees in the forest.The dimensionality of the resulting representation is
n_out <= n_estimators * max_leaf_nodes
. Ifmax_leaf_nodes == None
, the number of leaf nodes is at mostn_estimators * 2 ** max_depth
.Read more in the User Guide.
Parameters
 n_estimators : integer, optional (default=10)
Number of trees in the forest.
Changed in version 0.20: The default value of
n_estimators
will change from 10 in version 0.20 to 100 in version 0.22. max_depth : integer, optional (default=5)
 The maximum depth of each tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split as the minimum number.
 If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) is the minimum number of samples for each split.
Changed in version 0.18: Added float values for fractions.
 min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. If int, then consider min_samples_leaf as the minimum number.
 If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) is the minimum number of samples for each node.
Changed in version 0.18: Added float values for fractions.
 min_weight_fraction_leaf : float, optional (default=0.)
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
 max_leaf_nodes : int or None, optional (default=None)
 Grow trees with
max_leaf_nodes
in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.  min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following:
N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.New in version 0.19.
 min_impurity_split : float, (default=1e7)
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Deprecated since version 0.19:
min_impurity_split
has been deprecated in favor ofmin_impurity_decrease
in 0.19. The default value ofmin_impurity_split
will change from 1e7 to 0 in 0.23 and it will be removed in 0.25. Usemin_impurity_decrease
instead. sparse_output : bool, optional (default=True)
 Whether or not to return a sparse CSR matrix, as default behavior, or to return a dense array compatible with dense pipeline operators.
 n_jobs : int or None, optional (default=None)
 The number of jobs to run in parallel for both fit and predict.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 verbose : int, optional (default=0)
 Controls the verbosity when fitting and predicting.
 warm_start : bool, optional (default=False)
 When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.
Attributes
estimators_
: list of DecisionTreeClassifier The collection of fitted subestimators.
References
[1] P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 342, 2006. [2] Moosmann, F. and Triggs, B. and Jurie, F. “Fast discriminative visual codebooks using randomized clustering forests” NIPS 2007 Full API documentation: RandomTreesEmbeddingScikitsLearnNode

class
mdp.nodes.
PerceptronScikitsLearnNode
¶ Perceptron This node has been automatically generated by wrapping the
sklearn.linear_model.perceptron.Perceptron
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 penalty : None, ‘l2’ or ‘l1’ or ‘elasticnet’
 The penalty (aka regularization term) to be used. Defaults to None.
 alpha : float
 Constant that multiplies the regularization term if regularization is used. Defaults to 0.0001
 fit_intercept : bool
 Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. Defaults to True.
 max_iter : int, optional
The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the
fit
method, and not the partial_fit. Defaults to 5. Defaults to 1000 from 0.21, or if tol is not None.New in version 0.19.
 tol : float or None, optional
The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss  tol). Defaults to None. Defaults to 1e3 from 0.21.
New in version 0.19.
 shuffle : bool, optional, default True
 Whether or not the training data should be shuffled after each epoch.
 verbose : integer, optional
 The verbosity level
 eta0 : double
 Constant by which the updates are multiplied. Defaults to 1.
 n_jobs : int or None, optional (default=None)
 The number of CPUs to use to do the OVA (One Versus All, for
multiclass problems) computation.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  random_state : int, RandomState instance or None, optional, default None
 The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 early_stopping : bool, default=False
Whether to use early stopping to terminate training when validation. score is not improving. If set to True, it will automatically set aside a fraction of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs.
New in version 0.20.
 validation_fraction : float, default=0.1
The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.
New in version 0.20.
 n_iter_no_change : int, default=5
Number of iterations with no improvement to wait before early stopping.
New in version 0.20.
 class_weight : dict, {class_label: weight} or “balanced” or None, optional
Preset for the class_weight fit parameter.
Weights associated with classes. If not given, all classes are supposed to have weight one.
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
 warm_start : bool, optional
 When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.
 n_iter : int, optional
The number of passes over the training data (aka epochs). Defaults to None. Deprecated, will be removed in 0.21.
Changed in version 0.19: Deprecated
Attributes
coef_
: array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features] Weights assigned to the features.
intercept_
: array, shape = [1] if n_classes == 2 else [n_classes] Constants in decision function.
n_iter_
: int The actual number of iterations to reach the stopping criterion. For multiclass fits, it is the maximum over every binary fit.
Notes
Perceptron
is a classification algorithm which shares the same underlying implementation withSGDClassifier
. In fact,Perceptron()
is equivalent to SGDClassifier(loss=”perceptron”, eta0=1, learning_rate=”constant”, penalty=None).Examples
>>> from sklearn.datasets import load_digits >>> from sklearn.linear_model import Perceptron >>> X, y = load_digits(return_X_y=True) >>> clf = Perceptron(tol=1e3, random_state=0) >>> clf.fit(X, y) Perceptron(alpha=0.0001, class_weight=None, early_stopping=False, eta0=1.0, fit_intercept=True, max_iter=None, n_iter=None, n_iter_no_change=5, n_jobs=None, penalty=None, random_state=0, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False) >>> clf.score(X, y) 0.946...
See also
SGDClassifier
References
https://en.wikipedia.org/wiki/Perceptron and references therein.
Full API documentation: PerceptronScikitsLearnNode

class
mdp.nodes.
RidgeClassifierScikitsLearnNode
¶ Classifier using Ridge regression. This node has been automatically generated by wrapping the
sklearn.linear_model.ridge.RidgeClassifier
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 alpha : float
 Regularization strength; must be a positive float. Regularization
improves the conditioning of the problem and reduces the variance of
the estimates. Larger values specify stronger regularization.
Alpha corresponds to
C^1
in other linear models such as LogisticRegression or LinearSVC.  fit_intercept : boolean
 Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  copy_X : boolean, optional, default True
 If True, X will be copied; else, it may be overwritten.
 max_iter : int, optional
 Maximum number of iterations for conjugate gradient solver. The default value is determined by scipy.sparse.linalg.
 tol : float
 Precision of the solution.
 class_weight : dict or ‘balanced’, optional
Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one.The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
 solver : {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}
Solver to use in the computational routines:
‘auto’ chooses the solver automatically based on the type of data.
‘svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.
‘cholesky’ uses the standard scipy.linalg.solve function to obtain a closedform solution.
‘sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for largescale data (possibility to set tol and max_iter).
‘lsqr’ uses the dedicated regularized leastsquares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
‘sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its unbiased and more flexible version named SAGA. Both methods use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
New in version 0.17: Stochastic Average Gradient descent solver.
New in version 0.19: SAGA solver.
 random_state : int, RandomState instance or None, optional, default None
 The seed of the pseudo random number generator to use when shuffling
the data. If int, random_state is the seed used by the random number
generator; If RandomState instance, random_state is the random number
generator; If None, the random number generator is the RandomState
instance used by np.random. Used when
solver
== ‘sag’.
Attributes
coef_
: array, shape (n_features,) or (n_classes, n_features) Weight vector(s).
intercept_
: float  array, shape = (n_targets,) Independent term in decision function. Set to 0.0 if
fit_intercept = False
. n_iter_
: array or None, shape (n_targets,) Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None.
Examples
>>> from sklearn.datasets import load_breast_cancer >>> from sklearn.linear_model import RidgeClassifier >>> X, y = load_breast_cancer(return_X_y=True) >>> clf = RidgeClassifier().fit(X, y) >>> clf.score(X, y) 0.9595...
See also
Ridge : Ridge regression RidgeClassifierCV : Ridge classifier with builtin cross validation
Notes
For multiclass classification, n_class classifiers are trained in a oneversusall approach. Concretely, this is implemented by taking advantage of the multivariate response support in Ridge.
Full API documentation: RidgeClassifierScikitsLearnNode

class
mdp.nodes.
LinearSVRScikitsLearnNode
¶ Linear Support Vector Regression. This node has been automatically generated by wrapping the
sklearn.svm.classes.LinearSVR
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Similar to SVR with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.This class supports both dense and sparse input.
Read more in the User Guide.
Parameters
 epsilon : float, optional (default=0.1)
 Epsilon parameter in the epsiloninsensitive loss function. Note
that the value of this parameter depends on the scale of the target
variable y. If unsure, set
epsilon=0
.  tol : float, optional (default=1e4)
 Tolerance for stopping criteria.
 C : float, optional (default=1.0)
 Penalty parameter C of the error term. The penalty is a squared l2 penalty. The bigger this parameter, the less regularization is used.
 loss : string, optional (default=’epsilon_insensitive’)
 Specifies the loss function. The epsiloninsensitive loss (standard SVR) is the L1 loss, while the squared epsiloninsensitive loss (‘squared_epsilon_insensitive’) is the L2 loss.
 fit_intercept : boolean, optional (default=True)
 Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be already centered).
 intercept_scaling : float, optional (default=1)
 When self.fit_intercept is True, instance vector x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equals to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature weight Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.
 dual : bool, (default=True)
 Select the algorithm to either solve the dual or primal optimization problem. Prefer dual=False when n_samples > n_features.
 verbose : int, (default=0)
 Enable verbose output. Note that this setting takes advantage of a perprocess runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context.
 random_state : int, RandomState instance or None, optional (default=None)
 The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 max_iter : int, (default=1000)
 The maximum number of iterations to be run.
Attributes
coef_
: array, shape = [n_features] if n_classes == 2 else [n_classes, n_features]Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.
coef_ is a readonly property derived from raw_coef_ that follows the internal memory layout of liblinear.
intercept_
: array, shape = [1] if n_classes == 2 else [n_classes] Constants in decision function.
Examples
>>> from sklearn.svm import LinearSVR >>> from sklearn.datasets import make_regression >>> X, y = make_regression(n_features=4, random_state=0) >>> regr = LinearSVR(random_state=0, tol=1e5) >>> regr.fit(X, y) LinearSVR(C=1.0, dual=True, epsilon=0.0, fit_intercept=True, intercept_scaling=1.0, loss='epsilon_insensitive', max_iter=1000, random_state=0, tol=1e05, verbose=0) >>> print(regr.coef_) [16.35... 26.91... 42.30... 60.47...] >>> print(regr.intercept_) [4.29...] >>> print(regr.predict([[0, 0, 0, 0]])) [4.29...]
See also
 LinearSVC
 Implementation of Support Vector Machine classifier using the same library as this class (liblinear).
 SVR
Implementation of Support Vector Machine regression using libsvm:
 the kernel can be nonlinear but its SMO algorithm does not
 scale to large number of samples as LinearSVC does.
 sklearn.linear_model.SGDRegressor
 SGDRegressor can optimize the same cost function as LinearSVR by adjusting the penalty and loss parameters. In addition it requires less memory, allows incremental (online) learning, and implements various loss functions and regularization regimes.
Full API documentation: LinearSVRScikitsLearnNode

class
mdp.nodes.
OrdinalEncoderScikitsLearnNode
¶ Encode categorical features as an integer array. This node has been automatically generated by wrapping the
sklearn.preprocessing._encoders.OrdinalEncoder
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The input to this transformer should be an arraylike of integers or strings, denoting the values taken on by categorical (discrete) features. The features are converted to ordinal integers. This results in a single column of integers (0 to n_categories  1) per feature.Read more in the User Guide.
Parameters
 categories : ‘auto’ or a list of lists/arrays of values.
Categories (unique values) per feature:
 ‘auto’ : Determine categories automatically from the training data.
 list :
categories[i]
holds the categories expected in the ith column. The passed categories should not mix strings and numeric values, and should be sorted in case of numeric values.
The used categories can be found in the
categories_
attribute. dtype : number type, default np.float64
 Desired dtype of output.
Attributes
categories_
: list of arrays The categories of each feature determined during fitting
(in order of the features in X and corresponding with the output
of
transform
).
Examples
Given a dataset with two features, we let the encoder find the unique values per feature and transform the data to an ordinal encoding.
>>> from sklearn.preprocessing import OrdinalEncoder >>> enc = OrdinalEncoder() >>> X = [['Male', 1], ['Female', 3], ['Female', 2]] >>> enc.fit(X) ... OrdinalEncoder(categories='auto', dtype=<... 'numpy.float64'>) >>> enc.categories_ [array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)] >>> enc.transform([['Female', 3], ['Male', 1]]) array([[0., 2.], [1., 0.]])
>>> enc.inverse_transform([[1, 0], [0, 1]]) array([['Male', 1], ['Female', 2]], dtype=object)
See also
 sklearn.preprocessing.OneHotEncoder : performs a onehot encoding of
 categorical features.
 sklearn.preprocessing.LabelEncoder : encodes target labels with values
 between 0 and n_classes1.
Full API documentation: OrdinalEncoderScikitsLearnNode

class
mdp.nodes.
QuadraticDiscriminantAnalysisScikitsLearnNode
¶ Quadratic Discriminant Analysis This node has been automatically generated by wrapping the
sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. A classifier with a quadratic decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.The model fits a Gaussian density to each class.
New in version 0.17: QuadraticDiscriminantAnalysis
Read more in the User Guide.
Parameters
 priors : array, optional, shape = [n_classes]
 Priors on classes
 reg_param : float, optional
 Regularizes the covariance estimate as
(1reg_param)*Sigma + reg_param*np.eye(n_features)
 store_covariance : boolean
If True the covariance matrices are computed and stored in the self.covariance_ attribute.
New in version 0.17.
 tol : float, optional, default 1.0e4
Threshold used for rank estimation.
New in version 0.17.
 store_covariances : boolean
 Deprecated, use store_covariance.
Attributes
covariance_
: list of arraylike, shape = [n_features, n_features] Covariance matrices of each class.
means_
: arraylike, shape = [n_classes, n_features] Class means.
priors_
: arraylike, shape = [n_classes] Class priors (sum to 1).
rotations_
: list of arrays For each class k an array of shape [n_features, n_k], with
n_k = min(n_features, number of elements in class k)
It is the rotation of the Gaussian distribution, i.e. its principal axis. scalings_
: list of arrays For each class k an array of shape [n_k]. It contains the scaling of the Gaussian distributions along its principal axes, i.e. the variance in the rotated coordinate system.
Examples
>>> from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis >>> import numpy as np >>> X = np.array([[1, 1], [2, 1], [3, 2], [1, 1], [2, 1], [3, 2]]) >>> y = np.array([1, 1, 1, 2, 2, 2]) >>> clf = QuadraticDiscriminantAnalysis() >>> clf.fit(X, y) ... QuadraticDiscriminantAnalysis(priors=None, reg_param=0.0, store_covariance=False, store_covariances=None, tol=0.0001) >>> print(clf.predict([[0.8, 1]])) [1]
See also
 sklearn.discriminant_analysis.LinearDiscriminantAnalysis: Linear
 Discriminant Analysis
Full API documentation: QuadraticDiscriminantAnalysisScikitsLearnNode

class
mdp.nodes.
MLPClassifierScikitsLearnNode
¶ Multilayer Perceptron classifier. This node has been automatically generated by wrapping the
sklearn.neural_network.multilayer_perceptron.MLPClassifier
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This model optimizes the logloss function using LBFGS or stochastic gradient descent.New in version 0.18.
Parameters
 hidden_layer_sizes : tuple, length = n_layers  2, default (100,)
 The ith element represents the number of neurons in the ith hidden layer.
 activation : {‘identity’, ‘logistic’, ‘tanh’, ‘relu’}, default ‘relu’
Activation function for the hidden layer.
 ‘identity’, noop activation, useful to implement linear bottleneck, returns f(x) = x
 ‘logistic’, the logistic sigmoid function, returns f(x) = 1 / (1 + exp(x)).
 ‘tanh’, the hyperbolic tan function, returns f(x) = tanh(x).
 ‘relu’, the rectified linear unit function, returns f(x) = max(0, x)
 solver : {‘lbfgs’, ‘sgd’, ‘adam’}, default ‘adam’
The solver for weight optimization.
 ‘lbfgs’ is an optimizer in the family of quasiNewton methods.
 ‘sgd’ refers to stochastic gradient descent.
 ‘adam’ refers to a stochastic gradientbased optimizer proposed by Kingma, Diederik, and Jimmy Ba
Note: The default solver ‘adam’ works pretty well on relatively large datasets (with thousands of training samples or more) in terms of both training time and validation score. For small datasets, however, ‘lbfgs’ can converge faster and perform better.
 alpha : float, optional, default 0.0001
 L2 penalty (regularization term) parameter.
 batch_size : int, optional, default ‘auto’
 Size of minibatches for stochastic optimizers. If the solver is ‘lbfgs’, the classifier will not use minibatch. When set to “auto”, batch_size=min(200, n_samples)
 learning_rate : {‘constant’, ‘invscaling’, ‘adaptive’}, default ‘constant’
Learning rate schedule for weight updates.
 ‘constant’ is a constant learning rate given by ‘learning_rate_init’.
 ‘invscaling’ gradually decreases the learning rate at each time step ‘t’ using an inverse scaling exponent of ‘power_t’. effective_learning_rate = learning_rate_init / pow(t, power_t)
 ‘adaptive’ keeps the learning rate constant to ‘learning_rate_init’ as long as training loss keeps decreasing. Each time two consecutive epochs fail to decrease training loss by at least tol, or fail to increase validation score by at least tol if ‘early_stopping’ is on, the current learning rate is divided by 5.
Only used when
solver='sgd'
. learning_rate_init : double, optional, default 0.001
 The initial learning rate used. It controls the stepsize in updating the weights. Only used when solver=’sgd’ or ‘adam’.
 power_t : double, optional, default 0.5
 The exponent for inverse scaling learning rate. It is used in updating effective learning rate when the learning_rate is set to ‘invscaling’. Only used when solver=’sgd’.
 max_iter : int, optional, default 200
 Maximum number of iterations. The solver iterates until convergence (determined by ‘tol’) or this number of iterations. For stochastic solvers (‘sgd’, ‘adam’), note that this determines the number of epochs (how many times each data point will be used), not the number of gradient steps.
 shuffle : bool, optional, default True
 Whether to shuffle samples in each iteration. Only used when solver=’sgd’ or ‘adam’.
 random_state : int, RandomState instance or None, optional, default None
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 tol : float, optional, default 1e4
 Tolerance for the optimization. When the loss or score is not improving
by at least
tol
forn_iter_no_change
consecutive iterations, unlesslearning_rate
is set to ‘adaptive’, convergence is considered to be reached and training stops.  verbose : bool, optional, default False
 Whether to print progress messages to stdout.
 warm_start : bool, optional, default False
 When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.
 momentum : float, default 0.9
 Momentum for gradient descent update. Should be between 0 and 1. Only used when solver=’sgd’.
 nesterovs_momentum : boolean, default True
 Whether to use Nesterov’s momentum. Only used when solver=’sgd’ and momentum > 0.
 early_stopping : bool, default False
 Whether to use early stopping to terminate training when validation
score is not improving. If set to true, it will automatically set
aside 10% of training data as validation and terminate training when
validation score is not improving by at least tol for
n_iter_no_change
consecutive epochs. Only effective when solver=’sgd’ or ‘adam’  validation_fraction : float, optional, default 0.1
 The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True
 beta_1 : float, optional, default 0.9
 Exponential decay rate for estimates of first moment vector in adam, should be in [0, 1). Only used when solver=’adam’
 beta_2 : float, optional, default 0.999
 Exponential decay rate for estimates of second moment vector in adam, should be in [0, 1). Only used when solver=’adam’
 epsilon : float, optional, default 1e8
 Value for numerical stability in adam. Only used when solver=’adam’
 n_iter_no_change : int, optional, default 10
Maximum number of epochs to not meet
tol
improvement. Only effective when solver=’sgd’ or ‘adam’New in version 0.20.
Attributes
classes_
: array or list of array of shape (n_classes,) Class labels for each output.
loss_
: float The current loss computed with the loss function.
coefs_
: list, length n_layers  1 The ith element in the list represents the weight matrix corresponding to layer i.
intercepts_
: list, length n_layers  1 The ith element in the list represents the bias vector corresponding to layer i + 1.
n_iter_
: int, The number of iterations the solver has ran.
n_layers_
: int Number of layers.
n_outputs_
: int Number of outputs.
out_activation_
: string Name of the output activation function.
Notes
MLPClassifier trains iteratively since at each time step the partial derivatives of the loss function with respect to the model parameters are computed to update the parameters.
It can also have a regularization term added to the loss function that shrinks model parameters to prevent overfitting.
This implementation works with data represented as dense numpy arrays or sparse scipy arrays of floating point values.
References
 Hinton, Geoffrey E.
 “Connectionist learning procedures.” Artificial intelligence 40.1 (1989): 185234.
 Glorot, Xavier, and Yoshua Bengio. “Understanding the difficulty of
 training deep feedforward neural networks.” International Conference on Artificial Intelligence and Statistics. 2010.
 He, Kaiming, et al. “Delving deep into rectifiers: Surpassing humanlevel
 performance on imagenet classification.” arXiv preprint arXiv:1502.01852 (2015).
 Kingma, Diederik, and Jimmy Ba. “Adam: A method for stochastic
 optimization.” arXiv preprint arXiv:1412.6980 (2014).
Full API documentation: MLPClassifierScikitsLearnNode

class
mdp.nodes.
KNeighborsClassifierScikitsLearnNode
¶ Classifier implementing the knearest neighbors vote. This node has been automatically generated by wrapping the
sklearn.neighbors.classification.KNeighborsClassifier
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 n_neighbors : int, optional (default = 5)
 Number of neighbors to use by default for
kneighbors()
queries.  weights : str or callable, optional (default = ‘uniform’)
weight function used in prediction. Possible values:
 ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
 ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
 [callable] : a userdefined function which accepts an array of distances, and returns an array of the same shape containing the weights.
 algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
Algorithm used to compute the nearest neighbors:
 ‘ball_tree’ will use
BallTree
 ‘kd_tree’ will use
KDTree
 ‘brute’ will use a bruteforce search.
 ‘auto’ will attempt to decide the most appropriate algorithm
based on the values passed to
fit()
method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
 ‘ball_tree’ will use
 leaf_size : int, optional (default = 30)
 Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
 p : integer, optional (default = 2)
 Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
 metric : string or callable, default ‘minkowski’
 the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics.
 metric_params : dict, optional (default = None)
 Additional keyword arguments for the metric function.
 n_jobs : int or None, optional (default=None)
 The number of parallel jobs to run for neighbors search.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details. Doesn’t affectfit()
method.
Examples
>>> X = [[0], [1], [2], [3]] >>> y = [0, 0, 1, 1] >>> from sklearn.neighbors import KNeighborsClassifier >>> neigh = KNeighborsClassifier(n_neighbors=3) >>> neigh.fit(X, y) KNeighborsClassifier(...) >>> print(neigh.predict([[1.1]])) [0] >>> print(neigh.predict_proba([[0.9]])) [[0.66666667 0.33333333]]
See also
RadiusNeighborsClassifier KNeighborsRegressor RadiusNeighborsRegressor NearestNeighbors
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of
algorithm
andleaf_size
.Warning
Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.
https://en.wikipedia.org/wiki/Knearest_neighbor_algorithm
Full API documentation: KNeighborsClassifierScikitsLearnNode

class
mdp.nodes.
PowerTransformerScikitsLearnNode
¶ Apply a power transform featurewise to make data more Gaussianlike. This node has been automatically generated by wrapping the
sklearn.preprocessing.data.PowerTransformer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Power transforms are a family of parametric, monotonic transformations that are applied to make data more Gaussianlike. This is useful for modeling issues related to heteroscedasticity (nonconstant variance), or other situations where normality is desired.Currently, PowerTransformer supports the BoxCox transform and the YeoJohnson transform. The optimal parameter for stabilizing variance and minimizing skewness is estimated through maximum likelihood.
BoxCox requires input data to be strictly positive, while YeoJohnson supports both positive or negative data.
By default, zeromean, unitvariance normalization is applied to the transformed data.
Read more in the User Guide.
Parameters
 method : str, (default=’yeojohnson’)
The power transform method. Available methods are:
 standardize : boolean, default=True
 Set to True to apply zeromean, unitvariance normalization to the transformed output.
 copy : boolean, optional, default=True
 Set to False to perform inplace computation during transformation.
Attributes
lambdas_
: array of float, shape (n_features,) The parameters of the power transformation for the selected features.
Examples
>>> import numpy as np >>> from sklearn.preprocessing import PowerTransformer >>> pt = PowerTransformer() >>> data = [[1, 2], [3, 2], [4, 5]] >>> print(pt.fit(data)) PowerTransformer(copy=True, method='yeojohnson', standardize=True) >>> print(pt.lambdas_) [ 1.386... 3.100...] >>> print(pt.transform(data)) [[1.316... 0.707...] [ 0.209... 0.707...] [ 1.106... 1.414...]]
See also
power_transform : Equivalent function without the estimator API.
 QuantileTransformer : Maps data to a standard normal distribution with
 the parameter output_distribution=’normal’.
Notes
NaNs are treated as missing values: disregarded in
fit
, and maintained intransform
.For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
References
[1] I.K. Yeo and R.A. Johnson, “A new family of power transformations to improve normality or symmetry.” Biometrika, 87(4), pp.954959, (2000). [2] G.E.P. Box and D.R. Cox, “An Analysis of Transformations”, Journal of the Royal Statistical Society B, 26, 211252 (1964). Full API documentation: PowerTransformerScikitsLearnNode

class
mdp.nodes.
SparsePCAScikitsLearnNode
¶ Sparse Principal Components Analysis (SparsePCA) This node has been automatically generated by wrapping the
sklearn.decomposition.sparse_pca.SparsePCA
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.Read more in the User Guide.
Parameters
 n_components : int,
 Number of sparse atoms to extract.
 alpha : float,
 Sparsity controlling parameter. Higher values lead to sparser components.
 ridge_alpha : float,
 Amount of ridge shrinkage to apply in order to improve conditioning when calling the transform method.
 max_iter : int,
 Maximum number of iterations to perform.
 tol : float,
 Tolerance for the stopping condition.
 method : {‘lars’, ‘cd’}
 lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
 n_jobs : int or None, optional (default=None)
 Number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  U_init : array of shape (n_samples, n_components),
 Initial values for the loadings for warm restart scenarios.
 V_init : array of shape (n_components, n_features),
 Initial values for the components for warm restart scenarios.
 verbose : int
 Controls the verbosity; the higher, the more messages. Defaults to 0.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 normalize_components : boolean, optional (default=False)
 if False, use a version of Sparse PCA without components normalization and without data centering. This is likely a bug and even though it’s the default for backward compatibility, this should not be used.
 if True, use a version of Sparse PCA with components normalization and data centering.
New in version 0.20.
Deprecated since version 0.22:
normalize_components
was added and set toFalse
for backward compatibility. It would be set toTrue
from 0.22 onwards.
Attributes
components_
: array, [n_components, n_features] Sparse components extracted from the data.
error_
: array Vector of errors at each iteration.
n_iter_
: int Number of iterations run.
mean_
: array, shape (n_features,) Perfeature empirical mean, estimated from the training set.
Equal to
X.mean(axis=0)
.
Examples
>>> import numpy as np >>> from sklearn.datasets import make_friedman1 >>> from sklearn.decomposition import SparsePCA >>> X, _ = make_friedman1(n_samples=200, n_features=30, random_state=0) >>> transformer = SparsePCA(n_components=5, ... normalize_components=True, ... random_state=0) >>> transformer.fit(X) SparsePCA(...) >>> X_transformed = transformer.transform(X) >>> X_transformed.shape (200, 5) >>> # most values in the ``components_`` are zero (sparsity) >>> np.mean(transformer.components_ == 0) 0.9666...
See also
PCA MiniBatchSparsePCA DictionaryLearning
Full API documentation: SparsePCAScikitsLearnNode

class
mdp.nodes.
ExtraTreeRegressorScikitsLearnNode
¶ An extremely randomized tree regressor. This node has been automatically generated by wrapping the
sklearn.tree.tree.ExtraTreeRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Extratrees differ from classic decision trees in the way they are built. When looking for the best split to separate the samples of a node into two groups, random splits are drawn for each of the max_features randomly selected features and the best split among those is chosen. When max_features is set 1, this amounts to building a totally random decision tree.Warning: Extratrees should only be used within ensemble methods.
Read more in the User Guide.
Parameters
 criterion : string, optional (default=”mse”)
The function to measure the quality of a split. Supported criteria are “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion, and “mae” for the mean absolute error.
New in version 0.18: Mean Absolute Error (MAE) criterion.
 splitter : string, optional (default=”random”)
 The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.
 max_depth : int or None, optional (default=None)
 The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split as the minimum number.
 If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Changed in version 0.18: Added float values for fractions.
 min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. If int, then consider min_samples_leaf as the minimum number.
 If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
Changed in version 0.18: Added float values for fractions.
 min_weight_fraction_leaf : float, optional (default=0.)
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
 max_features : int, float, string or None, optional (default=”auto”)
The number of features to consider when looking for the best split:
 If int, then consider max_features features at each split.
 If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
 If “auto”, then max_features=n_features.
 If “sqrt”, then max_features=sqrt(n_features).
 If “log2”, then max_features=log2(n_features).
 If None, then max_features=n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_features
features. random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following:
N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.New in version 0.19.
 min_impurity_split : float, (default=1e7)
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Deprecated since version 0.19:
min_impurity_split
has been deprecated in favor ofmin_impurity_decrease
in 0.19. The default value ofmin_impurity_split
will change from 1e7 to 0 in 0.23 and it will be removed in 0.25. Usemin_impurity_decrease
instead. max_leaf_nodes : int or None, optional (default=None)
 Grow a tree with
max_leaf_nodes
in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.
See also
ExtraTreeClassifier, sklearn.ensemble.ExtraTreesClassifier, sklearn.ensemble.ExtraTreesRegressor
Notes
The default values for the parameters controlling the size of the trees (e.g.
max_depth
,min_samples_leaf
, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.References
[1] P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 342, 2006. Full API documentation: ExtraTreeRegressorScikitsLearnNode

class
mdp.nodes.
ExtraTreesClassifierScikitsLearnNode
¶ An extratrees classifier. This node has been automatically generated by wrapping the
sklearn.ensemble.forest.ExtraTreesClassifier
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extratrees) on various subsamples of the dataset and uses averaging to improve the predictive accuracy and control overfitting.Read more in the User Guide.
Parameters
 n_estimators : integer, optional (default=10)
The number of trees in the forest.
Changed in version 0.20: The default value of
n_estimators
will change from 10 in version 0.20 to 100 in version 0.22. criterion : string, optional (default=”gini”)
 The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.
 max_depth : integer or None, optional (default=None)
 The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split as the minimum number.
 If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Changed in version 0.18: Added float values for fractions.
 min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. If int, then consider min_samples_leaf as the minimum number.
 If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
Changed in version 0.18: Added float values for fractions.
 min_weight_fraction_leaf : float, optional (default=0.)
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
 max_features : int, float, string or None, optional (default=”auto”)
The number of features to consider when looking for the best split:
 If int, then consider max_features features at each split.
 If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
 If “auto”, then max_features=sqrt(n_features).
 If “sqrt”, then max_features=sqrt(n_features).
 If “log2”, then max_features=log2(n_features).
 If None, then max_features=n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_features
features. max_leaf_nodes : int or None, optional (default=None)
 Grow trees with
max_leaf_nodes
in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.  min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following:
N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.New in version 0.19.
 min_impurity_split : float, (default=1e7)
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Deprecated since version 0.19:
min_impurity_split
has been deprecated in favor ofmin_impurity_decrease
in 0.19. The default value ofmin_impurity_split
will change from 1e7 to 0 in 0.23 and it will be removed in 0.25. Usemin_impurity_decrease
instead. bootstrap : boolean, optional (default=False)
 Whether bootstrap samples are used when building trees. If False, the whole datset is used to build each tree.
 oob_score : bool, optional (default=False)
 Whether to use outofbag samples to estimate the generalization accuracy.
 n_jobs : int or None, optional (default=None)
 The number of jobs to run in parallel for both fit and predict.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 verbose : int, optional (default=0)
 Controls the verbosity when fitting and predicting.
 warm_start : bool, optional (default=False)
 When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.  class_weight : dict, list of dicts, “balanced”, “balanced_subsample” or None, optional (default=None)
Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one. For multioutput problems, a list of dicts can be provided in the same order as the columns of y.Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for fourclass multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.
For multioutput, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
Attributes
estimators_
: list of DecisionTreeClassifier The collection of fitted subestimators.
classes_
: array of shape = [n_classes] or a list of such arrays The classes labels (single output problem), or a list of arrays of class labels (multioutput problem).
n_classes_
: int or list The number of classes (single output problem), or a list containing the number of classes for each output (multioutput problem).
feature_importances_
: array of shape = [n_features] The feature importances (the higher, the more important the feature).
n_features_
: int The number of features when
fit
is performed. n_outputs_
: int The number of outputs when
fit
is performed. oob_score_
: float Score of the training dataset obtained using an outofbag estimate.
oob_decision_function_
: array of shape = [n_samples, n_classes] Decision function computed with outofbag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_decision_function_ might contain NaN.
Notes
The default values for the parameters controlling the size of the trees (e.g.
max_depth
,min_samples_leaf
, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.References
[1] P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 342, 2006. See also
sklearn.tree.ExtraTreeClassifier : Base classifier for this ensemble. RandomForestClassifier : Ensemble Classifier based on trees with optimal
splits.Full API documentation: ExtraTreesClassifierScikitsLearnNode

class
mdp.nodes.
GridSearchCVScikitsLearnNode
¶ Exhaustive search over specified parameter values for an estimator. This node has been automatically generated by wrapping the
sklearn.model_selection._search.GridSearchCV
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Important members are fit, predict.GridSearchCV implements a “fit” and a “score” method. It also implements “predict”, “predict_proba”, “decision_function”, “transform” and “inverse_transform” if they are implemented in the estimator used.
The parameters of the estimator used to apply these methods are optimized by crossvalidated gridsearch over a parameter grid.
Read more in the User Guide.
Parameters
 estimator : estimator object.
 This is assumed to implement the scikitlearn estimator interface.
Either estimator needs to provide a
score
function, orscoring
must be passed.  param_grid : dict or list of dictionaries
 Dictionary with parameters names (string) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored. This enables searching over any sequence of parameter settings.
 scoring : string, callable, list/tuple, dict or None, default: None
A single string (see scoring_parameter) or a callable (see scoring) to evaluate the predictions on the test set.
For evaluating multiple metrics, either give a list of (unique) strings or a dict with names as keys and callables as values.
NOTE that when using custom scorers, each scorer should return a single value. Metric functions returning a list/array of values can be wrapped into multiple scorers that return one value each.
See multimetric_grid_search for an example.
If None, the estimator’s default scorer (if available) is used.
 fit_params : dict, optional
Parameters to pass to the fit method.
Deprecated since version 0.19:
fit_params
as a constructor argument was deprecated in version 0.19 and will be removed in version 0.21. Pass fit parameters to thefit
method instead. n_jobs : int or None, optional (default=None)
 Number of jobs to run in parallel.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  pre_dispatch : int, or string, optional
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
 None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fastrunning jobs, to avoid delays due to ondemand spawning of the jobs
 An int, giving the exact number of total jobs that are spawned
 A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
 iid : boolean, default=’warn’
If True, return the average score across folds, weighted by the number of samples in each test set. In this case, the data is assumed to be identically distributed across the folds, and the loss minimized is the total loss per sample, and not the mean loss across the folds. If False, return the average score across folds. Default is True, but will change to False in version 0.21, to correspond to the standard definition of crossvalidation.
Changed in version 0.20: Parameter
iid
will change from True to False by default in version 0.22, and will be removed in 0.24. cv : int, crossvalidation generator or an iterable, optional
Determines the crossvalidation splitting strategy. Possible inputs for cv are:
 None, to use the default 3fold cross validation,
 integer, to specify the number of folds in a (Stratified)KFold,
 CV splitter,
 An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if the estimator is a classifier and
y
is either binary or multiclass,StratifiedKFold
is used. In all other cases,KFold
is used.Refer User Guide for the various crossvalidation strategies that can be used here.
Changed in version 0.20:
cv
default value if None will change from 3fold to 5fold in v0.22. refit : boolean, or string, default=True
Refit an estimator using the best found parameters on the whole dataset.
For multiple metric evaluation, this needs to be a string denoting the scorer is used to find the best parameters for refitting the estimator at the end.
The refitted estimator is made available at the
best_estimator_
attribute and permits usingpredict
directly on thisGridSearchCV
instance.Also for multiple metric evaluation, the attributes
best_index_
,best_score_
andbest_params_
will only be available ifrefit
is set and all of them will be determined w.r.t this specific scorer.See
scoring
parameter to know more about multiple metric evaluation. verbose : integer
 Controls the verbosity: the higher, the more messages.
 error_score : ‘raise’ or numeric
 Value to assign to the score if an error occurs in estimator fitting. If set to ‘raise’, the error is raised. If a numeric value is given, FitFailedWarning is raised. This parameter does not affect the refit step, which will always raise the error. Default is ‘raise’ but from version 0.22 it will change to np.nan.
 return_train_score : boolean, optional
If
False
, thecv_results_
attribute will not include training scores.Current default is
'warn'
, which behaves asTrue
in addition to raising a warning when a training score is looked up. That default will be changed toFalse
in 0.21. Computing training scores is used to get insights on how different parameter settings impact the overfitting/underfitting tradeoff. However computing the scores on the training set can be computationally expensive and is not strictly required to select the parameters that yield the best generalization performance.
Examples
>>> from sklearn import svm, datasets >>> from sklearn.model_selection import GridSearchCV >>> iris = datasets.load_iris() >>> parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]} >>> svc = svm.SVC(gamma="scale") >>> clf = GridSearchCV(svc, parameters, cv=5) >>> clf.fit(iris.data, iris.target) ... GridSearchCV(cv=5, error_score=..., estimator=SVC(C=1.0, cache_size=..., class_weight=..., coef0=..., decision_function_shape='ovr', degree=..., gamma=..., kernel='rbf', max_iter=1, probability=False, random_state=None, shrinking=True, tol=..., verbose=False), fit_params=None, iid=..., n_jobs=None, param_grid=..., pre_dispatch=..., refit=..., return_train_score=..., scoring=..., verbose=...) >>> sorted(clf.cv_results_.keys()) ... ['mean_fit_time', 'mean_score_time', 'mean_test_score',... 'mean_train_score', 'param_C', 'param_kernel', 'params',... 'rank_test_score', 'split0_test_score',... 'split0_train_score', 'split1_test_score', 'split1_train_score',... 'split2_test_score', 'split2_train_score',... 'std_fit_time', 'std_score_time', 'std_test_score', 'std_train_score'...]
Attributes
cv_results_
: dict of numpy (masked) ndarraysA dict with keys as column headers and values as columns, that can be imported into a pandas
DataFrame
.For instance the below given table
param_kernel param_gamma param_degree split0_test_score … rank_t… ‘poly’ – 2 0.80 … 2 ‘poly’ – 3 0.70 … 4 ‘rbf’ 0.1 – 0.80 … 3 ‘rbf’ 0.2 – 0.93 … 1 will be represented by a
cv_results_
dict of:{ 'param_kernel': masked_array(data = ['poly', 'poly', 'rbf', 'rbf'], mask = [False False False False]...) 'param_gamma': masked_array(data = [  0.1 0.2], mask = [ True True False False]...), 'param_degree': masked_array(data = [2.0 3.0  ], mask = [False False True True]...), 'split0_test_score' : [0.80, 0.70, 0.80, 0.93], 'split1_test_score' : [0.82, 0.50, 0.70, 0.78], 'mean_test_score' : [0.81, 0.60, 0.75, 0.85], 'std_test_score' : [0.01, 0.10, 0.05, 0.08], 'rank_test_score' : [2, 4, 3, 1], 'split0_train_score' : [0.80, 0.92, 0.70, 0.93], 'split1_train_score' : [0.82, 0.55, 0.70, 0.87], 'mean_train_score' : [0.81, 0.74, 0.70, 0.90], 'std_train_score' : [0.01, 0.19, 0.00, 0.03], 'mean_fit_time' : [0.73, 0.63, 0.43, 0.49], 'std_fit_time' : [0.01, 0.02, 0.01, 0.01], 'mean_score_time' : [0.01, 0.06, 0.04, 0.04], 'std_score_time' : [0.00, 0.00, 0.00, 0.01], 'params' : [{'kernel': 'poly', 'degree': 2}, ...], }
NOTE
The key
'params'
is used to store a list of parameter settings dicts for all the parameter candidates.The
mean_fit_time
,std_fit_time
,mean_score_time
andstd_score_time
are all in seconds.For multimetric evaluation, the scores for all the scorers are available in the
cv_results_
dict at the keys ending with that scorer’s name ('_<scorer_name>'
) instead of'_score'
shown above. (‘split0_test_precision’, ‘mean_train_precision’ etc.)best_estimator_
: estimator or dictEstimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data. Not available if
refit=False
.See
refit
parameter for more information on allowed values.best_score_
: floatMean crossvalidated score of the best_estimator
For multimetric evaluation, this is present only if
refit
is specified.best_params_
: dictParameter setting that gave the best results on the hold out data.
For multimetric evaluation, this is present only if
refit
is specified.best_index_
: intThe index (of the
cv_results_
arrays) which corresponds to the best candidate parameter setting.The dict at
search.cv_results_['params'][search.best_index_]
gives the parameter setting for the best model, that gives the highest mean score (search.best_score_
).For multimetric evaluation, this is present only if
refit
is specified.scorer_
: function or a dictScorer function used on the held out data to choose the best parameters for the model.
For multimetric evaluation, this attribute holds the validated
scoring
dict which maps the scorer key to the scorer callable.n_splits_
: int The number of crossvalidation splits (folds/iterations).
refit_time_
: floatSeconds used for refitting the best model on the whole dataset.
This is present only if
refit
is not False.
Notes
The parameters selected are those that maximize the score of the left out data, unless an explicit score is passed in which case it is used instead.
If n_jobs was set to a value higher than one, the data is copied for each point in the grid (and not n_jobs times). This is done for efficiency reasons if individual jobs take very little time, but may raise errors if the dataset is large and not enough memory is available. A workaround in this case is to set pre_dispatch. Then, the memory is copied only pre_dispatch many times. A reasonable value for pre_dispatch is 2 * n_jobs.
See Also
ParameterGrid
: generates all the combinations of a hyperparameter grid.
sklearn.model_selection.train_test_split()
: utility function to split the data into a development set usable
 for fitting a GridSearchCV instance and an evaluation set for
 its final evaluation.
sklearn.metrics.make_scorer()
: Make a scorer from a performance metric or loss function.
Full API documentation: GridSearchCVScikitsLearnNode

class
mdp.nodes.
LassoCVScikitsLearnNode
¶ Lasso linear model with iterative fitting along a regularization path. This node has been automatically generated by wrapping the
sklearn.linear_model.coordinate_descent.LassoCV
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. See glossary entry for crossvalidation estimator.The best model is selected by crossvalidation.
The optimization objective for Lasso is:
(1 / (2 * n_samples)) * y  Xw^2_2 + alpha * w_1
Read more in the User Guide.
Parameters
 eps : float, optional
 Length of the path.
eps=1e3
means thatalpha_min / alpha_max = 1e3
.  n_alphas : int, optional
 Number of alphas along the regularization path
 alphas : numpy array, optional
 List of alphas where to compute the models.
If
None
alphas are set automatically  fit_intercept : boolean, default True
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  precompute : True  False  ‘auto’  arraylike
 Whether to use a precomputed Gram matrix to speed up
calculations. If set to
'auto'
let us decide. The Gram matrix can also be passed as argument.  max_iter : int, optional
 The maximum number of iterations
 tol : float, optional
 The tolerance for the optimization: if the updates are
smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
.  copy_X : boolean, optional, default True
 If
True
, X will be copied; else, it may be overwritten.  cv : int, crossvalidation generator or an iterable, optional
Determines the crossvalidation splitting strategy. Possible inputs for cv are:
 None, to use the default 3fold crossvalidation,
 integer, to specify the number of folds.
 CV splitter,
 An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs,
KFold
is used.Refer User Guide for the various crossvalidation strategies that can be used here.
Changed in version 0.20:
cv
default value if None will change from 3fold to 5fold in v0.22. verbose : bool or integer
 Amount of verbosity.
 n_jobs : int or None, optional (default=None)
 Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  positive : bool, optional
 If positive, restrict regression coefficients to be positive
 random_state : int, RandomState instance or None, optional, default None
 The seed of the pseudo random number generator that selects a random
feature to update. If int, random_state is the seed used by the random
number generator; If RandomState instance, random_state is the random
number generator; If None, the random number generator is the
RandomState instance used by np.random. Used when
selection
== ‘random’.  selection : str, default ‘cyclic’
 If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e4.
Attributes
alpha_
: float The amount of penalization chosen by cross validation
coef_
: array, shape (n_features,)  (n_targets, n_features) parameter vector (w in the cost function formula)
intercept_
: float  array, shape (n_targets,) independent term in decision function.
mse_path_
: array, shape (n_alphas, n_folds) mean square error for the test set on each fold, varying alpha
alphas_
: numpy array, shape (n_alphas,) The grid of alphas used for fitting
dual_gap_
: ndarray, shape () The dual gap at the end of the optimization for the optimal alpha
(
alpha_
). n_iter_
: int number of iterations run by the coordinate descent solver to reach the specified tolerance for the optimal alpha.
Examples
>>> from sklearn.linear_model import LassoCV >>> from sklearn.datasets import make_regression >>> X, y = make_regression(noise=4, random_state=0) >>> reg = LassoCV(cv=5, random_state=0).fit(X, y) >>> reg.score(X, y) 0.9993... >>> reg.predict(X[:1,]) array([78.4951...])
Notes
For an example, see examples/linear_model/plot_lasso_model_selection.py.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortrancontiguous numpy array.
See also
lars_path lasso_path LassoLars Lasso LassoLarsCV
Full API documentation: LassoCVScikitsLearnNode

class
mdp.nodes.
OneClassSVMScikitsLearnNode
¶ Unsupervised Outlier Detection. This node has been automatically generated by wrapping the
sklearn.svm.classes.OneClassSVM
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Estimate the support of a highdimensional distribution.The implementation is based on libsvm.
Read more in the User Guide.
Parameters
 kernel : string, optional (default=’rbf’)
 Specifies the kernel type to be used in the algorithm. It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or a callable. If none is given, ‘rbf’ will be used. If a callable is given it is used to precompute the kernel matrix.
 degree : int, optional (default=3)
 Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.
 gamma : float, optional (default=’auto’)
Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
Current default is ‘auto’ which uses 1 / n_features, if
gamma='scale'
is passed then it uses 1 / (n_features * X.var()) as value of gamma. The current default of gamma, ‘auto’, will change to ‘scale’ in version 0.22. ‘auto_deprecated’, a deprecated version of ‘auto’ is used as a default indicating that no explicit value of gamma was passed. coef0 : float, optional (default=0.0)
 Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.
 tol : float, optional
 Tolerance for stopping criterion.
 nu : float, optional
 An upper bound on the fraction of training errors and a lower bound of the fraction of support vectors. Should be in the interval (0, 1]. By default 0.5 will be taken.
 shrinking : boolean, optional
 Whether to use the shrinking heuristic.
 cache_size : float, optional
 Specify the size of the kernel cache (in MB).
 verbose : bool, default: False
 Enable verbose output. Note that this setting takes advantage of a perprocess runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.
 max_iter : int, optional (default=1)
 Hard limit on iterations within solver, or 1 for no limit.
 random_state : int, RandomState instance or None, optional (default=None)
Ignored.
Deprecated since version 0.20:
random_state
has been deprecated in 0.20 and will be removed in 0.22.
Attributes
support_
: arraylike, shape = [n_SV] Indices of support vectors.
support_vectors_
: arraylike, shape = [nSV, n_features] Support vectors.
dual_coef_
: array, shape = [1, n_SV] Coefficients of the support vectors in the decision function.
coef_
: array, shape = [1, n_features]Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.
coef_ is readonly property derived from dual_coef_ and support_vectors_
intercept_
: array, shape = [1,] Constant in the decision function.
offset_
: float Offset used to define the decision function from the raw scores. We have the relation: decision_function = score_samples  offset_. The offset is the opposite of intercept_ and is provided for consistency with other outlier detection algorithms.
Full API documentation: OneClassSVMScikitsLearnNode

class
mdp.nodes.
RidgeCVScikitsLearnNode
¶ Ridge regression with builtin crossvalidation. This node has been automatically generated by wrapping the
sklearn.linear_model.ridge.RidgeCV
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. See glossary entry for crossvalidation estimator.By default, it performs Generalized CrossValidation, which is a form of efficient LeaveOneOut crossvalidation.
Read more in the User Guide.
Parameters
 alphas : numpy array of shape [n_alphas]
 Array of alpha values to try.
Regularization strength; must be a positive float. Regularization
improves the conditioning of the problem and reduces the variance of
the estimates. Larger values specify stronger regularization.
Alpha corresponds to
C^1
in other linear models such as LogisticRegression or LinearSVC.  fit_intercept : boolean
 Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  scoring : string, callable or None, optional, default: None
 A string (see model evaluation documentation) or
a scorer callable object / function with signature
scorer(estimator, X, y)
.  cv : int, crossvalidation generator or an iterable, optional
Determines the crossvalidation splitting strategy. Possible inputs for cv are:
 None, to use the efficient LeaveOneOut crossvalidation
 integer, to specify the number of folds.
 CV splitter,
 An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs, if
y
is binary or multiclass,sklearn.model_selection.StratifiedKFold
is used, else,sklearn.model_selection.KFold
is used.Refer User Guide for the various crossvalidation strategies that can be used here.
 gcv_mode : {None, ‘auto’, ‘svd’, eigen’}, optional
Flag indicating which strategy to use when performing Generalized CrossValidation. Options are:
'auto' : use svd if n_samples > n_features or when X is a sparse matrix, otherwise use eigen 'svd' : force computation via singular value decomposition of X (does not work for sparse matrices) 'eigen' : force computation via eigendecomposition of X^T X
The ‘auto’ mode is the default and is intended to pick the cheaper option of the two depending upon the shape and format of the training data.
 store_cv_values : boolean, default=False
 Flag indicating if the crossvalidation values corresponding to
each alpha should be stored in the
cv_values_
attribute (see below). This flag is only compatible withcv=None
(i.e. using Generalized CrossValidation).
Attributes
cv_values_
: array, shape = [n_samples, n_alphas] or shape = [n_samples, n_targets, n_alphas], optional Crossvalidation values for each alpha (if
store_cv_values=True
andcv=None
). Afterfit()
has been called, this attribute will contain the mean squared errors (by default) or the values of the{loss,score}_func
function (if provided in the constructor). coef_
: array, shape = [n_features] or [n_targets, n_features] Weight vector(s).
intercept_
: float  array, shape = (n_targets,) Independent term in decision function. Set to 0.0 if
fit_intercept = False
. alpha_
: float Estimated regularization parameter.
Examples
>>> from sklearn.datasets import load_diabetes >>> from sklearn.linear_model import RidgeCV >>> X, y = load_diabetes(return_X_y=True) >>> clf = RidgeCV(alphas=[1e3, 1e2, 1e1, 1]).fit(X, y) >>> clf.score(X, y) 0.5166...
See also
Ridge : Ridge regression RidgeClassifier : Ridge classifier RidgeClassifierCV : Ridge classifier with builtin cross validation
Full API documentation: RidgeCVScikitsLearnNode

class
mdp.nodes.
LinearDiscriminantAnalysisScikitsLearnNode
¶ Linear Discriminant Analysis This node has been automatically generated by wrapping the
sklearn.discriminant_analysis.LinearDiscriminantAnalysis
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. A classifier with a linear decision boundary, generated by fitting class conditional densities to the data and using Bayes’ rule.The model fits a Gaussian density to each class, assuming that all classes share the same covariance matrix.
The fitted model can also be used to reduce the dimensionality of the input by projecting it to the most discriminative directions.
New in version 0.17: LinearDiscriminantAnalysis.
Read more in the User Guide.
Parameters
 solver : string, optional
Solver to use, possible values:
 ‘svd’: Singular value decomposition (default).
 Does not compute the covariance matrix, therefore this solver is
 recommended for data with a large number of features.
 ‘lsqr’: Least squares solution, can be combined with shrinkage.
 ‘eigen’: Eigenvalue decomposition, can be combined with shrinkage.
 shrinkage : string or float, optional
Shrinkage parameter, possible values:
 None: no shrinkage (default).
 ‘auto’: automatic shrinkage using the LedoitWolf lemma.
 float between 0 and 1: fixed shrinkage parameter.
Note that shrinkage works only with ‘lsqr’ and ‘eigen’ solvers.
 priors : array, optional, shape (n_classes,)
 Class priors.
 n_components : int, optional
 Number of components (< n_classes  1) for dimensionality reduction.
 store_covariance : bool, optional
Additionally compute class covariance matrix (default False), used only in ‘svd’ solver.
New in version 0.17.
 tol : float, optional, (default 1.0e4)
Threshold used for rank estimation in SVD solver.
New in version 0.17.
Attributes
coef_
: array, shape (n_features,) or (n_classes, n_features) Weight vector(s).
intercept_
: array, shape (n_features,) Intercept term.
covariance_
: arraylike, shape (n_features, n_features) Covariance matrix (shared by all classes).
explained_variance_ratio_
: array, shape (n_components,) Percentage of variance explained by each of the selected components.
If
n_components
is not set then all components are stored and the sum of explained variances is equal to 1.0. Only available when eigen or svd solver is used. means_
: arraylike, shape (n_classes, n_features) Class means.
priors_
: arraylike, shape (n_classes,) Class priors (sum to 1).
scalings_
: arraylike, shape (rank, n_classes  1) Scaling of the features in the space spanned by the class centroids.
xbar_
: arraylike, shape (n_features,) Overall mean.
classes_
: arraylike, shape (n_classes,) Unique class labels.
See also
 sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis: Quadratic
 Discriminant Analysis
Notes
The default solver is ‘svd’. It can perform both classification and transform, and it does not rely on the calculation of the covariance matrix. This can be an advantage in situations where the number of features is large. However, the ‘svd’ solver cannot be used with shrinkage.
The ‘lsqr’ solver is an efficient algorithm that only works for classification. It supports shrinkage.
The ‘eigen’ solver is based on the optimization of the between class scatter to within class scatter ratio. It can be used for both classification and transform, and it supports shrinkage. However, the ‘eigen’ solver needs to compute the covariance matrix, so it might not be suitable for situations with a high number of features.
Examples
>>> import numpy as np >>> from sklearn.discriminant_analysis import LinearDiscriminantAnalysis >>> X = np.array([[1, 1], [2, 1], [3, 2], [1, 1], [2, 1], [3, 2]]) >>> y = np.array([1, 1, 1, 2, 2, 2]) >>> clf = LinearDiscriminantAnalysis() >>> clf.fit(X, y) LinearDiscriminantAnalysis(n_components=None, priors=None, shrinkage=None, solver='svd', store_covariance=False, tol=0.0001) >>> print(clf.predict([[0.8, 1]])) [1]
Full API documentation: LinearDiscriminantAnalysisScikitsLearnNode

class
mdp.nodes.
PriorProbabilityEstimatorScikitsLearnNode
¶ An estimator predicting the probability of each This node has been automatically generated by wrapping the
sklearn.ensemble.gradient_boosting.PriorProbabilityEstimator
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute.Full API documentation: PriorProbabilityEstimatorScikitsLearnNode

class
mdp.nodes.
ARDRegressionScikitsLearnNode
¶ Bayesian ARD regression. This node has been automatically generated by wrapping the
sklearn.linear_model.bayes.ARDRegression
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Fit the weights of a regression model, using an ARD prior. The weights of the regression model are assumed to be in Gaussian distributions. Also estimate the parameters lambda (precisions of the distributions of the weights) and alpha (precision of the distribution of the noise). The estimation is done by an iterative procedures (Evidence Maximization)Read more in the User Guide.
Parameters
 n_iter : int, optional
 Maximum number of iterations. Default is 300
 tol : float, optional
 Stop the algorithm if w has converged. Default is 1.e3.
 alpha_1 : float, optional
 Hyperparameter : shape parameter for the Gamma distribution prior over the alpha parameter. Default is 1.e6.
 alpha_2 : float, optional
 Hyperparameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the alpha parameter. Default is 1.e6.
 lambda_1 : float, optional
 Hyperparameter : shape parameter for the Gamma distribution prior over the lambda parameter. Default is 1.e6.
 lambda_2 : float, optional
 Hyperparameter : inverse scale parameter (rate parameter) for the Gamma distribution prior over the lambda parameter. Default is 1.e6.
 compute_score : boolean, optional
 If True, compute the objective function at each step of the model. Default is False.
 threshold_lambda : float, optional
 threshold for removing (pruning) weights with high precision from the computation. Default is 1.e+4.
 fit_intercept : boolean, optional
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered). Default is True.
 normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  copy_X : boolean, optional, default True.
 If True, X will be copied; else, it may be overwritten.
 verbose : boolean, optional, default False
 Verbose mode when fitting the model.
Attributes
coef_
: array, shape = (n_features) Coefficients of the regression model (mean of distribution)
alpha_
: float estimated precision of the noise.
lambda_
: array, shape = (n_features) estimated precisions of the weights.
sigma_
: array, shape = (n_features, n_features) estimated variancecovariance matrix of the weights
scores_
: float if computed, value of the objective function (to be maximized)
Examples
>>> from sklearn import linear_model >>> clf = linear_model.ARDRegression() >>> clf.fit([[0,0], [1, 1], [2, 2]], [0, 1, 2]) ... ARDRegression(alpha_1=1e06, alpha_2=1e06, compute_score=False, copy_X=True, fit_intercept=True, lambda_1=1e06, lambda_2=1e06, n_iter=300, normalize=False, threshold_lambda=10000.0, tol=0.001, verbose=False) >>> clf.predict([[1, 1]]) array([1.])
Notes
For an example, see examples/linear_model/plot_ard.py.
References
D. J. C. MacKay, Bayesian nonlinear modeling for the prediction competition, ASHRAE Transactions, 1994.
R. Salakhutdinov, Lecture notes on Statistical Machine Learning, http://www.utstat.toronto.edu/~rsalakhu/sta4273/notes/Lecture2.pdf#page=15 Their beta is our
self.alpha_
Their alpha is ourself.lambda_
ARD is a little different than the slide: only dimensions/features for whichself.lambda_ < self.threshold_lambda
are kept and the rest are discarded.Full API documentation: ARDRegressionScikitsLearnNode

class
mdp.nodes.
ImputerScikitsLearnNode
¶ Imputation transformer for completing missing values. This node has been automatically generated by wrapping the
sklearn.preprocessing.imputation.Imputer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 missing_values : integer or “NaN”, optional (default=”NaN”)
 The placeholder for the missing values. All occurrences of missing_values will be imputed. For missing values encoded as np.nan, use the string value “NaN”.
 strategy : string, optional (default=”mean”)
The imputation strategy.
 If “mean”, then replace missing values using the mean along the axis.
 If “median”, then replace missing values using the median along the axis.
 If “most_frequent”, then replace missing using the most frequent value along the axis.
 axis : integer, optional (default=0)
The axis along which to impute.
 If axis=0, then impute along columns.
 If axis=1, then impute along rows.
 verbose : integer, optional (default=0)
 Controls the verbosity of the imputer.
 copy : boolean, optional (default=True)
If True, a copy of X will be created. If False, imputation will be done inplace whenever possible. Note that, in the following cases, a new copy will always be made, even if copy=False:
 If X is not an array of floating values;
 If X is sparse and missing_values=0;
 If axis=0 and X is encoded as a CSR matrix;
 If axis=1 and X is encoded as a CSC matrix.
Attributes
statistics_
: array of shape (n_features,) The imputation fill value for each feature if axis == 0.
Notes
 When
axis=0
, columns which only contained missing values at fit are discarded upon transform.  When
axis=1
, an exception is raised if there are rows for which it is not possible to fill in the missing values (e.g., because they only contain missing values).
Full API documentation: ImputerScikitsLearnNode

class
mdp.nodes.
VarianceThresholdScikitsLearnNode
¶ Feature selector that removes all lowvariance features. This node has been automatically generated by wrapping the
sklearn.feature_selection.variance_threshold.VarianceThreshold
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This feature selection algorithm looks only at the features (X), not the desired outputs (y), and can thus be used for unsupervised learning.Read more in the User Guide.
Parameters
 threshold : float, optional
 Features with a trainingset variance lower than this threshold will be removed. The default is to keep all features with nonzero variance, i.e. remove the features that have the same value in all samples.
Attributes
variances_
: array, shape (n_features,) Variances of individual features.
Examples
The following dataset has integer features, two of which are the same in every sample. These are removed with the default setting for threshold:
>>> X = [[0, 2, 0, 3], [0, 1, 4, 3], [0, 1, 1, 3]] >>> selector = VarianceThreshold() >>> selector.fit_transform(X) array([[2, 0], [1, 4], [1, 1]])
Full API documentation: VarianceThresholdScikitsLearnNode

class
mdp.nodes.
GradientBoostingRegressorScikitsLearnNode
¶ Gradient Boosting for regression. This node has been automatically generated by wrapping the
sklearn.ensemble.gradient_boosting.GradientBoostingRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. GB builds an additive model in a forward stagewise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage a regression tree is fit on the negative gradient of the given loss function.Read more in the User Guide.
Parameters
 loss : {‘ls’, ‘lad’, ‘huber’, ‘quantile’}, optional (default=’ls’)
 loss function to be optimized. ‘ls’ refers to least squares regression. ‘lad’ (least absolute deviation) is a highly robust loss function solely based on order information of the input variables. ‘huber’ is a combination of the two. ‘quantile’ allows quantile regression (use alpha to specify the quantile).
 learning_rate : float, optional (default=0.1)
 learning rate shrinks the contribution of each tree by learning_rate. There is a tradeoff between learning_rate and n_estimators.
 n_estimators : int (default=100)
 The number of boosting stages to perform. Gradient boosting is fairly robust to overfitting so a large number usually results in better performance.
 subsample : float, optional (default=1.0)
 The fraction of samples to be used for fitting the individual base learners. If smaller than 1.0 this results in Stochastic Gradient Boosting. subsample interacts with the parameter n_estimators. Choosing subsample < 1.0 leads to a reduction of variance and an increase in bias.
 criterion : string, optional (default=”friedman_mse”)
The function to measure the quality of a split. Supported criteria are “friedman_mse” for the mean squared error with improvement score by Friedman, “mse” for mean squared error, and “mae” for the mean absolute error. The default value of “friedman_mse” is generally the best as it can provide a better approximation in some cases.
New in version 0.18.
 min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split as the minimum number.
 If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Changed in version 0.18: Added float values for fractions.
 min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. If int, then consider min_samples_leaf as the minimum number.
 If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
Changed in version 0.18: Added float values for fractions.
 min_weight_fraction_leaf : float, optional (default=0.)
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
 max_depth : integer, optional (default=3)
 maximum depth of the individual regression estimators. The maximum depth limits the number of nodes in the tree. Tune this parameter for best performance; the best value depends on the interaction of the input variables.
 min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following:
N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.New in version 0.19.
 min_impurity_split : float, (default=1e7)
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Deprecated since version 0.19:
min_impurity_split
has been deprecated in favor ofmin_impurity_decrease
in 0.19. The default value ofmin_impurity_split
will change from 1e7 to 0 in 0.23 and it will be removed in 0.25. Usemin_impurity_decrease
instead. init : estimator, optional (default=None)
 An estimator object that is used to compute the initial
predictions.
init
has to providefit
andpredict
. If None it usesloss.init_estimator
.  random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 max_features : int, float, string or None, optional (default=None)
The number of features to consider when looking for the best split:
 If int, then consider max_features features at each split.
 If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
 If “auto”, then max_features=n_features.
 If “sqrt”, then max_features=sqrt(n_features).
 If “log2”, then max_features=log2(n_features).
 If None, then max_features=n_features.
Choosing max_features < n_features leads to a reduction of variance and an increase in bias.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_features
features. alpha : float (default=0.9)
 The alphaquantile of the huber loss function and the quantile
loss function. Only if
loss='huber'
orloss='quantile'
.  verbose : int, default: 0
 Enable verbose output. If 1 then it prints progress and performance once in a while (the more trees the lower the frequency). If greater than 1 then it prints progress and performance for every tree.
 max_leaf_nodes : int or None, optional (default=None)
 Grow trees with
max_leaf_nodes
in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.  warm_start : bool, default: False
 When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just erase the previous solution. See the Glossary.  presort : bool or ‘auto’, optional (default=’auto’)
Whether to presort the data to speed up the finding of best splits in fitting. Auto mode by default will use presorting on dense data and default to normal sorting on sparse data. Setting presort to true on sparse data will raise an error.
New in version 0.17: optional parameter presort.
 validation_fraction : float, optional, default 0.1
The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if
n_iter_no_change
is set to an integer.New in version 0.20.
 n_iter_no_change : int, default None
n_iter_no_change
is used to decide if early stopping will be used to terminate training when validation score is not improving. By default it is set to None to disable early stopping. If set to a number, it will set asidevalidation_fraction
size of the training data as validation and terminate training when validation score is not improving in all of the previousn_iter_no_change
numbers of iterations.New in version 0.20.
 tol : float, optional, default 1e4
Tolerance for the early stopping. When the loss is not improving by at least tol for
n_iter_no_change
iterations (if set to a number), the training stops.New in version 0.20.
Attributes
feature_importances_
: array, shape (n_features,) The feature importances (the higher, the more important the feature).
oob_improvement_
: array, shape (n_estimators,) The improvement in loss (= deviance) on the outofbag samples
relative to the previous iteration.
oob_improvement_[0]
is the improvement in loss of the first stage over theinit
estimator. train_score_
: array, shape (n_estimators,) The ith score
train_score_[i]
is the deviance (= loss) of the model at iterationi
on the inbag sample. Ifsubsample == 1
this is the deviance on the training data. loss_
: LossFunction The concrete
LossFunction
object. init_
: estimator The estimator that provides the initial predictions.
Set via the
init
argument orloss.init_estimator
. estimators_
: array of DecisionTreeRegressor, shape (n_estimators, 1) The collection of fitted subestimators.
Notes
The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and
max_features=n_features
, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting,random_state
has to be fixed.See also
DecisionTreeRegressor, RandomForestRegressor
References
J. Friedman, Greedy Function Approximation: A Gradient Boosting Machine, The Annals of Statistics, Vol. 29, No. 5, 2001.
 Friedman, Stochastic Gradient Boosting, 1999
T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning Ed. 2, Springer, 2009.
Full API documentation: GradientBoostingRegressorScikitsLearnNode

class
mdp.nodes.
OrthogonalMatchingPursuitScikitsLearnNode
¶ Orthogonal Matching Pursuit model (OMP) This node has been automatically generated by wrapping the
sklearn.linear_model.omp.OrthogonalMatchingPursuit
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 n_nonzero_coefs : int, optional
 Desired number of nonzero entries in the solution. If None (by default) this value is set to 10% of n_features.
 tol : float, optional
 Maximum norm of the residual. If not None, overrides n_nonzero_coefs.
 fit_intercept : boolean, optional
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default True
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  precompute : {True, False, ‘auto’}, default ‘auto’
 Whether to use a precomputed Gram and Xy matrix to speed up calculations. Improves performance when n_targets or n_samples is very large. Note that if you already have such matrices, you can pass them directly to the fit method.
Attributes
coef_
: array, shape (n_features,) or (n_targets, n_features) parameter vector (w in the formula)
intercept_
: float or array, shape (n_targets,) independent term in decision function.
n_iter_
: int or arraylike Number of active features across every target.
Examples
>>> from sklearn.linear_model import OrthogonalMatchingPursuit >>> from sklearn.datasets import make_regression >>> X, y = make_regression(noise=4, random_state=0) >>> reg = OrthogonalMatchingPursuit().fit(X, y) >>> reg.score(X, y) 0.9991... >>> reg.predict(X[:1,]) array([78.3854...])
Notes
Orthogonal matching pursuit was introduced in G. Mallat, Z. Zhang, Matching pursuits with timefrequency dictionaries, IEEE Transactions on Signal Processing, Vol. 41, No. 12. (December 1993), pp. 33973415. (http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf)
This implementation is based on Rubinstein, R., Zibulevsky, M. and Elad, M., Efficient Implementation of the KSVD Algorithm using Batch Orthogonal Matching Pursuit Technical Report  CS Technion, April 2008. http://www.cs.technion.ac.il/~ronrubin/Publications/KSVDOMPv2.pdf
See also
orthogonal_mp orthogonal_mp_gram lars_path Lars LassoLars decomposition.sparse_encode OrthogonalMatchingPursuitCV
Full API documentation: OrthogonalMatchingPursuitScikitsLearnNode

class
mdp.nodes.
PLSCanonicalScikitsLearnNode
¶ PLSCanonical implements the 2 blocks canonical PLS of the original Wold algorithm [Tenenhaus 1998] p.204, referred as PLSC2A in [Wegelin 2000]. This node has been automatically generated by wrapping the
sklearn.cross_decomposition.pls_.PLSCanonical
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This class inherits from PLS with mode=”A” and deflation_mode=”canonical”, norm_y_weights=True and algorithm=”nipals”, but svd should provide similar results up to numerical errors.Read more in the User Guide.
Parameters
 n_components : int, (default 2).
 Number of components to keep
 scale : boolean, (default True)
 Option to scale data
 algorithm : string, “nipals” or “svd”
 The algorithm used to estimate the weights. It will be called n_components times, i.e. once for each iteration of the outer loop.
 max_iter : an integer, (default 500)
 the maximum number of iterations of the NIPALS inner loop (used only if algorithm=”nipals”)
 tol : nonnegative real, default 1e06
 the tolerance used in the iterative algorithm
 copy : boolean, default True
 Whether the deflation should be done on a copy. Let the default value to True unless you don’t care about side effect
Attributes
x_weights_
: array, shape = [p, n_components] X block weights vectors.
y_weights_
: array, shape = [q, n_components] Y block weights vectors.
x_loadings_
: array, shape = [p, n_components] X block loadings vectors.
y_loadings_
: array, shape = [q, n_components] Y block loadings vectors.
x_scores_
: array, shape = [n_samples, n_components] X scores.
y_scores_
: array, shape = [n_samples, n_components] Y scores.
x_rotations_
: array, shape = [p, n_components] X block to latents rotations.
y_rotations_
: array, shape = [q, n_components] Y block to latents rotations.
n_iter_
: arraylike Number of iterations of the NIPALS inner loop for each component. Not useful if the algorithm provided is “svd”.
Notes
Matrices:
T: ``x_scores_`` U: ``y_scores_`` W: ``x_weights_`` C: ``y_weights_`` P: ``x_loadings_`` Q: ``y_loadings__``
Are computed such that:
X = T P.T + Err and Y = U Q.T + Err T[:, k] = Xk W[:, k] for k in range(n_components) U[:, k] = Yk C[:, k] for k in range(n_components) ``x_rotations_`` = W (P.T W)^(1) ``y_rotations_`` = C (Q.T C)^(1)
where Xk and Yk are residual matrices at iteration k.
For each component k, find weights u, v that optimize:
max corr(Xk u, Yk v) * std(Xk u) std(Yk u), such that ``u = v = 1``
Note that it maximizes both the correlations between the scores and the intrablock variances.
The residual matrix of X (Xk+1) block is obtained by the deflation on the current X score: x_score.
The residual matrix of Y (Yk+1) block is obtained by deflation on the current Y score. This performs a canonical symmetric version of the PLS regression. But slightly different than the CCA. This is mostly used for modeling.
This implementation provides the same results that the “plspm” package provided in the R language (Rproject), using the function plsca(X, Y). Results are equal or collinear with the function
pls(..., mode = "canonical")
of the “mixOmics” package. The difference relies in the fact that mixOmics implementation does not exactly implement the Wold algorithm since it does not normalize y_weights to one.Examples
>>> from sklearn.cross_decomposition import PLSCanonical >>> X = [[0., 0., 1.], [1.,0.,0.], [2.,2.,2.], [2.,5.,4.]] >>> Y = [[0.1, 0.2], [0.9, 1.1], [6.2, 5.9], [11.9, 12.3]] >>> plsca = PLSCanonical(n_components=2) >>> plsca.fit(X, Y) ... PLSCanonical(algorithm='nipals', copy=True, max_iter=500, n_components=2, scale=True, tol=1e06) >>> X_c, Y_c = plsca.transform(X, Y)
References
Jacob A. Wegelin. A survey of Partial Least Squares (PLS) methods, with emphasis on the twoblock case. Technical Report 371, Department of Statistics, University of Washington, Seattle, 2000.
Tenenhaus, M. (1998). La regression PLS: theorie et pratique. Paris:
Editions Technic.
See also
CCA PLSSVD
Full API documentation: PLSCanonicalScikitsLearnNode

class
mdp.nodes.
FeatureAgglomerationScikitsLearnNode
¶ Agglomerate features. This node has been automatically generated by wrapping the
sklearn.cluster.hierarchical.FeatureAgglomeration
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Similar to AgglomerativeClustering, but recursively merges features instead of samples.Read more in the User Guide.
Parameters
 n_clusters : int, default 2
 The number of clusters to find.
 affinity : string or callable, default “euclidean”
 Metric used to compute the linkage. Can be “euclidean”, “l1”, “l2”, “manhattan”, “cosine”, or ‘precomputed’. If linkage is “ward”, only “euclidean” is accepted.
 memory : None, str or object with the joblib.Memory interface, optional
 Used to cache the output of the computation of the tree. By default, no caching is done. If a string is given, it is the path to the caching directory.
 connectivity : arraylike or callable, optional
 Connectivity matrix. Defines for each feature the neighboring features following a given structure of the data. This can be a connectivity matrix itself or a callable that transforms the data into a connectivity matrix, such as derived from kneighbors_graph. Default is None, i.e, the hierarchical clustering algorithm is unstructured.
 compute_full_tree : bool or ‘auto’, optional, default “auto”
 Stop early the construction of the tree at n_clusters. This is useful to decrease computation time if the number of clusters is not small compared to the number of features. This option is useful only when specifying a connectivity matrix. Note also that when varying the number of clusters and using caching, it may be advantageous to compute the full tree.
 linkage : {“ward”, “complete”, “average”, “single”}, optional (default=”ward”)
Which linkage criterion to use. The linkage criterion determines which distance to use between sets of features. The algorithm will merge the pairs of cluster that minimize this criterion.
 ward minimizes the variance of the clusters being merged.
 average uses the average of the distances of each feature of the two sets.
 complete or maximum linkage uses the maximum distances between all features of the two sets.
 single uses the minimum of the distances between all observations of the two sets.
 pooling_func : callable, default np.mean
 This combines the values of agglomerated features into a single value, and should accept an array of shape [M, N] and the keyword argument axis=1, and reduce it to an array of size [M].
Attributes
labels_
: arraylike, (n_features,) cluster labels for each feature.
n_leaves_
: int Number of leaves in the hierarchical tree.
n_components_
: int The estimated number of connected components in the graph.
children_
: arraylike, shape (n_nodes1, 2) The children of each nonleaf node. Values less than n_features correspond to leaves of the tree which are the original samples. A node i greater than or equal to n_features is a nonleaf node and has children children_[i  n_features]. Alternatively at the ith iteration, children[i][0] and children[i][1] are merged to form node n_features + i
Examples
>>> import numpy as np >>> from sklearn import datasets, cluster >>> digits = datasets.load_digits() >>> images = digits.images >>> X = np.reshape(images, (len(images), 1)) >>> agglo = cluster.FeatureAgglomeration(n_clusters=32) >>> agglo.fit(X) FeatureAgglomeration(affinity='euclidean', compute_full_tree='auto', connectivity=None, linkage='ward', memory=None, n_clusters=32, pooling_func=...) >>> X_reduced = agglo.transform(X) >>> X_reduced.shape (1797, 32)
Full API documentation: FeatureAgglomerationScikitsLearnNode

class
mdp.nodes.
SelectPercentileScikitsLearnNode
¶ Select features according to a percentile of the highest scores. This node has been automatically generated by wrapping the
sklearn.feature_selection.univariate_selection.SelectPercentile
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 score_func : callable
 Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues) or a single array with scores. Default is f_classif (see below “See also”). The default function only works with classification tasks.
 percentile : int, optional, default=10
 Percent of features to keep.
Attributes
scores_
: arraylike, shape=(n_features,) Scores of features.
pvalues_
: arraylike, shape=(n_features,) pvalues of feature scores, None if score_func returned only scores.
Examples
>>> from sklearn.datasets import load_digits >>> from sklearn.feature_selection import SelectPercentile, chi2 >>> X, y = load_digits(return_X_y=True) >>> X.shape (1797, 64) >>> X_new = SelectPercentile(chi2, percentile=10).fit_transform(X, y) >>> X_new.shape (1797, 7)
Notes
Ties between features with equal scores will be broken in an unspecified way.
See also
f_classif: ANOVA Fvalue between label/feature for classification tasks. mutual_info_classif: Mutual information for a discrete target. chi2: Chisquared stats of nonnegative features for classification tasks. f_regression: Fvalue between label/feature for regression tasks. mutual_info_regression: Mutual information for a continuous target. SelectKBest: Select features based on the k highest scores. SelectFpr: Select features based on a false positive rate test. SelectFdr: Select features based on an estimated false discovery rate. SelectFwe: Select features based on familywise error rate. GenericUnivariateSelect: Univariate feature selector with configurable mode.
Full API documentation: SelectPercentileScikitsLearnNode

class
mdp.nodes.
KernelRidgeScikitsLearnNode
¶ Kernel ridge regression. This node has been automatically generated by wrapping the
sklearn.kernel_ridge.KernelRidge
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Kernel ridge regression (KRR) combines ridge regression (linear least squares with l2norm regularization) with the kernel trick. It thus learns a linear function in the space induced by the respective kernel and the data. For nonlinear kernels, this corresponds to a nonlinear function in the original space.The form of the model learned by KRR is identical to support vector regression (SVR). However, different loss functions are used: KRR uses squared error loss while support vector regression uses epsiloninsensitive loss, both combined with l2 regularization. In contrast to SVR, fitting a KRR model can be done in closedform and is typically faster for mediumsized datasets. On the other hand, the learned model is nonsparse and thus slower than SVR, which learns a sparse model for epsilon > 0, at predictiontime.
This estimator has builtin support for multivariate regression (i.e., when y is a 2darray of shape [n_samples, n_targets]).
Read more in the User Guide.
Parameters
 alpha : {float, arraylike}, shape = [n_targets]
 Small positive values of alpha improve the conditioning of the problem
and reduce the variance of the estimates. Alpha corresponds to
(2*C)^1
in other linear models such as LogisticRegression or LinearSVC. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.  kernel : string or callable, default=”linear”
 Kernel mapping used internally. A callable should accept two arguments and the keyword arguments passed to this object as kernel_params, and should return a floating point number. Set to “precomputed” in order to pass a precomputed kernel matrix to the estimator methods instead of samples.
 gamma : float, default=None
 Gamma parameter for the RBF, laplacian, polynomial, exponential chi2 and sigmoid kernels. Interpretation of the default value is left to the kernel; see the documentation for sklearn.metrics.pairwise. Ignored by other kernels.
 degree : float, default=3
 Degree of the polynomial kernel. Ignored by other kernels.
 coef0 : float, default=1
 Zero coefficient for polynomial and sigmoid kernels. Ignored by other kernels.
 kernel_params : mapping of string to any, optional
 Additional parameters (keyword arguments) for kernel function passed as callable object.
Attributes
dual_coef_
: array, shape = [n_samples] or [n_samples, n_targets] Representation of weight vector(s) in kernel space
X_fit_
: {arraylike, sparse matrix}, shape = [n_samples, n_features] Training data, which is also required for prediction. If kernel == “precomputed” this is instead the precomputed training matrix, shape = [n_samples, n_samples].
References
 Kevin P. Murphy “Machine Learning: A Probabilistic Perspective”, The MIT Press chapter 14.4.3, pp. 492493
See also
sklearn.linear_model.Ridge:
 Linear ridge regression.
sklearn.svm.SVR:
 Support Vector Regression implemented using libsvm.
Examples
>>> from sklearn.kernel_ridge import KernelRidge >>> import numpy as np >>> n_samples, n_features = 10, 5 >>> rng = np.random.RandomState(0) >>> y = rng.randn(n_samples) >>> X = rng.randn(n_samples, n_features) >>> clf = KernelRidge(alpha=1.0) >>> clf.fit(X, y) KernelRidge(alpha=1.0, coef0=1, degree=3, gamma=None, kernel='linear', kernel_params=None)
Full API documentation: KernelRidgeScikitsLearnNode

class
mdp.nodes.
MultiTaskLassoCVScikitsLearnNode
¶ Multitask Lasso model trained with L1/L2 mixednorm as regularizer. This node has been automatically generated by wrapping the
sklearn.linear_model.coordinate_descent.MultiTaskLassoCV
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. See glossary entry for crossvalidation estimator.The optimization objective for MultiTaskLasso is:
(1 / (2 * n_samples)) * Y  XW^Fro_2 + alpha * W_21
Where:
W_21 = \sum_i \sqrt{\sum_j w_{ij}^2}
i.e. the sum of norm of each row.
Read more in the User Guide.
Parameters
 eps : float, optional
 Length of the path.
eps=1e3
means thatalpha_min / alpha_max = 1e3
.  n_alphas : int, optional
 Number of alphas along the regularization path
 alphas : arraylike, optional
 List of alphas where to compute the models. If not provided, set automatically.
 fit_intercept : boolean
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  max_iter : int, optional
 The maximum number of iterations.
 tol : float, optional
 The tolerance for the optimization: if the updates are
smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
.  copy_X : boolean, optional, default True
 If
True
, X will be copied; else, it may be overwritten.  cv : int, crossvalidation generator or an iterable, optional
Determines the crossvalidation splitting strategy. Possible inputs for cv are:
 None, to use the default 3fold crossvalidation,
 integer, to specify the number of folds.
 CV splitter,
 An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs,
KFold
is used.Refer User Guide for the various crossvalidation strategies that can be used here.
Changed in version 0.20:
cv
default value if None will change from 3fold to 5fold in v0.22. verbose : bool or integer
 Amount of verbosity.
 n_jobs : int or None, optional (default=None)
 Number of CPUs to use during the cross validation. Note that this is
used only if multiple values for l1_ratio are given.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  random_state : int, RandomState instance or None, optional, default None
 The seed of the pseudo random number generator that selects a random
feature to update. If int, random_state is the seed used by the random
number generator; If RandomState instance, random_state is the random
number generator; If None, the random number generator is the
RandomState instance used by np.random. Used when
selection
== ‘random’  selection : str, default ‘cyclic’
 If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e4.
Attributes
intercept_
: array, shape (n_tasks,) Independent term in decision function.
coef_
: array, shape (n_tasks, n_features) Parameter vector (W in the cost function formula).
Note that
coef_
stores the transpose ofW
,W.T
. alpha_
: float The amount of penalization chosen by cross validation
mse_path_
: array, shape (n_alphas, n_folds) mean square error for the test set on each fold, varying alpha
alphas_
: numpy array, shape (n_alphas,) The grid of alphas used for fitting.
n_iter_
: int number of iterations run by the coordinate descent solver to reach the specified tolerance for the optimal alpha.
Examples
>>> from sklearn.linear_model import MultiTaskLassoCV >>> from sklearn.datasets import make_regression >>> X, y = make_regression(n_targets=2, noise=4, random_state=0) >>> reg = MultiTaskLassoCV(cv=5, random_state=0).fit(X, y) >>> reg.score(X, y) 0.9994... >>> reg.alpha_ 0.5713... >>> reg.predict(X[:1,]) array([[153.7971..., 94.9015...]])
See also
MultiTaskElasticNet ElasticNetCV MultiTaskElasticNetCV
Notes
The algorithm used to fit the model is coordinate descent.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortrancontiguous numpy array.
Full API documentation: MultiTaskLassoCVScikitsLearnNode

class
mdp.nodes.
GaussianNBScikitsLearnNode
¶ Gaussian Naive Bayes (GaussianNB) This node has been automatically generated by wrapping the
sklearn.naive_bayes.GaussianNB
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Can perform online updates to model parameters via partial_fit method. For details on algorithm used to update feature means and variance online, see Stanford CS tech report STANCS79773 by Chan, Golub, and LeVeque:Read more in the User Guide.
Parameters
 priors : arraylike, shape (n_classes,)
 Prior probabilities of the classes. If specified the priors are not adjusted according to the data.
 var_smoothing : float, optional (default=1e9)
 Portion of the largest variance of all features that is added to variances for calculation stability.
Attributes
class_prior_
: array, shape (n_classes,) probability of each class.
class_count_
: array, shape (n_classes,) number of training samples observed in each class.
theta_
: array, shape (n_classes, n_features) mean of each feature per class
sigma_
: array, shape (n_classes, n_features) variance of each feature per class
epsilon_
: float absolute additive value to variances
Examples
>>> import numpy as np >>> X = np.array([[1, 1], [2, 1], [3, 2], [1, 1], [2, 1], [3, 2]]) >>> Y = np.array([1, 1, 1, 2, 2, 2]) >>> from sklearn.naive_bayes import GaussianNB >>> clf = GaussianNB() >>> clf.fit(X, Y) GaussianNB(priors=None, var_smoothing=1e09) >>> print(clf.predict([[0.8, 1]])) [1] >>> clf_pf = GaussianNB() >>> clf_pf.partial_fit(X, Y, np.unique(Y)) GaussianNB(priors=None, var_smoothing=1e09) >>> print(clf_pf.predict([[0.8, 1]])) [1]
Full API documentation: GaussianNBScikitsLearnNode

class
mdp.nodes.
LabelSpreadingScikitsLearnNode
¶ LabelSpreading model for semisupervised learning This node has been automatically generated by wrapping the
sklearn.semi_supervised.label_propagation.LabelSpreading
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This model is similar to the basic Label Propagation algorithm, but uses affinity matrix based on the normalized graph Laplacian and soft clamping across the labels.Read more in the User Guide.
Parameters
 kernel : {‘knn’, ‘rbf’, callable}
 String identifier for kernel function to use or the kernel function itself. Only ‘rbf’ and ‘knn’ strings are valid inputs. The function passed should take two inputs, each of shape [n_samples, n_features], and return a [n_samples, n_samples] shaped weight matrix
 gamma : float
 parameter for rbf kernel
 n_neighbors : integer > 0
 parameter for knn kernel
 alpha : float
 Clamping factor. A value in (0, 1) that specifies the relative amount that an instance should adopt the information from its neighbors as opposed to its initial label. alpha=0 means keeping the initial label information; alpha=1 means replacing all initial information.
 max_iter : integer
 maximum number of iterations allowed
 tol : float
 Convergence tolerance: threshold to consider the system at steady state
 n_jobs : int or None, optional (default=None)
 The number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.
Attributes
X_
: array, shape = [n_samples, n_features] Input array.
classes_
: array, shape = [n_classes] The distinct labels used in classifying instances.
label_distributions_
: array, shape = [n_samples, n_classes] Categorical distribution for each item.
transduction_
: array, shape = [n_samples] Label assigned to each item via the transduction.
n_iter_
: int Number of iterations run.
Examples
>>> import numpy as np >>> from sklearn import datasets >>> from sklearn.semi_supervised import LabelSpreading >>> label_prop_model = LabelSpreading() >>> iris = datasets.load_iris() >>> rng = np.random.RandomState(42) >>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3 >>> labels = np.copy(iris.target) >>> labels[random_unlabeled_points] = 1 >>> label_prop_model.fit(iris.data, labels) ... LabelSpreading(...)
References
Dengyong Zhou, Olivier Bousquet, Thomas Navin Lal, Jason Weston, Bernhard Schoelkopf. Learning with local and global consistency (2004) http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.115.3219
See Also
LabelPropagation : Unregularized graph based semisupervised learning
Full API documentation: LabelSpreadingScikitsLearnNode

class
mdp.nodes.
LatentDirichletAllocationScikitsLearnNode
¶ Latent Dirichlet Allocation with online variational Bayes algorithm This node has been automatically generated by wrapping the
sklearn.decomposition.online_lda.LatentDirichletAllocation
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. .. versionadded:: 0.17Read more in the User Guide.
Parameters
 n_components : int, optional (default=10)
 Number of topics.
 doc_topic_prior : float, optional (default=None)
 Prior of document topic distribution theta. If the value is None, defaults to 1 / n_components. In [1]_, this is called alpha.
 topic_word_prior : float, optional (default=None)
 Prior of topic word distribution beta. If the value is None, defaults to 1 / n_components. In [1]_, this is called eta.
 learning_method : ‘batch’  ‘online’, default=’batch’
Method used to update _component. Only used in fit method. In general, if the data size is large, the online update will be much faster than the batch update.
Valid options:
'batch': Batch variational Bayes method. Use all training data in each EM update. Old `components_` will be overwritten in each iteration. 'online': Online variational Bayes method. In each EM update, use minibatch of training data to update the ``components_`` variable incrementally. The learning rate is controlled by the ``learning_decay`` and the ``learning_offset`` parameters.
Changed in version 0.20: The default learning method is now
"batch"
. learning_decay : float, optional (default=0.7)
 It is a parameter that control learning rate in the online learning
method. The value should be set between (0.5, 1.0] to guarantee
asymptotic convergence. When the value is 0.0 and batch_size is
n_samples
, the update method is same as batch learning. In the literature, this is called kappa.  learning_offset : float, optional (default=10.)
 A (positive) parameter that downweights early iterations in online learning. It should be greater than 1.0. In the literature, this is called tau_0.
 max_iter : integer, optional (default=10)
 The maximum number of iterations.
 batch_size : int, optional (default=128)
 Number of documents to use in each EM iteration. Only used in online learning.
 evaluate_every : int, optional (default=0)
 How often to evaluate perplexity. Only used in fit method. set it to 0 or negative number to not evalute perplexity in training at all. Evaluating perplexity can help you check convergence in training process, but it will also increase total training time. Evaluating perplexity in every iteration might increase training time up to twofold.
 total_samples : int, optional (default=1e6)
 Total number of documents. Only used in the partial_fit method.
 perp_tol : float, optional (default=1e1)
 Perplexity tolerance in batch learning. Only used when
evaluate_every
is greater than 0.  mean_change_tol : float, optional (default=1e3)
 Stopping tolerance for updating document topic distribution in Estep.
 max_doc_update_iter : int (default=100)
 Max number of iterations for updating document topic distribution in the Estep.
 n_jobs : int or None, optional (default=None)
 The number of jobs to use in the Estep.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  verbose : int, optional (default=0)
 Verbosity level.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 n_topics : int, optional (default=None)
 This parameter has been renamed to n_components and will be removed in version 0.21. .. deprecated:: 0.19
Attributes
components_
: array, [n_components, n_features]Variational parameters for topic word distribution. Since the complete conditional for topic word distribution is a Dirichlet,
components_[i, j]
can be viewed as pseudocount that represents the number of times word j was assigned to topic i. It can also be viewed as distribution over the words for each topic after normalization:model.components_ / model.components_.sum(axis=1)[:, np.newaxis]
.
n_batch_iter_
: int Number of iterations of the EM step.
n_iter_
: int Number of passes over the dataset.
Examples
>>> from sklearn.decomposition import LatentDirichletAllocation >>> from sklearn.datasets import make_multilabel_classification >>> # This produces a feature matrix of token counts, similar to what >>> # CountVectorizer would produce on text. >>> X, _ = make_multilabel_classification(random_state=0) >>> lda = LatentDirichletAllocation(n_components=5, ... random_state=0) >>> lda.fit(X) LatentDirichletAllocation(...) >>> # get topics for some given samples:
>>> lda.transform(X[2:]) array([[0.00360392, 0.25499205, 0.0036211 , 0.64236448, 0.09541846], [0.15297572, 0.00362644, 0.44412786, 0.39568399, 0.003586 ]])
References
 [1] “Online Learning for Latent Dirichlet Allocation”, Matthew D. Hoffman,
 David M. Blei, Francis Bach, 2010
 [2] “Stochastic Variational Inference”, Matthew D. Hoffman, David M. Blei,
 Chong Wang, John Paisley, 2013
[3] Matthew D. Hoffman’s onlineldavb code. Link:
Full API documentation: LatentDirichletAllocationScikitsLearnNode

class
mdp.nodes.
NMFScikitsLearnNode
¶ NonNegative Matrix Factorization (NMF) This node has been automatically generated by wrapping the
sklearn.decomposition.nmf.NMF
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Find two nonnegative matrices (W, H) whose product approximates the non negative matrix X. This factorization can be used for example for dimensionality reduction, source separation or topic extraction.The objective function is:
0.5 * X  WH_Fro^2 + alpha * l1_ratio * vec(W)_1 + alpha * l1_ratio * vec(H)_1 + 0.5 * alpha * (1  l1_ratio) * W_Fro^2 + 0.5 * alpha * (1  l1_ratio) * H_Fro^2
Where:
A_Fro^2 = \sum_{i,j} A_{ij}^2 (Frobenius norm) vec(A)_1 = \sum_{i,j} abs(A_{ij}) (Elementwise L1 norm)
For multiplicativeupdate (‘mu’) solver, the Frobenius norm (0.5 * X  WH_Fro^2) can be changed into another betadivergence loss, by changing the beta_loss parameter.
The objective function is minimized with an alternating minimization of W and H.
Read more in the User Guide.
Parameters
 n_components : int or None
 Number of components, if n_components is not set all features are kept.
 init : ‘random’  ‘nndsvd’  ‘nndsvda’  ‘nndsvdar’  ‘custom’
Method used to initialize the procedure. Default: ‘nndsvd’ if n_components < n_features, otherwise random. Valid options:
‘random’: nonnegative random matrices, scaled with:
 sqrt(X.mean() / n_components)
 ‘nndsvd’: Nonnegative Double Singular Value Decomposition (NNDSVD)
initialization (better for sparseness)
 ‘nndsvda’: NNDSVD with zeros filled with the average of X
(better when sparsity is not desired)
 ‘nndsvdar’: NNDSVD with zeros filled with small random values
(generally faster, less accurate alternative to NNDSVDa for when sparsity is not desired)
‘custom’: use custom matrices W and H
 solver : ‘cd’  ‘mu’
Numerical solver to use:
 ‘cd’ is a Coordinate Descent solver.
 ‘mu’ is a Multiplicative Update solver.
New in version 0.17: Coordinate Descent solver.
New in version 0.19: Multiplicative Update solver.
 beta_loss : float or string, default ‘frobenius’
String must be in {‘frobenius’, ‘kullbackleibler’, ‘itakurasaito’}. Beta divergence to be minimized, measuring the distance between X and the dot product WH. Note that values different from ‘frobenius’ (or 2) and ‘kullbackleibler’ (or 1) lead to significantly slower fits. Note that for beta_loss <= 0 (or ‘itakurasaito’), the input matrix X cannot contain zeros. Used only in ‘mu’ solver.
New in version 0.19.
 tol : float, default: 1e4
 Tolerance of the stopping condition.
 max_iter : integer, default: 200
 Maximum number of iterations before timing out.
 random_state : int, RandomState instance or None, optional, default: None
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 alpha : double, default: 0.
Constant that multiplies the regularization terms. Set it to zero to have no regularization.
New in version 0.17: alpha used in the Coordinate Descent solver.
 l1_ratio : double, default: 0.
The regularization mixing parameter, with 0 <= l1_ratio <= 1. For l1_ratio = 0 the penalty is an elementwise L2 penalty (aka Frobenius Norm). For l1_ratio = 1 it is an elementwise L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2.
New in version 0.17: Regularization parameter l1_ratio used in the Coordinate Descent solver.
 verbose : bool, default=False
 Whether to be verbose.
 shuffle : boolean, default: False
If true, randomize the order of coordinates in the CD solver.
New in version 0.17: shuffle parameter used in the Coordinate Descent solver.
Attributes
components_
: array, [n_components, n_features] Factorization matrix, sometimes called ‘dictionary’.
reconstruction_err_
: number Frobenius norm of the matrix difference, or betadivergence, between
the training data
X
and the reconstructed dataWH
from the fitted model. n_iter_
: int Actual number of iterations.
Examples
>>> import numpy as np >>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]]) >>> from sklearn.decomposition import NMF >>> model = NMF(n_components=2, init='random', random_state=0) >>> W = model.fit_transform(X) >>> H = model.components_
References
Cichocki, Andrzej, and P. H. A. N. AnhHuy. “Fast local algorithms for large scale nonnegative matrix and tensor factorizations.” IEICE transactions on fundamentals of electronics, communications and computer sciences 92.3: 708721, 2009.
Fevotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the betadivergence. Neural Computation, 23(9).
Full API documentation: NMFScikitsLearnNode

class
mdp.nodes.
ScaledLogOddsEstimatorScikitsLearnNode
¶ This node has been automatically generated by wrapping the
sklearn.ensemble.gradient_boosting.ScaledLogOddsEstimator
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute.Full API documentation: ScaledLogOddsEstimatorScikitsLearnNode

class
mdp.nodes.
MaxAbsScalerScikitsLearnNode
¶ Scale each feature by its maximum absolute value. This node has been automatically generated by wrapping the
sklearn.preprocessing.data.MaxAbsScaler
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This estimator scales and translates each feature individually such that the maximal absolute value of each feature in the training set will be 1.0. It does not shift/center the data, and thus does not destroy any sparsity.This scaler can also be applied to sparse CSR or CSC matrices.
New in version 0.17.
Parameters
 copy : boolean, optional, default is True
 Set to False to perform inplace scaling and avoid a copy (if the input is already a numpy array).
Attributes
scale_
: ndarray, shape (n_features,)Per feature relative scaling of the data.
New in version 0.17: scale_ attribute.
max_abs_
: ndarray, shape (n_features,) Per feature maximum absolute value.
n_samples_seen_
: int The number of samples processed by the estimator. Will be reset on
new calls to fit, but increments across
partial_fit
calls.
Examples
>>> from sklearn.preprocessing import MaxAbsScaler >>> X = [[ 1., 1., 2.], ... [ 2., 0., 0.], ... [ 0., 1., 1.]] >>> transformer = MaxAbsScaler().fit(X) >>> transformer MaxAbsScaler(copy=True) >>> transformer.transform(X) array([[ 0.5, 1. , 1. ], [ 1. , 0. , 0. ], [ 0. , 1. , 0.5]])
See also
maxabs_scale: Equivalent function without the estimator API.
Notes
NaNs are treated as missing values: disregarded in fit, and maintained in transform.
For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
Full API documentation: MaxAbsScalerScikitsLearnNode

class
mdp.nodes.
HashingVectorizerScikitsLearnNode
¶ Convert a collection of text documents to a matrix of token occurrences This node has been automatically generated by wrapping the
sklearn.feature_extraction.text.HashingVectorizer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. It turns a collection of text documents into a scipy.sparse matrix holding token occurrence counts (or binary occurrence information), possibly normalized as token frequencies if norm=’l1’ or projected on the euclidean unit sphere if norm=’l2’.This text vectorizer implementation uses the hashing trick to find the token string name to feature integer index mapping.
This strategy has several advantages:
 it is very low memory scalable to large datasets as there is no need to store a vocabulary dictionary in memory
 it is fast to pickle and unpickle as it holds no state besides the constructor parameters
 it can be used in a streaming (partial fit) or parallel pipeline as there is no state computed during fit.
There are also a couple of cons (vs using a CountVectorizer with an inmemory vocabulary):
 there is no way to compute the inverse transform (from feature indices to string feature names) which can be a problem when trying to introspect which features are most important to a model.
 there can be collisions: distinct tokens can be mapped to the same feature index. However in practice this is rarely an issue if n_features is large enough (e.g. 2 ** 18 for text classification problems).
 no IDF weighting as this would render the transformer stateful.
The hash function employed is the signed 32bit version of Murmurhash3.
Read more in the User Guide.
Parameters
 input : string {‘filename’, ‘file’, ‘content’}
If ‘filename’, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.
If ‘file’, the sequence items must have a ‘read’ method (filelike object) that is called to fetch the bytes in memory.
Otherwise the input is expected to be the sequence strings or bytes items are expected to be analyzed directly.
 encoding : string, default=’utf8’
 If bytes or files are given to analyze, this encoding is used to decode.
 decode_error : {‘strict’, ‘ignore’, ‘replace’}
 Instruction on what to do if a byte sequence is given to analyze that contains characters not of the given encoding. By default, it is ‘strict’, meaning that a UnicodeDecodeError will be raised. Other values are ‘ignore’ and ‘replace’.
 strip_accents : {‘ascii’, ‘unicode’, None}
Remove accents and perform other character normalization during the preprocessing step. ‘ascii’ is a fast method that only works on characters that have an direct ASCII mapping. ‘unicode’ is a slightly slower method that works on any characters. None (default) does nothing.
Both ‘ascii’ and ‘unicode’ use NFKD normalization from
unicodedata.normalize()
. lowercase : boolean, default=True
 Convert all characters to lowercase before tokenizing.
 preprocessor : callable or None (default)
 Override the preprocessing (string transformation) stage while preserving the tokenizing and ngrams generation steps.
 tokenizer : callable or None (default)
 Override the string tokenization step while preserving the
preprocessing and ngrams generation steps.
Only applies if
analyzer == 'word'
.  stop_words : string {‘english’}, list, or None (default)
If ‘english’, a builtin stop word list for English is used. There are several known issues with ‘english’ and you should consider an alternative (see stop_words).
If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens. Only applies if
analyzer == 'word'
. token_pattern : string
 Regular expression denoting what constitutes a “token”, only used
if
analyzer == 'word'
. The default regexp selects tokens of 2 or more alphanumeric characters (punctuation is completely ignored and always treated as a token separator).  ngram_range : tuple (min_n, max_n), default=(1, 1)
 The lower and upper boundary of the range of nvalues for different ngrams to be extracted. All values of n such that min_n <= n <= max_n will be used.
 analyzer : string, {‘word’, ‘char’, ‘char_wb’} or callable
Whether the feature should be made of word or character ngrams. Option ‘char_wb’ creates character ngrams only from text inside word boundaries; ngrams at the edges of words are padded with space.
If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.
 n_features : integer, default=(2 ** 20)
 The number of features (columns) in the output matrices. Small numbers of features are likely to cause hash collisions, but large numbers will cause larger coefficient dimensions in linear learners.
 binary : boolean, default=False.
 If True, all non zero counts are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts.
 norm : ‘l1’, ‘l2’ or None, optional
 Norm used to normalize term vectors. None for no normalization.
 alternate_sign : boolean, optional, default True
When True, an alternating sign is added to the features as to approximately conserve the inner product in the hashed space even for small n_features. This approach is similar to sparse random projection.
New in version 0.19.
 non_negative : boolean, optional, default False
When True, an absolute value is applied to the features matrix prior to returning it. When used in conjunction with alternate_sign=True, this significantly reduces the inner product preservation property.
Deprecated since version 0.19: This option will be removed in 0.21.
 dtype : type, optional
 Type of the matrix returned by fit_transform() or transform().
Examples
>>> from sklearn.feature_extraction.text import HashingVectorizer >>> corpus = [ ... 'This is the first document.', ... 'This document is the second document.', ... 'And this is the third one.', ... 'Is this the first document?', ... ] >>> vectorizer = HashingVectorizer(n_features=2**4) >>> X = vectorizer.fit_transform(corpus) >>> print(X.shape) (4, 16)
See also
CountVectorizer, TfidfVectorizer
Full API documentation: HashingVectorizerScikitsLearnNode

class
mdp.nodes.
LogisticRegressionCVScikitsLearnNode
¶ Logistic Regression CV (aka logit, MaxEnt) classifier. This node has been automatically generated by wrapping the
sklearn.linear_model.logistic.LogisticRegressionCV
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. See glossary entry for crossvalidation estimator.This class implements logistic regression using liblinear, newtoncg, sag of lbfgs optimizer. The newtoncg, sag and lbfgs solvers support only L2 regularization with primal formulation. The liblinear solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty.
For the grid of Cs values (that are set by default to be ten values in a logarithmic scale between 1e4 and 1e4), the best hyperparameter is selected by the crossvalidator StratifiedKFold, but it can be changed using the cv parameter. In the case of newtoncg and lbfgs solvers, we warm start along the path i.e guess the initial coefficients of the present fit to be the coefficients got after convergence in the previous fit, so it is supposed to be faster for highdimensional dense data.
For a multiclass problem, the hyperparameters for each class are computed using the best scores got by doing a onevsrest in parallel across all folds and classes. Hence this is not the true multinomial loss.
Read more in the User Guide.
Parameters
 Cs : list of floats  int
 Each of the values in Cs describes the inverse of regularization strength. If Cs is as an int, then a grid of Cs values are chosen in a logarithmic scale between 1e4 and 1e4. Like in support vector machines, smaller values specify stronger regularization.
 fit_intercept : bool, default: True
 Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
 cv : integer or crossvalidation generator, default: None
The default crossvalidation generator used is Stratified KFolds. If an integer is provided, then it is the number of folds used. See the module
sklearn.model_selection
module for the list of possible crossvalidation objects.Changed in version 0.20:
cv
default value if None will change from 3fold to 5fold in v0.22. dual : bool
 Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features.
 penalty : str, ‘l1’ or ‘l2’
 Used to specify the norm used in the penalization. The ‘newtoncg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties.
 scoring : string, callable, or None
 A string (see model evaluation documentation) or
a scorer callable object / function with signature
scorer(estimator, X, y)
. For a list of scoring functions that can be used, look atsklearn.metrics
. The default scoring option used is ‘accuracy’.
solver : str, {‘newtoncg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default: ‘lbfgs’.
Algorithm to use in the optimization problem.
 For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones.
 For multiclass problems, only ‘newtoncg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to oneversusrest schemes.
 ‘newtoncg’, ‘lbfgs’ and ‘sag’ only handle L2 penalty, whereas ‘liblinear’ and ‘saga’ handle L1 penalty.
 ‘liblinear’ might be slower in LogisticRegressionCV because it does not handle warmstarting.
Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
New in version 0.17: Stochastic Average Gradient descent solver.
New in version 0.19: SAGA solver.
 tol : float, optional
 Tolerance for stopping criteria.
 max_iter : int, optional
 Maximum number of iterations of the optimization algorithm.
 class_weight : dict or ‘balanced’, optional
Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one.The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
.Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
New in version 0.17: class_weight == ‘balanced’
 n_jobs : int or None, optional (default=None)
 Number of CPU cores used during the crossvalidation loop.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  verbose : int
 For the ‘liblinear’, ‘sag’ and ‘lbfgs’ solvers set verbose to any positive number for verbosity.
 refit : bool
 If set to True, the scores are averaged across all folds, and the coefs and the C that corresponds to the best score is taken, and a final refit is done using these parameters. Otherwise the coefs, intercepts and C that correspond to the best scores across folds are averaged.
 intercept_scaling : float, default 1.
Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes
intercept_scaling * synthetic_feature_weight
.Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.
 multi_class : str, {‘ovr’, ‘multinomial’, ‘auto’}, default: ‘ovr’
If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’.
New in version 0.18: Stochastic Average Gradient descent solver for ‘multinomial’ case.
Changed in version 0.20: Default will change from ‘ovr’ to ‘auto’ in 0.22.
 random_state : int, RandomState instance or None, optional, default None
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Attributes
classes_
: array, shape (n_classes, ) A list of class labels known to the classifier.
coef_
: array, shape (1, n_features) or (n_classes, n_features)Coefficient of the features in the decision function.
coef_ is of shape (1, n_features) when the given problem is binary.
intercept_
: array, shape (1,) or (n_classes,)Intercept (a.k.a. bias) added to the decision function.
If fit_intercept is set to False, the intercept is set to zero. intercept_ is of shape(1,) when the problem is binary.
Cs_
: array Array of C i.e. inverse of regularization parameter values used for crossvalidation.
coefs_paths_
: array, shape(n_folds, len(Cs_), n_features)
or(n_folds, len(Cs_), n_features + 1)
 dict with classes as the keys, and the path of coefficients obtained
during crossvalidating across each fold and then across each Cs
after doing an OvR for the corresponding class as values.
If the ‘multi_class’ option is set to ‘multinomial’, then
the coefs_paths are the coefficients corresponding to each class.
Each dict value has shape
(n_folds, len(Cs_), n_features)
or(n_folds, len(Cs_), n_features + 1)
depending on whether the intercept is fit or not. scores_
: dict dict with classes as the keys, and the values as the grid of scores obtained during crossvalidating each fold, after doing an OvR for the corresponding class. If the ‘multi_class’ option given is ‘multinomial’ then the same scores are repeated across all classes, since this is the multinomial class. Each dict value has shape (n_folds, len(Cs))
C_
: array, shape (n_classes,) or (n_classes  1,) Array of C that maps to the best scores across every class. If refit is set to False, then for each class, the best C is the average of the C’s that correspond to the best scores for each fold. C_ is of shape(n_classes,) when the problem is binary.
n_iter_
: array, shape (n_classes, n_folds, n_cs) or (1, n_folds, n_cs) Actual number of iterations for all classes, folds and Cs. In the binary or multinomial cases, the first dimension is equal to 1.
Examples
>>> from sklearn.datasets import load_iris >>> from sklearn.linear_model import LogisticRegressionCV >>> X, y = load_iris(return_X_y=True) >>> clf = LogisticRegressionCV(cv=5, random_state=0, ... multi_class='multinomial').fit(X, y) >>> clf.predict(X[:2, :]) array([0, 0]) >>> clf.predict_proba(X[:2, :]).shape (2, 3) >>> clf.score(X, y) 0.98...
See also
LogisticRegression
Full API documentation: LogisticRegressionCVScikitsLearnNode

class
mdp.nodes.
ZeroEstimatorScikitsLearnNode
¶ This node has been automatically generated by wrapping the
sklearn.ensemble.gradient_boosting.ZeroEstimator
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute.Full API documentation: ZeroEstimatorScikitsLearnNode

class
mdp.nodes.
SVCScikitsLearnNode
¶ CSupport Vector Classification. This node has been automatically generated by wrapping the
sklearn.svm.classes.SVC
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The implementation is based on libsvm. The fit time complexity is more than quadratic with the number of samples which makes it hard to scale to dataset with more than a couple of 10000 samples.The multiclass support is handled according to a onevsone scheme.
For details on the precise mathematical formulation of the provided kernel functions and how gamma, coef0 and degree affect each other, see the corresponding section in the narrative documentation:
svm_kernels.
Read more in the User Guide.
Parameters
 C : float, optional (default=1.0)
 Penalty parameter C of the error term.
 kernel : string, optional (default=’rbf’)
 Specifies the kernel type to be used in the algorithm.
It must be one of ‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’ or
a callable.
If none is given, ‘rbf’ will be used. If a callable is given it is
used to precompute the kernel matrix from data matrices; that matrix
should be an array of shape
(n_samples, n_samples)
.  degree : int, optional (default=3)
 Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.
 gamma : float, optional (default=’auto’)
Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.
Current default is ‘auto’ which uses 1 / n_features, if
gamma='scale'
is passed then it uses 1 / (n_features * X.var()) as value of gamma. The current default of gamma, ‘auto’, will change to ‘scale’ in version 0.22. ‘auto_deprecated’, a deprecated version of ‘auto’ is used as a default indicating that no explicit value of gamma was passed. coef0 : float, optional (default=0.0)
 Independent term in kernel function. It is only significant in ‘poly’ and ‘sigmoid’.
 shrinking : boolean, optional (default=True)
 Whether to use the shrinking heuristic.
 probability : boolean, optional (default=False)
 Whether to enable probability estimates. This must be enabled prior to calling fit, and will slow down that method.
 tol : float, optional (default=1e3)
 Tolerance for stopping criterion.
 cache_size : float, optional
 Specify the size of the kernel cache (in MB).
 class_weight : {dict, ‘balanced’}, optional
 Set the parameter C of class i to class_weight[i]*C for
SVC. If not given, all classes are supposed to have
weight one.
The “balanced” mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as
n_samples / (n_classes * np.bincount(y))
 verbose : bool, default: False
 Enable verbose output. Note that this setting takes advantage of a perprocess runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context.
 max_iter : int, optional (default=1)
 Hard limit on iterations within solver, or 1 for no limit.
 decision_function_shape : ‘ovo’, ‘ovr’, default=’ovr’
Whether to return a onevsrest (‘ovr’) decision function of shape (n_samples, n_classes) as all other classifiers, or the original onevsone (‘ovo’) decision function of libsvm which has shape (n_samples, n_classes * (n_classes  1) / 2). However, onevsone (‘ovo’) is always used as multiclass strategy.
Changed in version 0.19: decision_function_shape is ‘ovr’ by default.
New in version 0.17: decision_function_shape=’ovr’ is recommended.
Changed in version 0.17: Deprecated decision_function_shape=’ovo’ and None.
 random_state : int, RandomState instance or None, optional (default=None)
 The seed of the pseudo random number generator used when shuffling the data for probability estimates. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Attributes
support_
: arraylike, shape = [n_SV] Indices of support vectors.
support_vectors_
: arraylike, shape = [n_SV, n_features] Support vectors.
n_support_
: arraylike, dtype=int32, shape = [n_class] Number of support vectors for each class.
dual_coef_
: array, shape = [n_class1, n_SV] Coefficients of the support vector in the decision function. For multiclass, coefficient for all 1vs1 classifiers. The layout of the coefficients in the multiclass case is somewhat nontrivial. See the section about multiclass classification in the SVM section of the User Guide for details.
coef_
: array, shape = [n_class * (n_class1) / 2, n_features]Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.
coef_ is a readonly property derived from dual_coef_ and support_vectors_.
intercept_
: array, shape = [n_class * (n_class1) / 2] Constants in decision function.
fit_status_
: int 0 if correctly fitted, 1 otherwise (will raise warning)
probA_
: array, shape = [n_class * (n_class1) / 2]probB_
: array, shape = [n_class * (n_class1) / 2]If probability=True, the parameters learned in Platt scaling to produce probability estimates from decision values. If probability=False, an empty array. Platt scaling uses the logistic function1 / (1 + exp(decision_value * ``probA_
+ probB_))`` whereprobA_
andprobB_
are learned from the dataset [2]_. For more information on the multiclass case and training procedure see section 8 of [1]_.Examples
>>> import numpy as np >>> X = np.array([[1, 1], [2, 1], [1, 1], [2, 1]]) >>> y = np.array([1, 1, 2, 2]) >>> from sklearn.svm import SVC >>> clf = SVC(gamma='auto') >>> clf.fit(X, y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf', max_iter=1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) >>> print(clf.predict([[0.8, 1]])) [1]
See also
 SVR
 Support Vector Machine for Regression implemented using libsvm.
 LinearSVC
 Scalable Linear Support Vector Machine for classification implemented using liblinear. Check the See also section of LinearSVC for more comparison element.
References
[1] LIBSVM: A Library for Support Vector Machines [2] Platt, John (1999). “Probabilistic outputs for support vector machines and comparison to regularizedlikelihood methods.” Full API documentation: SVCScikitsLearnNode

class
mdp.nodes.
IsotonicRegressionScikitsLearnNode
¶ Isotonic regression model. This node has been automatically generated by wrapping the
sklearn.isotonic.IsotonicRegression
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The isotonic regression optimization problem is defined by:min sum w_i (y[i]  y_[i]) ** 2 subject to y_[i] <= y_[j] whenever X[i] <= X[j] and min(y_) = y_min, max(y_) = y_max
where:
y[i]
are inputs (real numbers)
y_[i]
are fitted
X
specifies the order.
 If
X
is nondecreasing theny_
is nondecreasing. w[i]
are optional strictly positive weights (default to 1.0)
Read more in the User Guide.
Parameters
 y_min : optional, default: None
 If not None, set the lowest value of the fit to y_min.
 y_max : optional, default: None
 If not None, set the highest value of the fit to y_max.
 increasing : boolean or string, optional, default: True
If boolean, whether or not to fit the isotonic regression with y increasing or decreasing.
The string value “auto” determines whether y should increase or decrease based on the Spearman correlation estimate’s sign.
 out_of_bounds : string, optional, default: “nan”
 The
out_of_bounds
parameter handles how xvalues outside of the training domain are handled. When set to “nan”, predicted yvalues will be NaN. When set to “clip”, predicted yvalues will be set to the value corresponding to the nearest train interval endpoint. When set to “raise”, allowinterp1d
to throw ValueError.
Attributes
X_min_
: float Minimum value of input array X_ for left bound.
X_max_
: float Maximum value of input array X_ for right bound.
f_
: function The stepwise interpolating function that covers the input domain
X
.
Notes
Ties are broken using the secondary method from Leeuw, 1977.
References
Isotonic Median Regression: A Linear Programming Approach Nilotpal Chakravarti Mathematics of Operations Research Vol. 14, No. 2 (May, 1989), pp. 303308
Isotone Optimization in R : PoolAdjacentViolators Algorithm (PAVA) and Active Set Methods Leeuw, Hornik, Mair Journal of Statistical Software 2009
Correctness of Kruskal’s algorithms for monotone regression with ties Leeuw, Psychometrica, 1977
Full API documentation: IsotonicRegressionScikitsLearnNode

class
mdp.nodes.
DictVectorizerScikitsLearnNode
¶ Transforms lists of featurevalue mappings to vectors. This node has been automatically generated by wrapping the
sklearn.feature_extraction.dict_vectorizer.DictVectorizer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This transformer turns lists of mappings (dictlike objects) of feature names to feature values into Numpy arrays or scipy.sparse matrices for use with scikitlearn estimators.When feature values are strings, this transformer will do a binary onehot (aka oneofK) coding: one booleanvalued feature is constructed for each of the possible string values that the feature can take on. For instance, a feature “f” that can take on the values “ham” and “spam” will become two features in the output, one signifying “f=ham”, the other “f=spam”.
However, note that this transformer will only do a binary onehot encoding when feature values are of type string. If categorical features are represented as numeric values such as int, the DictVectorizer can be followed by
sklearn.preprocessing.OneHotEncoder
to complete binary onehot encoding.Features that do not occur in a sample (mapping) will have a zero value in the resulting array/matrix.
Read more in the User Guide.
Parameters
 dtype : callable, optional
 The type of feature values. Passed to Numpy array/scipy.sparse matrix constructors as the dtype argument.
 separator : string, optional
 Separator string used when constructing new features for onehot coding.
 sparse : boolean, optional.
 Whether transform should produce scipy.sparse matrices. True by default.
 sort : boolean, optional.
 Whether
feature_names_
andvocabulary_
should be sorted when fitting. True by default.
Attributes
vocabulary_
: dict A dictionary mapping feature names to feature indices.
feature_names_
: list A list of length n_features containing the feature names (e.g., “f=ham” and “f=spam”).
Examples
>>> from sklearn.feature_extraction import DictVectorizer >>> v = DictVectorizer(sparse=False) >>> D = [{'foo': 1, 'bar': 2}, {'foo': 3, 'baz': 1}] >>> X = v.fit_transform(D) >>> X array([[2., 0., 1.], [0., 1., 3.]]) >>> v.inverse_transform(X) == [{'bar': 2.0, 'foo': 1.0}, {'baz': 1.0, 'foo': 3.0}] True >>> v.transform({'foo': 4, 'unseen_feature': 3}) array([[0., 0., 4.]])
See also
FeatureHasher : performs vectorization using only a hash function. sklearn.preprocessing.OrdinalEncoder : handles nominal/categorical
features encoded as columns of arbitrary data types.Full API documentation: DictVectorizerScikitsLearnNode

class
mdp.nodes.
LinearSVCScikitsLearnNode
¶ Linear Support Vector Classification. This node has been automatically generated by wrapping the
sklearn.svm.classes.LinearSVC
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Similar to SVC with parameter kernel=’linear’, but implemented in terms of liblinear rather than libsvm, so it has more flexibility in the choice of penalties and loss functions and should scale better to large numbers of samples.This class supports both dense and sparse input and the multiclass support is handled according to a onevstherest scheme.
Read more in the User Guide.
Parameters
 penalty : string, ‘l1’ or ‘l2’ (default=’l2’)
 Specifies the norm used in the penalization. The ‘l2’
penalty is the standard used in SVC. The ‘l1’ leads to
coef_
vectors that are sparse.  loss : string, ‘hinge’ or ‘squared_hinge’ (default=’squared_hinge’)
 Specifies the loss function. ‘hinge’ is the standard SVM loss (used e.g. by the SVC class) while ‘squared_hinge’ is the square of the hinge loss.
 dual : bool, (default=True)
 Select the algorithm to either solve the dual or primal optimization problem. Prefer dual=False when n_samples > n_features.
 tol : float, optional (default=1e4)
 Tolerance for stopping criteria.
 C : float, optional (default=1.0)
 Penalty parameter C of the error term.
 multi_class : string, ‘ovr’ or ‘crammer_singer’ (default=’ovr’)
 Determines the multiclass strategy if y contains more than
two classes.
"ovr"
trains n_classes onevsrest classifiers, while"crammer_singer"
optimizes a joint objective over all classes. While crammer_singer is interesting from a theoretical perspective as it is consistent, it is seldom used in practice as it rarely leads to better accuracy and is more expensive to compute. If"crammer_singer"
is chosen, the options loss, penalty and dual will be ignored.  fit_intercept : boolean, optional (default=True)
 Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (i.e. data is expected to be already centered).
 intercept_scaling : float, optional (default=1)
 When self.fit_intercept is True, instance vector x becomes
[x, self.intercept_scaling]
, i.e. a “synthetic” feature with constant value equals to intercept_scaling is appended to the instance vector. The intercept becomes intercept_scaling * synthetic feature weight Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased.  class_weight : {dict, ‘balanced’}, optional
 Set the parameter C of class i to
class_weight[i]*C
for SVC. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data asn_samples / (n_classes * np.bincount(y))
 verbose : int, (default=0)
 Enable verbose output. Note that this setting takes advantage of a perprocess runtime setting in liblinear that, if enabled, may not work properly in a multithreaded context.
 random_state : int, RandomState instance or None, optional (default=None)
 The seed of the pseudo random number generator to use when shuffling
the data for the dual coordinate descent (if
dual=True
). Whendual=False
the underlying implementation ofLinearSVC
is not random andrandom_state
has no effect on the results. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.  max_iter : int, (default=1000)
 The maximum number of iterations to be run.
Attributes
coef_
: array, shape = [n_features] if n_classes == 2 else [n_classes, n_features]Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel.
coef_
is a readonly property derived fromraw_coef_
that follows the internal memory layout of liblinear.intercept_
: array, shape = [1] if n_classes == 2 else [n_classes] Constants in decision function.
Examples
>>> from sklearn.svm import LinearSVC >>> from sklearn.datasets import make_classification >>> X, y = make_classification(n_features=4, random_state=0) >>> clf = LinearSVC(random_state=0, tol=1e5) >>> clf.fit(X, y) LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True, intercept_scaling=1, loss='squared_hinge', max_iter=1000, multi_class='ovr', penalty='l2', random_state=0, tol=1e05, verbose=0) >>> print(clf.coef_) [[0.085... 0.394... 0.498... 0.375...]] >>> print(clf.intercept_) [0.284...] >>> print(clf.predict([[0, 0, 0, 0]])) [1]
Notes
The underlying C implementation uses a random number generator to select features when fitting the model. It is thus not uncommon to have slightly different results for the same input data. If that happens, try with a smaller
tol
parameter.The underlying implementation, liblinear, uses a sparse internal representation for the data that will incur a memory copy.
Predict output may not match that of standalone liblinear in certain cases. See differences from liblinear in the narrative documentation.
References
LIBLINEAR: A Library for Large Linear Classification
See also
 SVC
Implementation of Support Vector Machine classifier using libsvm:
 the kernel can be nonlinear but its SMO algorithm does not
 scale to large number of samples as LinearSVC does.
Furthermore SVC multiclass mode is implemented using one vs one scheme while LinearSVC uses one vs the rest. It is possible to implement one vs the rest with SVC by using the
sklearn.multiclass.OneVsRestClassifier
wrapper.Finally SVC can fit dense data without memory copy if the input is Ccontiguous. Sparse data will still incur memory copy though.
 sklearn.linear_model.SGDClassifier
 SGDClassifier can optimize the same cost function as LinearSVC by adjusting the penalty and loss parameters. In addition it requires less memory, allows incremental (online) learning, and implements various loss functions and regularization regimes.
Full API documentation: LinearSVCScikitsLearnNode

class
mdp.nodes.
RandomizedLassoScikitsLearnNode
¶ Randomized Lasso. This node has been automatically generated by wrapping the
sklearn.linear_model.randomized_l1.RandomizedLasso
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Randomized Lasso works by subsampling the training data and computing a Lasso estimate where the penalty of a random subset of coefficients has been scaled. By performing this double randomization several times, the method assigns high scores to features that are repeatedly selected across randomizations. This is known as stability selection. In short, features selected more often are considered good features.Parameters
 alpha : float, ‘aic’, or ‘bic’, optional
 The regularization parameter alpha parameter in the Lasso. Warning: this is not the alpha parameter in the stability selection article which is scaling.
 scaling : float, optional
 The s parameter used to randomly scale the penalty of different features. Should be between 0 and 1.
 sample_fraction : float, optional
 The fraction of samples to be used in each randomized design. Should be between 0 and 1. If 1, all samples are used.
 n_resampling : int, optional
 Number of randomized models.
 selection_threshold : float, optional
 The score above which features should be selected.
 fit_intercept : boolean, optional
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 verbose : boolean or integer, optional
 Sets the verbosity amount
 normalize : boolean, optional, default True
 If True, the regressors X will be normalized before regression. This parameter is ignored when fit_intercept is set to False. When the regressors are normalized, note that this makes the hyperparameters learned more robust and almost independent of the number of samples. The same property is not valid for standardized data. However, if you wish to standardize, please use preprocessing.StandardScaler before calling fit on an estimator with normalize=False.
 precompute : True  False  ‘auto’  arraylike
 Whether to use a precomputed Gram matrix to speed up calculations. If set to ‘auto’ let us decide. The Gram matrix can also be passed as argument, but it will be used only for the selection of parameter alpha, if alpha is ‘aic’ or ‘bic’.
 max_iter : integer, optional
 Maximum number of iterations to perform in the Lars algorithm.
 eps : float, optional
 The machineprecision regularization in the computation of the Cholesky diagonal factors. Increase this for very illconditioned systems. Unlike the ‘tol’ parameter in some iterative optimizationbased algorithms, this parameter does not control the tolerance of the optimization.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 n_jobs : int or None, optional (default=None)
 Number of CPUs to use during the resampling.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  pre_dispatch : int, or string, optional
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
 None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fastrunning jobs, to avoid delays due to ondemand spawning of the jobs
 An int, giving the exact number of total jobs that are spawned
 A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
 memory : None, str or object with the joblib.Memory interface, optional (default=None)
 Used for internal caching. By default, no caching is done. If a string is given, it is the path to the caching directory.
Attributes
scores_
: array, shape = [n_features] Feature scores between 0 and 1.
all_scores_
: array, shape = [n_features, n_reg_parameter] Feature scores between 0 and 1 for all values of the regularization parameter. The reference article suggests
scores_
is the max ofall_scores_
.
Examples
>>> from sklearn.linear_model import RandomizedLasso >>> randomized_lasso = RandomizedLasso()
References
Stability selection Nicolai Meinshausen, Peter Buhlmann Journal of the Royal Statistical Society: Series B Volume 72, Issue 4, pages 417473, September 2010 DOI: 10.1111/j.14679868.2010.00740.x
See also
RandomizedLogisticRegression, Lasso, ElasticNet
Full API documentation: RandomizedLassoScikitsLearnNode

class
mdp.nodes.
MultiLabelBinarizerScikitsLearnNode
¶ Transform between iterable of iterables and a multilabel format This node has been automatically generated by wrapping the
sklearn.preprocessing.label.MultiLabelBinarizer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.Parameters
 classes : arraylike of shape [n_classes] (optional)
 Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes).
 sparse_output : boolean (default: False),
 Set to true if output binary array is desired in CSR sparse format
Attributes
classes_
: array of labels A copy of the classes parameter where provided, or otherwise, the sorted set of classes found when fitting.
Examples
>>> from sklearn.preprocessing import MultiLabelBinarizer >>> mlb = MultiLabelBinarizer() >>> mlb.fit_transform([(1, 2), (3,)]) array([[1, 1, 0], [0, 0, 1]]) >>> mlb.classes_ array([1, 2, 3])
>>> mlb.fit_transform([set(['scifi', 'thriller']), set(['comedy'])]) array([[0, 1, 1], [1, 0, 0]]) >>> list(mlb.classes_) ['comedy', 'scifi', 'thriller']
See also
 sklearn.preprocessing.OneHotEncoder : encode categorical features
 using a onehot aka oneofK scheme.
Full API documentation: MultiLabelBinarizerScikitsLearnNode

class
mdp.nodes.
FastICAScikitsLearnNode
¶ FastICA: a fast algorithm for Independent Component Analysis. This node has been automatically generated by wrapping the
sklearn.decomposition.fastica_.FastICA
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 n_components : int, optional
 Number of components to use. If none is passed, all are used.
 algorithm : {‘parallel’, ‘deflation’}
 Apply parallel or deflational algorithm for FastICA.
 whiten : boolean, optional
 If whiten is false, the data is already considered to be whitened, and no whitening is performed.
 fun : string or function, optional. Default: ‘logcosh’
The functional form of the G function used in the approximation to negentropy. Could be either ‘logcosh’, ‘exp’, or ‘cube’. You can also provide your own function. It should return a tuple containing the value of the function, and of its derivative, in the point. Example:
def my_g(x):
 return x ** 3, (3 * x ** 2).mean(axis=1)
 fun_args : dictionary, optional
 Arguments to send to the functional form. If empty and if fun=’logcosh’, fun_args will take value {‘alpha’ : 1.0}.
 max_iter : int, optional
 Maximum number of iterations during fit.
 tol : float, optional
 Tolerance on update at each iteration.
 w_init : None of an (n_components, n_components) ndarray
 The mixing matrix to be used to initialize the algorithm.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Attributes
components_
: 2D array, shape (n_components, n_features) The unmixing matrix.
mixing_
: array, shape (n_features, n_components) The mixing matrix.
n_iter_
: int If the algorithm is “deflation”, n_iter is the maximum number of iterations run across all components. Else they are just the number of iterations taken to converge.
Examples
>>> from sklearn.datasets import load_digits >>> from sklearn.decomposition import FastICA >>> X, _ = load_digits(return_X_y=True) >>> transformer = FastICA(n_components=7, ... random_state=0) >>> X_transformed = transformer.fit_transform(X) >>> X_transformed.shape (1797, 7)
Notes
Implementation based on `A. Hyvarinen and E. Oja, Independent Component Analysis:
Algorithms and Applications, Neural Networks, 13(45), 2000, pp. 411430`
Full API documentation: FastICAScikitsLearnNode

class
mdp.nodes.
RandomForestRegressorScikitsLearnNode
¶ A random forest regressor. This node has been automatically generated by wrapping the
sklearn.ensemble.forest.RandomForestRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. A random forest is a meta estimator that fits a number of classifying decision trees on various subsamples of the dataset and uses averaging to improve the predictive accuracy and control overfitting. The subsample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).Read more in the User Guide.
Parameters
 n_estimators : integer, optional (default=10)
The number of trees in the forest.
Changed in version 0.20: The default value of
n_estimators
will change from 10 in version 0.20 to 100 in version 0.22. criterion : string, optional (default=”mse”)
The function to measure the quality of a split. Supported criteria are “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion, and “mae” for the mean absolute error.
New in version 0.18: Mean Absolute Error (MAE) criterion.
 max_depth : integer or None, optional (default=None)
 The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split as the minimum number.
 If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Changed in version 0.18: Added float values for fractions.
 min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. If int, then consider min_samples_leaf as the minimum number.
 If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
Changed in version 0.18: Added float values for fractions.
 min_weight_fraction_leaf : float, optional (default=0.)
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
 max_features : int, float, string or None, optional (default=”auto”)
The number of features to consider when looking for the best split:
 If int, then consider max_features features at each split.
 If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
 If “auto”, then max_features=n_features.
 If “sqrt”, then max_features=sqrt(n_features).
 If “log2”, then max_features=log2(n_features).
 If None, then max_features=n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_features
features. max_leaf_nodes : int or None, optional (default=None)
 Grow trees with
max_leaf_nodes
in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.  min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following:
N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.New in version 0.19.
 min_impurity_split : float, (default=1e7)
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Deprecated since version 0.19:
min_impurity_split
has been deprecated in favor ofmin_impurity_decrease
in 0.19. The default value ofmin_impurity_split
will change from 1e7 to 0 in 0.23 and it will be removed in 0.25. Usemin_impurity_decrease
instead. bootstrap : boolean, optional (default=True)
 Whether bootstrap samples are used when building trees. If False, the whole datset is used to build each tree.
 oob_score : bool, optional (default=False)
 whether to use outofbag samples to estimate the R^2 on unseen data.
 n_jobs : int or None, optional (default=None)
 The number of jobs to run in parallel for both fit and predict.
None` means 1 unless in a
joblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 verbose : int, optional (default=0)
 Controls the verbosity when fitting and predicting.
 warm_start : bool, optional (default=False)
 When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.
Attributes
estimators_
: list of DecisionTreeRegressor The collection of fitted subestimators.
feature_importances_
: array of shape = [n_features] The feature importances (the higher, the more important the feature).
n_features_
: int The number of features when
fit
is performed. n_outputs_
: int The number of outputs when
fit
is performed. oob_score_
: float Score of the training dataset obtained using an outofbag estimate.
oob_prediction_
: array of shape = [n_samples] Prediction computed with outofbag estimate on the training set.
Examples
>>> from sklearn.ensemble import RandomForestRegressor >>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=4, n_informative=2, ... random_state=0, shuffle=False) >>> regr = RandomForestRegressor(max_depth=2, random_state=0, ... n_estimators=100) >>> regr.fit(X, y) RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=2, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=0, verbose=0, warm_start=False) >>> print(regr.feature_importances_) [0.18146984 0.81473937 0.00145312 0.00233767] >>> print(regr.predict([[0, 0, 0, 0]])) [8.32987858]
Notes
The default values for the parameters controlling the size of the trees (e.g.
max_depth
,min_samples_leaf
, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data,
max_features=n_features
andbootstrap=False
, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting,random_state
has to be fixed.The default value
max_features="auto"
usesn_features
rather thann_features / 3
. The latter was originally suggested in [1], whereas the former was more recently justified empirically in [2].References
[1]  Breiman, “Random Forests”, Machine Learning, 45(1), 532, 2001.
[2] P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 342, 2006. See also
DecisionTreeRegressor, ExtraTreesRegressor
Full API documentation: RandomForestRegressorScikitsLearnNode

class
mdp.nodes.
MultinomialNBScikitsLearnNode
¶ Naive Bayes classifier for multinomial models This node has been automatically generated by wrapping the
sklearn.naive_bayes.MultinomialNB
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tfidf may also work.Read more in the User Guide.
Parameters
 alpha : float, optional (default=1.0)
 Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
 fit_prior : boolean, optional (default=True)
 Whether to learn class prior probabilities or not. If false, a uniform prior will be used.
 class_prior : arraylike, size (n_classes,), optional (default=None)
 Prior probabilities of the classes. If specified the priors are not adjusted according to the data.
Attributes
class_log_prior_
: array, shape (n_classes, ) Smoothed empirical log probability for each class.
intercept_
: array, shape (n_classes, ) Mirrors
class_log_prior_
for interpreting MultinomialNB as a linear model. feature_log_prob_
: array, shape (n_classes, n_features) Empirical log probability of features
given a class,
P(x_iy)
. coef_
: array, shape (n_classes, n_features) Mirrors
feature_log_prob_
for interpreting MultinomialNB as a linear model. class_count_
: array, shape (n_classes,) Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.
feature_count_
: array, shape (n_classes, n_features) Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.
Examples
>>> import numpy as np >>> X = np.random.randint(5, size=(6, 100)) >>> y = np.array([1, 2, 3, 4, 5, 6]) >>> from sklearn.naive_bayes import MultinomialNB >>> clf = MultinomialNB() >>> clf.fit(X, y) MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True) >>> print(clf.predict(X[2:3])) [3]
Notes
For the rationale behind the names coef_ and intercept_, i.e. naive Bayes as a linear classifier, see J. Rennie et al. (2003), Tackling the poor assumptions of naive Bayes text classifiers, ICML.
References
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234265. http://nlp.stanford.edu/IRbook/html/htmledition/naivebayestextclassification1.html
Full API documentation: MultinomialNBScikitsLearnNode

class
mdp.nodes.
LabelEncoderScikitsLearnNode
¶ Encode labels with value between 0 and n_classes1. This node has been automatically generated by wrapping the
sklearn.preprocessing.label.LabelEncoder
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Attributes
classes_
: array of shape (n_class,) Holds the label for each class.
Examples
LabelEncoder can be used to normalize labels.
>>> from sklearn import preprocessing >>> le = preprocessing.LabelEncoder() >>> le.fit([1, 2, 2, 6]) LabelEncoder() >>> le.classes_ array([1, 2, 6]) >>> le.transform([1, 1, 2, 6]) array([0, 0, 1, 2]...) >>> le.inverse_transform([0, 0, 1, 2]) array([1, 1, 2, 6])
It can also be used to transform nonnumerical labels (as long as they are hashable and comparable) to numerical labels.
>>> le = preprocessing.LabelEncoder() >>> le.fit(["paris", "paris", "tokyo", "amsterdam"]) LabelEncoder() >>> list(le.classes_) ['amsterdam', 'paris', 'tokyo'] >>> le.transform(["tokyo", "tokyo", "paris"]) array([2, 2, 1]...) >>> list(le.inverse_transform([2, 2, 1])) ['tokyo', 'tokyo', 'paris']
See also
 sklearn.preprocessing.OrdinalEncoder : encode categorical features
 using a onehot or ordinal encoding scheme.
Full API documentation: LabelEncoderScikitsLearnNode

class
mdp.nodes.
LocallyLinearEmbeddingScikitsLearnNode
¶ Locally Linear Embedding This node has been automatically generated by wrapping the
sklearn.manifold.locally_linear.LocallyLinearEmbedding
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 n_neighbors : integer
 number of neighbors to consider for each point.
 n_components : integer
 number of coordinates for the manifold
 reg : float
 regularization constant, multiplies the trace of the local covariance matrix of the distances.
 eigen_solver : string, {‘auto’, ‘arpack’, ‘dense’}
auto : algorithm will attempt to choose the best method for input data
 arpack : use arnoldi iteration in shiftinvert mode.
 For this method, M may be a dense matrix, sparse matrix, or general linear operator. Warning: ARPACK can be unstable for some problems. It is best to try several random seeds in order to check results.
 dense : use standard dense matrix operations for the eigenvalue
 decomposition. For this method, M must be an array or matrix type. This method should be avoided for large problems.
 tol : float, optional
 Tolerance for ‘arpack’ method Not used if eigen_solver==’dense’.
 max_iter : integer
 maximum number of iterations for the arpack solver. Not used if eigen_solver==’dense’.
 method : string (‘standard’, ‘hessian’, ‘modified’ or ‘ltsa’)
 standard : use the standard locally linear embedding algorithm. see
 reference [1]
 hessian : use the Hessian eigenmap method. This method requires
n_neighbors > n_components * (1 + (n_components + 1) / 2
see reference [2] modified : use the modified locally linear embedding algorithm.
 see reference [3]
 ltsa : use local tangent space alignment algorithm
 see reference [4]
 hessian_tol : float, optional
 Tolerance for Hessian eigenmapping method.
Only used if
method == 'hessian'
 modified_tol : float, optional
 Tolerance for modified LLE method.
Only used if
method == 'modified'
 neighbors_algorithm : string [‘auto’’brute’’kd_tree’’ball_tree’]
 algorithm to use for nearest neighbors search, passed to neighbors.NearestNeighbors instance
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random. Used when
eigen_solver
== ‘arpack’.  n_jobs : int or None, optional (default=None)
 The number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.
Attributes
embedding_
: arraylike, shape [n_samples, n_components] Stores the embedding vectors
reconstruction_error_
: float Reconstruction error associated with embedding_
nbrs_
: NearestNeighbors object Stores nearest neighbors instance, including BallTree or KDtree if applicable.
Examples
>>> from sklearn.datasets import load_digits >>> from sklearn.manifold import LocallyLinearEmbedding >>> X, _ = load_digits(return_X_y=True) >>> X.shape (1797, 64) >>> embedding = LocallyLinearEmbedding(n_components=2) >>> X_transformed = embedding.fit_transform(X[:100]) >>> X_transformed.shape (100, 2)
References
[1] Roweis, S. & Saul, L. Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323 (2000). [2] Donoho, D. & Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for highdimensional data. Proc Natl Acad Sci U S A. 100:5591 (2003). [3] Zhang, Z. & Wang, J. MLLE: Modified Locally Linear Embedding Using Multiple Weights. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.382 [4] Zhang, Z. & Zha, H. Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. Journal of Shanghai Univ. 8:406 (2004) Full API documentation: LocallyLinearEmbeddingScikitsLearnNode

class
mdp.nodes.
AdaBoostClassifierScikitsLearnNode
¶ An AdaBoost classifier. This node has been automatically generated by wrapping the
sklearn.ensemble.weight_boosting.AdaBoostClassifier
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. An AdaBoost [1] classifier is a metaestimator that begins by fitting a classifier on the original dataset and then fits additional copies of the classifier on the same dataset but where the weights of incorrectly classified instances are adjusted such that subsequent classifiers focus more on difficult cases.This class implements the algorithm known as AdaBoostSAMME [2].
Read more in the User Guide.
Parameters
 base_estimator : object, optional (default=None)
 The base estimator from which the boosted ensemble is built.
Support for sample weighting is required, as well as proper
classes_
andn_classes_
attributes. IfNone
, then the base estimator isDecisionTreeClassifier(max_depth=1)
 n_estimators : integer, optional (default=50)
 The maximum number of estimators at which boosting is terminated. In case of perfect fit, the learning procedure is stopped early.
 learning_rate : float, optional (default=1.)
 Learning rate shrinks the contribution of each classifier by
learning_rate
. There is a tradeoff betweenlearning_rate
andn_estimators
.  algorithm : {‘SAMME’, ‘SAMME.R’}, optional (default=’SAMME.R’)
 If ‘SAMME.R’ then use the SAMME.R real boosting algorithm.
base_estimator
must support calculation of class probabilities. If ‘SAMME’ then use the SAMME discrete boosting algorithm. The SAMME.R algorithm typically converges faster than SAMME, achieving a lower test error with fewer boosting iterations.  random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Attributes
estimators_
: list of classifiers The collection of fitted subestimators.
classes_
: array of shape = [n_classes] The classes labels.
n_classes_
: int The number of classes.
estimator_weights_
: array of floats Weights for each estimator in the boosted ensemble.
estimator_errors_
: array of floats Classification error for each estimator in the boosted ensemble.
feature_importances_
: array of shape = [n_features] The feature importances if supported by the
base_estimator
.
See also
AdaBoostRegressor, GradientBoostingClassifier, sklearn.tree.DecisionTreeClassifier
References
[1] Y. Freund, R. Schapire, “A DecisionTheoretic Generalization of onLine Learning and an Application to Boosting”, 1995. [2]  Zhu, H. Zou, S. Rosset, T. Hastie, “Multiclass AdaBoost”, 2009.
Full API documentation: AdaBoostClassifierScikitsLearnNode

class
mdp.nodes.
GaussianProcessClassifierScikitsLearnNode
¶ Gaussian process classification (GPC) based on Laplace approximation. This node has been automatically generated by wrapping the
sklearn.gaussian_process.gpc.GaussianProcessClassifier
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The implementation is based on Algorithm 3.1, 3.2, and 5.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams.Internally, the Laplace approximation is used for approximating the nonGaussian posterior by a Gaussian.
Currently, the implementation is restricted to using the logistic link function. For multiclass classification, several binary oneversus rest classifiers are fitted. Note that this class thus does not implement a true multiclass Laplace approximation.
Parameters
 kernel : kernel object
 The kernel specifying the covariance function of the GP. If None is passed, the kernel “1.0 * RBF(1.0)” is used as default. Note that the kernel’s hyperparameters are optimized during fitting.
 optimizer : string or callable, optional (default: “fmin_l_bfgs_b”)
Can either be one of the internally supported optimizers for optimizing the kernel’s parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature:
def optimizer(obj_func, initial_theta, bounds):  # * 'obj_func' is the objective function to be maximized, which  # takes the hyperparameters theta as parameter and an  # optional flag eval_gradient, which determines if the  # gradient is returned additionally to the function value  # * 'initial_theta': the initial value for theta, which can be  # used by local optimizers  # * 'bounds': the bounds on the values of theta  ....  # Returned are the best found hyperparameters theta and  # the corresponding value of the target function.  return theta_opt, func_min
Per default, the ‘fmin_l_bfgs_b’ algorithm from scipy.optimize is used. If None is passed, the kernel’s parameters are kept fixed. Available internal optimizers are:
'fmin_l_bfgs_b'
 n_restarts_optimizer : int, optional (default: 0)
 The number of restarts of the optimizer for finding the kernel’s parameters which maximize the logmarginal likelihood. The first run of the optimizer is performed from the kernel’s initial parameters, the remaining ones (if any) from thetas sampled loguniform randomly from the space of allowed thetavalues. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer=0 implies that one run is performed.
 max_iter_predict : int, optional (default: 100)
 The maximum number of iterations in Newton’s method for approximating the posterior during predict. Smaller values will reduce computation time at the cost of worse results.
 warm_start : bool, optional (default: False)
 If warmstarts are enabled, the solution of the last Newton iteration on the Laplace approximation of the posterior mode is used as initialization for the next call of _posterior_mode(). This can speed up convergence when _posterior_mode is called several times on similar problems as in hyperparameter optimization. See the Glossary.
 copy_X_train : bool, optional (default: True)
 If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.
 random_state : int, RandomState instance or None, optional (default: None)
 The generator used to initialize the centers. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 multi_class : string, default : “one_vs_rest”
 Specifies how multiclass classification problems are handled. Supported are “one_vs_rest” and “one_vs_one”. In “one_vs_rest”, one binary Gaussian process classifier is fitted for each class, which is trained to separate this class from the rest. In “one_vs_one”, one binary Gaussian process classifier is fitted for each pair of classes, which is trained to separate these two classes. The predictions of these binary predictors are combined into multiclass predictions. Note that “one_vs_one” does not support predicting probability estimates.
 n_jobs : int or None, optional (default=None)
 The number of jobs to use for the computation.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.
Attributes
kernel_
: kernel object The kernel used for prediction. In case of binary classification, the structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters. In case of multiclass classification, a CompoundKernel is returned which consists of the different kernels used in the oneversusrest classifiers.
log_marginal_likelihood_value_
: float The logmarginallikelihood of
self.kernel_.theta
classes_
: arraylike, shape = (n_classes,) Unique class labels.
n_classes_
: int The number of classes in the training data
Examples
>>> from sklearn.datasets import load_iris >>> from sklearn.gaussian_process import GaussianProcessClassifier >>> from sklearn.gaussian_process.kernels import RBF >>> X, y = load_iris(return_X_y=True) >>> kernel = 1.0 * RBF(1.0) >>> gpc = GaussianProcessClassifier(kernel=kernel, ... random_state=0).fit(X, y) >>> gpc.score(X, y) 0.9866... >>> gpc.predict_proba(X[:2,:]) array([[0.83548752, 0.03228706, 0.13222543], [0.79064206, 0.06525643, 0.14410151]])
New in version 0.18.
Full API documentation: GaussianProcessClassifierScikitsLearnNode

class
mdp.nodes.
LarsCVScikitsLearnNode
¶ Crossvalidated Least Angle Regression model. This node has been automatically generated by wrapping the
sklearn.linear_model.least_angle.LarsCV
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. See glossary entry for crossvalidation estimator.Read more in the User Guide.
Parameters
 fit_intercept : boolean
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 verbose : boolean or integer, optional
 Sets the verbosity amount
 max_iter : integer, optional
 Maximum number of iterations to perform.
 normalize : boolean, optional, default True
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  precompute : True  False  ‘auto’  arraylike
 Whether to use a precomputed Gram matrix to speed up
calculations. If set to
'auto'
let us decide. The Gram matrix cannot be passed as argument since we will use only subsets of X.  cv : int, crossvalidation generator or an iterable, optional
Determines the crossvalidation splitting strategy. Possible inputs for cv are:
 None, to use the default 3fold crossvalidation,
 integer, to specify the number of folds.
 CV splitter,
 An iterable yielding (train, test) splits as arrays of indices.
For integer/None inputs,
KFold
is used.Refer User Guide for the various crossvalidation strategies that can be used here.
Changed in version 0.20:
cv
default value if None will change from 3fold to 5fold in v0.22. max_n_alphas : integer, optional
 The maximum number of points on the path used to compute the residuals in the crossvalidation
 n_jobs : int or None, optional (default=None)
 Number of CPUs to use during the cross validation.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  eps : float, optional
 The machineprecision regularization in the computation of the Cholesky diagonal factors. Increase this for very illconditioned systems.
 copy_X : boolean, optional, default True
 If
True
, X will be copied; else, it may be overwritten.  positive : boolean (default=False)
Restrict coefficients to be >= 0. Be aware that you might want to remove fit_intercept which is set True by default.
Deprecated since version 0.20: The option is broken and deprecated. It will be removed in v0.22.
Attributes
coef_
: array, shape (n_features,) parameter vector (w in the formulation formula)
intercept_
: float independent term in decision function
coef_path_
: array, shape (n_features, n_alphas) the varying values of the coefficients along the path
alpha_
: float the estimated regularization parameter alpha
alphas_
: array, shape (n_alphas,) the different values of alpha along the path
cv_alphas_
: array, shape (n_cv_alphas,) all the values of alpha along the path for the different folds
mse_path_
: array, shape (n_folds, n_cv_alphas) the mean square error on leftout for each fold along the path
(alpha values given by
cv_alphas
) n_iter_
: arraylike or int the number of iterations run by Lars with the optimal alpha.
Examples
>>> from sklearn.linear_model import LarsCV >>> from sklearn.datasets import make_regression >>> X, y = make_regression(n_samples=200, noise=4.0, random_state=0) >>> reg = LarsCV(cv=5).fit(X, y) >>> reg.score(X, y) 0.9996... >>> reg.alpha_ 0.0254... >>> reg.predict(X[:1,]) array([154.0842...])
See also
lars_path, LassoLars, LassoLarsCV
Full API documentation: LarsCVScikitsLearnNode

class
mdp.nodes.
AdditiveChi2SamplerScikitsLearnNode
¶ Approximate feature map for additive chi2 kernel. This node has been automatically generated by wrapping the
sklearn.kernel_approximation.AdditiveChi2Sampler
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Uses sampling the fourier transform of the kernel characteristic at regular intervals.Since the kernel that is to be approximated is additive, the components of the input vectors can be treated separately. Each entry in the original space is transformed into 2*sample_steps+1 features, where sample_steps is a parameter of the method. Typical values of sample_steps include 1, 2 and 3.
Optimal choices for the sampling interval for certain data ranges can be computed (see the reference). The default values should be reasonable.
Read more in the User Guide.
Parameters
 sample_steps : int, optional
 Gives the number of (complex) sampling points.
 sample_interval : float, optional
 Sampling interval. Must be specified when sample_steps not in {1,2,3}.
Examples
>>> from sklearn.datasets import load_digits >>> from sklearn.linear_model import SGDClassifier >>> from sklearn.kernel_approximation import AdditiveChi2Sampler >>> X, y = load_digits(return_X_y=True) >>> chi2sampler = AdditiveChi2Sampler(sample_steps=2) >>> X_transformed = chi2sampler.fit_transform(X, y) >>> clf = SGDClassifier(max_iter=5, random_state=0, tol=1e3) >>> clf.fit(X_transformed, y) SGDClassifier(alpha=0.0001, average=False, class_weight=None, early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=5, n_iter=None, n_iter_no_change=5, n_jobs=None, penalty='l2', power_t=0.5, random_state=0, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False) >>> clf.score(X_transformed, y) 0.9543...
Notes
This estimator approximates a slightly different version of the additive chi squared kernel then
metric.additive_chi2
computes.See also
 SkewedChi2Sampler : A Fourierapproximation to a nonadditive variant of
 the chi squared kernel.
sklearn.metrics.pairwise.chi2_kernel : The exact chi squared kernel.
 sklearn.metrics.pairwise.additive_chi2_kernel : The exact additive chi
 squared kernel.
References
See “Efficient additive kernels via explicit feature maps” A. Vedaldi and A. Zisserman, Pattern Analysis and Machine Intelligence, 2011
Full API documentation: AdditiveChi2SamplerScikitsLearnNode

class
mdp.nodes.
QuantileEstimatorScikitsLearnNode
¶ An estimator predicting the alphaquantile of the training targets. This node has been automatically generated by wrapping the
sklearn.ensemble.gradient_boosting.QuantileEstimator
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Parameters alpha : float
 The quantile
Full API documentation: QuantileEstimatorScikitsLearnNode

class
mdp.nodes.
BirchScikitsLearnNode
¶ Implements the Birch clustering algorithm. This node has been automatically generated by wrapping the
sklearn.cluster.birch.Birch
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. It is a memoryefficient, onlinelearning algorithm provided as an alternative toMiniBatchKMeans
. It constructs a tree data structure with the cluster centroids being read off the leaf. These can be either the final cluster centroids or can be provided as input to another clustering algorithm such asAgglomerativeClustering
.Read more in the User Guide.
Parameters
 threshold : float, default 0.5
 The radius of the subcluster obtained by merging a new sample and the closest subcluster should be lesser than the threshold. Otherwise a new subcluster is started. Setting this value to be very low promotes splitting and viceversa.
 branching_factor : int, default 50
 Maximum number of CF subclusters in each node. If a new samples enters such that the number of subclusters exceed the branching_factor then that node is split into two nodes with the subclusters redistributed in each. The parent subcluster of that node is removed and two new subclusters are added as parents of the 2 split nodes.
 n_clusters : int, instance of sklearn.cluster model, default 3
Number of clusters after the final clustering step, which treats the subclusters from the leaves as new samples.
 None : the final clustering step is not performed and the subclusters are returned as they are.
 sklearn.cluster Estimator : If a model is provided, the model is fit treating the subclusters as new samples and the initial data is mapped to the label of the closest subcluster.
 int : the model fit is
AgglomerativeClustering
with n_clusters set to be equal to the int.
 compute_labels : bool, default True
 Whether or not to compute labels for each fit.
 copy : bool, default True
 Whether or not to make a copy of the given data. If set to False, the initial data will be overwritten.
Attributes
root_
: _CFNode Root of the CFTree.
dummy_leaf_
: _CFNode Start pointer to all the leaves.
subcluster_centers_
: ndarray, Centroids of all subclusters read directly from the leaves.
subcluster_labels_
: ndarray, Labels assigned to the centroids of the subclusters after they are clustered globally.
labels_
: ndarray, shape (n_samples,) Array of labels assigned to the input data. if partial_fit is used instead of fit, they are assigned to the last batch of data.
Examples
>>> from sklearn.cluster import Birch >>> X = [[0, 1], [0.3, 1], [0.3, 1], [0, 1], [0.3, 1], [0.3, 1]] >>> brc = Birch(branching_factor=50, n_clusters=None, threshold=0.5, ... compute_labels=True) >>> brc.fit(X) Birch(branching_factor=50, compute_labels=True, copy=True, n_clusters=None, threshold=0.5) >>> brc.predict(X) array([0, 0, 0, 1, 1, 1])
References
 Tian Zhang, Raghu Ramakrishnan, Maron Livny BIRCH: An efficient data clustering method for large databases. http://www.cs.sfu.ca/CourseCentral/459/han/papers/zhang96.pdf
 Roberto Perdisci JBirch  Java implementation of BIRCH clustering algorithm https://code.google.com/archive/p/jbirch
Notes
The tree data structure consists of nodes with each node consisting of a number of subclusters. The maximum number of subclusters in a node is determined by the branching factor. Each subcluster maintains a linear sum, squared sum and the number of samples in that subcluster. In addition, each subcluster can also have a node as its child, if the subcluster is not a member of a leaf node.
For a new point entering the root, it is merged with the subcluster closest to it and the linear sum, squared sum and the number of samples of that subcluster are updated. This is done recursively till the properties of the leaf node are updated.
Full API documentation: BirchScikitsLearnNode

class
mdp.nodes.
QuantileTransformerScikitsLearnNode
¶ Transform features using quantiles information. This node has been automatically generated by wrapping the
sklearn.preprocessing.data.QuantileTransformer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This method transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme.The transformation is applied on each feature independently. The cumulative distribution function of a feature is used to project the original values. Features values of new/unseen data that fall below or above the fitted range will be mapped to the bounds of the output distribution. Note that this transform is nonlinear. It may distort linear correlations between variables measured at the same scale but renders variables measured at different scales more directly comparable.
Read more in the User Guide.
Parameters
 n_quantiles : int, optional (default=1000)
 Number of quantiles to be computed. It corresponds to the number of landmarks used to discretize the cumulative distribution function.
 output_distribution : str, optional (default=’uniform’)
 Marginal distribution for the transformed data. The choices are ‘uniform’ (default) or ‘normal’.
 ignore_implicit_zeros : bool, optional (default=False)
 Only applies to sparse matrices. If True, the sparse entries of the matrix are discarded to compute the quantile statistics. If False, these entries are treated as zeros.
 subsample : int, optional (default=1e5)
 Maximum number of samples used to estimate the quantiles for computational efficiency. Note that the subsampling procedure may differ for valueidentical sparse and dense matrices.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Note that this is used by subsampling and smoothing noise.
 copy : boolean, optional, (default=True)
 Set to False to perform inplace transformation and avoid a copy (if the input is already a numpy array).
Attributes
quantiles_
: ndarray, shape (n_quantiles, n_features) The values corresponding the quantiles of reference.
references_
: ndarray, shape(n_quantiles, ) Quantiles of references.
Examples
>>> import numpy as np >>> from sklearn.preprocessing import QuantileTransformer >>> rng = np.random.RandomState(0) >>> X = np.sort(rng.normal(loc=0.5, scale=0.25, size=(25, 1)), axis=0) >>> qt = QuantileTransformer(n_quantiles=10, random_state=0) >>> qt.fit_transform(X) array([...])
See also
quantile_transform : Equivalent function without the estimator API. PowerTransformer : Perform mapping to a normal distribution using a power
transform. StandardScaler : Perform standardization that is faster, but less robust
 to outliers.
 RobustScaler : Perform robust standardization that removes the influence
 of outliers but does not put outliers and inliers on the same scale.
Notes
NaNs are treated as missing values: disregarded in fit, and maintained in transform.
For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
Full API documentation: QuantileTransformerScikitsLearnNode

class
mdp.nodes.
CountVectorizerScikitsLearnNode
¶ Convert a collection of text documents to a matrix of token counts This node has been automatically generated by wrapping the
sklearn.feature_extraction.text.CountVectorizer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This implementation produces a sparse representation of the counts using scipy.sparse.csr_matrix.If you do not provide an apriori dictionary and you do not use an analyzer that does some kind of feature selection then the number of features will be equal to the vocabulary size found by analyzing the data.
Read more in the User Guide.
Parameters
 input : string {‘filename’, ‘file’, ‘content’}
If ‘filename’, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.
If ‘file’, the sequence items must have a ‘read’ method (filelike object) that is called to fetch the bytes in memory.
Otherwise the input is expected to be the sequence strings or bytes items are expected to be analyzed directly.
 encoding : string, ‘utf8’ by default.
 If bytes or files are given to analyze, this encoding is used to decode.
 decode_error : {‘strict’, ‘ignore’, ‘replace’}
 Instruction on what to do if a byte sequence is given to analyze that contains characters not of the given encoding. By default, it is ‘strict’, meaning that a UnicodeDecodeError will be raised. Other values are ‘ignore’ and ‘replace’.
 strip_accents : {‘ascii’, ‘unicode’, None}
Remove accents and perform other character normalization during the preprocessing step. ‘ascii’ is a fast method that only works on characters that have an direct ASCII mapping. ‘unicode’ is a slightly slower method that works on any characters. None (default) does nothing.
Both ‘ascii’ and ‘unicode’ use NFKD normalization from
unicodedata.normalize()
. lowercase : boolean, True by default
 Convert all characters to lowercase before tokenizing.
 preprocessor : callable or None (default)
 Override the preprocessing (string transformation) stage while preserving the tokenizing and ngrams generation steps.
 tokenizer : callable or None (default)
 Override the string tokenization step while preserving the
preprocessing and ngrams generation steps.
Only applies if
analyzer == 'word'
.  stop_words : string {‘english’}, list, or None (default)
If ‘english’, a builtin stop word list for English is used. There are several known issues with ‘english’ and you should consider an alternative (see stop_words).
If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens. Only applies if
analyzer == 'word'
.If None, no stop words will be used. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.
 token_pattern : string
 Regular expression denoting what constitutes a “token”, only used
if
analyzer == 'word'
. The default regexp select tokens of 2 or more alphanumeric characters (punctuation is completely ignored and always treated as a token separator).  ngram_range : tuple (min_n, max_n)
 The lower and upper boundary of the range of nvalues for different ngrams to be extracted. All values of n such that min_n <= n <= max_n will be used.
 analyzer : string, {‘word’, ‘char’, ‘char_wb’} or callable
Whether the feature should be made of word or character ngrams. Option ‘char_wb’ creates character ngrams only from text inside word boundaries; ngrams at the edges of words are padded with space.
If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.
 max_df : float in range [0.0, 1.0] or int, default=1.0
 When building the vocabulary ignore terms that have a document frequency strictly higher than the given threshold (corpusspecific stop words). If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
 min_df : float in range [0.0, 1.0] or int, default=1
 When building the vocabulary ignore terms that have a document frequency strictly lower than the given threshold. This value is also called cutoff in the literature. If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
 max_features : int or None, default=None
If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.
This parameter is ignored if vocabulary is not None.
 vocabulary : Mapping or iterable, optional
 Either a Mapping (e.g., a dict) where keys are terms and values are indices in the feature matrix, or an iterable over terms. If not given, a vocabulary is determined from the input documents. Indices in the mapping should not be repeated and should not have any gap between 0 and the largest index.
 binary : boolean, default=False
 If True, all non zero counts are set to 1. This is useful for discrete probabilistic models that model binary events rather than integer counts.
 dtype : type, optional
 Type of the matrix returned by fit_transform() or transform().
Attributes
vocabulary_
: dict A mapping of terms to feature indices.
stop_words_
: setTerms that were ignored because they either:
 occurred in too many documents (max_df)
 occurred in too few documents (min_df)
 were cut off by feature selection (max_features).
This is only available if no vocabulary was given.
Examples
>>> from sklearn.feature_extraction.text import CountVectorizer >>> corpus = [ ... 'This is the first document.', ... 'This document is the second document.', ... 'And this is the third one.', ... 'Is this the first document?', ... ] >>> vectorizer = CountVectorizer() >>> X = vectorizer.fit_transform(corpus) >>> print(vectorizer.get_feature_names()) ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this'] >>> print(X.toarray()) [[0 1 1 1 0 0 1 0 1] [0 2 0 1 0 1 1 0 1] [1 0 0 1 1 0 1 1 1] [0 1 1 1 0 0 1 0 1]]
See also
HashingVectorizer, TfidfVectorizer
Notes
The
stop_words_
attribute can get large and increase the model size when pickling. This attribute is provided only for introspection and can be safely removed using delattr or set to None before pickling.Full API documentation: CountVectorizerScikitsLearnNode

class
mdp.nodes.
ExtraTreesRegressorScikitsLearnNode
¶ An extratrees regressor. This node has been automatically generated by wrapping the
sklearn.ensemble.forest.ExtraTreesRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extratrees) on various subsamples of the dataset and uses averaging to improve the predictive accuracy and control overfitting.Read more in the User Guide.
Parameters
 n_estimators : integer, optional (default=10)
The number of trees in the forest.
Changed in version 0.20: The default value of
n_estimators
will change from 10 in version 0.20 to 100 in version 0.22. criterion : string, optional (default=”mse”)
The function to measure the quality of a split. Supported criteria are “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion, and “mae” for the mean absolute error.
New in version 0.18: Mean Absolute Error (MAE) criterion.
 max_depth : integer or None, optional (default=None)
 The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split as the minimum number.
 If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Changed in version 0.18: Added float values for fractions.
 min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. If int, then consider min_samples_leaf as the minimum number.
 If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
Changed in version 0.18: Added float values for fractions.
 min_weight_fraction_leaf : float, optional (default=0.)
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
 max_features : int, float, string or None, optional (default=”auto”)
The number of features to consider when looking for the best split:
 If int, then consider max_features features at each split.
 If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
 If “auto”, then max_features=n_features.
 If “sqrt”, then max_features=sqrt(n_features).
 If “log2”, then max_features=log2(n_features).
 If None, then max_features=n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_features
features. max_leaf_nodes : int or None, optional (default=None)
 Grow trees with
max_leaf_nodes
in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.  min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following:
N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.New in version 0.19.
 min_impurity_split : float, (default=1e7)
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Deprecated since version 0.19:
min_impurity_split
has been deprecated in favor ofmin_impurity_decrease
in 0.19. The default value ofmin_impurity_split
will change from 1e7 to 0 in 0.23 and it will be removed in 0.25. Usemin_impurity_decrease
instead. bootstrap : boolean, optional (default=False)
 Whether bootstrap samples are used when building trees. If False, the whole datset is used to build each tree.
 oob_score : bool, optional (default=False)
 Whether to use outofbag samples to estimate the R^2 on unseen data.
 n_jobs : int or None, optional (default=None)
 The number of jobs to run in parallel for both fit and predict.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 verbose : int, optional (default=0)
 Controls the verbosity when fitting and predicting.
 warm_start : bool, optional (default=False)
 When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.
Attributes
estimators_
: list of DecisionTreeRegressor The collection of fitted subestimators.
feature_importances_
: array of shape = [n_features] The feature importances (the higher, the more important the feature).
n_features_
: int The number of features.
n_outputs_
: int The number of outputs.
oob_score_
: float Score of the training dataset obtained using an outofbag estimate.
oob_prediction_
: array of shape = [n_samples] Prediction computed with outofbag estimate on the training set.
Notes
The default values for the parameters controlling the size of the trees (e.g.
max_depth
,min_samples_leaf
, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.References
[1] P. Geurts, D. Ernst., and L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63(1), 342, 2006. See also
sklearn.tree.ExtraTreeRegressor: Base estimator for this ensemble. RandomForestRegressor: Ensemble regressor using trees with optimal splits.
Full API documentation: ExtraTreesRegressorScikitsLearnNode

class
mdp.nodes.
LabelPropagationScikitsLearnNode
¶ Label Propagation classifier This node has been automatically generated by wrapping the
sklearn.semi_supervised.label_propagation.LabelPropagation
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 kernel : {‘knn’, ‘rbf’, callable}
 String identifier for kernel function to use or the kernel function itself. Only ‘rbf’ and ‘knn’ strings are valid inputs. The function passed should take two inputs, each of shape [n_samples, n_features], and return a [n_samples, n_samples] shaped weight matrix.
 gamma : float
 Parameter for rbf kernel
 n_neighbors : integer > 0
 Parameter for knn kernel
 alpha : float
Clamping factor.
Deprecated since version 0.19: This parameter will be removed in 0.21. ‘alpha’ is fixed to zero in ‘LabelPropagation’.
 max_iter : integer
 Change maximum number of iterations allowed
 tol : float
 Convergence tolerance: threshold to consider the system at steady state
 n_jobs : int or None, optional (default=None)
 The number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.
Attributes
X_
: array, shape = [n_samples, n_features] Input array.
classes_
: array, shape = [n_classes] The distinct labels used in classifying instances.
label_distributions_
: array, shape = [n_samples, n_classes] Categorical distribution for each item.
transduction_
: array, shape = [n_samples] Label assigned to each item via the transduction.
n_iter_
: int Number of iterations run.
Examples
>>> import numpy as np >>> from sklearn import datasets >>> from sklearn.semi_supervised import LabelPropagation >>> label_prop_model = LabelPropagation() >>> iris = datasets.load_iris() >>> rng = np.random.RandomState(42) >>> random_unlabeled_points = rng.rand(len(iris.target)) < 0.3 >>> labels = np.copy(iris.target) >>> labels[random_unlabeled_points] = 1 >>> label_prop_model.fit(iris.data, labels) ... LabelPropagation(...)
References
Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMUCALD02107, Carnegie Mellon University, 2002 http://pages.cs.wisc.edu/~jerryzhu/pub/CMUCALD02107.pdf
See Also
LabelSpreading : Alternate label propagation strategy more robust to noise
Full API documentation: LabelPropagationScikitsLearnNode

class
mdp.nodes.
GaussianMixtureScikitsLearnNode
¶ Gaussian Mixture. This node has been automatically generated by wrapping the
sklearn.mixture.gaussian_mixture.GaussianMixture
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Representation of a Gaussian mixture model probability distribution. This class allows to estimate the parameters of a Gaussian mixture distribution.Read more in the User Guide.
New in version 0.18.
Parameters
 n_components : int, defaults to 1.
 The number of mixture components.
 covariance_type : {‘full’ (default), ‘tied’, ‘diag’, ‘spherical’}
String describing the type of covariance parameters to use. Must be one of:
 ‘full’
 each component has its own general covariance matrix
 ‘tied’
 all components share the same general covariance matrix
 ‘diag’
 each component has its own diagonal covariance matrix
 ‘spherical’
 each component has its own single variance
 tol : float, defaults to 1e3.
 The convergence threshold. EM iterations will stop when the lower bound average gain is below this threshold.
 reg_covar : float, defaults to 1e6.
 Nonnegative regularization added to the diagonal of covariance. Allows to assure that the covariance matrices are all positive.
 max_iter : int, defaults to 100.
 The number of EM iterations to perform.
 n_init : int, defaults to 1.
 The number of initializations to perform. The best results are kept.
 init_params : {‘kmeans’, ‘random’}, defaults to ‘kmeans’.
The method used to initialize the weights, the means and the precisions. Must be one of:
'kmeans' : responsibilities are initialized using kmeans. 'random' : responsibilities are initialized randomly.
 weights_init : arraylike, shape (n_components, ), optional
 The userprovided initial weights, defaults to None. If it None, weights are initialized using the init_params method.
 means_init : arraylike, shape (n_components, n_features), optional
 The userprovided initial means, defaults to None, If it None, means are initialized using the init_params method.
 precisions_init : arraylike, optional.
The userprovided initial precisions (inverse of the covariance matrices), defaults to None. If it None, precisions are initialized using the ‘init_params’ method. The shape depends on ‘covariance_type’:
(n_components,) if 'spherical', (n_features, n_features) if 'tied', (n_components, n_features) if 'diag', (n_components, n_features, n_features) if 'full'
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 warm_start : bool, default to False.
 If ‘warm_start’ is True, the solution of the last fitting is used as initialization for the next call of fit(). This can speed up convergence when fit is called several times on similar problems. In that case, ‘n_init’ is ignored and only a single initialization occurs upon the first call. See the Glossary.
 verbose : int, default to 0.
 Enable verbose output. If 1 then it prints the current initialization and each iteration step. If greater than 1 then it prints also the log probability and the time needed for each step.
 verbose_interval : int, default to 10.
 Number of iteration done before the next print.
Attributes
weights_
: arraylike, shape (n_components,) The weights of each mixture components.
means_
: arraylike, shape (n_components, n_features) The mean of each mixture component.
covariances_
: arraylikeThe covariance of each mixture component. The shape depends on covariance_type:
(n_components,) if 'spherical', (n_features, n_features) if 'tied', (n_components, n_features) if 'diag', (n_components, n_features, n_features) if 'full'
precisions_
: arraylikeThe precision matrices for each component in the mixture. A precision matrix is the inverse of a covariance matrix. A covariance matrix is symmetric positive definite so the mixture of Gaussian can be equivalently parameterized by the precision matrices. Storing the precision matrices instead of the covariance matrices makes it more efficient to compute the loglikelihood of new samples at test time. The shape depends on covariance_type:
(n_components,) if 'spherical', (n_features, n_features) if 'tied', (n_components, n_features) if 'diag', (n_components, n_features, n_features) if 'full'
precisions_cholesky_
: arraylikeThe cholesky decomposition of the precision matrices of each mixture component. A precision matrix is the inverse of a covariance matrix. A covariance matrix is symmetric positive definite so the mixture of Gaussian can be equivalently parameterized by the precision matrices. Storing the precision matrices instead of the covariance matrices makes it more efficient to compute the loglikelihood of new samples at test time. The shape depends on covariance_type:
(n_components,) if 'spherical', (n_features, n_features) if 'tied', (n_components, n_features) if 'diag', (n_components, n_features, n_features) if 'full'
converged_
: bool True when convergence was reached in fit(), False otherwise.
n_iter_
: int Number of step used by the best fit of EM to reach the convergence.
lower_bound_
: float Lower bound value on the loglikelihood (of the training data with respect to the model) of the best fit of EM.
See Also
 BayesianGaussianMixture : Gaussian mixture model fit with a variational
 inference.
Full API documentation: GaussianMixtureScikitsLearnNode

class
mdp.nodes.
MeanEstimatorScikitsLearnNode
¶ This node has been automatically generated by wrapping the
sklearn.ensemble.gradient_boosting.MeanEstimator
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute.Full API documentation: MeanEstimatorScikitsLearnNode

class
mdp.nodes.
SelectFromModelScikitsLearnNode
¶ Metatransformer for selecting features based on importance weights. This node has been automatically generated by wrapping the
sklearn.feature_selection.from_model.SelectFromModel
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. .. versionadded:: 0.17Parameters
 estimator : object
 The base estimator from which the transformer is built.
This can be both a fitted (if
prefit
is set to True) or a nonfitted estimator. The estimator must have either afeature_importances_
orcoef_
attribute after fitting.  threshold : string, float, optional default None
 The threshold value to use for feature selection. Features whose
importance is greater or equal are kept while the others are
discarded. If “median” (resp. “mean”), then the
threshold
value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e5. Otherwise, “mean” is used by default.  prefit : bool, default False
 Whether a prefit model is expected to be passed into the constructor
directly or not. If True,
transform
must be called directly and SelectFromModel cannot be used withcross_val_score
,GridSearchCV
and similar utilities that clone the estimator. Otherwise train the model usingfit
and thentransform
to do feature selection.  norm_order : nonzero int, inf, inf, default 1
 Order of the norm used to filter the vectors of coefficients below
threshold
in the case where thecoef_
attribute of the estimator is of dimension 2.  max_features : int or None, optional
The maximum number of features selected scoring above
threshold
. To disablethreshold
and only select based onmax_features
, setthreshold=np.inf
.New in version 0.20.
Attributes
estimator_
: an estimator The base estimator from which the transformer is built.
This is stored only when a nonfitted estimator is passed to the
SelectFromModel
, i.e when prefit is False. threshold_
: float The threshold value used for feature selection.
Full API documentation: SelectFromModelScikitsLearnNode

class
mdp.nodes.
RadiusNeighborsRegressorScikitsLearnNode
¶ Regression based on neighbors within a fixed radius. This node has been automatically generated by wrapping the
sklearn.neighbors.regression.RadiusNeighborsRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.Read more in the User Guide.
Parameters
 radius : float, optional (default = 1.0)
 Range of parameter space to use by default for
radius_neighbors()
queries.  weights : str or callable
weight function used in prediction. Possible values:
 ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
 ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
 [callable] : a userdefined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Uniform weights are used by default.
 algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
Algorithm used to compute the nearest neighbors:
 ‘ball_tree’ will use
BallTree
 ‘kd_tree’ will use
KDTree
 ‘brute’ will use a bruteforce search.
 ‘auto’ will attempt to decide the most appropriate algorithm
based on the values passed to
fit()
method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
 ‘ball_tree’ will use
 leaf_size : int, optional (default = 30)
 Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
 p : integer, optional (default = 2)
 Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
 metric : string or callable, default ‘minkowski’
 the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics.
 metric_params : dict, optional (default = None)
 Additional keyword arguments for the metric function.
 n_jobs : int or None, optional (default=None)
 The number of parallel jobs to run for neighbors search.
None
means 1 unless in ajoblib.parallel_backend
context.
1
means using all processors. See Glossary for more details.
Examples
>>> X = [[0], [1], [2], [3]] >>> y = [0, 0, 1, 1] >>> from sklearn.neighbors import RadiusNeighborsRegressor >>> neigh = RadiusNeighborsRegressor(radius=1.0) >>> neigh.fit(X, y) RadiusNeighborsRegressor(...) >>> print(neigh.predict([[1.5]])) [0.5]
See also
NearestNeighbors KNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of
algorithm
andleaf_size
.https://en.wikipedia.org/wiki/Knearest_neighbor_algorithm
Full API documentation: RadiusNeighborsRegressorScikitsLearnNode

class
mdp.nodes.
PLSSVDScikitsLearnNode
¶ Partial Least Square SVD This node has been automatically generated by wrapping the
sklearn.cross_decomposition.pls_.PLSSVD
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Simply perform a svd on the crosscovariance matrix: X’Y There are no iterative deflation here.Read more in the User Guide.
Parameters
 n_components : int, default 2
 Number of components to keep.
 scale : boolean, default True
 Whether to scale X and Y.
 copy : boolean, default True
 Whether to copy X and Y, or perform inplace computations.
Attributes
x_weights_
: array, [p, n_components] X block weights vectors.
y_weights_
: array, [q, n_components] Y block weights vectors.
x_scores_
: array, [n_samples, n_components] X scores.
y_scores_
: array, [n_samples, n_components] Y scores.
Examples
>>> import numpy as np >>> from sklearn.cross_decomposition import PLSSVD >>> X = np.array([[0., 0., 1.], ... [1.,0.,0.], ... [2.,2.,2.], ... [2.,5.,4.]]) >>> Y = np.array([[0.1, 0.2], ... [0.9, 1.1], ... [6.2, 5.9], ... [11.9, 12.3]]) >>> plsca = PLSSVD(n_components=2) >>> plsca.fit(X, Y) PLSSVD(copy=True, n_components=2, scale=True) >>> X_c, Y_c = plsca.transform(X, Y) >>> X_c.shape, Y_c.shape ((4, 2), (4, 2))
See also
PLSCanonical CCA
Full API documentation: PLSSVDScikitsLearnNode

class
mdp.nodes.
GaussianRandomProjectionScikitsLearnNode
¶ Reduce dimensionality through Gaussian random projection This node has been automatically generated by wrapping the
sklearn.random_projection.GaussianRandomProjection
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The components of the random matrix are drawn from N(0, 1 / n_components).Read more in the User Guide.
Parameters
 n_components : int or ‘auto’, optional (default = ‘auto’)
Dimensionality of the target projection space.
n_components can be automatically adjusted according to the number of samples in the dataset and the bound given by the JohnsonLindenstrauss lemma. In that case the quality of the embedding is controlled by the
eps
parameter.It should be noted that JohnsonLindenstrauss lemma can yield very conservative estimated of the required number of components as it makes no assumption on the structure of the dataset.
 eps : strictly positive float, optional (default=0.1)
Parameter to control the quality of the embedding according to the JohnsonLindenstrauss lemma when n_components is set to ‘auto’.
Smaller values lead to better embedding and higher number of dimensions (n_components) in the target projection space.
 random_state : int, RandomState instance or None, optional (default=None)
 Control the pseudo random number generator used to generate the matrix at fit time. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Attributes
n_component_
: int Concrete number of components computed when n_components=”auto”.
components_
: numpy array of shape [n_components, n_features] Random matrix used for the projection.
Examples
>>> import numpy as np >>> from sklearn.random_projection import GaussianRandomProjection >>> X = np.random.rand(100, 10000) >>> transformer = GaussianRandomProjection() >>> X_new = transformer.fit_transform(X) >>> X_new.shape (100, 3947)
See Also
SparseRandomProjection
Full API documentation: GaussianRandomProjectionScikitsLearnNode

class
mdp.nodes.
OneHotEncoderScikitsLearnNode
¶ Encode categorical integer features as a onehot numeric array. This node has been automatically generated by wrapping the
sklearn.preprocessing._encoders.OneHotEncoder
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The input to this transformer should be an arraylike of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a onehot (aka ‘oneofK’ or ‘dummy’) encoding scheme. This creates a binary column for each category and returns a sparse matrix or dense array.By default, the encoder derives the categories based on the unique values in each feature. Alternatively, you can also specify the categories manually. The OneHotEncoder previously assumed that the input features take on values in the range [0, max(values)). This behaviour is deprecated.
This encoding is needed for feeding categorical data to many scikitlearn estimators, notably linear models and SVMs with the standard kernels.
Note: a onehot encoding of y labels should use a LabelBinarizer instead.
Read more in the User Guide.
Parameters
 categories : ‘auto’ or a list of lists/arrays of values, default=’auto’.
Categories (unique values) per feature:
 ‘auto’ : Determine categories automatically from the training data.
 list :
categories[i]
holds the categories expected in the ith column. The passed categories should not mix strings and numeric values within a single feature, and should be sorted in case of numeric values.
The used categories can be found in the
categories_
attribute. sparse : boolean, default=True
 Will return sparse matrix if set True else will return an array.
 dtype : number type, default=np.float
 Desired dtype of output.
 handle_unknown : ‘error’ or ‘ignore’, default=’error’.
 Whether to raise an error or ignore if an unknown categorical feature is present during transform (default is to raise). When this parameter is set to ‘ignore’ and an unknown category is encountered during transform, the resulting onehot encoded columns for this feature will be all zeros. In the inverse transform, an unknown category will be denoted as None.
 n_values : ‘auto’, int or array of ints, default=’auto’
Number of values per feature.
‘auto’ : determine value range from training data.
 int : number of categorical values per feature.
Each feature value should be in
range(n_values)
 array :
n_values[i]
is the number of categorical values in X[:, i]
. Each feature value should be inrange(n_values[i])
 array :
Deprecated since version 0.20: The n_values keyword was deprecated in version 0.20 and will be removed in 0.22. Use categories instead.
 categorical_features : ‘all’ or array of indices or mask, default=’all’
Specify what features are treated as categorical.
 ‘all’: All features are treated as categorical.
 array of indices: Array of categorical feature indices.
 mask: Array of length n_features and with dtype=bool.
Noncategorical features are always stacked to the right of the matrix.
Deprecated since version 0.20: The categorical_features keyword was deprecated in version 0.20 and will be removed in 0.22. You can use the
ColumnTransformer
instead.
Attributes
categories_
: list of arrays The categories of each feature determined during fitting
(in order of the features in X and corresponding with the output
of
transform
). active_features_
: arrayIndices for active features, meaning values that actually occur in the training set. Only available when n_values is
'auto'
.Deprecated since version 0.20: The
active_features_
attribute was deprecated in version 0.20 and will be removed in 0.22.feature_indices_
: array of shape (n_features,)Indices to feature ranges. Feature
i
in the original data is mapped to features fromfeature_indices_[i]
tofeature_indices_[i+1]
(and then potentially masked byactive_features_
afterwards)Deprecated since version 0.20: The
feature_indices_
attribute was deprecated in version 0.20 and will be removed in 0.22.n_values_
: array of shape (n_features,)Maximum number of values per feature.
Deprecated since version 0.20: The
n_values_
attribute was deprecated in version 0.20 and will be removed in 0.22.
Examples
Given a dataset with two features, we let the encoder find the unique values per feature and transform the data to a binary onehot encoding.
>>> from sklearn.preprocessing import OneHotEncoder >>> enc = OneHotEncoder(handle_unknown='ignore') >>> X = [['Male', 1], ['Female', 3], ['Female', 2]] >>> enc.fit(X) ... OneHotEncoder(categorical_features=None, categories=None, dtype=<... 'numpy.float64'>, handle_unknown='ignore', n_values=None, sparse=True)
>>> enc.categories_ [array(['Female', 'Male'], dtype=object), array([1, 2, 3], dtype=object)] >>> enc.transform([['Female', 1], ['Male', 4]]).toarray() array([[1., 0., 1., 0., 0.], [0., 1., 0., 0., 0.]]) >>> enc.inverse_transform([[0, 1, 1, 0, 0], [0, 0, 0, 1, 0]]) array([['Male', 1], [None, 2]], dtype=object) >>> enc.get_feature_names() array(['x0_Female', 'x0_Male', 'x1_1', 'x1_2', 'x1_3'], dtype=object)
See also
 sklearn.preprocessing.OrdinalEncoder : performs an ordinal (integer)
 encoding of the categorical features.
 sklearn.feature_extraction.DictVectorizer : performs a onehot encoding of
 dictionary items (also handles stringvalued features).
 sklearn.feature_extraction.FeatureHasher : performs an approximate onehot
 encoding of dictionary items or strings.
 sklearn.preprocessing.LabelBinarizer : binarizes labels in a onevsall
 fashion.
 sklearn.preprocessing.MultiLabelBinarizer : transforms between iterable of
 iterables and a multilabel format, e.g. a (samples x classes) binary matrix indicating the presence of a class label.
Full API documentation: OneHotEncoderScikitsLearnNode

class
mdp.nodes.
KNeighborsRegressorScikitsLearnNode
¶ Regression based on knearest neighbors. This node has been automatically generated by wrapping the
sklearn.neighbors.regression.KNeighborsRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The target is predicted by local interpolation of the targets associated of the nearest neighbors in the training set.Read more in the User Guide.
Parameters
 n_neighbors : int, optional (default = 5)
 Number of neighbors to use by default for
kneighbors()
queries.  weights : str or callable
weight function used in prediction. Possible values:
 ‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
 ‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
 [callable] : a userdefined function which accepts an array of distances, and returns an array of the same shape containing the weights.
Uniform weights are used by default.
 algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
Algorithm used to compute the nearest neighbors:
 ‘ball_tree’ will use
BallTree
 ‘kd_tree’ will use
KDTree
 ‘brute’ will use a bruteforce search.
 ‘auto’ will attempt to decide the most appropriate algorithm
based on the values passed to
fit()
method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
 ‘ball_tree’ will use
 leaf_size : int, optional (default = 30)
 Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.
 p : integer, optional (default = 2)
 Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.
 metric : string or callable, default ‘minkowski’
 the distance metric to use for the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of the DistanceMetric class for a list of available metrics.
 metric_params : dict, optional (default = None)
 Additional keyword arguments for the metric function.
 n_jobs : int or None, optional (default=None)
 The number of parallel jobs to run for neighbors search.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details. Doesn’t affectfit()
method.
Examples
>>> X = [[0], [1], [2], [3]] >>> y = [0, 0, 1, 1] >>> from sklearn.neighbors import KNeighborsRegressor >>> neigh = KNeighborsRegressor(n_neighbors=2) >>> neigh.fit(X, y) KNeighborsRegressor(...) >>> print(neigh.predict([[1.5]])) [0.5]
See also
NearestNeighbors RadiusNeighborsRegressor KNeighborsClassifier RadiusNeighborsClassifier
Notes
See Nearest Neighbors in the online documentation for a discussion of the choice of
algorithm
andleaf_size
.Warning
Regarding the Nearest Neighbors algorithms, if it is found that two neighbors, neighbor k+1 and k, have identical distances but different labels, the results will depend on the ordering of the training data.
https://en.wikipedia.org/wiki/Knearest_neighbor_algorithm
Full API documentation: KNeighborsRegressorScikitsLearnNode

class
mdp.nodes.
LocalOutlierFactorScikitsLearnNode
¶ Unsupervised Outlier Detection using Local Outlier Factor (LOF) This node has been automatically generated by wrapping the
sklearn.neighbors.lof.LocalOutlierFactor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The anomaly score of each sample is called Local Outlier Factor. It measures the local deviation of density of a given sample with respect to its neighbors. It is local in that the anomaly score depends on how isolated the object is with respect to the surrounding neighborhood. More precisely, locality is given by knearest neighbors, whose distance is used to estimate the local density. By comparing the local density of a sample to the local densities of its neighbors, one can identify samples that have a substantially lower density than their neighbors. These are considered outliers.Parameters
 n_neighbors : int, optional (default=20)
 Number of neighbors to use by default for
kneighbors()
queries. If n_neighbors is larger than the number of samples provided, all samples will be used.  algorithm : {‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, optional
Algorithm used to compute the nearest neighbors:
 ‘ball_tree’ will use
BallTree
 ‘kd_tree’ will use
KDTree
 ‘brute’ will use a bruteforce search.
 ‘auto’ will attempt to decide the most appropriate algorithm
based on the values passed to
fit()
method.
Note: fitting on sparse input will override the setting of this parameter, using brute force.
 ‘ball_tree’ will use
 leaf_size : int, optional (default=30)
 Leaf size passed to
BallTree
orKDTree
. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.  metric : string or callable, default ‘minkowski’
metric used for the distance computation. Any metric from scikitlearn or scipy.spatial.distance can be used.
If ‘precomputed’, the training input X is expected to be a distance matrix.
If metric is a callable function, it is called on each pair of instances (rows) and the resulting value recorded. The callable should take two arrays as input and return one value indicating the distance between them. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.
Valid values for metric are:
 from scikitlearn: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’]
 from scipy.spatial.distance: [‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘correlation’, ‘dice’, ‘hamming’, ‘jaccard’, ‘kulsinski’, ‘mahalanobis’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’]
See the documentation for scipy.spatial.distance for details on these metrics:
 p : integer, optional (default=2)
 Parameter for the Minkowski metric from
sklearn.metrics.pairwise.pairwise_distances()
. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.  metric_params : dict, optional (default=None)
 Additional keyword arguments for the metric function.
 contamination : float in (0., 0.5), optional (default=0.1)
The amount of contamination of the data set, i.e. the proportion of outliers in the data set. When fitting this is used to define the threshold on the decision function. If “auto”, the decision function threshold is determined as in the original paper.
Changed in version 0.20: The default value of
contamination
will change from 0.1 in 0.20 to'auto'
in 0.22. novelty : boolean, default False
 By default, LocalOutlierFactor is only meant to be used for outlier detection (novelty=False). Set novelty to True if you want to use LocalOutlierFactor for novelty detection. In this case be aware that that you should only use predict, decision_function and score_samples on new unseen data and not on the training set.
 n_jobs : int or None, optional (default=None)
 The number of parallel jobs to run for neighbors search.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details. Affects onlykneighbors()
andkneighbors_graph()
methods.
Attributes
negative_outlier_factor_
: numpy array, shape (n_samples,)The opposite LOF of the training samples. The higher, the more normal. Inliers tend to have a LOF score close to 1 (
negative_outlier_factor_
close to 1), while outliers tend to have a larger LOF score.The local outlier factor (LOF) of a sample captures its supposed ‘degree of abnormality’. It is the average of the ratio of the local reachability density of a sample and those of its knearest neighbors.
n_neighbors_
: integer The actual number of neighbors used for
kneighbors()
queries. offset_
: float Offset used to obtain binary labels from the raw scores. Observations having a negative_outlier_factor smaller than offset_ are detected as abnormal. The offset is set to 1.5 (inliers score around 1), except when a contamination parameter different than “auto” is provided. In that case, the offset is defined in such a way we obtain the expected number of outliers in training.
References
[1] Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying densitybased local outliers. In ACM sigmod record. Full API documentation: LocalOutlierFactorScikitsLearnNode

class
mdp.nodes.
GaussianProcessRegressorScikitsLearnNode
¶ Gaussian process regression (GPR). This node has been automatically generated by wrapping the
sklearn.gaussian_process.gpr.GaussianProcessRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The implementation is based on Algorithm 2.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams.In addition to standard scikitlearn estimator API, GaussianProcessRegressor:
 allows prediction without prior fitting (based on the GP prior)
 provides an additional method sample_y(X), which evaluates samples drawn from the GPR (prior or posterior) at given inputs
 exposes a method log_marginal_likelihood(theta), which can be used externally for other ways of selecting hyperparameters, e.g., via Markov chain Monte Carlo.
Read more in the User Guide.
New in version 0.18.
Parameters
 kernel : kernel object
 The kernel specifying the covariance function of the GP. If None is passed, the kernel “1.0 * RBF(1.0)” is used as default. Note that the kernel’s hyperparameters are optimized during fitting.
 alpha : float or arraylike, optional (default: 1e10)
 Value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations. This can also prevent a potential numerical issue during fitting, by ensuring that the calculated values form a positive definite matrix. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapointdependent noise level. Note that this is equivalent to adding a WhiteKernel with c=alpha. Allowing to specify the noise level directly as a parameter is mainly for convenience and for consistency with Ridge.
 optimizer : string or callable, optional (default: “fmin_l_bfgs_b”)
Can either be one of the internally supported optimizers for optimizing the kernel’s parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature:
def optimizer(obj_func, initial_theta, bounds):  # * 'obj_func' is the objective function to be minimized, which  # takes the hyperparameters theta as parameter and an  # optional flag eval_gradient, which determines if the  # gradient is returned additionally to the function value  # * 'initial_theta': the initial value for theta, which can be  # used by local optimizers  # * 'bounds': the bounds on the values of theta  ....  # Returned are the best found hyperparameters theta and  # the corresponding value of the target function.  return theta_opt, func_min
Per default, the ‘fmin_l_bfgs_b’ algorithm from scipy.optimize is used. If None is passed, the kernel’s parameters are kept fixed. Available internal optimizers are:
'fmin_l_bfgs_b'
 n_restarts_optimizer : int, optional (default: 0)
 The number of restarts of the optimizer for finding the kernel’s parameters which maximize the logmarginal likelihood. The first run of the optimizer is performed from the kernel’s initial parameters, the remaining ones (if any) from thetas sampled loguniform randomly from the space of allowed thetavalues. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer == 0 implies that one run is performed.
 normalize_y : boolean, optional (default: False)
 Whether the target values y are normalized, i.e., the mean of the observed target values become zero. This parameter should be set to True if the target values’ mean is expected to differ considerable from zero. When enabled, the normalization effectively modifies the GP’s prior based on the data, which contradicts the likelihood principle; normalization is thus disabled per default.
 copy_X_train : bool, optional (default: True)
 If True, a persistent copy of the training data is stored in the object. Otherwise, just a reference to the training data is stored, which might cause predictions to change if the data is modified externally.
 random_state : int, RandomState instance or None, optional (default: None)
 The generator used to initialize the centers. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
Attributes
X_train_
: arraylike, shape = (n_samples, n_features) Feature values in training data (also required for prediction)
y_train_
: arraylike, shape = (n_samples, [n_output_dims]) Target values in training data (also required for prediction)
kernel_
: kernel object The kernel used for prediction. The structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters
L_
: arraylike, shape = (n_samples, n_samples) Lowertriangular Cholesky decomposition of the kernel in
X_train_
alpha_
: arraylike, shape = (n_samples,) Dual coefficients of training data points in kernel space
log_marginal_likelihood_value_
: float The logmarginallikelihood of
self.kernel_.theta
Examples
>>> from sklearn.datasets import make_friedman2 >>> from sklearn.gaussian_process import GaussianProcessRegressor >>> from sklearn.gaussian_process.kernels import DotProduct, WhiteKernel >>> X, y = make_friedman2(n_samples=500, noise=0, random_state=0) >>> kernel = DotProduct() + WhiteKernel() >>> gpr = GaussianProcessRegressor(kernel=kernel, ... random_state=0).fit(X, y) >>> gpr.score(X, y) 0.3680... >>> gpr.predict(X[:2,:], return_std=True) (array([653.0..., 592.1...]), array([316.6..., 316.6...]))
Full API documentation: GaussianProcessRegressorScikitsLearnNode

class
mdp.nodes.
KMeansScikitsLearnNode
¶ KMeans clustering This node has been automatically generated by wrapping the
sklearn.cluster.k_means_.KMeans
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 n_clusters : int, optional, default: 8
 The number of clusters to form as well as the number of centroids to generate.
 init : {‘kmeans++’, ‘random’ or an ndarray}
Method for initialization, defaults to ‘kmeans++’:
‘kmeans++’ : selects initial cluster centers for kmean clustering in a smart way to speed up convergence. See section Notes in k_init for more details.
‘random’: choose k observations (rows) at random from data for the initial centroids.
If an ndarray is passed, it should be of shape (n_clusters, n_features) and gives the initial centers.
 n_init : int, default: 10
 Number of time the kmeans algorithm will be run with different centroid seeds. The final results will be the best output of n_init consecutive runs in terms of inertia.
 max_iter : int, default: 300
 Maximum number of iterations of the kmeans algorithm for a single run.
 tol : float, default: 1e4
 Relative tolerance with regards to inertia to declare convergence
 precompute_distances : {‘auto’, True, False}
Precompute distances (faster but takes more memory).
‘auto’ : do not precompute distances if n_samples * n_clusters > 12 million. This corresponds to about 100MB overhead per job using double precision.
True : always precompute distances
False : never precompute distances
 verbose : int, default 0
 Verbosity mode.
 random_state : int, RandomState instance or None (default)
 Determines random number generation for centroid initialization. Use an int to make the randomness deterministic. See Glossary.
 copy_x : boolean, optional
 When precomputing distances it is more numerically accurate to center the data first. If copy_x is True (default), then the original data is not modified, ensuring X is Ccontiguous. If False, the original data is modified, and put back before the function returns, but small numerical differences may be introduced by subtracting and then adding the data mean, in this case it will also not ensure that data is Ccontiguous which may cause a significant slowdown.
 n_jobs : int or None, optional (default=None)
The number of jobs to use for the computation. This works by computing each of the n_init runs in parallel.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details. algorithm : “auto”, “full” or “elkan”, default=”auto”
 Kmeans algorithm to use. The classical EMstyle algorithm is “full”. The “elkan” variation is more efficient by using the triangle inequality, but currently doesn’t support sparse data. “auto” chooses “elkan” for dense data and “full” for sparse data.
Attributes
cluster_centers_
: array, [n_clusters, n_features] Coordinates of cluster centers. If the algorithm stops before fully
converging (see
tol
andmax_iter
), these will not be consistent withlabels_
.
labels_
: Labels of each point
inertia_
: float Sum of squared distances of samples to their closest cluster center.
n_iter_
: int Number of iterations run.
Examples
>>> from sklearn.cluster import KMeans >>> import numpy as np >>> X = np.array([[1, 2], [1, 4], [1, 0], ... [10, 2], [10, 4], [10, 0]]) >>> kmeans = KMeans(n_clusters=2, random_state=0).fit(X) >>> kmeans.labels_ array([1, 1, 1, 0, 0, 0], dtype=int32) >>> kmeans.predict([[0, 0], [12, 3]]) array([1, 0], dtype=int32) >>> kmeans.cluster_centers_ array([[10., 2.], [ 1., 2.]])
See also
 MiniBatchKMeans
 Alternative online implementation that does incremental updates of the centers positions using minibatches. For large scale learning (say n_samples > 10k) MiniBatchKMeans is probably much faster than the default batch implementation.
Notes
The kmeans problem is solved using either Lloyd’s or Elkan’s algorithm.
The average complexity is given by O(k n T), were n is the number of samples and T is the number of iteration.
The worst case complexity is given by O(n^(k+2/p)) with n = n_samples, p = n_features. (D. Arthur and S. Vassilvitskii, ‘How slow is the kmeans method?’ SoCG2006)
In practice, the kmeans algorithm is very fast (one of the fastest clustering algorithms available), but it falls in local minima. That’s why it can be useful to restart it several times.
If the algorithm stops before fully converging (because of
tol
ormax_iter
),labels_
andcluster_centers_
will not be consistent, i.e. thecluster_centers_
will not be the means of the points in each cluster. Also, the estimator will reassignlabels_
after the last iteration to makelabels_
consistent withpredict
on the training set.Full API documentation: KMeansScikitsLearnNode

class
mdp.nodes.
OutputCodeClassifierScikitsLearnNode
¶ (ErrorCorrecting) OutputCode multiclass strategy This node has been automatically generated by wrapping the
sklearn.multiclass.OutputCodeClassifier
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Outputcode based strategies consist in representing each class with a binary code (an array of 0s and 1s). At fitting time, one binary classifier per bit in the code book is fitted. At prediction time, the classifiers are used to project new points in the class space and the class closest to the points is chosen. The main advantage of these strategies is that the number of classifiers used can be controlled by the user, either for compressing the model (0 < code_size < 1) or for making the model more robust to errors (code_size > 1). See the documentation for more details.Read more in the User Guide.
Parameters
 estimator : estimator object
 An estimator object implementing fit and one of decision_function or predict_proba.
 code_size : float
 Percentage of the number of classes to be used to create the code book. A number between 0 and 1 will require fewer classifiers than onevstherest. A number greater than 1 will require more classifiers than onevstherest.
 random_state : int, RandomState instance or None, optional, default: None
 The generator used to initialize the codebook. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 n_jobs : int or None, optional (default=None)
 The number of jobs to use for the computation.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.
Attributes
estimators_
: list of int(n_classes * code_size) estimators Estimators used for predictions.
classes_
: numpy array of shape [n_classes] Array containing labels.
code_book_
: numpy array of shape [n_classes, code_size] Binary array containing the code of each class.
References
[1] “Solving multiclass learning problems via errorcorrecting output codes”, Dietterich T., Bakiri G., Journal of Artificial Intelligence Research 2, 1995. [2] “The error coding method and PICTs”, James G., Hastie T., Journal of Computational and Graphical statistics 7, 1998. [3] “The Elements of Statistical Learning”, Hastie T., Tibshirani R., Friedman J., page 606 (secondedition) 2008. Full API documentation: OutputCodeClassifierScikitsLearnNode

class
mdp.nodes.
PassiveAggressiveRegressorScikitsLearnNode
¶ Passive Aggressive Regressor This node has been automatically generated by wrapping the
sklearn.linear_model.passive_aggressive.PassiveAggressiveRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 C : float
 Maximum step size (regularization). Defaults to 1.0.
 fit_intercept : bool
 Whether the intercept should be estimated or not. If False, the data is assumed to be already centered. Defaults to True.
 max_iter : int, optional
The maximum number of passes over the training data (aka epochs). It only impacts the behavior in the
fit
method, and not the partial_fit. Defaults to 5. Defaults to 1000 from 0.21, or if tol is not None.New in version 0.19.
 tol : float or None, optional
The stopping criterion. If it is not None, the iterations will stop when (loss > previous_loss  tol). Defaults to None. Defaults to 1e3 from 0.21.
New in version 0.19.
 early_stopping : bool, default=False
Whether to use early stopping to terminate training when validation. score is not improving. If set to True, it will automatically set aside a fraction of training data as validation and terminate training when validation score is not improving by at least tol for n_iter_no_change consecutive epochs.
New in version 0.20.
 validation_fraction : float, default=0.1
The proportion of training data to set aside as validation set for early stopping. Must be between 0 and 1. Only used if early_stopping is True.
New in version 0.20.
 n_iter_no_change : int, default=5
Number of iterations with no improvement to wait before early stopping.
New in version 0.20.
 shuffle : bool, default=True
 Whether or not the training data should be shuffled after each epoch.
 verbose : integer, optional
 The verbosity level
 loss : string, optional
The loss function to be used:
 epsilon_insensitive: equivalent to PAI in the reference paper.
 squared_epsilon_insensitive: equivalent to PAII in the reference
 paper.
 epsilon : float
 If the difference between the current prediction and the correct label is below this threshold, the model is not updated.
 random_state : int, RandomState instance or None, optional, default=None
 The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 warm_start : bool, optional
When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.
Repeatedly calling fit or partial_fit when warm_start is True can result in a different solution than when calling fit a single time because of the way the data is shuffled.
 average : bool or int, optional
When set to True, computes the averaged SGD weights and stores the result in the
coef_
attribute. If set to an int greater than 1, averaging will begin once the total number of samples seen reaches average. So average=10 will begin averaging after seeing 10 samples.New in version 0.19: parameter average to use weights averaging in SGD
 n_iter : int, optional
The number of passes over the training data (aka epochs). Defaults to None. Deprecated, will be removed in 0.21.
Changed in version 0.19: Deprecated
Attributes
coef_
: array, shape = [1, n_features] if n_classes == 2 else [n_classes, n_features] Weights assigned to the features.
intercept_
: array, shape = [1] if n_classes == 2 else [n_classes] Constants in decision function.
n_iter_
: int The actual number of iterations to reach the stopping criterion.
Examples
>>> from sklearn.linear_model import PassiveAggressiveRegressor >>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=4, random_state=0) >>> regr = PassiveAggressiveRegressor(max_iter=100, random_state=0, ... tol=1e3) >>> regr.fit(X, y) PassiveAggressiveRegressor(C=1.0, average=False, early_stopping=False, epsilon=0.1, fit_intercept=True, loss='epsilon_insensitive', max_iter=100, n_iter=None, n_iter_no_change=5, random_state=0, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False) >>> print(regr.coef_) [20.48736655 34.18818427 67.59122734 87.94731329] >>> print(regr.intercept_) [0.02306214] >>> print(regr.predict([[0, 0, 0, 0]])) [0.02306214]
See also
SGDRegressor
References
Online PassiveAggressive Algorithms <http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf> K. Crammer, O. Dekel, J. Keshat, S. ShalevShwartz, Y. Singer  JMLR (2006)
Full API documentation: PassiveAggressiveRegressorScikitsLearnNode

class
mdp.nodes.
RandomForestClassifierScikitsLearnNode
¶ A random forest classifier. This node has been automatically generated by wrapping the
sklearn.ensemble.forest.RandomForestClassifier
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. A random forest is a meta estimator that fits a number of decision tree classifiers on various subsamples of the dataset and uses averaging to improve the predictive accuracy and control overfitting. The subsample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).Read more in the User Guide.
Parameters
 n_estimators : integer, optional (default=10)
The number of trees in the forest.
Changed in version 0.20: The default value of
n_estimators
will change from 10 in version 0.20 to 100 in version 0.22. criterion : string, optional (default=”gini”)
 The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain. Note: this parameter is treespecific.
 max_depth : integer or None, optional (default=None)
 The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split as the minimum number.
 If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Changed in version 0.18: Added float values for fractions.
 min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. If int, then consider min_samples_leaf as the minimum number.
 If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
Changed in version 0.18: Added float values for fractions.
 min_weight_fraction_leaf : float, optional (default=0.)
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
 max_features : int, float, string or None, optional (default=”auto”)
The number of features to consider when looking for the best split:
 If int, then consider max_features features at each split.
 If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
 If “auto”, then max_features=sqrt(n_features).
 If “sqrt”, then max_features=sqrt(n_features) (same as “auto”).
 If “log2”, then max_features=log2(n_features).
 If None, then max_features=n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_features
features. max_leaf_nodes : int or None, optional (default=None)
 Grow trees with
max_leaf_nodes
in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.  min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following:
N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.New in version 0.19.
 min_impurity_split : float, (default=1e7)
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Deprecated since version 0.19:
min_impurity_split
has been deprecated in favor ofmin_impurity_decrease
in 0.19. The default value ofmin_impurity_split
will change from 1e7 to 0 in 0.23 and it will be removed in 0.25. Usemin_impurity_decrease
instead. bootstrap : boolean, optional (default=True)
 Whether bootstrap samples are used when building trees. If False, the whole datset is used to build each tree.
 oob_score : bool (default=False)
 Whether to use outofbag samples to estimate the generalization accuracy.
 n_jobs : int or None, optional (default=None)
 The number of jobs to run in parallel for both fit and predict.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 verbose : int, optional (default=0)
 Controls the verbosity when fitting and predicting.
 warm_start : bool, optional (default=False)
 When set to
True
, reuse the solution of the previous call to fit and add more estimators to the ensemble, otherwise, just fit a whole new forest. See the Glossary.  class_weight : dict, list of dicts, “balanced”, “balanced_subsample” or None, optional (default=None)
Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one. For multioutput problems, a list of dicts can be provided in the same order as the columns of y.Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for fourclass multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
The “balanced_subsample” mode is the same as “balanced” except that weights are computed based on the bootstrap sample for every tree grown.
For multioutput, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
Attributes
estimators_
: list of DecisionTreeClassifier The collection of fitted subestimators.
classes_
: array of shape = [n_classes] or a list of such arrays The classes labels (single output problem), or a list of arrays of class labels (multioutput problem).
n_classes_
: int or list The number of classes (single output problem), or a list containing the number of classes for each output (multioutput problem).
n_features_
: int The number of features when
fit
is performed. n_outputs_
: int The number of outputs when
fit
is performed. feature_importances_
: array of shape = [n_features] The feature importances (the higher, the more important the feature).
oob_score_
: float Score of the training dataset obtained using an outofbag estimate.
oob_decision_function_
: array of shape = [n_samples, n_classes] Decision function computed with outofbag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_decision_function_ might contain NaN.
Examples
>>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=1000, n_features=4, ... n_informative=2, n_redundant=0, ... random_state=0, shuffle=False) >>> clf = RandomForestClassifier(n_estimators=100, max_depth=2, ... random_state=0) >>> clf.fit(X, y) RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=2, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=0, verbose=0, warm_start=False) >>> print(clf.feature_importances_) [0.14205973 0.76664038 0.0282433 0.06305659] >>> print(clf.predict([[0, 0, 0, 0]])) [1]
Notes
The default values for the parameters controlling the size of the trees (e.g.
max_depth
,min_samples_leaf
, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data,
max_features=n_features
andbootstrap=False
, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting,random_state
has to be fixed.References
[1]  Breiman, “Random Forests”, Machine Learning, 45(1), 532, 2001.
See also
DecisionTreeClassifier, ExtraTreesClassifier
Full API documentation: RandomForestClassifierScikitsLearnNode

class
mdp.nodes.
ForestRegressorScikitsLearnNode
¶ Base class for forest of treesbased regressors. This node has been automatically generated by wrapping the
sklearn.ensemble.forest.ForestRegressor
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Warning: This class should not be used directly. Use derived classes instead.Full API documentation: ForestRegressorScikitsLearnNode

class
mdp.nodes.
RidgeScikitsLearnNode
¶ Linear least squares with l2 regularization. This node has been automatically generated by wrapping the
sklearn.linear_model.ridge.Ridge
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Minimizes the objective function:y  Xw^2_2 + alpha * w^2_2
This model solves a regression model where the loss function is the linear least squares function and regularization is given by the l2norm. Also known as Ridge Regression or Tikhonov regularization. This estimator has builtin support for multivariate regression (i.e., when y is a 2darray of shape [n_samples, n_targets]).
Read more in the User Guide.
Parameters
 alpha : {float, arraylike}, shape (n_targets)
 Regularization strength; must be a positive float. Regularization
improves the conditioning of the problem and reduces the variance of
the estimates. Larger values specify stronger regularization.
Alpha corresponds to
C^1
in other linear models such as LogisticRegression or LinearSVC. If an array is passed, penalties are assumed to be specific to the targets. Hence they must correspond in number.  fit_intercept : boolean
 Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  copy_X : boolean, optional, default True
 If True, X will be copied; else, it may be overwritten.
 max_iter : int, optional
 Maximum number of iterations for conjugate gradient solver. For ‘sparse_cg’ and ‘lsqr’ solvers, the default value is determined by scipy.sparse.linalg. For ‘sag’ solver, the default value is 1000.
 tol : float
 Precision of the solution.
 solver : {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}
Solver to use in the computational routines:
 ‘auto’ chooses the solver automatically based on the type of data.
 ‘svd’ uses a Singular Value Decomposition of X to compute the Ridge coefficients. More stable for singular matrices than ‘cholesky’.
 ‘cholesky’ uses the standard scipy.linalg.solve function to obtain a closedform solution.
 ‘sparse_cg’ uses the conjugate gradient solver as found in scipy.sparse.linalg.cg. As an iterative algorithm, this solver is more appropriate than ‘cholesky’ for largescale data (possibility to set tol and max_iter).
 ‘lsqr’ uses the dedicated regularized leastsquares routine scipy.sparse.linalg.lsqr. It is the fastest and uses an iterative procedure.
 ‘sag’ uses a Stochastic Average Gradient descent, and ‘saga’ uses its improved, unbiased version named SAGA. Both methods also use an iterative procedure, and are often faster than other solvers when both n_samples and n_features are large. Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing.
All last five solvers support both dense and sparse data. However, only ‘sag’ and ‘saga’ supports sparse input when fit_intercept is True.
New in version 0.17: Stochastic Average Gradient descent solver.
New in version 0.19: SAGA solver.
 random_state : int, RandomState instance or None, optional, default None
The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when
solver
== ‘sag’.New in version 0.17: random_state to support Stochastic Average Gradient.
Attributes
coef_
: array, shape (n_features,) or (n_targets, n_features) Weight vector(s).
intercept_
: float  array, shape = (n_targets,) Independent term in decision function. Set to 0.0 if
fit_intercept = False
. n_iter_
: array or None, shape (n_targets,)Actual number of iterations for each target. Available only for sag and lsqr solvers. Other solvers will return None.
New in version 0.17.
See also
RidgeClassifier : Ridge classifier RidgeCV : Ridge regression with builtin cross validation
sklearn.kernel_ridge.KernelRidge
: Kernel ridge regressioncombines ridge regression with the kernel trickExamples
>>> from sklearn.linear_model import Ridge >>> import numpy as np >>> n_samples, n_features = 10, 5 >>> np.random.seed(0) >>> y = np.random.randn(n_samples) >>> X = np.random.randn(n_samples, n_features) >>> clf = Ridge(alpha=1.0) >>> clf.fit(X, y) Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None, normalize=False, random_state=None, solver='auto', tol=0.001)
Full API documentation: RidgeScikitsLearnNode

class
mdp.nodes.
ElasticNetScikitsLearnNode
¶ Linear regression with combined L1 and L2 priors as regularizer. This node has been automatically generated by wrapping the
sklearn.linear_model.coordinate_descent.ElasticNet
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Minimizes the objective function:1 / (2 * n_samples) * y  Xw^2_2 + alpha * l1_ratio * w_1 + 0.5 * alpha * (1  l1_ratio) * w^2_2
If you are interested in controlling the L1 and L2 penalty separately, keep in mind that this is equivalent to:
a * L1 + b * L2
where:
alpha = a + b and l1_ratio = a / (a + b)
The parameter l1_ratio corresponds to alpha in the glmnet R package while alpha corresponds to the lambda parameter in glmnet. Specifically, l1_ratio = 1 is the lasso penalty. Currently, l1_ratio <= 0.01 is not reliable, unless you supply your own sequence of alpha.
Read more in the User Guide.
Parameters
 alpha : float, optional
 Constant that multiplies the penalty terms. Defaults to 1.0.
See the notes for the exact mathematical meaning of this
parameter.``alpha = 0`` is equivalent to an ordinary least square,
solved by the
LinearRegression
object. For numerical reasons, usingalpha = 0
with theLasso
object is not advised. Given this, you should use theLinearRegression
object.  l1_ratio : float
 The ElasticNet mixing parameter, with
0 <= l1_ratio <= 1
. Forl1_ratio = 0
the penalty is an L2 penalty.For l1_ratio = 1
it is an L1 penalty. For0 < l1_ratio < 1
, the penalty is a combination of L1 and L2.  fit_intercept : bool
 Whether the intercept should be estimated or not. If
False
, the data is assumed to be already centered.  normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  precompute : True  False  arraylike
 Whether to use a precomputed Gram matrix to speed up
calculations. The Gram matrix can also be passed as argument.
For sparse input this option is always
True
to preserve sparsity.  max_iter : int, optional
 The maximum number of iterations
 copy_X : boolean, optional, default True
 If
True
, X will be copied; else, it may be overwritten.  tol : float, optional
 The tolerance for the optimization: if the updates are
smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
.  warm_start : bool, optional
 When set to
True
, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.  positive : bool, optional
 When set to
True
, forces the coefficients to be positive.  random_state : int, RandomState instance or None, optional, default None
 The seed of the pseudo random number generator that selects a random
feature to update. If int, random_state is the seed used by the random
number generator; If RandomState instance, random_state is the random
number generator; If None, the random number generator is the
RandomState instance used by np.random. Used when
selection
== ‘random’.  selection : str, default ‘cyclic’
 If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e4.
Attributes
coef_
: array, shape (n_features,)  (n_targets, n_features) parameter vector (w in the cost function formula)
sparse_coef_
: scipy.sparse matrix, shape (n_features, 1)  (n_targets, n_features)sparse_coef_
is a readonly property derived fromcoef_
intercept_
: float  array, shape (n_targets,) independent term in decision function.
n_iter_
: arraylike, shape (n_targets,) number of iterations run by the coordinate descent solver to reach the specified tolerance.
Examples
>>> from sklearn.linear_model import ElasticNet >>> from sklearn.datasets import make_regression
>>> X, y = make_regression(n_features=2, random_state=0) >>> regr = ElasticNet(random_state=0) >>> regr.fit(X, y) ElasticNet(alpha=1.0, copy_X=True, fit_intercept=True, l1_ratio=0.5, max_iter=1000, normalize=False, positive=False, precompute=False, random_state=0, selection='cyclic', tol=0.0001, warm_start=False) >>> print(regr.coef_) [18.83816048 64.55968825] >>> print(regr.intercept_) 1.451... >>> print(regr.predict([[0, 0]])) [1.451...]
Notes
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortrancontiguous numpy array.
See also
 ElasticNetCV : Elastic net model with best model selection by
 crossvalidation.
SGDRegressor: implements elastic net regression with incremental training. SGDClassifier: implements logistic regression with elastic net penalty
(SGDClassifier(loss="log", penalty="elasticnet")
).Full API documentation: ElasticNetScikitsLearnNode

class
mdp.nodes.
IsomapScikitsLearnNode
¶ Isomap Embedding This node has been automatically generated by wrapping the
sklearn.manifold.isomap.Isomap
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Nonlinear dimensionality reduction through Isometric MappingRead more in the User Guide.
Parameters
 n_neighbors : integer
 number of neighbors to consider for each point.
 n_components : integer
 number of coordinates for the manifold
 eigen_solver : [‘auto’’arpack’’dense’]
‘auto’ : Attempt to choose the most efficient solver for the given problem.
‘arpack’ : Use Arnoldi decomposition to find the eigenvalues and eigenvectors.
‘dense’ : Use a direct solver (i.e. LAPACK) for the eigenvalue decomposition.
 tol : float
 Convergence tolerance passed to arpack or lobpcg. not used if eigen_solver == ‘dense’.
 max_iter : integer
 Maximum number of iterations for the arpack solver. not used if eigen_solver == ‘dense’.
 path_method : string [‘auto’’FW’’D’]
Method to use in finding shortest path.
‘auto’ : attempt to choose the best algorithm automatically.
‘FW’ : FloydWarshall algorithm.
‘D’ : Dijkstra’s algorithm.
 neighbors_algorithm : string [‘auto’’brute’’kd_tree’’ball_tree’]
 Algorithm to use for nearest neighbors search, passed to neighbors.NearestNeighbors instance.
 n_jobs : int or None, optional (default=None)
 The number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.
Attributes
embedding_
: arraylike, shape (n_samples, n_components) Stores the embedding vectors.
kernel_pca_
: object KernelPCA object used to implement the embedding.
training_data_
: arraylike, shape (n_samples, n_features) Stores the training data.
nbrs_
: sklearn.neighbors.NearestNeighbors instance Stores nearest neighbors instance, including BallTree or KDtree if applicable.
dist_matrix_
: arraylike, shape (n_samples, n_samples) Stores the geodesic distance matrix of training data.
Examples
>>> from sklearn.datasets import load_digits >>> from sklearn.manifold import Isomap >>> X, _ = load_digits(return_X_y=True) >>> X.shape (1797, 64) >>> embedding = Isomap(n_components=2) >>> X_transformed = embedding.fit_transform(X[:100]) >>> X_transformed.shape (100, 2)
References
[1] Tenenbaum, J.B.; De Silva, V.; & Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 290 (5500) Full API documentation: IsomapScikitsLearnNode

class
mdp.nodes.
BinarizerScikitsLearnNode
¶ Binarize data (set feature values to 0 or 1) according to a threshold This node has been automatically generated by wrapping the
sklearn.preprocessing.data.Binarizer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Values greater than the threshold map to 1, while values less than or equal to the threshold map to 0. With the default threshold of 0, only positive values map to 1.Binarization is a common operation on text count data where the analyst can decide to only consider the presence or absence of a feature rather than a quantified number of occurrences for instance.
It can also be used as a preprocessing step for estimators that consider boolean random variables (e.g. modelled using the Bernoulli distribution in a Bayesian setting).
Read more in the User Guide.
Parameters
 threshold : float, optional (0.0 by default)
 Feature values below or equal to this are replaced by 0, above it by 1. Threshold may not be less than 0 for operations on sparse matrices.
 copy : boolean, optional, default True
 set to False to perform inplace binarization and avoid a copy (if the input is already a numpy array or a scipy.sparse CSR matrix).
Examples
>>> from sklearn.preprocessing import Binarizer >>> X = [[ 1., 1., 2.], ... [ 2., 0., 0.], ... [ 0., 1., 1.]] >>> transformer = Binarizer().fit(X) # fit does nothing. >>> transformer Binarizer(copy=True, threshold=0.0) >>> transformer.transform(X) array([[1., 0., 1.], [1., 0., 0.], [0., 1., 0.]])
Notes
If the input is a sparse matrix, only the nonzero values are subject to update by the Binarizer class.
This estimator is stateless (besides constructor parameters), the fit method does nothing but is useful when used in a pipeline.
See also
binarize: Equivalent function without the estimator API.
Full API documentation: BinarizerScikitsLearnNode

class
mdp.nodes.
MiniBatchDictionaryLearningScikitsLearnNode
¶ Minibatch dictionary learning This node has been automatically generated by wrapping the
sklearn.decomposition.dict_learning.MiniBatchDictionaryLearning
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Finds a dictionary (a set of atoms) that can best be used to represent data using a sparse code.Solves the optimization problem:
(U^*,V^*) = argmin 0.5  Y  U V _2^2 + alpha *  U _1 (U,V) with  V_k _2 = 1 for all 0 <= k < n_components
Read more in the User Guide.
Parameters
 n_components : int,
 number of dictionary elements to extract
 alpha : float,
 sparsity controlling parameter
 n_iter : int,
 total number of iterations to perform
 fit_algorithm : {‘lars’, ‘cd’}
 lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
 n_jobs : int or None, optional (default=None)
 Number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  batch_size : int,
 number of samples in each minibatch
 shuffle : bool,
 whether to shuffle the samples before forming batches
 dict_init : array of shape (n_components, n_features),
 initial value of the dictionary for warm restart scenarios
 transform_algorithm : {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}
 Algorithm used to transform the data. lars: uses the least angle regression method (linear_model.lars_path) lasso_lars: uses Lars to compute the Lasso solution lasso_cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). lasso_lars will be faster if the estimated components are sparse. omp: uses orthogonal matching pursuit to estimate the sparse solution threshold: squashes to zero all coefficients less than alpha from the projection dictionary * X’
 transform_n_nonzero_coefs : int,
0.1 * n_features
by default  Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.
 transform_alpha : float, 1. by default
 If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.
 verbose : bool, optional (default: False)
 To control the verbosity of the procedure.
 split_sign : bool, False by default
 Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 positive_code : bool
Whether to enforce positivity when finding the code.
New in version 0.20.
 positive_dict : bool
Whether to enforce positivity when finding the dictionary.
New in version 0.20.
Attributes
components_
: array, [n_components, n_features] components extracted from the data
inner_stats_
: tuple of (A, B) ndarrays Internal sufficient statistics that are kept by the algorithm. Keeping them is useful in online settings, to avoid loosing the history of the evolution, but they shouldn’t have any use for the end user. A (n_components, n_components) is the dictionary covariance matrix. B (n_features, n_components) is the data approximation matrix
n_iter_
: int Number of iterations run.
Notes
References:
J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009: Online dictionary learning for sparse coding (http://www.di.ens.fr/sierra/pdfs/icml09.pdf)
See also
SparseCoder DictionaryLearning SparsePCA MiniBatchSparsePCA
Full API documentation: MiniBatchDictionaryLearningScikitsLearnNode

class
mdp.nodes.
TfidfVectorizerScikitsLearnNode
¶ Convert a collection of raw documents to a matrix of TFIDF features. This node has been automatically generated by wrapping the
sklearn.feature_extraction.text.TfidfVectorizer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Equivalent toCountVectorizer
followed byTfidfTransformer
.Read more in the User Guide.
Parameters
 input : string {‘filename’, ‘file’, ‘content’}
If ‘filename’, the sequence passed as an argument to fit is expected to be a list of filenames that need reading to fetch the raw content to analyze.
If ‘file’, the sequence items must have a ‘read’ method (filelike object) that is called to fetch the bytes in memory.
Otherwise the input is expected to be the sequence strings or bytes items are expected to be analyzed directly.
 encoding : string, ‘utf8’ by default.
 If bytes or files are given to analyze, this encoding is used to decode.
 decode_error : {‘strict’, ‘ignore’, ‘replace’} (default=’strict’)
 Instruction on what to do if a byte sequence is given to analyze that contains characters not of the given encoding. By default, it is ‘strict’, meaning that a UnicodeDecodeError will be raised. Other values are ‘ignore’ and ‘replace’.
 strip_accents : {‘ascii’, ‘unicode’, None} (default=None)
Remove accents and perform other character normalization during the preprocessing step. ‘ascii’ is a fast method that only works on characters that have an direct ASCII mapping. ‘unicode’ is a slightly slower method that works on any characters. None (default) does nothing.
Both ‘ascii’ and ‘unicode’ use NFKD normalization from
unicodedata.normalize()
. lowercase : boolean (default=True)
 Convert all characters to lowercase before tokenizing.
 preprocessor : callable or None (default=None)
 Override the preprocessing (string transformation) stage while preserving the tokenizing and ngrams generation steps.
 tokenizer : callable or None (default=None)
 Override the string tokenization step while preserving the
preprocessing and ngrams generation steps.
Only applies if
analyzer == 'word'
.  analyzer : string, {‘word’, ‘char’, ‘char_wb’} or callable
Whether the feature should be made of word or character ngrams. Option ‘char_wb’ creates character ngrams only from text inside word boundaries; ngrams at the edges of words are padded with space.
If a callable is passed it is used to extract the sequence of features out of the raw, unprocessed input.
 stop_words : string {‘english’}, list, or None (default=None)
If a string, it is passed to _check_stop_list and the appropriate stop list is returned. ‘english’ is currently the only supported string value. There are several known issues with ‘english’ and you should consider an alternative (see stop_words).
If a list, that list is assumed to contain stop words, all of which will be removed from the resulting tokens. Only applies if
analyzer == 'word'
.If None, no stop words will be used. max_df can be set to a value in the range [0.7, 1.0) to automatically detect and filter stop words based on intra corpus document frequency of terms.
 token_pattern : string
 Regular expression denoting what constitutes a “token”, only used
if
analyzer == 'word'
. The default regexp selects tokens of 2 or more alphanumeric characters (punctuation is completely ignored and always treated as a token separator).  ngram_range : tuple (min_n, max_n) (default=(1, 1))
 The lower and upper boundary of the range of nvalues for different ngrams to be extracted. All values of n such that min_n <= n <= max_n will be used.
 max_df : float in range [0.0, 1.0] or int (default=1.0)
 When building the vocabulary ignore terms that have a document frequency strictly higher than the given threshold (corpusspecific stop words). If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
 min_df : float in range [0.0, 1.0] or int (default=1)
 When building the vocabulary ignore terms that have a document frequency strictly lower than the given threshold. This value is also called cutoff in the literature. If float, the parameter represents a proportion of documents, integer absolute counts. This parameter is ignored if vocabulary is not None.
 max_features : int or None (default=None)
If not None, build a vocabulary that only consider the top max_features ordered by term frequency across the corpus.
This parameter is ignored if vocabulary is not None.
 vocabulary : Mapping or iterable, optional (default=None)
 Either a Mapping (e.g., a dict) where keys are terms and values are indices in the feature matrix, or an iterable over terms. If not given, a vocabulary is determined from the input documents.
 binary : boolean (default=False)
 If True, all nonzero term counts are set to 1. This does not mean outputs will have only 0/1 values, only that the tf term in tfidf is binary. (Set idf and normalization to False to get 0/1 outputs.)
 dtype : type, optional (default=float64)
 Type of the matrix returned by fit_transform() or transform().
 norm : ‘l1’, ‘l2’ or None, optional (default=’l2’)
Each output row will have unit norm, either:
 ‘l2’: Sum of squares of vector elements is 1. The cosine
 similarity between two vectors is their dot product when l2 norm has
 been applied.
 ‘l1’: Sum of absolute values of vector elements is 1.
 See
preprocessing.normalize()
 use_idf : boolean (default=True)
 Enable inversedocumentfrequency reweighting.
 smooth_idf : boolean (default=True)
 Smooth idf weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. Prevents zero divisions.
 sublinear_tf : boolean (default=False)
 Apply sublinear tf scaling, i.e. replace tf with 1 + log(tf).
Attributes
vocabulary_
: dict A mapping of terms to feature indices.
idf_
: array, shape (n_features) The inverse document frequency (IDF) vector; only defined
if
use_idf
is True. stop_words_
: setTerms that were ignored because they either:
 occurred in too many documents (max_df)
 occurred in too few documents (min_df)
 were cut off by feature selection (max_features).
This is only available if no vocabulary was given.
Examples
>>> from sklearn.feature_extraction.text import TfidfVectorizer >>> corpus = [ ... 'This is the first document.', ... 'This document is the second document.', ... 'And this is the third one.', ... 'Is this the first document?', ... ] >>> vectorizer = TfidfVectorizer() >>> X = vectorizer.fit_transform(corpus) >>> print(vectorizer.get_feature_names()) ['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this'] >>> print(X.shape) (4, 9)
See also
CountVectorizer : Transforms text into a sparse matrix of ngram counts.
 TfidfTransformer : Performs the TFIDF transformation from a provided
 matrix of counts.
Notes
The
stop_words_
attribute can get large and increase the model size when pickling. This attribute is provided only for introspection and can be safely removed using delattr or set to None before pickling.Full API documentation: TfidfVectorizerScikitsLearnNode

class
mdp.nodes.
KBinsDiscretizerScikitsLearnNode
¶ Bin continuous data into intervals. This node has been automatically generated by wrapping the
sklearn.preprocessing._discretization.KBinsDiscretizer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 n_bins : int or arraylike, shape (n_features,) (default=5)
 The number of bins to produce. Raises ValueError if
n_bins < 2
.  encode : {‘onehot’, ‘onehotdense’, ‘ordinal’}, (default=’onehot’)
Method used to encode the transformed result.
 onehot
 Encode the transformed result with onehot encoding and return a sparse matrix. Ignored features are always stacked to the right.
 onehotdense
 Encode the transformed result with onehot encoding and return a dense array. Ignored features are always stacked to the right.
 ordinal
 Return the bin identifier encoded as an integer value.
 strategy : {‘uniform’, ‘quantile’, ‘kmeans’}, (default=’quantile’)
Strategy used to define the widths of the bins.
 uniform
 All bins in each feature have identical widths.
 quantile
 All bins in each feature have the same number of points.
 kmeans
 Values in each bin have the same nearest center of a 1D kmeans cluster.
Attributes
n_bins_
: int array, shape (n_features,) Number of bins per feature. Bins whose width are too small (i.e., <= 1e8) are removed with a warning.
bin_edges_
: array of arrays, shape (n_features, ) The edges of each bin. Contain arrays of varying shapes
(n_bins_, )
Ignored features will have empty arrays.
Examples
>>> X = [[2, 1, 4, 1], ... [1, 2, 3, 0.5], ... [ 0, 3, 2, 0.5], ... [ 1, 4, 1, 2]] >>> est = KBinsDiscretizer(n_bins=3, encode='ordinal', strategy='uniform') >>> est.fit(X) KBinsDiscretizer(...) >>> Xt = est.transform(X) >>> Xt array([[ 0., 0., 0., 0.], [ 1., 1., 1., 0.], [ 2., 2., 2., 1.], [ 2., 2., 2., 2.]])
Sometimes it may be useful to convert the data back into the original feature space. The
inverse_transform
function converts the binned data into the original feature space. Each value will be equal to the mean of the two bin edges.>>> est.bin_edges_[0] array([2., 1., 0., 1.]) >>> est.inverse_transform(Xt) array([[1.5, 1.5, 3.5, 0.5], [0.5, 2.5, 2.5, 0.5], [ 0.5, 3.5, 1.5, 0.5], [ 0.5, 3.5, 1.5, 1.5]])
Notes
In bin edges for feature
i
, the first and last values are used only forinverse_transform
. During transform, bin edges are extended to:np.concatenate([np.inf, bin_edges_[i][1:1], np.inf])
You can combine
KBinsDiscretizer
withsklearn.compose.ColumnTransformer
if you only want to preprocess part of the features.KBinsDiscretizer
might produce constant features (e.g., whenencode = 'onehot'
and certain bins do not contain any data). These features can be removed with feature selection algorithms (e.g.,sklearn.feature_selection.VarianceThreshold
).See also
 sklearn.preprocessing.Binarizer : class used to bin values as
0
or 1
based on a parameterthreshold
.
Full API documentation: KBinsDiscretizerScikitsLearnNode

class
mdp.nodes.
IncrementalPCAScikitsLearnNode
¶ Incremental principal components analysis (IPCA). This node has been automatically generated by wrapping the
sklearn.decomposition.incremental_pca.IncrementalPCA
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Linear dimensionality reduction using Singular Value Decomposition of centered data, keeping only the most significant singular vectors to project the data to a lower dimensional space.Depending on the size of the input data, this algorithm can be much more memory efficient than a PCA.
This algorithm has constant memory complexity, on the order of
batch_size
, enabling use of np.memmap files without loading the entire file into memory.The computational overhead of each SVD is
O(batch_size * n_features ** 2)
, but only 2 * batch_size samples remain in memory at a time. There will ben_samples / batch_size
SVD computations to get the principal components, versus 1 large SVD of complexityO(n_samples * n_features ** 2)
for PCA.Read more in the User Guide.
Parameters
 n_components : int or None, (default=None)
 Number of components to keep. If
n_components `` is ``None
, thenn_components
is set tomin(n_samples, n_features)
.  whiten : bool, optional
When True (False by default) the
components_
vectors are divided byn_samples
timescomponents_
to ensure uncorrelated outputs with unit componentwise variances.Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometimes improve the predictive accuracy of the downstream estimators by making data respect some hardwired assumptions.
 copy : bool, (default=True)
 If False, X will be overwritten.
copy=False
can be used to save memory but is unsafe for general use.  batch_size : int or None, (default=None)
 The number of samples to use for each batch. Only used when calling
fit
. Ifbatch_size
isNone
, thenbatch_size
is inferred from the data and set to5 * n_features
, to provide a balance between approximation accuracy and memory consumption.
Attributes
components_
: array, shape (n_components, n_features) Components with maximum variance.
explained_variance_
: array, shape (n_components,) Variance explained by each of the selected components.
explained_variance_ratio_
: array, shape (n_components,) Percentage of variance explained by each of the selected components. If all components are stored, the sum of explained variances is equal to 1.0.
singular_values_
: array, shape (n_components,) The singular values corresponding to each of the selected components.
The singular values are equal to the 2norms of the
n_components
variables in the lowerdimensional space. mean_
: array, shape (n_features,) Perfeature empirical mean, aggregate over calls to
partial_fit
. var_
: array, shape (n_features,) Perfeature empirical variance, aggregate over calls to
partial_fit
. noise_variance_
: float The estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See “Pattern Recognition and Machine Learning” by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/metmppca.pdf.
n_components_
: int The estimated number of components. Relevant when
n_components=None
. n_samples_seen_
: int The number of samples processed by the estimator. Will be reset on
new calls to fit, but increments across
partial_fit
calls.
Examples
>>> from sklearn.datasets import load_digits >>> from sklearn.decomposition import IncrementalPCA >>> X, _ = load_digits(return_X_y=True) >>> transformer = IncrementalPCA(n_components=7, batch_size=200) >>> # either partially fit on smaller batches of data >>> transformer.partial_fit(X[:100, :]) IncrementalPCA(batch_size=200, copy=True, n_components=7, whiten=False) >>> # or let the fit function itself divide the data into batches >>> X_transformed = transformer.fit_transform(X) >>> X_transformed.shape (1797, 7)
Notes
Implements the incremental PCA model from:
D. Ross, J. Lim, R. Lin, M. Yang, Incremental Learning for Robust Visual Tracking, International Journal of Computer Vision, Volume 77, Issue 13, pp. 125141, May 2008. See http://www.cs.toronto.edu/~dross/ivt/RossLimLinYang_ijcv.pdf
This model is an extension of the Sequential KarhunenLoeve Transform from:
A. Levy and M. Lindenbaum, Sequential KarhunenLoeve Basis Extraction and its Application to Images, IEEE Transactions on Image Processing, Volume 9, Number 8, pp. 13711374, August 2000. See http://www.cs.technion.ac.il/~mic/doc/sklip.pdf
We have specifically abstained from an optimization used by authors of both papers, a QR decomposition used in specific situations to reduce the algorithmic complexity of the SVD. The source for this technique is Matrix Computations, Third Edition, G. Holub and C. Van Loan, Chapter 5, section 5.4.4, pp 252253.. This technique has been omitted because it is advantageous only when decomposing a matrix with
n_samples
(rows) >= 5/3 *n_features
(columns), and hurts the readability of the implemented algorithm. This would be a good opportunity for future optimization, if it is deemed necessary.References
 Ross, J. Lim, R. Lin, M. Yang. Incremental Learning for Robust Visual
 Tracking, International Journal of Computer Vision, Volume 77, Issue 13, pp. 125141, May 2008.
 Golub and C. Van Loan. Matrix Computations, Third Edition, Chapter 5,
 Section 5.4.4, pp. 252253.
See also
PCA KernelPCA SparsePCA TruncatedSVD
Full API documentation: IncrementalPCAScikitsLearnNode

class
mdp.nodes.
MiniBatchSparsePCAScikitsLearnNode
¶ Minibatch Sparse Principal Components Analysis This node has been automatically generated by wrapping the
sklearn.decomposition.sparse_pca.MiniBatchSparsePCA
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Finds the set of sparse components that can optimally reconstruct the data. The amount of sparseness is controllable by the coefficient of the L1 penalty, given by the parameter alpha.Read more in the User Guide.
Parameters
 n_components : int,
 number of sparse atoms to extract
 alpha : int,
 Sparsity controlling parameter. Higher values lead to sparser components.
 ridge_alpha : float,
 Amount of ridge shrinkage to apply in order to improve conditioning when calling the transform method.
 n_iter : int,
 number of iterations to perform for each mini batch
 callback : callable or None, optional (default: None)
 callable that gets invoked every five iterations
 batch_size : int,
 the number of features to take in each mini batch
 verbose : int
 Controls the verbosity; the higher, the more messages. Defaults to 0.
 shuffle : boolean,
 whether to shuffle the data before splitting it in batches
 n_jobs : int or None, optional (default=None)
 Number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  method : {‘lars’, ‘cd’}
 lars: uses the least angle regression method to solve the lasso problem (linear_model.lars_path) cd: uses the coordinate descent method to compute the Lasso solution (linear_model.Lasso). Lars will be faster if the estimated components are sparse.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 normalize_components : boolean, optional (default=False)
 if False, use a version of Sparse PCA without components normalization and without data centering. This is likely a bug and even though it’s the default for backward compatibility, this should not be used.
 if True, use a version of Sparse PCA with components normalization and data centering.
New in version 0.20.
Deprecated since version 0.22:
normalize_components
was added and set toFalse
for backward compatibility. It would be set toTrue
from 0.22 onwards.
Attributes
components_
: array, [n_components, n_features] Sparse components extracted from the data.
n_iter_
: int Number of iterations run.
mean_
: array, shape (n_features,) Perfeature empirical mean, estimated from the training set.
Equal to
X.mean(axis=0)
.
Examples
>>> import numpy as np >>> from sklearn.datasets import make_friedman1 >>> from sklearn.decomposition import MiniBatchSparsePCA >>> X, _ = make_friedman1(n_samples=200, n_features=30, random_state=0) >>> transformer = MiniBatchSparsePCA(n_components=5, ... batch_size=50, ... normalize_components=True, ... random_state=0) >>> transformer.fit(X) MiniBatchSparsePCA(...) >>> X_transformed = transformer.transform(X) >>> X_transformed.shape (200, 5) >>> # most values in the ``components_`` are zero (sparsity) >>> np.mean(transformer.components_ == 0) 0.94
See also
PCA SparsePCA DictionaryLearning
Full API documentation: MiniBatchSparsePCAScikitsLearnNode

class
mdp.nodes.
FactorAnalysisScikitsLearnNode
¶ Factor Analysis (FA) This node has been automatically generated by wrapping the
sklearn.decomposition.factor_analysis.FactorAnalysis
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. A simple linear generative model with Gaussian latent variables.The observations are assumed to be caused by a linear transformation of lower dimensional latent factors and added Gaussian noise. Without loss of generality the factors are distributed according to a Gaussian with zero mean and unit covariance. The noise is also zero mean and has an arbitrary diagonal covariance matrix.
If we would restrict the model further, by assuming that the Gaussian noise is even isotropic (all diagonal entries are the same) we would obtain
PPCA
.FactorAnalysis performs a maximum likelihood estimate of the socalled loading matrix, the transformation of the latent variables to the observed ones, using expectationmaximization (EM).
Read more in the User Guide.
Parameters
 n_components : int  None
 Dimensionality of latent space, the number of components
of
X
that are obtained aftertransform
. If None, n_components is set to the number of features.  tol : float
 Stopping tolerance for EM algorithm.
 copy : bool
 Whether to make a copy of X. If
False
, the input X gets overwritten during fitting.  max_iter : int
 Maximum number of iterations.
 noise_variance_init : None  array, shape=(n_features,)
 The initial guess of the noise variance for each feature. If None, it defaults to np.ones(n_features)
 svd_method : {‘lapack’, ‘randomized’}
 Which SVD method to use. If ‘lapack’ use standard SVD from
scipy.linalg, if ‘randomized’ use fast
randomized_svd
function. Defaults to ‘randomized’. For most applications ‘randomized’ will be sufficiently precise while providing significant speed gains. Accuracy can also be improved by setting higher values for iterated_power. If this is not sufficient, for maximum precision you should choose ‘lapack’.  iterated_power : int, optional
 Number of iterations for the power method. 3 by default. Only used
if
svd_method
equals ‘randomized’  random_state : int, RandomState instance or None, optional (default=0)
 If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by np.random. Only used when
svd_method
equals ‘randomized’.
Attributes
components_
: array, [n_components, n_features] Components with maximum variance.
loglike_
: list, [n_iterations] The log likelihood at each iteration.
noise_variance_
: array, shape=(n_features,) The estimated noise variance for each feature.
n_iter_
: int Number of iterations run.
Examples
>>> from sklearn.datasets import load_digits >>> from sklearn.decomposition import FactorAnalysis >>> X, _ = load_digits(return_X_y=True) >>> transformer = FactorAnalysis(n_components=7, random_state=0) >>> X_transformed = transformer.fit_transform(X) >>> X_transformed.shape (1797, 7)
References
See also
 PCA: Principal component analysis is also a latent linear variable model
 which however assumes equal noise variance for each feature. This extra assumption makes probabilistic PCA faster as it can be computed in closed form.
 FastICA: Independent component analysis, a latent variable model with
 nonGaussian latent variables.
Full API documentation: FactorAnalysisScikitsLearnNode

class
mdp.nodes.
FunctionTransformerScikitsLearnNode
¶ Constructs a transformer from an arbitrary callable. This node has been automatically generated by wrapping the
sklearn.preprocessing._function_transformer.FunctionTransformer
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. A FunctionTransformer forwards its X (and optionally y) arguments to a userdefined function or function object and returns the result of this function. This is useful for stateless transformations such as taking the log of frequencies, doing custom scaling, etc.Note: If a lambda is used as the function, then the resulting transformer will not be pickleable.
New in version 0.17.
Read more in the User Guide.
Parameters
 func : callable, optional default=None
 The callable to use for the transformation. This will be passed the same arguments as transform, with args and kwargs forwarded. If func is None, then func will be the identity function.
 inverse_func : callable, optional default=None
 The callable to use for the inverse transformation. This will be passed the same arguments as inverse transform, with args and kwargs forwarded. If inverse_func is None, then inverse_func will be the identity function.
 validate : bool, optional default=True
Indicate that the input X array should be checked before calling
func
. The possibilities are: If False, there is no input validation.
 If True, then X will be converted to a 2dimensional NumPy array or sparse matrix. If the conversion is not possible an exception is raised.
Deprecated since version 0.20:
validate=True
as default will be replaced byvalidate=False
in 0.22. accept_sparse : boolean, optional
 Indicate that func accepts a sparse matrix as input. If validate is False, this has no effect. Otherwise, if accept_sparse is false, sparse matrix inputs will cause an exception to be raised.
 pass_y : bool, optional default=False
 Indicate that transform should forward the y argument to the inner callable.
 check_inverse : bool, default=True
Whether to check that or
func
followed byinverse_func
leads to the original inputs. It can be used for a sanity check, raising a warning when the condition is not fulfilled.New in version 0.20.
 kw_args : dict, optional
 Dictionary of additional keyword arguments to pass to func.
 inv_kw_args : dict, optional
 Dictionary of additional keyword arguments to pass to inverse_func.
Full API documentation: FunctionTransformerScikitsLearnNode

class
mdp.nodes.
LassoLarsICScikitsLearnNode
¶ Lasso model fit with Lars using BIC or AIC for model selection This node has been automatically generated by wrapping the
sklearn.linear_model.least_angle.LassoLarsIC
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The optimization objective for Lasso is:(1 / (2 * n_samples)) * y  Xw^2_2 + alpha * w_1
AIC is the Akaike information criterion and BIC is the Bayes Information criterion. Such criteria are useful to select the value of the regularization parameter by making a tradeoff between the goodness of fit and the complexity of the model. A good model should explain well the data while being simple.
Read more in the User Guide.
Parameters
 criterion : ‘bic’  ‘aic’
 The type of criterion to use.
 fit_intercept : boolean
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 verbose : boolean or integer, optional
 Sets the verbosity amount
 normalize : boolean, optional, default True
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  precompute : True  False  ‘auto’  arraylike
 Whether to use a precomputed Gram matrix to speed up
calculations. If set to
'auto'
let us decide. The Gram matrix can also be passed as argument.  max_iter : integer, optional
 Maximum number of iterations to perform. Can be used for early stopping.
 eps : float, optional
 The machineprecision regularization in the computation of the
Cholesky diagonal factors. Increase this for very illconditioned
systems. Unlike the
tol
parameter in some iterative optimizationbased algorithms, this parameter does not control the tolerance of the optimization.  copy_X : boolean, optional, default True
 If True, X will be copied; else, it may be overwritten.
 positive : boolean (default=False)
 Restrict coefficients to be >= 0. Be aware that you might want to
remove fit_intercept which is set True by default.
Under the positive restriction the model coefficients do not converge
to the ordinaryleastsquares solution for small values of alpha.
Only coefficients up to the smallest alpha value (
alphas_[alphas_ > 0.].min()
when fit_path=True) reached by the stepwise LarsLasso algorithm are typically in congruence with the solution of the coordinate descent Lasso estimator. As a consequence using LassoLarsIC only makes sense for problems where a sparse solution is expected and/or reached.
Attributes
coef_
: array, shape (n_features,) parameter vector (w in the formulation formula)
intercept_
: float independent term in decision function.
alpha_
: float the alpha parameter chosen by the information criterion
n_iter_
: int number of iterations run by lars_path to find the grid of alphas.
criterion_
: array, shape (n_alphas,) The value of the information criteria (‘aic’, ‘bic’) across all
alphas. The alpha which has the smallest information criterion is
chosen. This value is larger by a factor of
n_samples
compared to Eqns. 2.15 and 2.16 in (Zou et al, 2007).
Examples
>>> from sklearn import linear_model >>> reg = linear_model.LassoLarsIC(criterion='bic') >>> reg.fit([[1, 1], [0, 0], [1, 1]], [1.1111, 0, 1.1111]) ... LassoLarsIC(copy_X=True, criterion='bic', eps=..., fit_intercept=True, max_iter=500, normalize=True, positive=False, precompute='auto', verbose=False) >>> print(reg.coef_) [ 0. 1.11...]
Notes
The estimation of the number of degrees of freedom is given by:
“On the degrees of freedom of the lasso” Hui Zou, Trevor Hastie, and Robert Tibshirani Ann. Statist. Volume 35, Number 5 (2007), 21732192.
https://en.wikipedia.org/wiki/Akaike_information_criterion https://en.wikipedia.org/wiki/Bayesian_information_criterion
See also
lars_path, LassoLars, LassoLarsCV
Full API documentation: LassoLarsICScikitsLearnNode

class
mdp.nodes.
RFEScikitsLearnNode
¶ Feature ranking with recursive feature elimination. This node has been automatically generated by wrapping the
sklearn.feature_selection.rfe.RFE
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features and the importance of each feature is obtained either through acoef_
attribute or through afeature_importances_
attribute. Then, the least important features are pruned from current set of features. That procedure is recursively repeated on the pruned set until the desired number of features to select is eventually reached.Read more in the User Guide.
Parameters
 estimator : object
 A supervised learning estimator with a
fit
method that provides information about feature importance either through acoef_
attribute or through afeature_importances_
attribute.  n_features_to_select : int or None (default=None)
 The number of features to select. If None, half of the features are selected.
 step : int or float, optional (default=1)
 If greater than or equal to 1, then
step
corresponds to the (integer) number of features to remove at each iteration. If within (0.0, 1.0), thenstep
corresponds to the percentage (rounded down) of features to remove at each iteration.  verbose : int, (default=0)
 Controls verbosity of output.
Attributes
n_features_
: int The number of selected features.
support_
: array of shape [n_features] The mask of selected features.
ranking_
: array of shape [n_features] The feature ranking, such that
ranking_[i]
corresponds to the ranking position of the ith feature. Selected (i.e., estimated best) features are assigned rank 1. estimator_
: object The external estimator fit on the reduced dataset.
Examples
The following example shows how to retrieve the 5 right informative features in the Friedman #1 dataset.
>>> from sklearn.datasets import make_friedman1 >>> from sklearn.feature_selection import RFE >>> from sklearn.svm import SVR >>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0) >>> estimator = SVR(kernel="linear") >>> selector = RFE(estimator, 5, step=1) >>> selector = selector.fit(X, y) >>> selector.support_ array([ True, True, True, True, True, False, False, False, False, False]) >>> selector.ranking_ array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
See also
 RFECV : Recursive feature elimination with builtin crossvalidated
 selection of the best number of features
References
[1] Guyon, I., Weston, J., Barnhill, S., & Vapnik, V., “Gene selection for cancer classification using support vector machines”, Mach. Learn., 46(13), 389–422, 2002. Full API documentation: RFEScikitsLearnNode

class
mdp.nodes.
PCAScikitsLearnNode
¶ Principal component analysis (PCA) This node has been automatically generated by wrapping the
sklearn.decomposition.pca.PCA
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.It uses the LAPACK implementation of the full SVD or a randomized truncated SVD by the method of Halko et al. 2009, depending on the shape of the input data and the number of components to extract.
It can also use the scipy.sparse.linalg ARPACK implementation of the truncated SVD.
Notice that this class does not support sparse input. See
TruncatedSVD
for an alternative with sparse data.Read more in the User Guide.
Parameters
 n_components : int, float, None or string
Number of components to keep. if n_components is not set all components are kept:
n_components == min(n_samples, n_features)
If
n_components == 'mle'
andsvd_solver == 'full'
, Minka’s MLE is used to guess the dimension. Use ofn_components == 'mle'
will interpretsvd_solver == 'auto'
assvd_solver == 'full'
.If
0 < n_components < 1
andsvd_solver == 'full'
, select the number of components such that the amount of variance that needs to be explained is greater than the percentage specified by n_components.If
svd_solver == 'arpack'
, the number of components must be strictly less than the minimum of n_features and n_samples.Hence, the None case results in:
n_components == min(n_samples, n_features)  1
 copy : bool (default True)
 If False, data passed to fit are overwritten and running fit(X).transform(X) will not yield the expected results, use fit_transform(X) instead.
 whiten : bool, optional (default False)
When True (False by default) the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit componentwise variances.
Whitening will remove some information from the transformed signal (the relative variance scales of the components) but can sometime improve the predictive accuracy of the downstream estimators by making their data respect some hardwired assumptions.
 svd_solver : string {‘auto’, ‘full’, ‘arpack’, ‘randomized’}
auto :
 the solver is selected by a default policy based on X.shape and
 n_components: if the input data is larger than 500x500 and the
 number of components to extract is lower than 80% of the smallest
 dimension of the data, then the more efficient ‘randomized’
 method is enabled. Otherwise the exact full SVD is computed and
 optionally truncated afterwards.
full :
 run exact full SVD calling the standard LAPACK solver via
 scipy.linalg.svd and select the components by postprocessing
arpack :
 run SVD truncated to n_components calling ARPACK solver via
 scipy.sparse.linalg.svds. It requires strictly
 0 < n_components < min(X.shape)
randomized :
 run randomized SVD by the method of Halko et al.
New in version 0.18.0.
 tol : float >= 0, optional (default .0)
Tolerance for singular values computed by svd_solver == ‘arpack’.
New in version 0.18.0.
 iterated_power : int >= 0, or ‘auto’, (default ‘auto’)
Number of iterations for the power method computed by svd_solver == ‘randomized’.
New in version 0.18.0.
 random_state : int, RandomState instance or None, optional (default None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random. Used when
svd_solver
== ‘arpack’ or ‘randomized’.New in version 0.18.0.
Attributes
components_
: array, shape (n_components, n_features) Principal axes in feature space, representing the directions of
maximum variance in the data. The components are sorted by
explained_variance_
. explained_variance_
: array, shape (n_components,)The amount of variance explained by each of the selected components.
Equal to n_components largest eigenvalues of the covariance matrix of X.
New in version 0.18.
explained_variance_ratio_
: array, shape (n_components,)Percentage of variance explained by each of the selected components.
If
n_components
is not set then all components are stored and the sum of the ratios is equal to 1.0.singular_values_
: array, shape (n_components,) The singular values corresponding to each of the selected components.
The singular values are equal to the 2norms of the
n_components
variables in the lowerdimensional space. mean_
: array, shape (n_features,)Perfeature empirical mean, estimated from the training set.
Equal to X.mean(axis=0).
n_components_
: int The estimated number of components. When n_components is set to ‘mle’ or a number between 0 and 1 (with svd_solver == ‘full’) this number is estimated from input data. Otherwise it equals the parameter n_components, or the lesser value of n_features and n_samples if n_components is None.
noise_variance_
: floatThe estimated noise covariance following the Probabilistic PCA model from Tipping and Bishop 1999. See “Pattern Recognition and Machine Learning” by C. Bishop, 12.2.1 p. 574 or http://www.miketipping.com/papers/metmppca.pdf. It is required to compute the estimated data covariance and score samples.
Equal to the average of (min(n_features, n_samples)  n_components) smallest eigenvalues of the covariance matrix of X.
References
For n_components == ‘mle’, this class uses the method of Minka, T. P. “Automatic choice of dimensionality for PCA”. In NIPS, pp. 598604
Implements the probabilistic PCA model from:
`Tipping, M. E., and Bishop, C. M. (1999). “Probabilistic principal component analysis”. Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 61(3), 611622. via the score and score_samples methods. See http://www.miketipping.com/papers/metmppca.pdf
For svd_solver == ‘arpack’, refer to scipy.sparse.linalg.svds.
For svd_solver == ‘randomized’, see:
Halko, N., Martinsson, P. G., and Tropp, J. A. (2011). “Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions”. SIAM review, 53(2), 217288. and also Martinsson, P. G., Rokhlin, V., and Tygert, M. (2011). “A randomized algorithm for the decomposition of matrices”. Applied and Computational Harmonic Analysis, 30(1), 4768.
Examples
>>> import numpy as np >>> from sklearn.decomposition import PCA >>> X = np.array([[1, 1], [2, 1], [3, 2], [1, 1], [2, 1], [3, 2]]) >>> pca = PCA(n_components=2) >>> pca.fit(X) PCA(copy=True, iterated_power='auto', n_components=2, random_state=None, svd_solver='auto', tol=0.0, whiten=False) >>> print(pca.explained_variance_ratio_) [0.9924... 0.0075...] >>> print(pca.singular_values_) [6.30061... 0.54980...]
>>> pca = PCA(n_components=2, svd_solver='full') >>> pca.fit(X) PCA(copy=True, iterated_power='auto', n_components=2, random_state=None, svd_solver='full', tol=0.0, whiten=False) >>> print(pca.explained_variance_ratio_) [0.9924... 0.00755...] >>> print(pca.singular_values_) [6.30061... 0.54980...]
>>> pca = PCA(n_components=1, svd_solver='arpack') >>> pca.fit(X) PCA(copy=True, iterated_power='auto', n_components=1, random_state=None, svd_solver='arpack', tol=0.0, whiten=False) >>> print(pca.explained_variance_ratio_) [0.99244...] >>> print(pca.singular_values_) [6.30061...]
See also
KernelPCA SparsePCA TruncatedSVD IncrementalPCA
Full API documentation: PCAScikitsLearnNode

class
mdp.nodes.
MultiTaskLassoScikitsLearnNode
¶ Multitask Lasso model trained with L1/L2 mixednorm as regularizer. This node has been automatically generated by wrapping the
sklearn.linear_model.coordinate_descent.MultiTaskLasso
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The optimization objective for Lasso is:(1 / (2 * n_samples)) * Y  XW^2_Fro + alpha * W_21
Where:
W_21 = \sum_i \sqrt{\sum_j w_{ij}^2}
i.e. the sum of norm of each row.
Read more in the User Guide.
Parameters
 alpha : float, optional
 Constant that multiplies the L1/L2 term. Defaults to 1.0
 fit_intercept : boolean
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  copy_X : boolean, optional, default True
 If
True
, X will be copied; else, it may be overwritten.  max_iter : int, optional
 The maximum number of iterations
 tol : float, optional
 The tolerance for the optimization: if the updates are
smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
.  warm_start : bool, optional
 When set to
True
, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.  random_state : int, RandomState instance or None, optional, default None
 The seed of the pseudo random number generator that selects a random
feature to update. If int, random_state is the seed used by the random
number generator; If RandomState instance, random_state is the random
number generator; If None, the random number generator is the
RandomState instance used by np.random. Used when
selection
== ‘random’.  selection : str, default ‘cyclic’
 If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e4
Attributes
coef_
: array, shape (n_tasks, n_features) Parameter vector (W in the cost function formula).
Note that
coef_
stores the transpose ofW
,W.T
. intercept_
: array, shape (n_tasks,) independent term in decision function.
n_iter_
: int number of iterations run by the coordinate descent solver to reach the specified tolerance.
Examples
>>> from sklearn import linear_model >>> clf = linear_model.MultiTaskLasso(alpha=0.1) >>> clf.fit([[0,0], [1, 1], [2, 2]], [[0, 0], [1, 1], [2, 2]]) MultiTaskLasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000, normalize=False, random_state=None, selection='cyclic', tol=0.0001, warm_start=False) >>> print(clf.coef_) [[0.89393398 0. ] [0.89393398 0. ]] >>> print(clf.intercept_) [0.10606602 0.10606602]
See also
MultiTaskLasso : Multitask L1/L2 Lasso with builtin crossvalidation Lasso MultiTaskElasticNet
Notes
The algorithm used to fit the model is coordinate descent.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortrancontiguous numpy array.
Full API documentation: MultiTaskLassoScikitsLearnNode

class
mdp.nodes.
RandomizedLogisticRegressionScikitsLearnNode
¶ Randomized Logistic Regression This node has been automatically generated by wrapping the
sklearn.linear_model.randomized_l1.RandomizedLogisticRegression
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Randomized Logistic Regression works by subsampling the training data and fitting a L1penalized LogisticRegression model where the penalty of a random subset of coefficients has been scaled. By performing this double randomization several times, the method assigns high scores to features that are repeatedly selected across randomizations. This is known as stability selection. In short, features selected more often are considered good features.Parameters
 C : float or arraylike of shape [n_reg_parameter], optional, default=1
 The regularization parameter C in the LogisticRegression.
When C is an array, fit will take each regularization parameter in C
one by one for LogisticRegression and store results for each one
in
all_scores_
, where columns and rows represent corresponding reg_parameters and features.  scaling : float, optional, default=0.5
 The s parameter used to randomly scale the penalty of different features. Should be between 0 and 1.
 sample_fraction : float, optional, default=0.75
 The fraction of samples to be used in each randomized design. Should be between 0 and 1. If 1, all samples are used.
 n_resampling : int, optional, default=200
 Number of randomized models.
 selection_threshold : float, optional, default=0.25
 The score above which features should be selected.
 tol : float, optional, default=1e3
 tolerance for stopping criteria of LogisticRegression
 fit_intercept : boolean, optional, default=True
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 verbose : boolean or integer, optional
 Sets the verbosity amount
 normalize : boolean, optional, default True
 If True, the regressors X will be normalized before regression. This parameter is ignored when fit_intercept is set to False. When the regressors are normalized, note that this makes the hyperparameters learnt more robust and almost independent of the number of samples. The same property is not valid for standardized data. However, if you wish to standardize, please use preprocessing.StandardScaler before calling fit on an estimator with normalize=False.
 random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 n_jobs : int or None, optional (default=None)
 Number of CPUs to use during the resampling.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  pre_dispatch : int, or string, optional
Controls the number of jobs that get dispatched during parallel execution. Reducing this number can be useful to avoid an explosion of memory consumption when more jobs get dispatched than CPUs can process. This parameter can be:
 None, in which case all the jobs are immediately created and spawned. Use this for lightweight and fastrunning jobs, to avoid delays due to ondemand spawning of the jobs
 An int, giving the exact number of total jobs that are spawned
 A string, giving an expression as a function of n_jobs, as in ‘2*n_jobs’
 memory : None, str or object with the joblib.Memory interface, optional (default=None)
 Used for internal caching. By default, no caching is done. If a string is given, it is the path to the caching directory.
Attributes
scores_
: array, shape = [n_features] Feature scores between 0 and 1.
all_scores_
: array, shape = [n_features, n_reg_parameter] Feature scores between 0 and 1 for all values of the regularization parameter. The reference article suggests
scores_
is the max ofall_scores_
.
Examples
>>> from sklearn.linear_model import RandomizedLogisticRegression >>> randomized_logistic = RandomizedLogisticRegression()
References
Stability selection Nicolai Meinshausen, Peter Buhlmann Journal of the Royal Statistical Society: Series B Volume 72, Issue 4, pages 417473, September 2010 DOI: 10.1111/j.14679868.2010.00740.x
See also
RandomizedLasso, LogisticRegression
Full API documentation: RandomizedLogisticRegressionScikitsLearnNode

class
mdp.nodes.
SelectFweScikitsLearnNode
¶ Filter: Select the pvalues corresponding to Familywise error rate This node has been automatically generated by wrapping the
sklearn.feature_selection.univariate_selection.SelectFwe
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 score_func : callable
 Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). Default is f_classif (see below “See also”). The default function only works with classification tasks.
 alpha : float, optional
 The highest uncorrected pvalue for features to keep.
Examples
>>> from sklearn.datasets import load_breast_cancer >>> from sklearn.feature_selection import SelectFwe, chi2 >>> X, y = load_breast_cancer(return_X_y=True) >>> X.shape (569, 30) >>> X_new = SelectFwe(chi2, alpha=0.01).fit_transform(X, y) >>> X_new.shape (569, 15)
Attributes
scores_
: arraylike, shape=(n_features,) Scores of features.
pvalues_
: arraylike, shape=(n_features,) pvalues of feature scores.
See also
f_classif: ANOVA Fvalue between label/feature for classification tasks. chi2: Chisquared stats of nonnegative features for classification tasks. f_regression: Fvalue between label/feature for regression tasks. SelectPercentile: Select features based on percentile of the highest scores. SelectKBest: Select features based on the k highest scores. SelectFpr: Select features based on a false positive rate test. SelectFdr: Select features based on an estimated false discovery rate. GenericUnivariateSelect: Univariate feature selector with configurable mode.
Full API documentation: SelectFweScikitsLearnNode

class
mdp.nodes.
MultiTaskElasticNetScikitsLearnNode
¶ Multitask ElasticNet model trained with L1/L2 mixednorm as regularizer This node has been automatically generated by wrapping the
sklearn.linear_model.coordinate_descent.MultiTaskElasticNet
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The optimization objective for MultiTaskElasticNet is:(1 / (2 * n_samples)) * Y  XW_Fro^2 + alpha * l1_ratio * W_21 + 0.5 * alpha * (1  l1_ratio) * W_Fro^2
Where:
W_21 = sum_i sqrt(sum_j w_ij ^ 2)
i.e. the sum of norm of each row.
Read more in the User Guide.
Parameters
 alpha : float, optional
 Constant that multiplies the L1/L2 term. Defaults to 1.0
 l1_ratio : float
 The ElasticNet mixing parameter, with 0 < l1_ratio <= 1.
For l1_ratio = 1 the penalty is an L1/L2 penalty. For l1_ratio = 0 it
is an L2 penalty.
For
0 < l1_ratio < 1
, the penalty is a combination of L1/L2 and L2.  fit_intercept : boolean
 whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations (e.g. data is expected to be already centered).
 normalize : boolean, optional, default False
 This parameter is ignored when
fit_intercept
is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2norm. If you wish to standardize, please usesklearn.preprocessing.StandardScaler
before callingfit
on an estimator withnormalize=False
.  copy_X : boolean, optional, default True
 If
True
, X will be copied; else, it may be overwritten.  max_iter : int, optional
 The maximum number of iterations
 tol : float, optional
 The tolerance for the optimization: if the updates are
smaller than
tol
, the optimization code checks the dual gap for optimality and continues until it is smaller thantol
.  warm_start : bool, optional
 When set to
True
, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. See the Glossary.  random_state : int, RandomState instance or None, optional, default None
 The seed of the pseudo random number generator that selects a random
feature to update. If int, random_state is the seed used by the random
number generator; If RandomState instance, random_state is the random
number generator; If None, the random number generator is the
RandomState instance used by np.random. Used when
selection
== ‘random’.  selection : str, default ‘cyclic’
 If set to ‘random’, a random coefficient is updated every iteration rather than looping over features sequentially by default. This (setting to ‘random’) often leads to significantly faster convergence especially when tol is higher than 1e4.
Attributes
intercept_
: array, shape (n_tasks,) Independent term in decision function.
coef_
: array, shape (n_tasks, n_features) Parameter vector (W in the cost function formula). If a 1D y is
passed in at fit (non multitask usage),
coef_
is then a 1D array. Note thatcoef_
stores the transpose ofW
,W.T
. n_iter_
: int number of iterations run by the coordinate descent solver to reach the specified tolerance.
Examples
>>> from sklearn import linear_model >>> clf = linear_model.MultiTaskElasticNet(alpha=0.1) >>> clf.fit([[0,0], [1, 1], [2, 2]], [[0, 0], [1, 1], [2, 2]]) ... MultiTaskElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.5, max_iter=1000, normalize=False, random_state=None, selection='cyclic', tol=0.0001, warm_start=False) >>> print(clf.coef_) [[0.45663524 0.45612256] [0.45663524 0.45612256]] >>> print(clf.intercept_) [0.0872422 0.0872422]
See also
 MultiTaskElasticNet : Multitask L1/L2 ElasticNet with builtin
 crossvalidation.
ElasticNet MultiTaskLasso
Notes
The algorithm used to fit the model is coordinate descent.
To avoid unnecessary memory duplication the X argument of the fit method should be directly passed as a Fortrancontiguous numpy array.
Full API documentation: MultiTaskElasticNetScikitsLearnNode

class
mdp.nodes.
SparseCoderScikitsLearnNode
¶ Sparse coding This node has been automatically generated by wrapping the
sklearn.decomposition.dict_learning.SparseCoder
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Finds a sparse representation of data against a fixed, precomputed dictionary.Each row of the result is the solution to a sparse coding problem. The goal is to find a sparse array code such that:
X ~= code * dictionary
Read more in the User Guide.
Parameters
 dictionary : array, [n_components, n_features]
 The dictionary atoms used for sparse coding. Lines are assumed to be normalized to unit norm.
 transform_algorithm : {‘lasso_lars’, ‘lasso_cd’, ‘lars’, ‘omp’, ‘threshold’}
Algorithm used to transform the data:
 lars: uses the least angle regression method (linear_model.lars_path)
 lasso_lars: uses Lars to compute the Lasso solution
 lasso_cd: uses the coordinate descent method to compute the
 Lasso solution (linear_model.Lasso). lasso_lars will be faster if
 the estimated components are sparse.
 omp: uses orthogonal matching pursuit to estimate the sparse solution
 threshold: squashes to zero all coefficients less than alpha from
 the projection
dictionary * X'
 transform_n_nonzero_coefs : int,
0.1 * n_features
by default  Number of nonzero coefficients to target in each column of the solution. This is only used by algorithm=’lars’ and algorithm=’omp’ and is overridden by alpha in the omp case.
 transform_alpha : float, 1. by default
 If algorithm=’lasso_lars’ or algorithm=’lasso_cd’, alpha is the penalty applied to the L1 norm. If algorithm=’threshold’, alpha is the absolute value of the threshold below which coefficients will be squashed to zero. If algorithm=’omp’, alpha is the tolerance parameter: the value of the reconstruction error targeted. In this case, it overrides n_nonzero_coefs.
 split_sign : bool, False by default
 Whether to split the sparse feature vector into the concatenation of its negative part and its positive part. This can improve the performance of downstream classifiers.
 n_jobs : int or None, optional (default=None)
 Number of parallel jobs to run.
None
means 1 unless in ajoblib.parallel_backend
context.1
means using all processors. See Glossary for more details.  positive_code : bool
Whether to enforce positivity when finding the code.
New in version 0.20.
Attributes
components_
: array, [n_components, n_features] The unchanged dictionary atoms
See also
DictionaryLearning MiniBatchDictionaryLearning SparsePCA MiniBatchSparsePCA sparse_encode
Full API documentation: SparseCoderScikitsLearnNode

class
mdp.nodes.
StandardScalerScikitsLearnNode
¶ Standardize features by removing the mean and scaling to unit variance This node has been automatically generated by wrapping the
sklearn.preprocessing.data.StandardScaler
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. The standard score of a sample x is calculated as:z = (x  u) / swhere u is the mean of the training samples or zero if with_mean=False, and s is the standard deviation of the training samples or one if with_std=False.
Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.
Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data (e.g. Gaussian with 0 mean and unit variance).
For instance many elements used in the objective function of a learning algorithm (such as the RBF kernel of Support Vector Machines or the L1 and L2 regularizers of linear models) assume that all features are centered around 0 and have variance in the same order. If a feature has a variance that is orders of magnitude larger that others, it might dominate the objective function and make the estimator unable to learn from other features correctly as expected.
This scaler can also be applied to sparse CSR or CSC matrices by passing with_mean=False to avoid breaking the sparsity structure of the data.
Read more in the User Guide.
Parameters
 copy : boolean, optional, default True
 If False, try to avoid a copy and do inplace scaling instead. This is not guaranteed to always work inplace; e.g. if the data is not a NumPy array or scipy.sparse CSR matrix, a copy may still be returned.
 with_mean : boolean, True by default
 If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.
 with_std : boolean, True by default
 If True, scale the data to unit variance (or equivalently, unit standard deviation).
Attributes
scale_
: ndarray or None, shape (n_features,)Per feature relative scaling of the data. This is calculated using np.sqrt(var_). Equal to
None
whenwith_std=False
.New in version 0.17: scale_
mean_
: ndarray or None, shape (n_features,) The mean value for each feature in the training set.
Equal to
None
whenwith_mean=False
. var_
: ndarray or None, shape (n_features,) The variance for each feature in the training set. Used to compute
scale_. Equal to
None
whenwith_std=False
. n_samples_seen_
: int or array, shape (n_features,) The number of samples processed by the estimator for each feature.
If there are not missing samples, the
n_samples_seen
will be an integer, otherwise it will be an array. Will be reset on new calls to fit, but increments acrosspartial_fit
calls.
Examples
>>> from sklearn.preprocessing import StandardScaler >>> data = [[0, 0], [0, 0], [1, 1], [1, 1]] >>> scaler = StandardScaler() >>> print(scaler.fit(data)) StandardScaler(copy=True, with_mean=True, with_std=True) >>> print(scaler.mean_) [0.5 0.5] >>> print(scaler.transform(data)) [[1. 1.] [1. 1.] [ 1. 1.] [ 1. 1.]] >>> print(scaler.transform([[2, 2]])) [[3. 3.]]
See also
scale: Equivalent function without the estimator API.
sklearn.decomposition.PCA
 Further removes the linear correlation across features with ‘whiten=True’.
Notes
NaNs are treated as missing values: disregarded in fit, and maintained in transform.
We use a biased estimator for the standard deviation, equivalent to numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to affect model performance.
For a comparison of the different scalers, transformers, and normalizers, see examples/preprocessing/plot_all_scaling.py.
Full API documentation: StandardScalerScikitsLearnNode

class
mdp.nodes.
DecisionTreeClassifierScikitsLearnNode
¶ A decision tree classifier. This node has been automatically generated by wrapping the
sklearn.tree.tree.DecisionTreeClassifier
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 criterion : string, optional (default=”gini”)
 The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “entropy” for the information gain.
 splitter : string, optional (default=”best”)
 The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.
 max_depth : int or None, optional (default=None)
 The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
 min_samples_split : int, float, optional (default=2)
The minimum number of samples required to split an internal node:
 If int, then consider min_samples_split as the minimum number.
 If float, then min_samples_split is a fraction and ceil(min_samples_split * n_samples) are the minimum number of samples for each split.
Changed in version 0.18: Added float values for fractions.
 min_samples_leaf : int, float, optional (default=1)
The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least
min_samples_leaf
training samples in each of the left and right branches. This may have the effect of smoothing the model, especially in regression. If int, then consider min_samples_leaf as the minimum number.
 If float, then min_samples_leaf is a fraction and ceil(min_samples_leaf * n_samples) are the minimum number of samples for each node.
Changed in version 0.18: Added float values for fractions.
 min_weight_fraction_leaf : float, optional (default=0.)
 The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. Samples have equal weight when sample_weight is not provided.
 max_features : int, float, string or None, optional (default=None)
The number of features to consider when looking for the best split:
 If int, then consider max_features features at each split.
 If float, then max_features is a fraction and int(max_features * n_features) features are considered at each split.
 If “auto”, then max_features=sqrt(n_features).
 If “sqrt”, then max_features=sqrt(n_features).
 If “log2”, then max_features=log2(n_features).
 If None, then max_features=n_features.
Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than
max_features
features. random_state : int, RandomState instance or None, optional (default=None)
 If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
 max_leaf_nodes : int or None, optional (default=None)
 Grow a tree with
max_leaf_nodes
in bestfirst fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes.  min_impurity_decrease : float, optional (default=0.)
A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
The weighted impurity decrease equation is the following:
N_t / N * (impurity  N_t_R / N_t * right_impurity  N_t_L / N_t * left_impurity)
where
N
is the total number of samples,N_t
is the number of samples at the current node,N_t_L
is the number of samples in the left child, andN_t_R
is the number of samples in the right child.N
,N_t
,N_t_R
andN_t_L
all refer to the weighted sum, ifsample_weight
is passed.New in version 0.19.
 min_impurity_split : float, (default=1e7)
Threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Deprecated since version 0.19:
min_impurity_split
has been deprecated in favor ofmin_impurity_decrease
in 0.19. The default value ofmin_impurity_split
will change from 1e7 to 0 in 0.23 and it will be removed in 0.25. Usemin_impurity_decrease
instead. class_weight : dict, list of dicts, “balanced” or None, default=None
Weights associated with classes in the form
{class_label: weight}
. If not given, all classes are supposed to have weight one. For multioutput problems, a list of dicts can be provided in the same order as the columns of y.Note that for multioutput (including multilabel) weights should be defined for each class of every column in its own dict. For example, for fourclass multilabel classification weights should be [{0: 1, 1: 1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1}, {2:5}, {3:1}, {4:1}].
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as
n_samples / (n_classes * np.bincount(y))
For multioutput, the weights of each column of y will be multiplied.
Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified.
 presort : bool, optional (default=False)
 Whether to presort the data to speed up the finding of best splits in fitting. For the default settings of a decision tree on large datasets, setting this to true may slow down the training process. When using either a smaller dataset or a restricted depth, this may speed up the training.
Attributes
classes_
: array of shape = [n_classes] or a list of such arrays The classes labels (single output problem), or a list of arrays of class labels (multioutput problem).
feature_importances_
: array of shape = [n_features] The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4]_.
max_features_
: int, The inferred value of max_features.
n_classes_
: int or list The number of classes (for single output problems), or a list containing the number of classes for each output (for multioutput problems).
n_features_
: int The number of features when
fit
is performed. n_outputs_
: int The number of outputs when
fit
is performed. tree_
: Tree object The underlying Tree object. Please refer to
help(sklearn.tree._tree.Tree)
for attributes of Tree object and sphx_glr_auto_examples_tree_plot_unveil_tree_structure.py for basic usage of these attributes.
Notes
The default values for the parameters controlling the size of the trees (e.g.
max_depth
,min_samples_leaf
, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and
max_features=n_features
, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting,random_state
has to be fixed.See also
DecisionTreeRegressor
References
[1] https://en.wikipedia.org/wiki/Decision_tree_learning [2] L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984. [3] T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009. [4] L. Breiman, and A. Cutler, “Random Forests”, https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm Examples
>>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import cross_val_score >>> from sklearn.tree import DecisionTreeClassifier >>> clf = DecisionTreeClassifier(random_state=0) >>> iris = load_iris() >>> cross_val_score(clf, iris.data, iris.target, cv=10) ... ... array([ 1. , 0.93..., 0.86..., 0.93..., 0.93..., 0.93..., 0.93..., 1. , 0.93..., 1. ])
Full API documentation: DecisionTreeClassifierScikitsLearnNode

class
mdp.nodes.
GenericUnivariateSelectScikitsLearnNode
¶ Univariate feature selector with configurable strategy. This node has been automatically generated by wrapping the
sklearn.feature_selection.univariate_selection.GenericUnivariateSelect
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Read more in the User Guide.Parameters
 score_func : callable
 Function taking two arrays X and y, and returning a pair of arrays (scores, pvalues). For modes ‘percentile’ or ‘kbest’ it can return a single array scores.
 mode : {‘percentile’, ‘k_best’, ‘fpr’, ‘fdr’, ‘fwe’}
 Feature selection mode.
 param : float or int depending on the feature selection mode
 Parameter of the corresponding mode.
Attributes
scores_
: arraylike, shape=(n_features,) Scores of features.
pvalues_
: arraylike, shape=(n_features,) pvalues of feature scores, None if score_func returned scores only.
Examples
>>> from sklearn.datasets import load_breast_cancer >>> from sklearn.feature_selection import GenericUnivariateSelect, chi2 >>> X, y = load_breast_cancer(return_X_y=True) >>> X.shape (569, 30) >>> transformer = GenericUnivariateSelect(chi2, 'k_best', param=20) >>> X_new = transformer.fit_transform(X, y) >>> X_new.shape (569, 20)
See also
f_classif: ANOVA Fvalue between label/feature for classification tasks. mutual_info_classif: Mutual information for a discrete target. chi2: Chisquared stats of nonnegative features for classification tasks. f_regression: Fvalue between label/feature for regression tasks. mutual_info_regression: Mutual information for a continuous target. SelectPercentile: Select features based on percentile of the highest scores. SelectKBest: Select features based on the k highest scores. SelectFpr: Select features based on a false positive rate test. SelectFdr: Select features based on an estimated false discovery rate. SelectFwe: Select features based on familywise error rate.
Full API documentation: GenericUnivariateSelectScikitsLearnNode

class
mdp.nodes.
BernoulliNBScikitsLearnNode
¶ Naive Bayes classifier for multivariate Bernoulli models. This node has been automatically generated by wrapping the
sklearn.naive_bayes.BernoulliNB
class from thesklearn
library. The wrapped instance can be accessed through thescikits_alg
attribute. Like MultinomialNB, this classifier is suitable for discrete data. The difference is that while MultinomialNB works with occurrence counts, BernoulliNB is designed for binary/boolean features.Read more in the User Guide.
Parameters
 alpha : float, optional (default=1.0)
 Additive (Laplace/Lidstone) smoothing parameter (0 for no smoothing).
 binarize : float or None, optional (default=0.0)
 Threshold for binarizing (mapping to booleans) of sample features. If None, input is presumed to already consist of binary vectors.
 fit_prior : boolean, optional (default=True)
 Whether to learn class prior probabilities or not. If false, a uniform prior will be used.
 class_prior : arraylike, size=[n_classes,], optional (default=None)
 Prior probabilities of the classes. If specified the priors are not adjusted according to the data.
Attributes
class_log_prior_
: array, shape = [n_classes] Log probability of each class (smoothed).
feature_log_prob_
: array, shape = [n_classes, n_features] Empirical log probability of features given a class, P(x_iy).
class_count_
: array, shape = [n_classes] Number of samples encountered for each class during fitting. This value is weighted by the sample weight when provided.
feature_count_
: array, shape = [n_classes, n_features] Number of samples encountered for each (class, feature) during fitting. This value is weighted by the sample weight when provided.
Examples
>>> import numpy as np >>> X = np.random.randint(2, size=(6, 100)) >>> Y = np.array([1, 2, 3, 4, 4, 5]) >>> from sklearn.naive_bayes import BernoulliNB >>> clf = BernoulliNB() >>> clf.fit(X, Y) BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True) >>> print(clf.predict(X[2:3])) [3]
References
C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234265.