Convert a collection of raw documents to a matrix of token counts
This node has been automatically generated by wrapping the scikits.learn.feature_extraction.text.CountVectorizer
class
from the sklearn
library. The wrapped instance can be accessed
through the scikits_alg
attribute.
This implementation produces a sparse representation of the counts using
scipy.sparse.coo_matrix.
If you do not provide an a-priori dictionary and you do not use
an analyzer that does some kind of feature selection then the number of
features (the vocabulary size found by analysing the data) might be very
large and the count vectors might not fit in memory.
For this case it is either recommended to use the sparse.CountVectorizer
variant of this class or a HashingVectorizer that will reduce the
dimensionality to an arbitrary number by using random projection.
|
__init__(self,
input_dim=None,
output_dim=None,
dtype=None,
**kwargs)
Convert a collection of raw documents to a matrix of token counts
This node has been automatically generated by wrapping the scikits.learn.feature_extraction.text.CountVectorizer class
from the sklearn library. The wrapped instance can be accessed
through the scikits_alg attribute.
This implementation produces a sparse representation of the counts using
scipy.sparse.coo_matrix. |
|
|
|
|
list
|
_get_supported_dtypes(self)
Return the list of dtypes supported by this node.
The types can be specified in any format allowed by numpy.dtype. |
|
|
|
_stop_training(self,
**kwargs)
Concatenate the collected data in a single array. |
|
|
|
execute(self,
x)
Extract token counts out of raw text documents
This node has been automatically generated by wrapping the scikits.learn.feature_extraction.text.CountVectorizer class
from the sklearn library. The wrapped instance can be accessed
through the scikits_alg attribute.
Parameters |
|
|
|
stop_training(self,
**kwargs)
Learn a vocabulary dictionary of all tokens in the raw documents
This node has been automatically generated by wrapping the scikits.learn.feature_extraction.text.CountVectorizer class
from the sklearn library. The wrapped instance can be accessed
through the scikits_alg attribute.
Parameters |
|
|
Inherited from unreachable.newobject :
__long__ ,
__native__ ,
__nonzero__ ,
__unicode__ ,
next
Inherited from object :
__delattr__ ,
__format__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__ ,
__sizeof__ ,
__subclasshook__
|
|
_train(self,
*args)
Collect all input data in a list. |
|
|
|
train(self,
*args)
Collect all input data in a list. |
|
|
|
|
|
__call__(self,
x,
*args,
**kwargs)
Calling an instance of Node is equivalent to calling
its execute method. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_refcast(self,
x)
Helper function to cast arrays to the internal dtype. |
|
|
|
|
|
|
|
|
|
copy(self,
protocol=None)
Return a deep copy of the node. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
inverse(self,
y,
*args,
**kwargs)
Invert y . |
|
|
|
is_training(self)
Return True if the node is in the training phase,
False otherwise. |
|
|
|
save(self,
filename,
protocol=-1)
Save a pickled serialization of the node to filename .
If filename is None, return a string. |
|
|
|
set_dtype(self,
t)
Set internal structures' dtype. |
|
|
|
|
|
|