SOMClassifier
- class susi.SOMClassifier(n_rows: int = 10, n_columns: int = 10, *, init_mode_unsupervised: str = 'random', init_mode_supervised: str = 'majority', n_iter_unsupervised: int = 1000, n_iter_supervised: int = 1000, train_mode_unsupervised: str = 'online', train_mode_supervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', neighborhood_mode_supervised: str = 'linear', learn_mode_unsupervised: str = 'min', learn_mode_supervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start: float = 0.5, learning_rate_end: float = 0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', missing_label_placeholder: int | str | None = None, do_class_weighting: bool = True, n_jobs: int | None = None, random_state=None, verbose: int | None = 0)[source]
Bases:
SOMEstimator
,ClassifierMixin
Supervised SOM for estimating discrete variables (= classification).
- Parameters:
n_rows (int, optional (default=10)) – Number of rows for the SOM grid
n_columns (int, optional (default=10)) – Number of columns for the SOM grid
init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM
init_mode_supervised (str, optional (default=”majority”)) – Initialization mode of the classification SOM
n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM
n_iter_supervised (int, optional (default=1000)) – Number of iterations for the classification SOM
train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM
train_mode_supervised (str, optional (default=”online”)) – Training mode of the classification SOM
neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM
neighborhood_mode_supervised (str, optional (default=”linear”)) – Neighborhood mode of the classification SOM
learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM
learn_mode_supervised (str, optional (default=”min”)) – Learning mode of the classification SOM
distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”, “spectralangle”}. Note that “tanimoto” tends to be slow.
New in version 1.1.1: Spectral angle metric.
learning_rate_start (float, optional (default=0.5)) – Learning rate start value
learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)
nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.
missing_label_placeholder (int or str or None, optional (default=None)) – Label placeholder for datapoints with no label. This is needed for semi-supervised learning.
do_class_weighting (bool, optional (default=True)) – If true, classes are weighted.
n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (int, optional (default=0)) – Controls the verbosity.
- Variables:
node_list (np.ndarray of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes
radius_max (float, int) – Maximum radius of the neighborhood function
radius_min (float, int) – Minimum radius of the neighborhood function
unsuper_som (np.ndarray) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])
X (np.ndarray) – Input data
fitted (bool) – States if estimator is fitted to X
max_iterations (int) – Maximum number of iterations for the current training
bmus (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X.
placeholder_dict (dict) – Dict of placeholders for initializing nodes without mapped class.
n_features_in (int) – Number of input features in X.
classes (np.ndarray) – Unique classes in the dataset labels y.
class_counts (np.ndarray) – Number of datapoints per unique class in y.
class_dtype (type) – Type of a label in y.
- fit(X: Sequence, y: Sequence | None = None)[source]
Fit classification SOM to the input data.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (array-like matrix of shape = [n_samples, 1], optional) – The labels (ground truth) of the input samples
- Returns:
self
- Return type:
Examples
Load the SOM and fit it to your input data X and the labels y with:
>>> import susi >>> som = susi.SOMClassifier() >>> som.fit(X, y)
- fit_transform(X: Sequence, y: Sequence | None = None) ndarray
Fit to the input data and transform it.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The training and prediction input samples.
y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples
- Returns:
Predictions including the BMUs of each datapoint
- Return type:
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> som = susi.SOMClassifier() >>> tuples = som.fit_transform(X, y)
- get_bmu(datapoint: ndarray, som_array: ndarray) Tuple[int, int]
Get best matching unit (BMU) for datapoint.
- get_bmus(X: ndarray, som_array: ndarray | None = None) List[Tuple[int, int]] | None
Get Best Matching Units for big datalist.
- Parameters:
X (np.ndarray) – List of datapoints
som_array (np.ndarray, optional (default=`None`)) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])
- Returns:
bmus – Position of best matching units (row, column) for each datapoint
- Return type:
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> import matplotlib.pyplot as plt >>> som = susi.SOMClustering() >>> som.fit(X) >>> bmu_list = som.get_bmus(X) >>> plt.hist2d([x[0] for x in bmu_list], [x[1] for x in bmu_list]
- get_clusters(X: ndarray) List[Tuple[int, int]] | None
Calculate the SOM nodes on the unsupervised SOM grid per datapoint.
- get_estimation_map() ndarray
Return SOM grid with the estimated value on each node.
- Returns:
super_som_ – Supervised SOM grid with estimated value on each node.
- Return type:
np.ndarray
Examples
Fit the SOM on your data X, y:
>>> import susi >>> import matplotlib.pyplot as plt >>> som = susi.SOMClassifier() >>> som.fit(X, y) >>> estimation_map = som.get_estimation_map() >>> plt.imshow(np.squeeze(estimation_map,) cmap="viridis_r")
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequest
encapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
- get_quantization_error(X: Sequence | None = None) float
Get quantization error for X (or the training data).
- Parameters:
X (array-like matrix, optional (default=True)) – Samples of shape = [n_samples, n_features]. If None, the training data is used for the calculation.
- Returns:
Mean quantization error over all datapoints.
- Return type:
- Raises:
RuntimeError – Raised if the SOM is not fitted yet.
- get_u_matrix(mode: str = 'mean') ndarray
Calculate unified distance matrix (u-matrix).
- Parameters:
mode (str, optional (default=”mean)) – Choice of the averaging algorithm
- Returns:
u_matrix – U-matrix containing the distances between all nodes of the unsupervised SOM. Shape = (n_rows*2-1, n_columns*2-1)
- Return type:
np.ndarray
Examples
Fit your SOM to input data X and then calculate the u-matrix with get_u_matrix(). You can plot the u-matrix then with e.g. pyplot.imshow().
>>> import susi >>> import numpy as np >>> import matplotlib.pyplot as plt >>> som = susi.SOMClustering() >>> som.fit(X) >>> umat = som.get_u_matrix() >>> plt.imshow(np.squeeze(umat))
- predict(X: Sequence, y: Sequence | None = None) ndarray
Predict output of data X.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.
- Returns:
y_pred – List of predicted values.
- Return type:
Examples
Fit the SOM on your data X, y:
>>> import susi >>> som = susi.SOMClassifier() >>> som.fit(X, y) >>> y_pred = som.predict(X)
- predict_proba(X: Sequence, y: Sequence | None = None) ndarray [source]
Predict class probabilities for X.
New in version 1.1.3.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (array-like matrix of shape = [n_samples, 1], optional) – The labels (ground truth) of the input samples
- Returns:
List of probabilities of shape (n_samples, n_classes)
- Return type:
np.ndarray
- score(X, y, sample_weight=None)
Return the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – Mean accuracy of
self.predict(X)
w.r.t. y.- Return type:
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SOMClassifier
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weight
parameter inscore
.- Returns:
self – The updated object.
- Return type:
- transform(X: Sequence, y: Sequence | None = None) ndarray
Transform input data.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.
- Returns:
Predictions including the BMUs of each datapoint
- Return type:
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> som = susi.SOMClustering() >>> som.fit(X) >>> X_transformed = som.transform(X)