SOMClassifier

class susi.SOMClassifier(n_rows: int = 10, n_columns: int = 10, *, init_mode_unsupervised: str = 'random', init_mode_supervised: str = 'majority', n_iter_unsupervised: int = 1000, n_iter_supervised: int = 1000, train_mode_unsupervised: str = 'online', train_mode_supervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', neighborhood_mode_supervised: str = 'linear', learn_mode_unsupervised: str = 'min', learn_mode_supervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start: float = 0.5, learning_rate_end: float = 0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', missing_label_placeholder: int | str | None = None, do_class_weighting: bool = True, n_jobs: int | None = None, random_state=None, verbose: int | None = 0)[source]

Bases: SOMEstimator, ClassifierMixin

Supervised SOM for estimating discrete variables (= classification).

Parameters:
  • n_rows (int, optional (default=10)) – Number of rows for the SOM grid

  • n_columns (int, optional (default=10)) – Number of columns for the SOM grid

  • init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM

  • init_mode_supervised (str, optional (default=”majority”)) – Initialization mode of the classification SOM

  • n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM

  • n_iter_supervised (int, optional (default=1000)) – Number of iterations for the classification SOM

  • train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM

  • train_mode_supervised (str, optional (default=”online”)) – Training mode of the classification SOM

  • neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM

  • neighborhood_mode_supervised (str, optional (default=”linear”)) – Neighborhood mode of the classification SOM

  • learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM

  • learn_mode_supervised (str, optional (default=”min”)) – Learning mode of the classification SOM

  • distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”, “spectralangle”}. Note that “tanimoto” tends to be slow.

    New in version 1.1.1: Spectral angle metric.

  • learning_rate_start (float, optional (default=0.5)) – Learning rate start value

  • learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)

  • nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.

  • missing_label_placeholder (int or str or None, optional (default=None)) – Label placeholder for datapoints with no label. This is needed for semi-supervised learning.

  • do_class_weighting (bool, optional (default=True)) – If true, classes are weighted.

  • n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • verbose (int, optional (default=0)) – Controls the verbosity.

Variables:
  • node_list (np.ndarray of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes

  • radius_max (float, int) – Maximum radius of the neighborhood function

  • radius_min (float, int) – Minimum radius of the neighborhood function

  • unsuper_som (np.ndarray) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])

  • X (np.ndarray) – Input data

  • fitted (bool) – States if estimator is fitted to X

  • max_iterations (int) – Maximum number of iterations for the current training

  • bmus (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X.

  • placeholder_dict (dict) – Dict of placeholders for initializing nodes without mapped class.

  • n_features_in (int) – Number of input features in X.

  • classes (np.ndarray) – Unique classes in the dataset labels y.

  • class_counts (np.ndarray) – Number of datapoints per unique class in y.

  • class_dtype (type) – Type of a label in y.

fit(X: Sequence, y: Sequence | None = None)[source]

Fit classification SOM to the input data.

Parameters:
  • X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.

  • y (array-like matrix of shape = [n_samples, 1], optional) – The labels (ground truth) of the input samples

Returns:

self

Return type:

object

Examples

Load the SOM and fit it to your input data X and the labels y with:

>>> import susi
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
fit_transform(X: Sequence, y: Sequence | None = None) ndarray

Fit to the input data and transform it.

Parameters:
  • X (array-like matrix of shape = [n_samples, n_features]) – The training and prediction input samples.

  • y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples

Returns:

Predictions including the BMUs of each datapoint

Return type:

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClassifier()
>>> tuples = som.fit_transform(X, y)
get_bmu(datapoint: ndarray, som_array: ndarray) Tuple[int, int]

Get best matching unit (BMU) for datapoint.

Parameters:
  • datapoint (np.ndarray, shape=shape[1]) – Datapoint = one row of the dataset X

  • som_array (np.ndarray) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns:

Position of best matching unit (row, column)

Return type:

tuple, shape = (int, int)

get_bmus(X: ndarray, som_array: ndarray | None = None) List[Tuple[int, int]] | None

Get Best Matching Units for big datalist.

Parameters:
  • X (np.ndarray) – List of datapoints

  • som_array (np.ndarray, optional (default=`None`)) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns:

bmus – Position of best matching units (row, column) for each datapoint

Return type:

list of (int, int) tuples

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> bmu_list = som.get_bmus(X)
>>> plt.hist2d([x[0] for x in bmu_list], [x[1] for x in bmu_list]
get_clusters(X: ndarray) List[Tuple[int, int]] | None

Calculate the SOM nodes on the unsupervised SOM grid per datapoint.

Parameters:

X (np.ndarray) – Input data

Returns:

List of SOM nodes, one for each input datapoint

Return type:

list of tuples (int, int)

get_datapoints_from_node(node: Tuple[int, int]) List[int]

Get all datapoints of one node.

Parameters:

node (tuple, shape (int, int)) – Node for which the linked datapoints are calculated

Returns:

datapoints – List of indices of the datapoints that are linked to node

Return type:

list of int

get_estimation_map() ndarray

Return SOM grid with the estimated value on each node.

Returns:

super_som_ – Supervised SOM grid with estimated value on each node.

Return type:

np.ndarray

Examples

Fit the SOM on your data X, y:

>>> import susi
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
>>> estimation_map = som.get_estimation_map()
>>> plt.imshow(np.squeeze(estimation_map,) cmap="viridis_r")
get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

get_quantization_error(X: Sequence | None = None) float

Get quantization error for X (or the training data).

Parameters:

X (array-like matrix, optional (default=True)) – Samples of shape = [n_samples, n_features]. If None, the training data is used for the calculation.

Returns:

Mean quantization error over all datapoints.

Return type:

float

Raises:

RuntimeError – Raised if the SOM is not fitted yet.

get_u_matrix(mode: str = 'mean') ndarray

Calculate unified distance matrix (u-matrix).

Parameters:

mode (str, optional (default=”mean)) – Choice of the averaging algorithm

Returns:

u_matrix – U-matrix containing the distances between all nodes of the unsupervised SOM. Shape = (n_rows*2-1, n_columns*2-1)

Return type:

np.ndarray

Examples

Fit your SOM to input data X and then calculate the u-matrix with get_u_matrix(). You can plot the u-matrix then with e.g. pyplot.imshow().

>>> import susi
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> umat = som.get_u_matrix()
>>> plt.imshow(np.squeeze(umat))
predict(X: Sequence, y: Sequence | None = None) ndarray

Predict output of data X.

Parameters:
  • X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.

  • y (None, optional) – Ignored.

Returns:

y_pred – List of predicted values.

Return type:

list of float

Examples

Fit the SOM on your data X, y:

>>> import susi
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
>>> y_pred = som.predict(X)
predict_proba(X: Sequence, y: Sequence | None = None) ndarray[source]

Predict class probabilities for X.

New in version 1.1.3.

Parameters:
  • X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.

  • y (array-like matrix of shape = [n_samples, 1], optional) – The labels (ground truth) of the input samples

Returns:

List of probabilities of shape (n_samples, n_classes)

Return type:

np.ndarray

score(X, y, sample_weight=None)

Return the mean accuracy on the given test data and labels.

In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – Mean accuracy of self.predict(X) w.r.t. y.

Return type:

float

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SOMClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: Sequence, y: Sequence | None = None) ndarray

Transform input data.

Parameters:
  • X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.

  • y (None, optional) – Ignored.

Returns:

Predictions including the BMUs of each datapoint

Return type:

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> X_transformed = som.transform(X)