SOMRegressor

class susi.SOMRegressor(n_rows: int = 10, n_columns: int = 10, *, init_mode_unsupervised: str = 'random', init_mode_supervised: str = 'random', n_iter_unsupervised: int = 1000, n_iter_supervised: int = 1000, train_mode_unsupervised: str = 'online', train_mode_supervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', neighborhood_mode_supervised: str = 'linear', learn_mode_unsupervised: str = 'min', learn_mode_supervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start: float = 0.5, learning_rate_end: float = 0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', missing_label_placeholder: int | str | None = None, n_jobs: int | None = None, random_state=None, verbose: int | None = 0)[source]

Bases: SOMEstimator, RegressorMixin

Supervised SOM for estimating continuous variables (= regression).

Parameters:
  • n_rows (int, optional (default=10)) – Number of rows for the SOM grid

  • n_columns (int, optional (default=10)) – Number of columns for the SOM grid

  • init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM

  • init_mode_supervised (str, optional (default=”random”)) – Initialization mode of the supervised SOM

  • n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM

  • n_iter_supervised (int, optional (default=1000)) – Number of iterations for the supervised SOM

  • train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM

  • train_mode_supervised (str, optional (default=”online”)) – Training mode of the supervised SOM

  • neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM

  • neighborhood_mode_supervised (str, optional (default=”linear”)) – Neighborhood mode of the supervised SOM

  • learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM

  • learn_mode_supervised (str, optional (default=”min”)) – Learning mode of the supervised SOM

  • distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”, “spectralangle”}. Note that “tanimoto” tends to be slow.

    New in version 1.1.1: Spectral angle metric.

  • learning_rate_start (float, optional (default=0.5)) – Learning rate start value

  • learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)

  • nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.

  • missing_label_placeholder (int or str or None, optional (default=None)) – Label placeholder for datapoints with no label. This is needed for semi-supervised learning.

  • n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • verbose (int, optional (default=0)) – Controls the verbosity.

Variables:
  • node_list (np.ndarray of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes

  • radius_max (float, int) – Maximum radius of the neighborhood function

  • radius_min (float, int) – Minimum radius of the neighborhood function

  • unsuper_som (np.ndarray) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])

  • X (np.ndarray) – Input data

  • fitted (bool) – States if estimator is fitted to X

  • max_iterations (int) – Maximum number of iterations for the current training

  • bmus (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X

  • sample_weights (np.ndarray) – Sample weights.

  • n_regression_vars (int) – Number of regression variables. In most examples, this equals one.

  • n_features_in (int) – Number of input features

fit(X: Sequence, y: Sequence | None = None)

Fit supervised SOM to the input data.

Parameters:
  • X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.

  • y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples

Returns:

self

Return type:

object

Examples

Load the SOM and fit it to your input data X and the labels y with:

>>> import susi
>>> som = susi.SOMRegressor()
>>> som.fit(X, y)
fit_transform(X: Sequence, y: Sequence | None = None) ndarray

Fit to the input data and transform it.

Parameters:
  • X (array-like matrix of shape = [n_samples, n_features]) – The training and prediction input samples.

  • y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples

Returns:

Predictions including the BMUs of each datapoint

Return type:

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClassifier()
>>> tuples = som.fit_transform(X, y)
get_bmu(datapoint: ndarray, som_array: ndarray) Tuple[int, int]

Get best matching unit (BMU) for datapoint.

Parameters:
  • datapoint (np.ndarray, shape=shape[1]) – Datapoint = one row of the dataset X

  • som_array (np.ndarray) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns:

Position of best matching unit (row, column)

Return type:

tuple, shape = (int, int)

get_bmus(X: ndarray, som_array: ndarray | None = None) List[Tuple[int, int]] | None

Get Best Matching Units for big datalist.

Parameters:
  • X (np.ndarray) – List of datapoints

  • som_array (np.ndarray, optional (default=`None`)) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns:

bmus – Position of best matching units (row, column) for each datapoint

Return type:

list of (int, int) tuples

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> bmu_list = som.get_bmus(X)
>>> plt.hist2d([x[0] for x in bmu_list], [x[1] for x in bmu_list]
get_clusters(X: ndarray) List[Tuple[int, int]] | None

Calculate the SOM nodes on the unsupervised SOM grid per datapoint.

Parameters:

X (np.ndarray) – Input data

Returns:

List of SOM nodes, one for each input datapoint

Return type:

list of tuples (int, int)

get_datapoints_from_node(node: Tuple[int, int]) List[int]

Get all datapoints of one node.

Parameters:

node (tuple, shape (int, int)) – Node for which the linked datapoints are calculated

Returns:

datapoints – List of indices of the datapoints that are linked to node

Return type:

list of int

get_estimation_map() ndarray

Return SOM grid with the estimated value on each node.

Returns:

super_som_ – Supervised SOM grid with estimated value on each node.

Return type:

np.ndarray

Examples

Fit the SOM on your data X, y:

>>> import susi
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
>>> estimation_map = som.get_estimation_map()
>>> plt.imshow(np.squeeze(estimation_map,) cmap="viridis_r")
get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:

routing – A MetadataRequest encapsulating routing information.

Return type:

MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

get_quantization_error(X: Sequence | None = None) float

Get quantization error for X (or the training data).

Parameters:

X (array-like matrix, optional (default=True)) – Samples of shape = [n_samples, n_features]. If None, the training data is used for the calculation.

Returns:

Mean quantization error over all datapoints.

Return type:

float

Raises:

RuntimeError – Raised if the SOM is not fitted yet.

get_u_matrix(mode: str = 'mean') ndarray

Calculate unified distance matrix (u-matrix).

Parameters:

mode (str, optional (default=”mean)) – Choice of the averaging algorithm

Returns:

u_matrix – U-matrix containing the distances between all nodes of the unsupervised SOM. Shape = (n_rows*2-1, n_columns*2-1)

Return type:

np.ndarray

Examples

Fit your SOM to input data X and then calculate the u-matrix with get_u_matrix(). You can plot the u-matrix then with e.g. pyplot.imshow().

>>> import susi
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> umat = som.get_u_matrix()
>>> plt.imshow(np.squeeze(umat))
predict(X: Sequence, y: Sequence | None = None) ndarray

Predict output of data X.

Parameters:
  • X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.

  • y (None, optional) – Ignored.

Returns:

y_pred – List of predicted values.

Return type:

list of float

Examples

Fit the SOM on your data X, y:

>>> import susi
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
>>> y_pred = som.predict(X)
score(X, y, sample_weight=None)

Return the coefficient of determination of the prediction.

The coefficient of determination \(R^2\) is defined as \((1 - \frac{u}{v})\), where \(u\) is the residual sum of squares ((y_true - y_pred)** 2).sum() and \(v\) is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a \(R^2\) score of 0.0.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.

  • y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.

  • sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score\(R^2\) of self.predict(X) w.r.t. y.

Return type:

float

Notes

The \(R^2\) score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Estimator parameters.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SOMRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

transform(X: Sequence, y: Sequence | None = None) ndarray

Transform input data.

Parameters:
  • X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.

  • y (None, optional) – Ignored.

Returns:

Predictions including the BMUs of each datapoint

Return type:

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> X_transformed = som.transform(X)