SOMRegressor

class susi.SOMRegressor(n_rows: int = 10, n_columns: int = 10, *, init_mode_unsupervised: str = 'random', init_mode_supervised: str = 'random', n_iter_unsupervised: int = 1000, n_iter_supervised: int = 1000, train_mode_unsupervised: str = 'online', train_mode_supervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', neighborhood_mode_supervised: str = 'linear', learn_mode_unsupervised: str = 'min', learn_mode_supervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start: float = 0.5, learning_rate_end: float = 0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', missing_label_placeholder: int | str | None = None, n_jobs: int | None = None, random_state=None, verbose: int | None = 0)[source]

Bases: RegressorMixin, SOMEstimator

Supervised SOM for estimating continuous variables (= regression).

Parameters:

n_rows (int, optional (default=10)) – Number of rows for the SOM grid
n_columns (int, optional (default=10)) – Number of columns for the SOM grid
init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM
init_mode_supervised (str, optional (default=”random”)) – Initialization mode of the supervised SOM
n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM
n_iter_supervised (int, optional (default=1000)) – Number of iterations for the supervised SOM
train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM
train_mode_supervised (str, optional (default=”online”)) – Training mode of the supervised SOM
neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM
neighborhood_mode_supervised (str, optional (default=”linear”)) – Neighborhood mode of the supervised SOM
learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM
learn_mode_supervised (str, optional (default=”min”)) – Learning mode of the supervised SOM
distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”, “spectralangle”}. Note that “tanimoto” tends to be slow.

Added in version 1.1.1: Spectral angle metric.
learning_rate_start (float, optional (default=0.5)) – Learning rate start value
learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)
nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.
missing_label_placeholder (int or str or None, optional (default=None)) – Label placeholder for datapoints with no label. This is needed for semi-supervised learning.
n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (int, optional (default=0)) – Controls the verbosity.

Variables:

node_list (np.ndarray of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes
radius_max (float, int) – Maximum radius of the neighborhood function
radius_min (float, int) – Minimum radius of the neighborhood function
unsuper_som (np.ndarray) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])
X (np.ndarray) – Input data
fitted (bool) – States if estimator is fitted to X
max_iterations (int) – Maximum number of iterations for the current training
bmus (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X
sample_weights (np.ndarray) – Sample weights.
n_regression_vars (int) – Number of regression variables. In most examples, this equals one.
n_features_in (int) – Number of input features

fit(X: Sequence, y: Sequence | None = None)

Fit supervised SOM to the input data.

Parameters:

X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples

Returns:

self

Return type:

object

Examples

Load the SOM and fit it to your input data X and the labels y with:

>>> import susi
>>> som = susi.SOMRegressor()
>>> som.fit(X, y)

fit_transform(X: Sequence, y: Sequence | None = None) → ndarray

Fit to the input data and transform it.

Parameters:

X (array-like matrix of shape = [n_samples, n_features]) – The training and prediction input samples.
y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples

Returns:

Predictions including the BMUs of each datapoint

Return type:

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClassifier()
>>> tuples = som.fit_transform(X, y)

get_bmu(datapoint: ndarray, som_array: ndarray) → Tuple[int, int]

Get best matching unit (BMU) for datapoint.

Parameters:

datapoint (np.ndarray, shape=shape[1]) – Datapoint = one row of the dataset X
som_array (np.ndarray) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns:

Position of best matching unit (row, column)

Return type:

tuple, shape = (int, int)

get_bmus(X: ndarray, som_array: ndarray | None = None) → List[Tuple[int, int]] | None

Get Best Matching Units for big datalist.

Parameters:

X (np.ndarray) – List of datapoints
som_array (np.ndarray, optional (default=`None`)) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns:

bmus – Position of best matching units (row, column) for each datapoint

Return type:

list of (int, int) tuples

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> bmu_list = som.get_bmus(X)
>>> plt.hist2d([x[0] for x in bmu_list], [x[1] for x in bmu_list]

get_clusters(X: ndarray) → List[Tuple[int, int]] | None

Calculate the SOM nodes on the unsupervised SOM grid per datapoint.

Parameters:: X (np.ndarray) – Input data
Returns:: List of SOM nodes, one for each input datapoint
Return type:: list of tuples (int, int)

get_datapoints_from_node(node: Tuple[int, int]) → List[int]

Get all datapoints of one node.

Parameters:: node (tuple, shape (int, int)) – Node for which the linked datapoints are calculated
Returns:: datapoints – List of indices of the datapoints that are linked to node
Return type:: list of int

get_estimation_map() → ndarray

Return SOM grid with the estimated value on each node.

Returns:: super_som_ – Supervised SOM grid with estimated value on each node.
Return type:: np.ndarray

Examples

Fit the SOM on your data X, y:

>>> import susi
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
>>> estimation_map = som.get_estimation_map()
>>> plt.imshow(np.squeeze(estimation_map,) cmap="viridis_r")

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

get_quantization_error(X: Sequence | None = None) → float

Get quantization error for X (or the training data).

Parameters:: X (array-like matrix, optional (default=True)) – Samples of shape = [n_samples, n_features]. If None, the training data is used for the calculation.
Returns:: Mean quantization error over all datapoints.
Return type:: float
Raises:: RuntimeError – Raised if the SOM is not fitted yet.

get_u_matrix(mode: str = 'mean') → ndarray

Calculate unified distance matrix (u-matrix).

Parameters:: mode (str, optional (default=”mean)) – Choice of the averaging algorithm
Returns:: u_matrix – U-matrix containing the distances between all nodes of the unsupervised SOM. Shape = (n_rows*2-1, n_columns*2-1)
Return type:: np.ndarray

Examples

Fit your SOM to input data X and then calculate the u-matrix with get_u_matrix(). You can plot the u-matrix then with e.g. pyplot.imshow().

>>> import susi
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> umat = som.get_u_matrix()
>>> plt.imshow(np.squeeze(umat))

predict(X: Sequence, y: Sequence | None = None) → ndarray

Predict output of data X.

Parameters:

X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.

Returns:

y_pred – List of predicted values.

Return type:

list of float

Examples

Fit the SOM on your data X, y:

>>> import susi
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
>>> y_pred = som.predict(X)

score(X, y, sample_weight=None)

Return the coefficient of determination of the prediction.

The coefficient of determination $R^2$ is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – $R^2$ of self.predict(X) w.r.t. y.

Return type:

float

Notes

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_output(*, transform=None)

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”, “polars”}, default=None) – Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SOMRegressor

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

transform(X: Sequence, y: Sequence | None = None) → ndarray

Transform input data.

Parameters:

X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.

Returns:

Predictions including the BMUs of each datapoint

Return type:

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> X_transformed = som.transform(X)