SOMRegressor

class susi.SOMRegressor(n_rows: int = 10, n_columns: int = 10, *, init_mode_unsupervised: str = 'random', init_mode_supervised: str = 'random', n_iter_unsupervised: int = 1000, n_iter_supervised: int = 1000, train_mode_unsupervised: str = 'online', train_mode_supervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', neighborhood_mode_supervised: str = 'linear', learn_mode_unsupervised: str = 'min', learn_mode_supervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start: float = 0.5, learning_rate_end: float = 0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', missing_label_placeholder: int | str | None = None, n_jobs: int | None = None, random_state=None, verbose: int | None = 0)[source]

Bases: RegressorMixin, SOMEstimator

Supervised SOM for estimating continuous variables (= regression).

Parameters:

n_rows (int, optional (default=10)) – Number of rows for the SOM grid
n_columns (int, optional (default=10)) – Number of columns for the SOM grid
init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM
init_mode_supervised (str, optional (default=”random”)) – Initialization mode of the supervised SOM
n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM
n_iter_supervised (int, optional (default=1000)) – Number of iterations for the supervised SOM
train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM
train_mode_supervised (str, optional (default=”online”)) – Training mode of the supervised SOM
neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM
neighborhood_mode_supervised (str, optional (default=”linear”)) – Neighborhood mode of the supervised SOM
learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM
learn_mode_supervised (str, optional (default=”min”)) – Learning mode of the supervised SOM
distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”, “spectralangle”}. Note that “tanimoto” tends to be slow.

Added in version 1.1.1: Spectral angle metric.
learning_rate_start (float, optional (default=0.5)) – Learning rate start value
learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)
nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.
missing_label_placeholder (int or str or None, optional (default=None)) – Label placeholder for datapoints with no label. This is needed for semi-supervised learning.
n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (int, optional (default=0)) – Controls the verbosity.

Variables:

node_list (np.ndarray of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes
radius_max (float, int) – Maximum radius of the neighborhood function
radius_min (float, int) – Minimum radius of the neighborhood function
unsuper_som (np.ndarray) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])
X (np.ndarray) – Input data
fitted (bool) – States if estimator is fitted to X
max_iterations (int) – Maximum number of iterations for the current training
bmus (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X
sample_weights (np.ndarray) – Sample weights.
n_regression_vars (int) – Number of regression variables. In most examples, this equals one.
n_features_in (int) – Number of input features

fit(X: Sequence, y: Sequence | None = None)

Fit supervised SOM to the input data.

Parameters:

X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples

Returns:

self

Return type:

object

Examples

Load the SOM and fit it to your input data X and the labels y with:

>>> import susi
>>> som = susi.SOMRegressor()
>>> som.fit(X, y)

fit_transform(X: Sequence, y: Sequence | None = None) → ndarray

Fit to the input data and transform it.

Parameters:

X (array-like matrix of shape = [n_samples, n_features]) – The training and prediction input samples.
y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples

Returns:

Predictions including the BMUs of each datapoint

Return type:

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClassifier()
>>> tuples = som.fit_transform(X, y)

get_bmu(datapoint: ndarray, som_array: ndarray) → Tuple[int, int]

Get best matching unit (BMU) for datapoint.

Parameters:

datapoint (np.ndarray, shape=shape[1]) – Datapoint = one row of the dataset X
som_array (np.ndarray) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns:

Position of best matching unit (row, column)

Return type:

tuple, shape = (int, int)

get_bmus(X: ndarray, som_array: ndarray | None = None) → List[Tuple[int, int]] | None

Get Best Matching Units for big datalist.

Parameters:

X (np.ndarray) – List of datapoints
som_array (np.ndarray, optional (default=`None`)) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns:

bmus – Position of best matching units (row, column) for each datapoint

Return type:

list of (int, int) tuples

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> bmu_list = som.get_bmus(X)
>>> plt.hist2d([x[0] for x in bmu_list], [x[1] for x in bmu_list]

get_clusters(X: ndarray) → List[Tuple[int, int]] | None

Calculate the SOM nodes on the unsupervised SOM grid per datapoint.

Parameters:: X (np.ndarray) – Input data
Returns:: List of SOM nodes, one for each input datapoint
Return type:: list of tuples (int, int)

get_datapoints_from_node(node: Tuple[int, int]) → List[int]

Get all datapoints of one node.

Parameters:: node (tuple, shape (int, int)) – Node for which the linked datapoints are calculated
Returns:: datapoints – List of indices of the datapoints that are linked to node
Return type:: list of int

get_estimation_map() → ndarray

Return SOM grid with the estimated value on each node.

Returns:: super_som_ – Supervised SOM grid with estimated value on each node.
Return type:: np.ndarray

Examples

Fit the SOM on your data X, y:

>>> import susi
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
>>> estimation_map = som.get_estimation_map()
>>> plt.imshow(np.squeeze(estimation_map,) cmap="viridis_r")

get_metadata_routing()

Get metadata routing of this object.

Please check User Guide on how the routing mechanism works.

Returns:: routing – A MetadataRequest encapsulating routing information.
Return type:: MetadataRequest

get_params(deep=True)

Get parameters for this estimator.

Parameters:: deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params – Parameter names mapped to their values.
Return type:: dict

get_quantization_error(X: Sequence | None = None) → float

Get quantization error for X (or the training data).

Parameters:: X (array-like matrix, optional (default=True)) – Samples of shape = [n_samples, n_features]. If None, the training data is used for the calculation.
Returns:: Mean quantization error over all datapoints.
Return type:: float
Raises:: RuntimeError – Raised if the SOM is not fitted yet.

get_u_matrix(mode: str = 'mean') → ndarray

Calculate unified distance matrix (u-matrix).

Parameters:: mode (str, optional (default=”mean)) – Choice of the averaging algorithm
Returns:: u_matrix – U-matrix containing the distances between all nodes of the unsupervised SOM. Shape = (n_rows*2-1, n_columns*2-1)
Return type:: np.ndarray

Examples

Fit your SOM to input data X and then calculate the u-matrix with get_u_matrix(). You can plot the u-matrix then with e.g. pyplot.imshow().

>>> import susi
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> umat = som.get_u_matrix()
>>> plt.imshow(np.squeeze(umat))

predict(X: Sequence, y: Sequence | None = None) → ndarray

Predict output of data X.

Parameters:

X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.

Returns:

y_pred – List of predicted values.

Return type:

list of float

Examples

Fit the SOM on your data X, y:

>>> import susi
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
>>> y_pred = som.predict(X)

score(X, y, sample_weight=None)

Return coefficient of determination on test data.

The coefficient of determination, $R^2$, is defined as $(1 - \frac{u}{v})$, where $u$ is the residual sum of squares ((y_true - y_pred)** 2).sum() and $v$ is the total sum of squares ((y_true - y_true.mean()) ** 2).sum(). The best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $R^2$ score of 0.0.

Parameters:

X (array-like of shape (n_samples, n_features)) – Test samples. For some estimators this may be a precomputed kernel matrix or a list of generic objects instead with shape (n_samples, n_samples_fitted), where n_samples_fitted is the number of samples used in the fitting for the estimator.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True values for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.

Returns:

score – $R^2$ of self.predict(X) w.r.t. y.

Return type:

float

Notes

The $R^2$ score used when calling score on a regressor uses multioutput='uniform_average' from version 0.23 to keep consistent with default value of r2_score(). This influences the score method of all the multioutput regressors (except for MultiOutputRegressor).

set_output(*, transform=None)

Set output container.

See Introducing the set_output API for an example on how to use the API.

Parameters:

transform ({“default”, “pandas”, “polars”}, default=None) – Configure output of transform and fit_transform.

“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged

Added in version 1.4: “polars” option was added.

Returns:

self – Estimator instance.

Return type:

estimator instance

set_params(**params)

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:: **params (dict) – Estimator parameters.
Returns:: self – Estimator instance.
Return type:: estimator instance

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → SOMRegressor

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

transform(X: Sequence, y: Sequence | None = None) → ndarray

Transform input data.

Parameters:

X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.

Returns:

Predictions including the BMUs of each datapoint

Return type:

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> X_transformed = som.transform(X)