SOMClassifier
- class susi.SOMClassifier(n_rows: int = 10, n_columns: int = 10, *, init_mode_unsupervised: str = 'random', init_mode_supervised: str = 'majority', n_iter_unsupervised: int = 1000, n_iter_supervised: int = 1000, train_mode_unsupervised: str = 'online', train_mode_supervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', neighborhood_mode_supervised: str = 'linear', learn_mode_unsupervised: str = 'min', learn_mode_supervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start: float = 0.5, learning_rate_end: float = 0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', missing_label_placeholder: int | str | None = None, do_class_weighting: bool = True, n_jobs: int | None = None, random_state=None, verbose: int | None = 0)[source]
Bases:
ClassifierMixin,SOMEstimatorSupervised SOM for estimating discrete variables (= classification).
- Parameters:
n_rows (int, optional (default=10)) – Number of rows for the SOM grid
n_columns (int, optional (default=10)) – Number of columns for the SOM grid
init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM
init_mode_supervised (str, optional (default=”majority”)) – Initialization mode of the classification SOM
n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM
n_iter_supervised (int, optional (default=1000)) – Number of iterations for the classification SOM
train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM
train_mode_supervised (str, optional (default=”online”)) – Training mode of the classification SOM
neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM
neighborhood_mode_supervised (str, optional (default=”linear”)) – Neighborhood mode of the classification SOM
learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM
learn_mode_supervised (str, optional (default=”min”)) – Learning mode of the classification SOM
distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”, “spectralangle”}. Note that “tanimoto” tends to be slow.
Added in version 1.1.1: Spectral angle metric.
learning_rate_start (float, optional (default=0.5)) – Learning rate start value
learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)
nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.
missing_label_placeholder (int or str or None, optional (default=None)) – Label placeholder for datapoints with no label. This is needed for semi-supervised learning.
do_class_weighting (bool, optional (default=True)) – If true, classes are weighted.
n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (int, optional (default=0)) – Controls the verbosity.
- Variables:
node_list (np.ndarray of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes
radius_max (float, int) – Maximum radius of the neighborhood function
radius_min (float, int) – Minimum radius of the neighborhood function
unsuper_som (np.ndarray) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])
X (np.ndarray) – Input data
fitted (bool) – States if estimator is fitted to X
max_iterations (int) – Maximum number of iterations for the current training
bmus (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X.
placeholder_dict (dict) – Dict of placeholders for initializing nodes without mapped class.
n_features_in (int) – Number of input features in X.
classes (np.ndarray) – Unique classes in the dataset labels y.
class_counts (np.ndarray) – Number of datapoints per unique class in y.
class_dtype (type) – Type of a label in y.
- fit(X: Sequence, y: Sequence | None = None)[source]
Fit classification SOM to the input data.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (array-like matrix of shape = [n_samples, 1], optional) – The labels (ground truth) of the input samples
- Returns:
self
- Return type:
Examples
Load the SOM and fit it to your input data X and the labels y with:
>>> import susi >>> som = susi.SOMClassifier() >>> som.fit(X, y)
- fit_transform(X: Sequence, y: Sequence | None = None) ndarray
Fit to the input data and transform it.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The training and prediction input samples.
y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples
- Returns:
Predictions including the BMUs of each datapoint
- Return type:
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> som = susi.SOMClassifier() >>> tuples = som.fit_transform(X, y)
- get_bmu(datapoint: ndarray, som_array: ndarray) Tuple[int, int]
Get best matching unit (BMU) for datapoint.
- get_bmus(X: ndarray, som_array: ndarray | None = None) List[Tuple[int, int]] | None
Get Best Matching Units for big datalist.
- Parameters:
X (np.ndarray) – List of datapoints
som_array (np.ndarray, optional (default=`None`)) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])
- Returns:
bmus – Position of best matching units (row, column) for each datapoint
- Return type:
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> import matplotlib.pyplot as plt >>> som = susi.SOMClustering() >>> som.fit(X) >>> bmu_list = som.get_bmus(X) >>> plt.hist2d([x[0] for x in bmu_list], [x[1] for x in bmu_list]
- get_clusters(X: ndarray) List[Tuple[int, int]] | None
Calculate the SOM nodes on the unsupervised SOM grid per datapoint.
- get_estimation_map() ndarray
Return SOM grid with the estimated value on each node.
- Returns:
super_som_ – Supervised SOM grid with estimated value on each node.
- Return type:
np.ndarray
Examples
Fit the SOM on your data X, y:
>>> import susi >>> import matplotlib.pyplot as plt >>> som = susi.SOMClassifier() >>> som.fit(X, y) >>> estimation_map = som.get_estimation_map() >>> plt.imshow(np.squeeze(estimation_map,) cmap="viridis_r")
- get_metadata_routing()
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
- Returns:
routing – A
MetadataRequestencapsulating routing information.- Return type:
MetadataRequest
- get_params(deep=True)
Get parameters for this estimator.
- Parameters:
deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns:
params – Parameter names mapped to their values.
- Return type:
- get_quantization_error(X: Sequence | None = None) float
Get quantization error for X (or the training data).
- Parameters:
X (array-like matrix, optional (default=True)) – Samples of shape = [n_samples, n_features]. If None, the training data is used for the calculation.
- Returns:
Mean quantization error over all datapoints.
- Return type:
- Raises:
RuntimeError – Raised if the SOM is not fitted yet.
- get_u_matrix(mode: str = 'mean') ndarray
Calculate unified distance matrix (u-matrix).
- Parameters:
mode (str, optional (default=”mean)) – Choice of the averaging algorithm
- Returns:
u_matrix – U-matrix containing the distances between all nodes of the unsupervised SOM. Shape = (n_rows*2-1, n_columns*2-1)
- Return type:
np.ndarray
Examples
Fit your SOM to input data X and then calculate the u-matrix with get_u_matrix(). You can plot the u-matrix then with e.g. pyplot.imshow().
>>> import susi >>> import numpy as np >>> import matplotlib.pyplot as plt >>> som = susi.SOMClustering() >>> som.fit(X) >>> umat = som.get_u_matrix() >>> plt.imshow(np.squeeze(umat))
- predict(X: Sequence, y: Sequence | None = None) ndarray
Predict output of data X.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.
- Returns:
y_pred – List of predicted values.
- Return type:
Examples
Fit the SOM on your data X, y:
>>> import susi >>> som = susi.SOMClassifier() >>> som.fit(X, y) >>> y_pred = som.predict(X)
- predict_proba(X: Sequence, y: Sequence | None = None) ndarray[source]
Predict class probabilities for X.
Added in version 1.1.3.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (array-like matrix of shape = [n_samples, 1], optional) – The labels (ground truth) of the input samples
- Returns:
List of probabilities of shape (n_samples, n_classes)
- Return type:
np.ndarray
- score(X, y, sample_weight=None)
Return accuracy on provided data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
- Parameters:
X (array-like of shape (n_samples, n_features)) – Test samples.
y (array-like of shape (n_samples,) or (n_samples, n_outputs)) – True labels for X.
sample_weight (array-like of shape (n_samples,), default=None) – Sample weights.
- Returns:
score – Mean accuracy of
self.predict(X)w.r.t. y.- Return type:
- set_output(*, transform=None)
Set output container.
See Introducing the set_output API for an example on how to use the API.
- Parameters:
transform ({“default”, “pandas”, “polars”}, default=None) – Configure output of transform and fit_transform.
“default”: Default output format of a transformer
“pandas”: DataFrame output
“polars”: Polars output
None: Transform configuration is unchanged
Added in version 1.4: “polars” option was added.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_params(**params)
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline). The latter have parameters of the form<component>__<parameter>so that it’s possible to update each component of a nested object.- Parameters:
**params (dict) – Estimator parameters.
- Returns:
self – Estimator instance.
- Return type:
estimator instance
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') SOMClassifier
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for
sample_weightparameter inscore.- Returns:
self – The updated object.
- Return type:
- transform(X: Sequence, y: Sequence | None = None) ndarray
Transform input data.
- Parameters:
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.
- Returns:
Predictions including the BMUs of each datapoint
- Return type:
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> som = susi.SOMClustering() >>> som.fit(X) >>> X_transformed = som.transform(X)