SOMClustering¶
- class susi.SOMClustering(n_rows: int = 10, n_columns: int = 10, *, init_mode_unsupervised: str = 'random', n_iter_unsupervised: int = 1000, train_mode_unsupervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', learn_mode_unsupervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start: float = 0.5, learning_rate_end: float = 0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', n_jobs: Optional[int] = None, random_state=None, verbose: Optional[int] = 0)[source]¶
Bases:
object
Unsupervised self-organizing map for clustering.
- Parameters
n_rows (int, optional (default=10)) – Number of rows for the SOM grid
n_columns (int, optional (default=10)) – Number of columns for the SOM grid
init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM
n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM
train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM
neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM
learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM
distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”, “spectralangle”}. Note that “tanimoto” tends to be slow.
New in version 1.1.1: Spectral angle metric.
learning_rate_start (float, optional (default=0.5)) – Learning rate start value
learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)
nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.
n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.
random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
verbose (int, optional (default=0)) – Controls the verbosity.
- Variables
node_list_ (np.ndarray of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes
radius_max_ (float, int) – Maximum radius of the neighborhood function
radius_min_ (float, int) – Minimum radius of the neighborhood function
unsuper_som_ (np.ndarray) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])
X_ (np.ndarray) – Input data
fitted_ (boolean) – States if estimator is fitted to X
max_iterations_ (int) – Maximum number of iterations for the current training
bmus_ (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X
variances_ (array of float) – Standard deviations of every feature
- fit(X: Sequence, y: Optional[Sequence] = None)[source]¶
Fit unsupervised SOM to input data.
- Parameters
X (array-like matrix of shape = [n_samples, n_features]) – The training input samples.
y (None) – Not used in this class.
- Returns
self
- Return type
Examples
Load the SOM and fit it to your input data X with:
>>> import susi >>> som = susi.SOMClustering() >>> som.fit(X)
- fit_transform(X: Sequence, y: Optional[Sequence] = None) numpy.ndarray [source]¶
Fit to the input data and transform it.
- Parameters
X (array-like matrix of shape = [n_samples, n_features]) – The training and prediction input samples.
y (None, optional) – Ignored.
- Returns
Predictions including the BMUs of each datapoint
- Return type
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> som = susi.SOMClustering() >>> X_transformed = som.fit_transform(X)
- get_bmu(datapoint: numpy.ndarray, som_array: numpy.ndarray) Tuple[int, int] [source]¶
Get best matching unit (BMU) for datapoint.
- get_bmus(X: numpy.ndarray, som_array: Optional[numpy.array] = None) Optional[List[Tuple[int, int]]] [source]¶
Get Best Matching Units for big datalist.
- Parameters
X (np.ndarray) – List of datapoints
som_array (np.ndarray, optional (default=`None`)) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])
- Returns
bmus – Position of best matching units (row, column) for each datapoint
- Return type
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> import matplotlib.pyplot as plt >>> som = susi.SOMClustering() >>> som.fit(X) >>> bmu_list = som.get_bmus(X) >>> plt.hist2d([x[0] for x in bmu_list], [x[1] for x in bmu_list]
- get_clusters(X: numpy.ndarray) Optional[List[Tuple[int, int]]] [source]¶
Calculate the SOM nodes on the unsupervised SOM grid per datapoint.
- get_datapoints_from_node(node: Tuple[int, int]) List[int] [source]¶
Get all datapoints of one node.
- Parameters
node (tuple, shape (int, int)) – Node for which the linked datapoints are calculated
- Returns
datapoints – List of indices of the datapoints that are linked to node
- Return type
list of int
- get_quantization_error(X: Optional[Sequence] = None) float [source]¶
Get quantization error for X (or the training data).
- Parameters
X (array-like matrix, optional (default=True)) – Samples of shape = [n_samples, n_features]. If None, the training data is used for the calculation.
- Returns
Mean quantization error over all datapoints.
- Return type
- Raises
RuntimeError – Raised if the SOM is not fitted yet.
- get_u_matrix(mode: str = 'mean') numpy.ndarray [source]¶
Calculate unified distance matrix (u-matrix).
- Parameters
mode (str, optional (default=”mean)) – Choice of the averaging algorithm
- Returns
u_matrix – U-matrix containing the distances between all nodes of the unsupervised SOM. Shape = (n_rows*2-1, n_columns*2-1)
- Return type
np.ndarray
Examples
Fit your SOM to input data X and then calculate the u-matrix with get_u_matrix(). You can plot the u-matrix then with e.g. pyplot.imshow().
>>> import susi >>> import numpy as np >>> import matplotlib.pyplot as plt >>> som = susi.SOMClustering() >>> som.fit(X) >>> umat = som.get_u_matrix() >>> plt.imshow(np.squeeze(umat))
- transform(X: Sequence, y: Optional[Sequence] = None) numpy.ndarray [source]¶
Transform input data.
- Parameters
X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.
y (None, optional) – Ignored.
- Returns
Predictions including the BMUs of each datapoint
- Return type
Examples
Load the SOM, fit it to your input data X and transform your input data with:
>>> import susi >>> som = susi.SOMClustering() >>> som.fit(X) >>> X_transformed = som.transform(X)