SOMClustering

class susi.SOMClustering(n_rows: int = 10, n_columns: int = 10, init_mode_unsupervised: str = 'random', n_iter_unsupervised: int = 1000, train_mode_unsupervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', learn_mode_unsupervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start=0.5, learning_rate_end=0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', n_jobs=None, random_state=None, verbose=0)[source]

Unsupervised self-organizing map for clustering.

Parameters
  • n_rows (int, optional (default=10)) – Number of rows for the SOM grid

  • n_columns (int, optional (default=10)) – Number of columns for the SOM grid

  • init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM

  • n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM

  • train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM

  • neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM

  • learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM

  • distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”}. Note that “tanimoto” tends to be slow.

  • learning_rate_start (float, optional (default=0.5)) – Learning rate start value

  • learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)

  • nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.

  • n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • verbose (int, optional (default=0)) – Controls the verbosity.

Variables
  • node_list_ (np.array of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes

  • radius_max_ (float, int) – Maximum radius of the neighborhood function

  • radius_min_ (float, int) – Minimum radius of the neighborhood function

  • unsuper_som_ (np.array) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])

  • X_ (np.array) – Input data

  • fitted_ (boolean) – States if estimator is fitted to X

  • max_iterations_ (int) – Maximum number of iterations for the current training

  • bmus_ (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X

  • variances_ (array of float) – Standard deviations of every feature

calc_learning_rate(curr_it, mode)[source]

Calculate learning rate alpha with 0 <= alpha <= 1.

Parameters
  • curr_it (int) – Current iteration count

  • mode (str, optional) – Mode of the learning rate (min, exp, expsquare)

Returns

Learning rate

Return type

float

calc_neighborhood_func(curr_it, mode)[source]

Calculate neighborhood function (= radius).

Parameters
  • curr_it (int) – Current number of iterations

  • mode (str) – Mode of the decreasing rate

Returns

Neighborhood function (= radius)

Return type

float

calc_u_matrix_distances()[source]

Calculate the Eucl. distances between all neighbored SOM nodes.

calc_u_matrix_means()[source]

Calculate the missing parts of the u-matrix.

After calc_u_matrix_distances(), there are two kinds of entries missing: the entries at the positions of the actual SOM nodes and the entries in between the distance nodes. Both types of entries are calculated in this function.

fit(X, y=None)[source]

Fit unsupervised SOM to input data.

Parameters
  • X (array-like matrix of shape = [n_samples, n_features]) – The training input samples.

  • y (None) – Not used in this class.

Returns

self

Return type

object

Examples

Load the SOM and fit it to your input data X with:

>>> import susi
>>> som = susi.SOMClustering()
>>> som.fit(X)
fit_transform(X, y=None)[source]

Fit to the input data and transform it.

Parameters
  • X (array-like matrix of shape = [n_samples, n_features]) – The training and prediction input samples.

  • y (None, optional) – Ignored.

Returns

Predictions including the BMUs of each datapoint

Return type

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClustering()
>>> X_transformed = som.fit_transform(X)
get_bmu(datapoint, som_array)[source]

Get best matching unit (BMU) for datapoint.

Parameters
  • datapoint (np.array, shape=shape[1]) – Datapoint = one row of the dataset X

  • som_array (np.array) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns

Position of best matching unit (row, column)

Return type

tuple, shape = (int, int)

get_bmus(X, som_array=None)[source]

Get Best Matching Units for big datalist.

Parameters
  • X (np.array) – List of datapoints

  • som_array (np.array, optional (default=`None`)) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

Returns

bmus – Position of best matching units (row, column) for each datapoint

Return type

list of (int, int) tuples

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> bmu_list = som.get_bmus(X)
>>> plt.hist2d([x[0] for x in bmu_list], [x[1] for x in bmu_list]
get_clusters(X)[source]

Calculate the SOM nodes on the unsupervised SOM grid per datapoint.

Parameters

X (np.ndarray) – Input data

Returns

List of SOM nodes, one for each input datapoint

Return type

list of tuples (int, int)

get_datapoints_from_node(node)[source]

Get all datapoints of one node.

Parameters

node (tuple, shape (int, int)) – Node for which the linked datapoints are calculated

Returns

datapoints – List of indices of the datapoints that are linked to node

Return type

list of int

get_nbh_distance_weight_block(nbh_func, bmus)[source]

Calculate distance weight matrix for all datapoints.

The combination of several distance weight matrices is called “block” in the following.

Parameters
  • neighborhood_func (float) – Current neighborhood function

  • bmu_pos (tuple, shape=(int, int)) – Position of calculated BMU of the current datapoint

Returns

dist_weight_block – Neighborhood distance weight block between SOM and BMUs

Return type

np.array of float, shape=(n_rows, n_columns)

get_nbh_distance_weight_matrix(neighborhood_func, bmu_pos)[source]

Calculate neighborhood distance weight.

Parameters
  • neighborhood_func (float) – Current neighborhood function

  • bmu_pos (tuple, shape=(int, int)) – Position of calculated BMU of the current datapoint

Returns

Neighborhood distance weight matrix between SOM and BMU

Return type

np.array of float, shape=(n_rows, n_columns)

get_node_distance_matrix(datapoint, som_array)[source]

Get distance of datapoint and node using Euclidean distance.

Parameters
  • datapoint (np.array, shape=(X.shape[1])) – Datapoint = one row of the dataset X

  • som_array (np.array) – Weight vectors of the SOM, shape = (self.n_rows, self.n_columns, X.shape[1])

Returns

distmat – Distance between datapoint and each SOM node

Return type

np.array of float

get_u_matrix(mode='mean')[source]

Calculate unified distance matrix (u-matrix).

Parameters

mode (str, optional (default=”mean)) – Choice of the averaging algorithm

Returns

U-matrix containing the distances between all nodes of the unsupervised SOM. Shape = (n_rows*2-1, n_columns*2-1)

Return type

np.array

Examples

Fit your SOM to input data X and then calculate the u-matrix with get_u_matrix(). You can plot the u-matrix then with e.g. pyplot.imshow().

>>> import susi
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> umat = som.get_u_matrix()
>>> plt.imshow(np.squeeze(umat))
get_u_mean(nodelist)[source]

Calculate a mean value of the node entries in nodelist.

Parameters

nodelist (list of tuple (int, int)) – List of nodes on the u-matrix containing distance values

Returns

u_mean – Mean value

Return type

float

init_unsuper_som()[source]

Initialize map.

modify_weight_matrix_batch(som_array, dist_weight_matrix, data)[source]

Modify weight matrix of the SOM for the online algorithm.

Parameters
  • som_array (np.array) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

  • dist_weight_matrix (np.array of float) – Current distance weight of the SOM for the specific node

  • data (np.array, optional) – True vector(s)

  • learningrate (float) – Current learning rate of the SOM

Returns

Weight vector of the SOM after the modification

Return type

np.array

set_bmus(X, som_array=None)[source]

Set BMUs in the current SOM object.

Parameters
  • X (array-like matrix of shape = [n_samples, n_features]) – The input samples.

  • som_array (np.array) – Weight vectors of the SOM shape = (self.n_rows, self.n_columns, X.shape[1])

train_unsupervised_som()[source]

Train unsupervised SOM.

transform(X, y=None)[source]

Transform input data.

Parameters
  • X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.

  • y (None, optional) – Ignored.

Returns

Predictions including the BMUs of each datapoint

Return type

np.array of tuples (int, int)

Examples

Load the SOM, fit it to your input data X and transform your input data with:

>>> import susi
>>> som = susi.SOMClustering()
>>> som.fit(X)
>>> X_transformed = som.transform(X)