SOMClassifier

class susi.SOMClassifier(n_rows: int = 10, n_columns: int = 10, init_mode_unsupervised: str = 'random', init_mode_supervised: str = 'majority', n_iter_unsupervised: int = 1000, n_iter_supervised: int = 1000, train_mode_unsupervised: str = 'online', train_mode_supervised: str = 'online', neighborhood_mode_unsupervised: str = 'linear', neighborhood_mode_supervised: str = 'linear', learn_mode_unsupervised: str = 'min', learn_mode_supervised: str = 'min', distance_metric: str = 'euclidean', learning_rate_start=0.5, learning_rate_end=0.05, nbh_dist_weight_mode: str = 'pseudo-gaussian', missing_label_placeholder=None, do_class_weighting=True, n_jobs=None, random_state=None, verbose=0)[source]

Supervised SOM for estimating discrete variables (= classification).

Parameters
  • n_rows (int, optional (default=10)) – Number of rows for the SOM grid

  • n_columns (int, optional (default=10)) – Number of columns for the SOM grid

  • init_mode_unsupervised (str, optional (default=”random”)) – Initialization mode of the unsupervised SOM

  • init_mode_supervised (str, optional (default=”majority”)) – Initialization mode of the classification SOM

  • n_iter_unsupervised (int, optional (default=1000)) – Number of iterations for the unsupervised SOM

  • n_iter_supervised (int, optional (default=1000)) – Number of iterations for the classification SOM

  • train_mode_unsupervised (str, optional (default=”online”)) – Training mode of the unsupervised SOM

  • train_mode_supervised (str, optional (default=”online”)) – Training mode of the classification SOM

  • neighborhood_mode_unsupervised (str, optional (default=”linear”)) – Neighborhood mode of the unsupervised SOM

  • neighborhood_mode_supervised (str, optional (default=”linear”)) – Neighborhood mode of the classification SOM

  • learn_mode_unsupervised (str, optional (default=”min”)) – Learning mode of the unsupervised SOM

  • learn_mode_supervised (str, optional (default=”min”)) – Learning mode of the classification SOM

  • distance_metric (str, optional (default=”euclidean”)) – Distance metric to compare on feature level (not SOM grid). Possible metrics: {“euclidean”, “manhattan”, “mahalanobis”, “tanimoto”}. Note that “tanimoto” tends to be slow.

  • learning_rate_start (float, optional (default=0.5)) – Learning rate start value

  • learning_rate_end (float, optional (default=0.05)) – Learning rate end value (only needed for some lr definitions)

  • nbh_dist_weight_mode (str, optional (default=”pseudo-gaussian”)) – Formula of the neighborhood distance weight. Possible formulas are: {“pseudo-gaussian”, “mexican-hat”}.

  • missing_label_placeholder (int or str or None, optional (default=None)) – Label placeholder for datapoints with no label. This is needed for semi-supervised learning.

  • do_class_weighting (bool, optional (default=True)) – If true, classes are weighted.

  • n_jobs (int or None, optional (default=None)) – The number of jobs to run in parallel.

  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • verbose (int, optional (default=0)) – Controls the verbosity.

Variables
  • node_list_ (np.array of (int, int) tuples) – List of 2-dimensional coordinates of SOM nodes

  • radius_max_ (float, int) – Maximum radius of the neighborhood function

  • radius_min_ (float, int) – Minimum radius of the neighborhood function

  • unsuper_som_ (np.array) – Weight vectors of the unsupervised SOM shape = (self.n_rows, self.n_columns, X.shape[1])

  • X_ (np.array) – Input data

  • fitted_ (bool) – States if estimator is fitted to X

  • max_iterations_ (int) – Maximum number of iterations for the current training

  • bmus_ (list of (int, int) tuples) – List of best matching units (BMUs) of the dataset X

  • placeholder_dict_ (dict) – Dict of placeholders for initializing nodes without mapped class.

  • n_features_in_ (int) – Number of input features

change_class_proba(learningrate, dist_weight_matrix, class_weight)[source]

Calculate probability of changing class in a node.

Parameters
  • learningrate (float) – Current learning rate of the SOM

  • dist_weight_matrix (np.array of float) – Current distance weight of the SOM for the specific node

  • class_weight (float) – Weight of the class of the current datapoint

Returns

change_class_bool – Matrix with one boolean for each node on the SOM node. If true, the value of the respective SOM node gets changed. If false, the value of the respective SOM node stays the same.

Return type

np.array, shape = (n_rows, n_columns)

fit(X, y=None)[source]

Fit classification SOM to the input data.

Parameters
  • X (array-like matrix of shape = [n_samples, n_features]) – The prediction input samples.

  • y (array-like matrix of shape = [n_samples, 1]) – The labels (ground truth) of the input samples

Returns

self

Return type

object

Examples

Load the SOM and fit it to your input data X and the labels y with:

>>> import susi
>>> som = susi.SOMClassifier()
>>> som.fit(X, y)
init_super_som()[source]

Initialize map.

modify_weight_matrix_supervised(dist_weight_matrix, true_vector=None, learningrate=None)[source]

Modify weight matrix of the SOM.

Parameters
  • dist_weight_matrix (np.array of float) – Current distance weight of the SOM for the specific node

  • learningrate (float, optional) – Current learning rate of the SOM

  • true_vector (np.array) – Datapoint = one row of the dataset X

Returns

new_matrix – Weight vector of the SOM after the modification

Return type

np.array

set_placeholder()[source]

Set placeholder depending on the class dtype.