Hyperparameters

In the following, the most important hyperparameters of the SuSi package are described. The default hyperparameter settings are a good start, but can always be optimized. You can do that yourself or through an optimization. The commonly used hyperparameter settings are taken from [RieseEtAl2020].

Grid Size (n_rows, n_columns)

The grid size of a SOM is defined with the parameters n_rows and n_columns, the numbers of rows and columns. The choice of the grid size depends on several trade-offs.

Characteristics of a larger grid:

  • Better adaption on complex problems (good!)

  • Better/smoother visualization capabilities (good!)

  • More possible overtraining (possibly bad)

  • Larger training time (bad if very limited resources)

Our Recommendation:

We suggest to start with a small grid, meaning 5 x 5, and extending this grid size while tracking the test and training error metrics. We consider SOM grids as “large” with a grid size of about 100 x 100 and more. Non-square SOM grids can also be helpful for specific problems. Commonly used grid sizes are 50 x 50 to 80 x 80.

Number of iterations and training mode

The number of iterations (n_iter_unsupervised and n_iter_supervised) depends on the training mode ( train_mode_unsupervised and train_mode_supervised).

Our Recommendation (Online Mode)

Use the online mode. If your dataset is small (< 1000 datapoints), use 10 000 iterations for the unsupervised SOM and 5000 iterations for the supervised SOM as start values. If your dataset is significantly larger, use significantly more iterations. Commonly used value ranges are for the unsupervised SOM 10 000 to 60 000 and for the (semi-)supervised SOM about 20 000 to 70 000 in the online mode.

Todo

Add recommendations for the batch mode.

Neighborhood Distance Weight, Neighborhood Function, and Learning Rate

The hyperparameters around the neighborhood mode (neighborhood_mode_unsupervised + neighborhood_mode_supervised) and the learning rate (learn_mode_unsupervised, learn_mode_supervised, learning_rate_start, and learning_rate_end) depend on the neighborhood distance weight formula nbh_dist_weight_mode. Two different modes are implemented so far: pseudo-gaussian and mexican-hat.

Our Recommendation (Pseudo-Gaussian):

Use the pseudo-gaussian neighborhood distance weight with the default formulas for the neighborhood mode and the learning rate. The most influence, from our experiences, comes from the start (and end) value of the learning rate (learning_rate_start, and learning_rate_end). They should be optimized. Commonly used formula are linear and min for the neighborhood mode, min and exp for the learning rate mode, start values from 0.3 to 0.8 and end values from 0.1 to 0.005.

Todo

Add recommendations for the mexican hat distance weight.

Distance Metric

In the following, we give recommendations for the distance metric.

Todo

Add recommendations for the distance metric.

Hyperparameter optimization

Possible ways to find optimal hyperparameters for a problem are a grid search or randomized search. Because the SuSi package is developed according to several scikit-learn guidelines, it can be used with:

For example, the randomized search can be applied as follows in Python3:

import susi
from sklearn.datasets import load_iris
from sklearn.model_selection import RandomizedSearchCV

iris = load_iris()
param_grid = {
    "n_rows": [5, 10, 20],
    "n_columns": [5, 20, 40],
    "learning_rate_start": [0.5, 0.7, 0.9],
    "learning_rate_end": [0.1, 0.05, 0.005],
}
som = susi.SOMRegressor()
clf = RandomizedSearchCV(som, param_grid, random_state=1)
clf.fit(iris.data, iris.target)
print(clf.best_params_)

References

RieseEtAl2020

F. M. Riese, S. Keller and S. Hinz, “Supervised and Semi-Supervised Self-Organizing Maps for Regression and Classification Focusing on Hyperspectral Data”, Remote Sensing, vol. 12, no. 1, 2020. Link