Defining the ModelConfig¶
The ModelConfig is created from a YAML file and defines the scoring layers of the RelevanceModel. Specifically, the model config defines the layers to convert the transformed features output by the InteractionModel to the scores for the model.
Currently, ml4ir supports a dense neural network architecture (multi layer perceptron like) and a linear ranking model. Users can define the type of scoring architecture using the architecture_key. The layers of the neural network can be defined as a list of configurations using the layers attribute. For each layer, define the type of tensorflow-keras layer. Then for each layer, we can specify arguments to be passed to the instantiation of the layer. Finally, for each layer, we can specify a name using the name attribute.
Note: To train a simple linear ranking model, use the architecture_key as linear with a single dense layer.
This file is also used to define the optimizer, the learning rate schedule and calibration with
temperature scaling. The current
supported optimizers are: adam, adagrad, nadam, sgd, rms_prop. Each of these optimizers need so set the following hyper-parameter: gradient_clip_value. adam is the default optimizer if non was specified.
The current supported learning rate schedules are: exponential, cyclic, constant and reduce_lr_on_plateau. constant is the default schedule if non was specified with learning rate = 0.01
The exponential learning rate schedule requires defining the following hyper-parameters: initial_learning_rate, decay_steps, decay_rate. For more information, see: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/ExponentialDecay
The cyclic learning rate schedule has three different type of policies: triangular, triangular2, exponential. All three types require defining the following hyper-parameters: initial_learning_rate, maximal_learning_rate, step_size. The exponential type requires and additional hyper-parameter: gamma.
For more information, see: https://www.tensorflow.org/addons/api_docs/python/tfa/optimizers/CyclicalLearningRate and https://arxiv.org/pdf/1506.01186.pdf.
The reduce_lr_on_plateau reduces the learning rate by a factor (where factor < 1) when the monitor metric does not improve from one epoch to the next.
Parameters that controls the scheduler:
factor: factor by which the learning rate will be reduced
patience: number of epochs with no improvement for the monitor metric after which learning rate will be reduced
min_lr: The minimum value for allowed for the learning rate to reach.
For more information, see: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ReduceLROnPlateau
Calibration will be done as a separate process after possibly training or evaluating a
(classification) model (currently, we do not support calibration for RankingModel).
It implements [temperature scaling](https://github.com/gpleiss
/temperature_scaling) technique to
calibrate output probabilities of a classifier. It uses the validation set to train a
temperature parameter, defined in the ModelConfig file. Then, it evaluates the calibrated
model
on the test set and stores the probability scores before and after applying calibration
. After training TS, the calibrated model can be created using relevance_model .add_temperature_layer(temp_value) from
the original RelevanceModel and be saved using relevance_model.save(). Note that for
applying calibration to the Functional API model of a RelevanceModel it is
expected that the model has an Activation layer (e.g. SoftMax) as the last layer.
Below you can see an example model config YAML using a DNN architecture to stack a bunch of dense layers with ReLU activation layers. Additionally, there are also a few dropout layers for regularization in between. A triangular2 cyclic learning rate schedule is used with adam optimizer.
architecture_key: dnn
layers:
- type: dense
name: first_dense
units: 256
activation: relu
- type: dropout
name: first_dropout
rate: 0.0
- type: dense
name: second_dense
units: 64
activation: relu
- type: dropout
name: second_dropout
rate: 0.0
- type: dense
name: final_dense
units: 1
activation: null
optimizer:
key: adam
gradient_clip_value: 5.0
lr_schedule:
key: cyclic
type: triangular2
initial_learning_rate: 0.001 #default value is 0.001
maximal_learning_rate: 0.01 #default value is 0.01
step_size: 10 #default value is 10
calibration:
key: temperature_scaling
temperature: 1.5
Examples for defining other learning rate schedules in the ModelConfig YAML
Cyclic Learning Rate Schedule
lr_schedule:
key: cyclic
type: triangular
initial_learning_rate: 0.001 #default value is 0.001
maximal_learning_rate: 0.01 #default value is 0.01
step_size: 10 #default value is 10
Exponential Decay Learning Rate Schedule
lr_schedule:
key: exponential
learning_rate: 0.01 #default value is 0.01
learning_rate_decay_steps: 100000 #default value is 100000
learning_rate_decay: 0.96 #default value is 0.96
reduce_lr_on_plateau Learning Rate Schedule
lr_schedule:
key: reduce_lr_on_plateau
learning_rate: 1.0
min_lr: 0.01
patience: 1
factor: 0.5