Defining the ModelConfig

The ModelConfig is created from a YAML file and defines the scoring layers of the RelevanceModel. Specifically, the model config defines the layers to convert the transformed features output by the InteractionModel to the scores for the model.

Currently, ml4ir supports a dense neural network architecture (multi layer perceptron like) and a linear ranking model. Users can define the type of scoring architecture using the architecture_key. The layers of the neural network can be defined as a list of configurations using the layers attribute. For each layer, define the type of tensorflow-keras layer. Then for each layer, we can specify arguments to be passed to the instantiation of the layer. Finally, for each layer, we can specify a name using the name attribute.

Note: To train a simple linear ranking model, use the architecture_key as linear with a single dense layer.

This file is also used to define the optimizer, the learning rate schedule and calibration with temperature scaling. The current supported optimizers are: adam, adagrad, nadam, sgd, rms_prop. Each of these optimizers need so set the following hyper-parameter: gradient_clip_value. adam is the default optimizer if non was specified. The current supported learning rate schedules are: exponential, cyclic, constant and reduce_lr_on_plateau. constant is the default schedule if non was specified with learning rate = 0.01

The exponential learning rate schedule requires defining the following hyper-parameters: initial_learning_rate, decay_steps, decay_rate. For more information, see: https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/schedules/ExponentialDecay

The cyclic learning rate schedule has three different type of policies: triangular, triangular2, exponential. All three types require defining the following hyper-parameters: initial_learning_rate, maximal_learning_rate, step_size. The exponential type requires and additional hyper-parameter: gamma. For more information, see: https://www.tensorflow.org/addons/api_docs/python/tfa/optimizers/CyclicalLearningRate and https://arxiv.org/pdf/1506.01186.pdf.

The reduce_lr_on_plateau reduces the learning rate by a factor (where factor < 1) when the monitor metric does not improve from one epoch to the next. Parameters that controls the scheduler: factor: factor by which the learning rate will be reduced patience: number of epochs with no improvement for the monitor metric after which learning rate will be reduced min_lr: The minimum value for allowed for the learning rate to reach. For more information, see: https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ReduceLROnPlateau

Calibration will be done as a separate process after possibly training or evaluating a (classification) model (currently, we do not support calibration for RankingModel). It implements [temperature scaling](https://github.com/gpleiss /temperature_scaling) technique to calibrate output probabilities of a classifier. It uses the validation set to train a temperature parameter, defined in the ModelConfig file. Then, it evaluates the calibrated model on the test set and stores the probability scores before and after applying calibration . After training TS, the calibrated model can be created using relevance_model .add_temperature_layer(temp_value) from the original RelevanceModel and be saved using relevance_model.save(). Note that for applying calibration to the Functional API model of a RelevanceModel it is expected that the model has an Activation layer (e.g. SoftMax) as the last layer.

Below you can see an example model config YAML using a DNN architecture to stack a bunch of dense layers with ReLU activation layers. Additionally, there are also a few dropout layers for regularization in between. A triangular2 cyclic learning rate schedule is used with adam optimizer.

architecture_key: dnn
layers:
  - type: dense
    name: first_dense
    units: 256
    activation: relu
  - type: dropout
    name: first_dropout
    rate: 0.0
  - type: dense
    name: second_dense
    units: 64
    activation: relu
  - type: dropout
    name: second_dropout
    rate: 0.0
  - type: dense
    name: final_dense
    units: 1
    activation: null
optimizer: 
  key: adam
  gradient_clip_value: 5.0
lr_schedule:
  key: cyclic
  type: triangular2
  initial_learning_rate: 0.001   #default value is 0.001
  maximal_learning_rate: 0.01    #default value is 0.01
  step_size: 10                  #default value is 10
calibration:
  key: temperature_scaling
  temperature: 1.5

Examples for defining other learning rate schedules in the ModelConfig YAML

Cyclic Learning Rate Schedule

lr_schedule:
  key: cyclic
  type: triangular
  initial_learning_rate: 0.001   #default value is 0.001
  maximal_learning_rate: 0.01    #default value is 0.01
  step_size: 10                  #default value is 10

Exponential Decay Learning Rate Schedule

lr_schedule:
  key: exponential
  learning_rate: 0.01                   #default value is 0.01
  learning_rate_decay_steps: 100000   #default value is 100000
  learning_rate_decay: 0.96              #default value is 0.96

reduce_lr_on_plateau Learning Rate Schedule

lr_schedule:
  key: reduce_lr_on_plateau
  learning_rate: 1.0
  min_lr: 0.01
  patience: 1
  factor: 0.5