Relevance Models

RelevanceModel

class ml4ir.base.model.relevance_model.RelevanceModel(feature_config: ml4ir.base.features.feature_config.FeatureConfig, tfrecord_type: str, file_io: ml4ir.base.io.file_io.FileIO, scorer: Optional[ml4ir.base.model.scoring.scoring_model.RelevanceScorer] = None, metrics: List[Union[keras.metrics.base_metric.Metric, str]] = [], optimizer: Optional[keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2] = None, model_file: Optional[str] = None, initialize_layers_dict: dict = {}, freeze_layers_list: list = [], compile_keras_model: bool = False, output_name: str = 'score', logger=None, eval_config: dict = {})

Bases: object

Constructor to instantiate a RelevanceModel that can be used for training and evaluating the search ML task

Parameters:
  • feature_config (FeatureConfig object) – FeatureConfig object that defines the features to be loaded in the dataset and the preprocessing functions to be applied to each of them
  • tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord protobuf message used for TFRecordDataset
  • file_io (FileIO object) – file I/O handler objects for reading and writing data
  • scorer (RelevanceScorer object) – Scorer object that wraps an InteractionModel and converts input features into scores
  • metrics (list) – List of keras Metric objects/strings that will be used for evaluating the trained model
  • optimizer (Optimizer) – Tensorflow keras optimizer to be used for training the model
  • model_file (str, optional) – Path to pretrained model file to be loaded for evaluation or retraining
  • initialize_layers_dict (dict, optional) – Dictionary of tensorflow layer names mapped to the path of pretrained weights Use this for transfer learning with pretrained weights
  • freeze_layers_list (list, optional) – List of model layer names to be frozen Use this for freezing pretrained weights from other ml4ir models
  • compile_keras_model (bool, optional) – Whether the keras model loaded from disk should be compiled with loss, metrics and an optimizer
  • output_name (str, optional) – Name of the output tensorflow node that captures the score
  • logger (Logger, optional) – logging handler for status messages
  • eval_config (dict) – A dictionary of Evaluation config parameters
is_compiled = None

Specify inputs to the model

Individual input nodes are defined for each feature Each data point represents features for all records in a single query

classmethod from_relevance_scorer(feature_config: ml4ir.base.features.feature_config.FeatureConfig, interaction_model: ml4ir.base.model.scoring.interaction_model.InteractionModel, model_config: dict, loss: ml4ir.base.model.losses.loss_base.RelevanceLossBase, metrics: List[Union[keras.metrics.base_metric.Metric, str]], optimizer: keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2, tfrecord_type: str, file_io: ml4ir.base.io.file_io.FileIO, model_file: Optional[str] = None, initialize_layers_dict: dict = {}, freeze_layers_list: list = [], compile_keras_model: bool = False, output_name: str = 'score', logger=None)

Create a RelevanceModel with default Scorer function constructed from an InteractionModel

Parameters:
  • feature_config (FeatureConfig object) – FeatureConfig object that defines the features to be loaded in the dataset and the preprocessing functions to be applied to each of them
  • tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord protobuf message used for TFRecordDataset
  • file_io (FileIO object) – file I/O handler objects for reading and writing data
  • interaction_model (InteractionModel object) – InteractionModel object that converts input features into a dense feature representation
  • loss (RelevanceLossBase object) – Loss object defining the final activation layer and the loss function
  • metrics (list) – List of keras Metric classes that will be used for evaluating the trained model
  • optimizer (Optimizer) – Tensorflow keras optimizer to be used for training the model
  • model_file (str, optional) – Path to pretrained model file to be loaded for evaluation or retraining
  • initialize_layers_dict (dict, optional) – Dictionary of tensorflow layer names mapped to the path of pretrained weights Use this for transfer learning with pretrained weights
  • freeze_layers_list (list, optional) – List of model layer names to be frozen Use this for freezing pretrained weights from other ml4ir models
  • compile_keras_model (bool, optional) – Whether the keras model loaded from disk should be compiled with loss, metrics and an optimizer
  • output_name (str, optional) – Name of the output tensorflow node that captures the score
  • logger (Logger, optional) – logging handler for status messages
Returns:

RelevanceModel object with a default scorer build with a custom InteractionModel

Return type:

RelevanceModel

classmethod from_univariate_interaction_model(model_config, feature_config: ml4ir.base.features.feature_config.FeatureConfig, tfrecord_type: str, loss: ml4ir.base.model.losses.loss_base.RelevanceLossBase, metrics: List[Union[keras.metrics.base_metric.Metric, str]], optimizer: keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2, feature_layer_keys_to_fns: dict = {}, model_file: Optional[str] = None, initialize_layers_dict: dict = {}, freeze_layers_list: list = [], compile_keras_model: bool = False, output_name: str = 'score', max_sequence_size: int = 0, file_io: ml4ir.base.io.file_io.FileIO = None, logger=None)

Create a RelevanceModel with default UnivariateInteractionModel

Parameters:
  • feature_config (FeatureConfig object) – FeatureConfig object that defines the features to be loaded in the dataset and the preprocessing functions to be applied to each of them
  • model_config (dict) – dictionary defining the dense model architecture
  • tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord protobuf message used for TFRecordDataset
  • file_io (FileIO object) – file I/O handler objects for reading and writing data
  • loss (RelevanceLossBase object) – Loss object defining the final activation layer and the loss function
  • metrics (list) – List of keras Metric classes that will be used for evaluating the trained model
  • optimizer (Optimizer) – Tensorflow keras optimizer to be used for training the model
  • feature_layer_keys_to_fns (dict) – Dictionary of custom feature transformation functions to be applied on the input features as part of the InteractionModel
  • model_file (str, optional) – Path to pretrained model file to be loaded for evaluation or retraining
  • initialize_layers_dict (dict, optional) – Dictionary of tensorflow layer names mapped to the path of pretrained weights Use this for transfer learning with pretrained weights
  • freeze_layers_list (list, optional) – List of model layer names to be frozen Use this for freezing pretrained weights from other ml4ir models
  • compile_keras_model (bool, optional) – Whether the keras model loaded from disk should be compiled with loss, metrics and an optimizer
  • output_name (str, optional) – Name of the output tensorflow node that captures the score
  • max_sequence_size (int, optional) – Maximum length of the sequence to be used for SequenceExample protobuf objects
  • logger (Logger, optional) – logging handler for status messages
Returns:

RelevanceModel object with a UnivariateInteractionModel

Return type:

RelevanceModel

build(dataset: ml4ir.base.data.relevance_dataset.RelevanceDataset)

Build the model layers and connect them to form a network

Parameters:dataset (RelevanceDataset) – RelevanceDataset object used to initialize the weights and input/output spec for the network

Notes

Because we build the model using keras model subclassing API, it has no understanding of the actual inputs to expect. So we do one forward pass to initialize all the internal weights and connections

define_scheduler_as_callback(monitor_metric, model_config)

Adding reduce lr on plateau as a callback if specified

Parameters:
  • monitor_metric (string) – The metric to be monitored by the callback
  • model_config (dict) – dictionary defining the dense model architecture
Returns:

The created scheduler callback object.

Return type:

reduce_lr

fit(dataset: ml4ir.base.data.relevance_dataset.RelevanceDataset, num_epochs: int, models_dir: str, logs_dir: Optional[str] = None, logging_frequency: int = 25, monitor_metric: str = '', monitor_mode: str = '', patience=2)

Trains model for defined number of epochs and returns the training and validation metrics as a dictionary

Parameters:
  • dataset (RelevanceDataset object) – RelevanceDataset object to be used for training and validation
  • num_epochs (int) – Value specifying number of epochs to train for
  • models_dir (str) – Directory to save model checkpoints
  • logs_dir (str, optional) – Directory to save model logs If set to False, no progress logs will be written
  • logging_frequency (int, optional) – Every #batches to log results
  • monitor_metric (str, optional) – Name of the metric to monitor for early stopping, checkpointing
  • monitor_mode ({"max", "min"}) – Whether to maximize or minimize the monitoring metric
  • patience (int) – Number of epochs to wait before early stopping
Returns:

train_metrics – Train and validation metrics in a single dictionary where key is metric name and value is floating point metric value. This dictionary will be used for experiment tracking for each ml4ir run

Return type:

dict

predict(test_dataset: tensorflow.python.data.ops.readers.TFRecordDatasetV2, inference_signature: str = 'serving_default', additional_features: dict = {}, logs_dir: Optional[str] = None, logging_frequency: int = 25)

Predict the scores on the test dataset using the trained model

Parameters:
  • test_dataset (Dataset object) – Dataset object for which predictions are to be made
  • inference_signature (str, optional) – If using a SavedModel for prediction, specify the inference signature to be used for computing scores
  • additional_features (dict, optional) – Dictionary containing new feature name and function definition to compute them. Use this to compute additional features from the scores. For example, converting ranking scores for each document into ranks for the query
  • logs_dir (str, optional) – Path to directory to save logs
  • logging_frequency (int) – Value representing how often(in batches) to log status
Returns:

pandas DataFrame containing the predictions on the test dataset made with the RelevanceModel

Return type:

pd.DataFrame

evaluate(test_dataset: tensorflow.python.data.ops.readers.TFRecordDatasetV2, inference_signature: str = None, additional_features: dict = {}, group_metrics_min_queries: int = 50, logs_dir: Optional[str] = None, logging_frequency: int = 25, compute_intermediate_stats: bool = True)

Evaluate the RelevanceModel

Parameters:
  • test_dataset (an instance of tf.data.dataset) –
  • inference_signature (str, optional) – If using a SavedModel for prediction, specify the inference signature to be used for computing scores
  • additional_features (dict, optional) – Dictionary containing new feature name and function definition to compute them. Use this to compute additional features from the scores. For example, converting ranking scores for each document into ranks for the query
  • group_metrics_min_queries (int, optional) – Minimum count threshold per group to be considered for computing groupwise metrics
  • logs_dir (str, optional) – Path to directory to save logs
  • logging_frequency (int) – Value representing how often(in batches) to log status
  • compute_intermediate_stats (bool) – Determines if group metrics and other intermediate stats on the test set should be computed
Returns:

  • df_overall_metrics (pd.DataFrame object) – pd.DataFrame containing overall metrics
  • df_groupwise_metrics (pd.DataFrame object) – pd.DataFrame containing groupwise metrics if group_metric_keys are defined in the FeatureConfig
  • metrics_dict (dict) – metrics as a dictionary of metric names mapping to values

Notes

You can directly do a model.evaluate() only if the keras model is compiled

Override this method to implement your own evaluation metrics.

run_ttest(mean, variance, n, ttest_pvalue_threshold)

Compute the paired t-test statistic and its p-value given mean, standard deviation and sample count :param mean: The mean of the rank differences for the entire dataset :type mean: float :param variance: The variance of the rank differences for the entire dataset :type variance: float :param n: The number of samples in the entire dataset :type n: int :param ttest_pvalue_threshold: P-value threshold for student t-test :type ttest_pvalue_threshold: float :param metrics_dict: dictionary of metrics to keep track :type metrics_dict: dict

Returns:t_test_metrics_dict – A dictionary with the t-test metrics recorded.
Return type:Dictionary
save(models_dir: str, preprocessing_keys_to_fns={}, postprocessing_fn=None, required_fields_only: bool = True, pad_sequence: bool = False, sub_dir: str = 'final', dataset: Optional[ml4ir.base.data.relevance_dataset.RelevanceDataset] = None, experiment_details: Optional[dict] = None)

Save the RelevanceModel as a tensorflow SavedModel to the models_dir

There are two different serving signatures currently used to save the model:

  • default: default keras model without any pre/post processing wrapper
  • tfrecord: serving signature that allows keras model to be served using TFRecord proto messages.
    Allows definition of custom pre/post processing logic

Additionally, each model layer is also saved as a separate numpy zipped array to enable transfer learning with other ml4ir models.

Parameters:
  • models_dir (str) – path to directory to save the model
  • preprocessing_keys_to_fns (dict) – dictionary mapping function names to tf.functions that should be saved in the preprocessing step of the tfrecord serving signature
  • postprocessing_fn (function) – custom tensorflow compatible postprocessing function to be used at serving time. Saved as part of the postprocessing layer of the tfrecord serving signature
  • required_fields_only (bool) – boolean value defining if only required fields need to be added to the tfrecord parsing function at serving time
  • pad_sequence (bool, optional) – Value defining if sequences should be padded for SequenceExample proto inputs at serving time. Set this to False if you want to not handle padded scores.
  • sub_dir (str, optional) – sub directory name to save the model into
  • dataset (RelevanceDataset object) – RelevanceDataset object that can optionally be passed to be used by downstream jobs that want to save the data along with the model. Note that this feature is currently unimplemented and is upto the users to override and customize.
  • experiment_details (dict) – Dictionary containing metadata and results about the current experiment

Notes

All the functions passed under preprocessing_keys_to_fns here must be serializable tensor graph operations

load(model_file: str) → keras.engine.training.Model

Loads model from the SavedModel file specified

Parameters:model_file (str) – path to file with saved tf keras model
Returns:Tensorflow keras model loaded from file
Return type:tf.keras.Model

Notes

Retraining currently not supported! Would require compiling the model with the right loss and optimizer states

load_weights(model_file: str)

Load saved model with compile=False

Parameters:model_file (str) – path to file with saved tf keras model
calibrate(relevance_dataset, logger, logs_dir_local, **kwargs) → Tuple[numpy.ndarray, ...]

Calibrate model with temperature scaling :param relevance_dataset: RelevanceDataset object to be used for training and evaluating temperature scaling :type relevance_dataset: RelevanceDataset :param logger: Logger object to log events :type logger: Logger :param logs_dir_local: path to save the calibration results. (zipped csv file containing original

probabilities, calibrated probabilities, …)
Returns:
  • Union[np.ndarray, Tuple[np.ndarray, …]]
  • optimizer output containing temperature value learned during temperature scaling
add_temperature_layer(temperature: float = 1.0, layer_name: str = 'temperature_layer')

Add temperature layer to the input of last activation (softmax) layer :param self: input RelevanceModel object that its last layer inputs will be divided by a

temperature value
Parameters:
  • temperature (float) – a scalar value to scale the last activation layer inputs
  • layer_name (str) – name of the temperature scaling layer
Returns:

  • RelevanceModel
  • updated RelevanceModel object with temperature

RankingModel

class ml4ir.applications.ranking.model.ranking_model.RankingModel(feature_config: ml4ir.base.features.feature_config.FeatureConfig, tfrecord_type: str, file_io: ml4ir.base.io.file_io.FileIO, scorer: Optional[ml4ir.base.model.scoring.scoring_model.RelevanceScorer] = None, metrics: List[Union[keras.metrics.base_metric.Metric, str]] = [], optimizer: Optional[keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2] = None, model_file: Optional[str] = None, initialize_layers_dict: dict = {}, freeze_layers_list: list = [], compile_keras_model: bool = False, output_name: str = 'score', logger=None, eval_config: dict = {})

Bases: ml4ir.base.model.relevance_model.RelevanceModel

Constructor to instantiate a RelevanceModel that can be used for training and evaluating the search ML task

Parameters:
  • feature_config (FeatureConfig object) – FeatureConfig object that defines the features to be loaded in the dataset and the preprocessing functions to be applied to each of them
  • tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord protobuf message used for TFRecordDataset
  • file_io (FileIO object) – file I/O handler objects for reading and writing data
  • scorer (RelevanceScorer object) – Scorer object that wraps an InteractionModel and converts input features into scores
  • metrics (list) – List of keras Metric objects/strings that will be used for evaluating the trained model
  • optimizer (Optimizer) – Tensorflow keras optimizer to be used for training the model
  • model_file (str, optional) – Path to pretrained model file to be loaded for evaluation or retraining
  • initialize_layers_dict (dict, optional) – Dictionary of tensorflow layer names mapped to the path of pretrained weights Use this for transfer learning with pretrained weights
  • freeze_layers_list (list, optional) – List of model layer names to be frozen Use this for freezing pretrained weights from other ml4ir models
  • compile_keras_model (bool, optional) – Whether the keras model loaded from disk should be compiled with loss, metrics and an optimizer
  • output_name (str, optional) – Name of the output tensorflow node that captures the score
  • logger (Logger, optional) – logging handler for status messages
  • eval_config (dict) – A dictionary of Evaluation config parameters
predict(test_dataset: tensorflow.python.data.ops.readers.TFRecordDatasetV2, inference_signature: str = 'serving_default', additional_features: dict = {}, logs_dir: Optional[str] = None, logging_frequency: int = 25)

Predict the scores on the test dataset using the trained model

Parameters:
  • test_dataset (Dataset object) – Dataset object for which predictions are to be made
  • inference_signature (str, optional) – If using a SavedModel for prediction, specify the inference signature to be used for computing scores
  • additional_features (dict, optional) – Dictionary containing new feature name and function definition to compute them. Use this to compute additional features from the scores. For example, converting ranking scores for each document into ranks for the query
  • logs_dir (str, optional) – Path to directory to save logs
  • logging_frequency (int) – Value representing how often(in batches) to log status
Returns:

pandas DataFrame containing the predictions on the test dataset made with the RelevanceModel

Return type:

pd.DataFrame

evaluate(test_dataset: tensorflow.python.data.ops.readers.TFRecordDatasetV2, inference_signature: str = None, additional_features: dict = {}, group_metrics_min_queries: int = 50, logs_dir: Optional[str] = None, logging_frequency: int = 25, compute_intermediate_stats: bool = True)

Evaluate the RelevanceModel

Parameters:
  • test_dataset (an instance of tf.data.dataset) –
  • inference_signature (str, optional) – If using a SavedModel for prediction, specify the inference signature to be used for computing scores
  • additional_features (dict, optional) – Dictionary containing new feature name and function definition to compute them. Use this to compute additional features from the scores. For example, converting ranking scores for each document into ranks for the query
  • group_metrics_min_queries (int, optional) – Minimum count threshold per group to be considered for computing groupwise metrics
  • logs_dir (str, optional) – Path to directory to save logs
  • logging_frequency (int) – Value representing how often(in batches) to log status
  • compute_intermediate_stats (bool) – [Currently ignored] Determines if group metrics and other intermediate stats on the test set should be computed
Returns:

  • df_overall_metrics (pd.DataFrame object) – pd.DataFrame containing overall metrics
  • df_groupwise_metrics (pd.DataFrame object) – pd.DataFrame containing groupwise metrics if group_metric_keys are defined in the FeatureConfig
  • metrics_dict (dict) – metrics as a dictionary of metric names mapping to values

Notes

You can directly do a model.evaluate() only if the keras model is compiled

Override this method to implement your own evaluation metrics.

save(models_dir: str, preprocessing_keys_to_fns={}, postprocessing_fn=None, required_fields_only: bool = True, pad_sequence: bool = False, dataset: Optional[ml4ir.base.data.relevance_dataset.RelevanceDataset] = None, experiment_details: Optional[dict] = None)

Save the RelevanceModel as a tensorflow SavedModel to the models_dir Additionally, sets the score for the padded records to 0

There are two different serving signatures currently used to save the model

default: default keras model without any pre/post processing wrapper tfrecord: serving signature that allows keras model to be served using TFRecord proto messages.

Allows definition of custom pre/post processing logic

Additionally, each model layer is also saved as a separate numpy zipped array to enable transfer learning with other ml4ir models.

Parameters:
  • models_dir (str) – path to directory to save the model
  • preprocessing_keys_to_fns (dict) – dictionary mapping function names to tf.functions that should be saved in the preprocessing step of the tfrecord serving signature
  • postprocessing_fn (function) – custom tensorflow compatible postprocessing function to be used at serving time. Saved as part of the postprocessing layer of the tfrecord serving signature
  • required_fields_only (bool) – boolean value defining if only required fields need to be added to the tfrecord parsing function at serving time
  • pad_sequence (bool, optional) – Value defining if sequences should be padded for SequenceExample proto inputs at serving time. Set this to False if you want to not handle padded scores.
  • dataset (RelevanceDataset object) – RelevanceDataset object that can optionally be passed to be used by downstream jobs that want to save the data along with the model. Note that this feature is currently unimplemented and is upto the users to override and customize.
  • experiment_details (dict) – Dictionary containing metadata and results about the current experiment

Notes

All the functions passed under preprocessing_keys_to_fns here must be serializable tensor graph operations