Relevance Models¶

RelevanceModel¶

class ml4ir.base.model.relevance_model.RelevanceModel(feature_config: ml4ir.base.features.feature_config.FeatureConfig, tfrecord_type: str, file_io: ml4ir.base.io.file_io.FileIO, scorer: Optional[ml4ir.base.model.scoring.scoring_model.RelevanceScorer] = None, metrics: List[Union[keras.metrics.base_metric.Metric, str]] = [], optimizer: Optional[keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2] = None, model_file: Optional[str] = None, initialize_layers_dict: dict = {}, freeze_layers_list: list = [], compile_keras_model: bool = False, output_name: str = 'score', logger=None, eval_config: dict = {})¶

Bases: object

Constructor to instantiate a RelevanceModel that can be used for training and evaluating the search ML task

Parameters:

feature_config (FeatureConfig object) – FeatureConfig object that defines the features to be loaded in the dataset and the preprocessing functions to be applied to each of them
tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord protobuf message used for TFRecordDataset
file_io (FileIO object) – file I/O handler objects for reading and writing data
scorer (RelevanceScorer object) – Scorer object that wraps an InteractionModel and converts input features into scores
metrics (list) – List of keras Metric objects/strings that will be used for evaluating the trained model
optimizer (Optimizer) – Tensorflow keras optimizer to be used for training the model
model_file (str, optional) – Path to pretrained model file to be loaded for evaluation or retraining
initialize_layers_dict (dict, optional) – Dictionary of tensorflow layer names mapped to the path of pretrained weights Use this for transfer learning with pretrained weights
freeze_layers_list (list, optional) – List of model layer names to be frozen Use this for freezing pretrained weights from other ml4ir models
compile_keras_model (bool, optional) – Whether the keras model loaded from disk should be compiled with loss, metrics and an optimizer
output_name (str, optional) – Name of the output tensorflow node that captures the score
logger (Logger, optional) – logging handler for status messages
eval_config (dict) – A dictionary of Evaluation config parameters

is_compiled = None¶

Specify inputs to the model

Individual input nodes are defined for each feature Each data point represents features for all records in a single query

classmethod from_relevance_scorer(feature_config: ml4ir.base.features.feature_config.FeatureConfig, interaction_model: ml4ir.base.model.scoring.interaction_model.InteractionModel, model_config: dict, loss: ml4ir.base.model.losses.loss_base.RelevanceLossBase, metrics: List[Union[keras.metrics.base_metric.Metric, str]], optimizer: keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2, tfrecord_type: str, file_io: ml4ir.base.io.file_io.FileIO, model_file: Optional[str] = None, initialize_layers_dict: dict = {}, freeze_layers_list: list = [], compile_keras_model: bool = False, output_name: str = 'score', logger=None)¶

Create a RelevanceModel with default Scorer function constructed from an InteractionModel

Parameters:	feature_config (FeatureConfig object) – FeatureConfig object that defines the features to be loaded in the dataset and the preprocessing functions to be applied to each of them tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord protobuf message used for TFRecordDataset file_io (FileIO object) – file I/O handler objects for reading and writing data interaction_model (InteractionModel object) – InteractionModel object that converts input features into a dense feature representation loss (RelevanceLossBase object) – Loss object defining the final activation layer and the loss function metrics (list) – List of keras Metric classes that will be used for evaluating the trained model optimizer (Optimizer) – Tensorflow keras optimizer to be used for training the model model_file (str, optional) – Path to pretrained model file to be loaded for evaluation or retraining initialize_layers_dict (dict, optional) – Dictionary of tensorflow layer names mapped to the path of pretrained weights Use this for transfer learning with pretrained weights freeze_layers_list (list, optional) – List of model layer names to be frozen Use this for freezing pretrained weights from other ml4ir models compile_keras_model (bool, optional) – Whether the keras model loaded from disk should be compiled with loss, metrics and an optimizer output_name (str, optional) – Name of the output tensorflow node that captures the score logger (Logger, optional) – logging handler for status messages
Returns:	RelevanceModel object with a default scorer build with a custom InteractionModel
Return type:	RelevanceModel

classmethod from_univariate_interaction_model(model_config, feature_config: ml4ir.base.features.feature_config.FeatureConfig, tfrecord_type: str, loss: ml4ir.base.model.losses.loss_base.RelevanceLossBase, metrics: List[Union[keras.metrics.base_metric.Metric, str]], optimizer: keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2, feature_layer_keys_to_fns: dict = {}, model_file: Optional[str] = None, initialize_layers_dict: dict = {}, freeze_layers_list: list = [], compile_keras_model: bool = False, output_name: str = 'score', max_sequence_size: int = 0, file_io: ml4ir.base.io.file_io.FileIO = None, logger=None)¶

Create a RelevanceModel with default UnivariateInteractionModel

Parameters:	feature_config (FeatureConfig object) – FeatureConfig object that defines the features to be loaded in the dataset and the preprocessing functions to be applied to each of them model_config (dict) – dictionary defining the dense model architecture tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord protobuf message used for TFRecordDataset file_io (FileIO object) – file I/O handler objects for reading and writing data loss (RelevanceLossBase object) – Loss object defining the final activation layer and the loss function metrics (list) – List of keras Metric classes that will be used for evaluating the trained model optimizer (Optimizer) – Tensorflow keras optimizer to be used for training the model feature_layer_keys_to_fns (dict) – Dictionary of custom feature transformation functions to be applied on the input features as part of the InteractionModel model_file (str, optional) – Path to pretrained model file to be loaded for evaluation or retraining initialize_layers_dict (dict, optional) – Dictionary of tensorflow layer names mapped to the path of pretrained weights Use this for transfer learning with pretrained weights freeze_layers_list (list, optional) – List of model layer names to be frozen Use this for freezing pretrained weights from other ml4ir models compile_keras_model (bool, optional) – Whether the keras model loaded from disk should be compiled with loss, metrics and an optimizer output_name (str, optional) – Name of the output tensorflow node that captures the score max_sequence_size (int, optional) – Maximum length of the sequence to be used for SequenceExample protobuf objects logger (Logger, optional) – logging handler for status messages
Returns:	RelevanceModel object with a UnivariateInteractionModel
Return type:	RelevanceModel

build(dataset: ml4ir.base.data.relevance_dataset.RelevanceDataset)¶

Build the model layers and connect them to form a network

Parameters:	dataset (RelevanceDataset) – RelevanceDataset object used to initialize the weights and input/output spec for the network

Notes

Because we build the model using keras model subclassing API, it has no understanding of the actual inputs to expect. So we do one forward pass to initialize all the internal weights and connections

define_scheduler_as_callback(monitor_metric, model_config)¶

Adding reduce lr on plateau as a callback if specified

Parameters:	monitor_metric (string) – The metric to be monitored by the callback model_config (dict) – dictionary defining the dense model architecture
Returns:	The created scheduler callback object.
Return type:	reduce_lr

fit(dataset: ml4ir.base.data.relevance_dataset.RelevanceDataset, num_epochs: int, models_dir: str, logs_dir: Optional[str] = None, logging_frequency: int = 25, monitor_metric: str = '', monitor_mode: str = '', patience=2)¶

Trains model for defined number of epochs and returns the training and validation metrics as a dictionary

Parameters:	dataset (RelevanceDataset object) – RelevanceDataset object to be used for training and validation num_epochs (int) – Value specifying number of epochs to train for models_dir (str) – Directory to save model checkpoints logs_dir (str, optional) – Directory to save model logs If set to False, no progress logs will be written logging_frequency (int, optional) – Every #batches to log results monitor_metric (str, optional) – Name of the metric to monitor for early stopping, checkpointing monitor_mode ({"max", "min"}) – Whether to maximize or minimize the monitoring metric patience (int) – Number of epochs to wait before early stopping
Returns:	train_metrics – Train and validation metrics in a single dictionary where key is metric name and value is floating point metric value. This dictionary will be used for experiment tracking for each ml4ir run
Return type:	dict

predict(test_dataset: tensorflow.python.data.ops.readers.TFRecordDatasetV2, inference_signature: str = 'serving_default', additional_features: dict = {}, logs_dir: Optional[str] = None, logging_frequency: int = 25)¶

Predict the scores on the test dataset using the trained model

Parameters:	test_dataset (Dataset object) – Dataset object for which predictions are to be made inference_signature (str, optional) – If using a SavedModel for prediction, specify the inference signature to be used for computing scores additional_features (dict, optional) – Dictionary containing new feature name and function definition to compute them. Use this to compute additional features from the scores. For example, converting ranking scores for each document into ranks for the query logs_dir (str, optional) – Path to directory to save logs logging_frequency (int) – Value representing how often(in batches) to log status
Returns:	pandas DataFrame containing the predictions on the test dataset made with the RelevanceModel
Return type:	pd.DataFrame

evaluate(test_dataset: tensorflow.python.data.ops.readers.TFRecordDatasetV2, inference_signature: str = None, additional_features: dict = {}, group_metrics_min_queries: int = 50, logs_dir: Optional[str] = None, logging_frequency: int = 25, compute_intermediate_stats: bool = True)¶

Evaluate the RelevanceModel

Parameters:

test_dataset (an instance of tf.data.dataset) –
inference_signature (str, optional) – If using a SavedModel for prediction, specify the inference signature to be used for computing scores
additional_features (dict, optional) – Dictionary containing new feature name and function definition to compute them. Use this to compute additional features from the scores. For example, converting ranking scores for each document into ranks for the query
group_metrics_min_queries (int, optional) – Minimum count threshold per group to be considered for computing groupwise metrics
logs_dir (str, optional) – Path to directory to save logs
logging_frequency (int) – Value representing how often(in batches) to log status
compute_intermediate_stats (bool) – Determines if group metrics and other intermediate stats on the test set should be computed

Returns:

df_overall_metrics (pd.DataFrame object) – pd.DataFrame containing overall metrics
df_groupwise_metrics (pd.DataFrame object) – pd.DataFrame containing groupwise metrics if group_metric_keys are defined in the FeatureConfig
metrics_dict (dict) – metrics as a dictionary of metric names mapping to values

Notes

You can directly do a model.evaluate() only if the keras model is compiled

Override this method to implement your own evaluation metrics.

run_ttest(mean, variance, n, ttest_pvalue_threshold)¶

Compute the paired t-test statistic and its p-value given mean, standard deviation and sample count :param mean: The mean of the rank differences for the entire dataset :type mean: float :param variance: The variance of the rank differences for the entire dataset :type variance: float :param n: The number of samples in the entire dataset :type n: int :param ttest_pvalue_threshold: P-value threshold for student t-test :type ttest_pvalue_threshold: float :param metrics_dict: dictionary of metrics to keep track :type metrics_dict: dict

Returns:	t_test_metrics_dict – A dictionary with the t-test metrics recorded.
Return type:	Dictionary

save(models_dir: str, preprocessing_keys_to_fns={}, postprocessing_fn=None, required_fields_only: bool = True, pad_sequence: bool = False, sub_dir: str = 'final', dataset: Optional[ml4ir.base.data.relevance_dataset.RelevanceDataset] = None, experiment_details: Optional[dict] = None)¶

Save the RelevanceModel as a tensorflow SavedModel to the models_dir

There are two different serving signatures currently used to save the model:

default: default keras model without any pre/post processing wrapper
tfrecord: serving signature that allows keras model to be served using TFRecord proto messages.

Allows definition of custom pre/post processing logic

Additionally, each model layer is also saved as a separate numpy zipped array to enable transfer learning with other ml4ir models.

Parameters:

models_dir (str) – path to directory to save the model
preprocessing_keys_to_fns (dict) – dictionary mapping function names to tf.functions that should be saved in the preprocessing step of the tfrecord serving signature
postprocessing_fn (function) – custom tensorflow compatible postprocessing function to be used at serving time. Saved as part of the postprocessing layer of the tfrecord serving signature
required_fields_only (bool) – boolean value defining if only required fields need to be added to the tfrecord parsing function at serving time
pad_sequence (bool, optional) – Value defining if sequences should be padded for SequenceExample proto inputs at serving time. Set this to False if you want to not handle padded scores.
sub_dir (str, optional) – sub directory name to save the model into
dataset (RelevanceDataset object) – RelevanceDataset object that can optionally be passed to be used by downstream jobs that want to save the data along with the model. Note that this feature is currently unimplemented and is upto the users to override and customize.
experiment_details (dict) – Dictionary containing metadata and results about the current experiment

Notes

All the functions passed under preprocessing_keys_to_fns here must be serializable tensor graph operations

load(model_file: str) → keras.engine.training.Model¶

Loads model from the SavedModel file specified

Parameters:	model_file (str) – path to file with saved tf keras model
Returns:	Tensorflow keras model loaded from file
Return type:	tf.keras.Model

Notes

Retraining currently not supported! Would require compiling the model with the right loss and optimizer states

load_weights(model_file: str)¶

Load saved model with compile=False

Parameters:	model_file (str) – path to file with saved tf keras model

calibrate(relevance_dataset, logger, logs_dir_local, **kwargs) → Tuple[numpy.ndarray, ...]¶

Calibrate model with temperature scaling :param relevance_dataset: RelevanceDataset object to be used for training and evaluating temperature scaling :type relevance_dataset: RelevanceDataset :param logger: Logger object to log events :type logger: Logger :param logs_dir_local: path to save the calibration results. (zipped csv file containing original

probabilities, calibrated probabilities, …)

Returns:	Union[np.ndarray, Tuple[np.ndarray, …]] optimizer output containing temperature value learned during temperature scaling

add_temperature_layer(temperature: float = 1.0, layer_name: str = 'temperature_layer')¶

Add temperature layer to the input of last activation (softmax) layer :param self: input RelevanceModel object that its last layer inputs will be divided by a

temperature value

Parameters:

temperature (float) – a scalar value to scale the last activation layer inputs
layer_name (str) – name of the temperature scaling layer

Returns:

RelevanceModel
updated RelevanceModel object with temperature

RankingModel¶

class ml4ir.applications.ranking.model.ranking_model.RankingModel(feature_config: ml4ir.base.features.feature_config.FeatureConfig, tfrecord_type: str, file_io: ml4ir.base.io.file_io.FileIO, scorer: Optional[ml4ir.base.model.scoring.scoring_model.RelevanceScorer] = None, metrics: List[Union[keras.metrics.base_metric.Metric, str]] = [], optimizer: Optional[keras.optimizers.optimizer_v2.optimizer_v2.OptimizerV2] = None, model_file: Optional[str] = None, initialize_layers_dict: dict = {}, freeze_layers_list: list = [], compile_keras_model: bool = False, output_name: str = 'score', logger=None, eval_config: dict = {})¶

Bases: ml4ir.base.model.relevance_model.RelevanceModel

Constructor to instantiate a RelevanceModel that can be used for training and evaluating the search ML task

Parameters:

feature_config (FeatureConfig object) – FeatureConfig object that defines the features to be loaded in the dataset and the preprocessing functions to be applied to each of them
tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord protobuf message used for TFRecordDataset
file_io (FileIO object) – file I/O handler objects for reading and writing data
scorer (RelevanceScorer object) – Scorer object that wraps an InteractionModel and converts input features into scores
metrics (list) – List of keras Metric objects/strings that will be used for evaluating the trained model
optimizer (Optimizer) – Tensorflow keras optimizer to be used for training the model
model_file (str, optional) – Path to pretrained model file to be loaded for evaluation or retraining
initialize_layers_dict (dict, optional) – Dictionary of tensorflow layer names mapped to the path of pretrained weights Use this for transfer learning with pretrained weights
freeze_layers_list (list, optional) – List of model layer names to be frozen Use this for freezing pretrained weights from other ml4ir models
compile_keras_model (bool, optional) – Whether the keras model loaded from disk should be compiled with loss, metrics and an optimizer
output_name (str, optional) – Name of the output tensorflow node that captures the score
logger (Logger, optional) – logging handler for status messages
eval_config (dict) – A dictionary of Evaluation config parameters

predict(test_dataset: tensorflow.python.data.ops.readers.TFRecordDatasetV2, inference_signature: str = 'serving_default', additional_features: dict = {}, logs_dir: Optional[str] = None, logging_frequency: int = 25)¶

Predict the scores on the test dataset using the trained model

Parameters:	test_dataset (Dataset object) – Dataset object for which predictions are to be made inference_signature (str, optional) – If using a SavedModel for prediction, specify the inference signature to be used for computing scores additional_features (dict, optional) – Dictionary containing new feature name and function definition to compute them. Use this to compute additional features from the scores. For example, converting ranking scores for each document into ranks for the query logs_dir (str, optional) – Path to directory to save logs logging_frequency (int) – Value representing how often(in batches) to log status
Returns:	pandas DataFrame containing the predictions on the test dataset made with the RelevanceModel
Return type:	pd.DataFrame

evaluate(test_dataset: tensorflow.python.data.ops.readers.TFRecordDatasetV2, inference_signature: str = None, additional_features: dict = {}, group_metrics_min_queries: int = 50, logs_dir: Optional[str] = None, logging_frequency: int = 25, compute_intermediate_stats: bool = True)¶

Evaluate the RelevanceModel

Parameters:

test_dataset (an instance of tf.data.dataset) –
inference_signature (str, optional) – If using a SavedModel for prediction, specify the inference signature to be used for computing scores
additional_features (dict, optional) – Dictionary containing new feature name and function definition to compute them. Use this to compute additional features from the scores. For example, converting ranking scores for each document into ranks for the query
group_metrics_min_queries (int, optional) – Minimum count threshold per group to be considered for computing groupwise metrics
logs_dir (str, optional) – Path to directory to save logs
logging_frequency (int) – Value representing how often(in batches) to log status
compute_intermediate_stats (bool) – [Currently ignored] Determines if group metrics and other intermediate stats on the test set should be computed

Returns:

df_overall_metrics (pd.DataFrame object) – pd.DataFrame containing overall metrics
df_groupwise_metrics (pd.DataFrame object) – pd.DataFrame containing groupwise metrics if group_metric_keys are defined in the FeatureConfig
metrics_dict (dict) – metrics as a dictionary of metric names mapping to values

Notes

You can directly do a model.evaluate() only if the keras model is compiled

Override this method to implement your own evaluation metrics.

save(models_dir: str, preprocessing_keys_to_fns={}, postprocessing_fn=None, required_fields_only: bool = True, pad_sequence: bool = False, dataset: Optional[ml4ir.base.data.relevance_dataset.RelevanceDataset] = None, experiment_details: Optional[dict] = None)¶

Save the RelevanceModel as a tensorflow SavedModel to the models_dir Additionally, sets the score for the padded records to 0

There are two different serving signatures currently used to save the model: default: default keras model without any pre/post processing wrapper tfrecord: serving signature that allows keras model to be served using TFRecord proto messages.

Allows definition of custom pre/post processing logic

Additionally, each model layer is also saved as a separate numpy zipped array to enable transfer learning with other ml4ir models.

Parameters:

models_dir (str) – path to directory to save the model
preprocessing_keys_to_fns (dict) – dictionary mapping function names to tf.functions that should be saved in the preprocessing step of the tfrecord serving signature
postprocessing_fn (function) – custom tensorflow compatible postprocessing function to be used at serving time. Saved as part of the postprocessing layer of the tfrecord serving signature
required_fields_only (bool) – boolean value defining if only required fields need to be added to the tfrecord parsing function at serving time
pad_sequence (bool, optional) – Value defining if sequences should be padded for SequenceExample proto inputs at serving time. Set this to False if you want to not handle padded scores.
dataset (RelevanceDataset object) – RelevanceDataset object that can optionally be passed to be used by downstream jobs that want to save the data along with the model. Note that this feature is currently unimplemented and is upto the users to override and customize.
experiment_details (dict) – Dictionary containing metadata and results about the current experiment

Notes

All the functions passed under preprocessing_keys_to_fns here must be serializable tensor graph operations