Pipelines

RelevancePipeline

class ml4ir.base.pipeline.RelevancePipeline(args: argparse.Namespace)

Bases: object

Base class that defines a pipeline to train, evaluate and save a RelevanceModel using ml4ir

Constructor to create a RelevancePipeline object to train, evaluate and save a model on ml4ir. This method sets up data, logs, models directories, file handlers used. The method also loads and sets up the FeatureConfig for the model training pipeline

Parameters:args (argparse Namespace) – arguments to be used with the pipeline. Typically, passed from command line arguments
setup_logging() → logging.Logger

Set up the logging utilities for the training pipeline Additionally, removes pre existing job status files

set_seeds(reset_graph=True)

Set the random seeds for tensorflow and numpy in order to replicate results

Parameters:reset_graph (bool) – Reset the tensorflow graph and clears the keras session
validate_args()

Validate the arguments to be used with RelevancePipeline

get_relevance_dataset(preprocessing_keys_to_fns={}) → ml4ir.base.data.relevance_dataset.RelevanceDataset

Create RelevanceDataset object by loading train, test data as tensorflow datasets

Parameters:preprocessing_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to function definitions that can now be used for preprocessing while loading the TFRecordDataset to create the RelevanceDataset object
Returns:RelevanceDataset object that can be used for training and evaluating the model
Return type:RelevanceDataset object

Notes

Override this method to create custom dataset objects

get_relevance_model(feature_layer_keys_to_fns={}) → ml4ir.base.model.relevance_model.RelevanceModel

Creates RelevanceModel that can be used for training and evaluating

Parameters:feature_layer_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to tensorflow compatible function definitions that can now be used in the InteractionModel as a feature function to transform input features
Returns:RelevanceModel that can be used for training and evaluating
Return type:RelevanceModel

Notes

Override this method to create custom loss, scorer, model objects

run()

Run the pipeline to train, evaluate and save the model

Notes

Also populates a experiment tracking dictionary containing the metadata, model architecture and metrics generated by the model

finish(job_status, job_info)

Wrap up the model training pipeline. Performs the following actions

  • save a job status file as _SUCCESS or _FAILURE to indicate job status.
  • delete temp data and models directories
  • if using spark IO, transfers models and logs directories to HDFS location from local directories
  • log overall run time of ml4ir job
Parameters:
  • job_status (str) – Tuple with first element _SUCCESS or _FAILURE second element
  • job_info (str) – for _SUCCESS, is experiment tracking metrics and metadata for _FAILURE, is stacktrace of failure

RankingPipeline

class ml4ir.applications.ranking.pipeline.RankingPipeline(args: argparse.Namespace)

Bases: ml4ir.base.pipeline.RelevancePipeline

Base class that defines a pipeline to train, evaluate and save a RankingModel using ml4ir

Constructor to create a RelevancePipeline object to train, evaluate and save a model on ml4ir. This method sets up data, logs, models directories, file handlers used. The method also loads and sets up the FeatureConfig for the model training pipeline

Parameters:args (argparse Namespace) – arguments to be used with the pipeline. Typically, passed from command line arguments
get_relevance_model(feature_layer_keys_to_fns={}) → ml4ir.base.model.relevance_model.RelevanceModel

Creates a RankingModel that can be used for training and evaluating

Parameters:feature_layer_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to tensorflow compatible function definitions that can now be used in the InteractionModel as a feature function to transform input features
Returns:RankingModel that can be used for training and evaluating a ranking model
Return type:RankingModel

Notes

Override this method to create custom loss, scorer, model objects

validate_args()

Validate the arguments to be used with RelevancePipeline

ClassificationPipeline

class ml4ir.applications.classification.pipeline.ClassificationPipeline(args: argparse.Namespace)

Bases: ml4ir.base.pipeline.RelevancePipeline

Base class that defines a pipeline to train, evaluate and save a RelevanceModel for classification using ml4ir

Constructor to create a RelevancePipeline object to train, evaluate and save a model on ml4ir. This method sets up data, logs, models directories, file handlers used. The method also loads and sets up the FeatureConfig for the model training pipeline

Parameters:args (argparse Namespace) – arguments to be used with the pipeline. Typically, passed from command line arguments
get_relevance_model(feature_layer_keys_to_fns={}) → ml4ir.base.model.relevance_model.RelevanceModel

Creates a RelevanceModel that can be used for training and evaluating

Parameters:feature_layer_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to tensorflow compatible function definitions that can now be used in the InteractionModel as a feature function to transform input features
Returns:RelevanceModel that can be used for training and evaluating a classification model
Return type:RelevanceModel

Notes

Override this method to create custom loss, scorer, model objects

get_relevance_dataset(parse_tfrecord=True, preprocessing_keys_to_fns={}) → ml4ir.base.data.relevance_dataset.RelevanceDataset

Create RelevanceDataset object by loading train, test data as tensorflow datasets Defines a preprocessing feature function to one hot vectorize classification labels

Parameters:preprocessing_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to function definitions that can now be used for preprocessing while loading the TFRecordDataset to create the RelevanceDataset object
Returns:RelevanceDataset object that can be used for training and evaluating the model
Return type:RelevanceDataset object

Notes

Override this method to create custom dataset objects