Pipelines¶
RelevancePipeline¶
-
class
ml4ir.base.pipeline.RelevancePipeline(args: argparse.Namespace)¶ Bases:
objectBase class that defines a pipeline to train, evaluate and save a RelevanceModel using ml4ir
Constructor to create a RelevancePipeline object to train, evaluate and save a model on ml4ir. This method sets up data, logs, models directories, file handlers used. The method also loads and sets up the FeatureConfig for the model training pipeline
Parameters: args (argparse Namespace) – arguments to be used with the pipeline. Typically, passed from command line arguments -
setup_logging() → logging.Logger¶ Set up the logging utilities for the training pipeline Additionally, removes pre existing job status files
-
set_seeds(reset_graph=True)¶ Set the random seeds for tensorflow and numpy in order to replicate results
Parameters: reset_graph (bool) – Reset the tensorflow graph and clears the keras session
-
validate_args()¶ Validate the arguments to be used with RelevancePipeline
-
get_relevance_dataset(preprocessing_keys_to_fns={}) → ml4ir.base.data.relevance_dataset.RelevanceDataset¶ Create RelevanceDataset object by loading train, test data as tensorflow datasets
Parameters: preprocessing_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to function definitions that can now be used for preprocessing while loading the TFRecordDataset to create the RelevanceDataset object Returns: RelevanceDataset object that can be used for training and evaluating the model Return type: RelevanceDataset object Notes
Override this method to create custom dataset objects
-
get_relevance_model(feature_layer_keys_to_fns={}) → ml4ir.base.model.relevance_model.RelevanceModel¶ Creates RelevanceModel that can be used for training and evaluating
Parameters: feature_layer_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to tensorflow compatible function definitions that can now be used in the InteractionModel as a feature function to transform input features Returns: RelevanceModel that can be used for training and evaluating Return type: RelevanceModel Notes
Override this method to create custom loss, scorer, model objects
-
run()¶ Run the pipeline to train, evaluate and save the model
Notes
Also populates a experiment tracking dictionary containing the metadata, model architecture and metrics generated by the model
-
finish(job_status, job_info)¶ Wrap up the model training pipeline. Performs the following actions
- save a job status file as _SUCCESS or _FAILURE to indicate job status.
- delete temp data and models directories
- if using spark IO, transfers models and logs directories to HDFS location from local directories
- log overall run time of ml4ir job
Parameters: - job_status (str) – Tuple with first element _SUCCESS or _FAILURE second element
- job_info (str) – for _SUCCESS, is experiment tracking metrics and metadata for _FAILURE, is stacktrace of failure
-
RankingPipeline¶
-
class
ml4ir.applications.ranking.pipeline.RankingPipeline(args: argparse.Namespace)¶ Bases:
ml4ir.base.pipeline.RelevancePipelineBase class that defines a pipeline to train, evaluate and save a RankingModel using ml4ir
Constructor to create a RelevancePipeline object to train, evaluate and save a model on ml4ir. This method sets up data, logs, models directories, file handlers used. The method also loads and sets up the FeatureConfig for the model training pipeline
Parameters: args (argparse Namespace) – arguments to be used with the pipeline. Typically, passed from command line arguments -
get_relevance_model(feature_layer_keys_to_fns={}) → ml4ir.base.model.relevance_model.RelevanceModel¶ Creates a RankingModel that can be used for training and evaluating
Parameters: feature_layer_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to tensorflow compatible function definitions that can now be used in the InteractionModel as a feature function to transform input features Returns: RankingModel that can be used for training and evaluating a ranking model Return type: RankingModel Notes
Override this method to create custom loss, scorer, model objects
-
validate_args()¶ Validate the arguments to be used with RelevancePipeline
-
ClassificationPipeline¶
-
class
ml4ir.applications.classification.pipeline.ClassificationPipeline(args: argparse.Namespace)¶ Bases:
ml4ir.base.pipeline.RelevancePipelineBase class that defines a pipeline to train, evaluate and save a RelevanceModel for classification using ml4ir
Constructor to create a RelevancePipeline object to train, evaluate and save a model on ml4ir. This method sets up data, logs, models directories, file handlers used. The method also loads and sets up the FeatureConfig for the model training pipeline
Parameters: args (argparse Namespace) – arguments to be used with the pipeline. Typically, passed from command line arguments -
get_relevance_model(feature_layer_keys_to_fns={}) → ml4ir.base.model.relevance_model.RelevanceModel¶ Creates a RelevanceModel that can be used for training and evaluating
Parameters: feature_layer_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to tensorflow compatible function definitions that can now be used in the InteractionModel as a feature function to transform input features Returns: RelevanceModel that can be used for training and evaluating a classification model Return type: RelevanceModel Notes
Override this method to create custom loss, scorer, model objects
-
get_relevance_dataset(parse_tfrecord=True, preprocessing_keys_to_fns={}) → ml4ir.base.data.relevance_dataset.RelevanceDataset¶ Create RelevanceDataset object by loading train, test data as tensorflow datasets Defines a preprocessing feature function to one hot vectorize classification labels
Parameters: preprocessing_keys_to_fns (dict of (str, function)) – dictionary of function names mapped to function definitions that can now be used for preprocessing while loading the TFRecordDataset to create the RelevanceDataset object Returns: RelevanceDataset object that can be used for training and evaluating the model Return type: RelevanceDataset object Notes
Override this method to create custom dataset objects
-