Feature Configuration

FeatureConfig

class ml4ir.base.features.feature_config.FeatureConfig(features_dict, logger: Optional[logging.Logger] = None)

Bases: object

Class that defines the features and their configurations used for training, evaluating and serving a RelevanceModel on ml4ir.

features_dict

Dictionary of features containing the configuration for every feature in the model. This dictionary is used to define the FeatureConfig object.

Type:dict
logger

Logging handler to log progress messages

Type:Logging object
query_key

Dictionary containing the feature configuration for the unique data point ID, query key

Type:dict
label

Dictionary containing the feature configuration for the label field for training and evaluating the model

Type:dict
mask

Dictionary containing the feature configuration for the computed mask field which is used to identify padded values

Type:dict
features

List of dictionaries containing configurations for all the features excluding query_key and label

Type:list of dict
all_features

List of dictionaries containing configurations for all the features including query_key and label

Type:list of dict
train_features

List of dictionaries containing configurations for all the features which are used for training, identified by trainable=False

Type:list of dict
metadata_features

List of dictionaries containing configurations for all the features which are NOT used for training, identified by trainable=False. These can be used for computing custom losses and metrics.

Type:list of dict
features_to_log

List of dictionaries containing configurations for all the features which will be logged when running model.predict(), identified using log_at_inference=True

Type:list of dict
group_metrics_keys

List of dictionaries containing configurations for all the features which will be used to compute groupwise metrics

Type:list of dict

Notes

Abstract class that is overriden by ExampleFeatureConfig and SequenceExampleFeatureConfig for the respective TFRecord types

Constructor to instantiate a FeatureConfig object

Parameters:
  • features_dict (dict) – Dictionary containing the feature configuration for each of the model features
  • logger (Logging object, optional) – Logging object handler for logging progress messages
initialize_features()

Initialize the feature attributes with empty lists accordingly

static get_instance(feature_config_dict: dict, tfrecord_type: str, logger: logging.Logger)

Factory method to get FeatureConfig object from a dictionary of feature configurations based on the TFRecord type

Parameters:
  • feature_config_dict (dict) – Dictionary containing the feature definitions for all the features for the model
  • tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord message type used for the ml4ir RelevanceModel
  • logger (Logging object) – Logging object handler to log status and progress messages
Returns:

ExampleFeatureConfig or SequenceExampleFeatureConfig object computed from the feature configuration dictionary

Return type:

FeatureConfig object

extract_features()

Extract the features from the input feature config dictionary and assign to relevant FeatureConfig attributes

log_initialization()

Log initial state of FeatureConfig object after extracting all the attributes

get_query_key(key: str = None)

Getter method for query_key in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str) – Value from the query_key feature configuration to be fetched
Returns:Query key value or entire config dictionary based on if the key is passed
Return type:str or int or bool or dict
get_label(key: str = None)

Getter method for label in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str) – Value from the label feature configuration to be fetched
Returns:Label value or entire config dictionary based on if the key is passed
Return type:str or int or bool or dict
get_aux_label(key: str = None)

Getter method for label in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str) – Value from the label feature configuration to be fetched
Returns:Label value or entire config dictionary based on if the key is passed
Return type:str or int or bool or dict
get_mask(key: str = None)

Getter method for mask in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str) – Value from the mask feature configuration to be fetched
Returns:Label value or entire config dictionary based on if the key is passed
Return type:str or int or bool or dict
get_feature_by_node_name(name: str)

Getter method for feature by node name in FeatureConfig object

Parameters:name (str) – Name of the feature node name to fetch
Returns:Feature config dictionary for the name of the feature passed
Return type:dict
get_feature(name: str)

Getter method for feature in FeatureConfig object

Parameters:name (str) – Name of the feature to fetch
Returns:Feature config dictionary for the name of the feature passed
Return type:dict
feature_exists(name: str, trainable=True)

Check if a feature exists in FeatureConfig object

Parameters:name (str) – Name of the feature to fetch
Returns:If a feature exists
Return type:Boolean
set_feature(name: str, new_feature_info: dict)

Setter method to set the feature_info of a feature in the FeatureConfig as specified by the name argument

Parameters:
  • name (str) – name of feature whose feature_info is to be updated
  • new_feature_info (dict) – dictionary used to set the feature_info for the feature with specified name
get_all_features(key: str = None, include_label: bool = True, include_mask: bool = True)

Getter method for all_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:
  • key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned
  • include_label (bool, optional) – Include label in list of features returned
  • include_mask (bool, optional) – Include mask in the list of features returned. Only applicable with SequenceExampleFeatureConfig currently
Returns:

Lift of feature configuration dictionaries or values for all features in FeatureConfig

Return type:

list

get_train_features(key: str = None)

Getter method for train_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned
Returns:Lift of feature configuration dictionaries or values for trainable features in FeatureConfig
Return type:list
get_metadata_features(key: str = None)

Getter method for metadata_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned
Returns:Lift of feature configuration dictionaries or values for metadata features in FeatureConfig
Return type:list
get_features_to_log(key: str = None)

Getter method for features_to_log in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned
Returns:Lift of feature configuration dictionaries or values for features to be logged at inference
Return type:list
get_group_metrics_keys(key: str = None)

Getter method for group_metrics_keys in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned
Returns:Lift of feature configuration dictionaries or values for features used to compute groupwise metrics
Return type:list
get_dtype(feature_info: dict)

Retrieve data type of a feature

Parameters:feature_info (dict) – Dictionary containing configuration for the feature
Returns:Data type of the feature
Return type:str
get_default_value(feature_info)

Retrieve default value of a feature

Parameters:feature_info (dict) – Dictionary containing configuration for the feature
Returns:Default value of the feature
Return type:str or int or float
create_dummy_protobuf(num_records=1, required_only=False)

Generate a dummy TFRecord protobuffer with dummy values

Parameters:
  • num_records (int) – Number of records or sequence features per TFRecord message to fetch
  • required_only (bool) – Whether to fetch on fields with required_only=True
Returns:

Example or SequenceExample object with dummy values generated from the FeatureConfig

Return type:

protobuffer object

get_hyperparameter_dict()

Create hyperparameter configs to track model metadata for best model selection Unwraps the feature config for each of the features to add preprocessing_info and feature_layer_info as key value pairs that can be tracked across the experiment. This can be used to identify the values that were set for the different feature layers in a given experiment. Will be used during best model selection and Hyper Parameter Optimization.

Returns:Flattened dictionary of important configuration keys and values that can be used for tracking the experiment run
Return type:dict

ExampleFeatureConfig

class ml4ir.base.features.feature_config.ExampleFeatureConfig(features_dict, logger: Optional[logging.Logger] = None)

Bases: ml4ir.base.features.feature_config.FeatureConfig

Class that defines the features and their configurations used for training, evaluating and serving a RelevanceModel on ml4ir for Example data

features_dict

Dictionary of features containing the configuration for every feature in the model. This dictionary is used to define the FeatureConfig object.

Type:dict
logger

Logging handler to log progress messages

Type:Logging object
query_key

Dictionary containing the feature configuration for the unique data point ID, query key

Type:dict
label

Dictionary containing the feature configuration for the label field for training and evaluating the model

Type:dict
features

List of dictionaries containing configurations for all the features excluding query_key and label

Type:list of dict
all_features

List of dictionaries containing configurations for all the features including query_key and label

Type:list of dict
train_features

List of dictionaries containing configurations for all the features which are used for training, identified by trainable=False

Type:list of dict
metadata_features

List of dictionaries containing configurations for all the features which are NOT used for training, identified by trainable=False. These can be used for computing custom losses and metrics.

Type:list of dict
features_to_log

List of dictionaries containing configurations for all the features which will be logged when running model.predict(), identified using log_at_inference=True

Type:list of dict
group_metrics_keys

List of dictionaries containing configurations for all the features which will be used to compute groupwise metrics

Type:list of dict

Constructor to instantiate a FeatureConfig object

Parameters:
  • features_dict (dict) – Dictionary containing the feature configuration for each of the model features
  • logger (Logging object, optional) – Logging object handler for logging progress messages
create_dummy_protobuf(num_records=1, required_only=False)

Create a SequenceExample protobuffer with dummy values

SequenceExampleFeatureConfig

class ml4ir.base.features.feature_config.SequenceExampleFeatureConfig(features_dict, logger)

Bases: ml4ir.base.features.feature_config.FeatureConfig

Class that defines the features and their configurations used for training, evaluating and serving a RelevanceModel on ml4ir for SequenceExample data

features_dict

Dictionary of features containing the configuration for every feature in the model. This dictionary is used to define the FeatureConfig object.

Type:dict
logger

Logging handler to log progress messages

Type:Logging object
query_key

Dictionary containing the feature configuration for the unique data point ID, query key

Type:dict
label

Dictionary containing the feature configuration for the label field for training and evaluating the model

Type:dict
rank

Dictionary containing the feature configuration for the rank field for training and evaluating the model. rank is used to assign an ordering to the sequences in the SequenceExample

Type:dict
mask

Dictionary containing the feature configuration for the mask field for training and evaluating the model. mask is used to identify which sequence features are padded. A value of 1 represents an existing sequence feature and 0 represents a padded sequence feature.

Type:dict
features

List of dictionaries containing configurations for all the features excluding query_key and label

Type:list of dict
all_features

List of dictionaries containing configurations for all the features including query_key and label

Type:list of dict
context_features

List of dictionaries containing configurations for all the features which represent the features common to the entire sequence in a protobuf message

Type:list of dict
sequence_features

List of dictionaries containing configurations for all the features which represent the feature unique to a sequence

Type:list of dict
train_features

List of dictionaries containing configurations for all the features which are used for training, identified by trainable=False

Type:list of dict
metadata_features

List of dictionaries containing configurations for all the features which are NOT used for training, identified by trainable=False. These can be used for computing custom losses and metrics.

Type:list of dict
features_to_log

List of dictionaries containing configurations for all the features which will be logged when running model.predict(), identified using log_at_inference=True

Type:list of dict
group_metrics_keys

List of dictionaries containing configurations for all the features which will be used to compute groupwise metrics

Type:list of dict

Constructor to instantiate a FeatureConfig object

Parameters:
  • features_dict (dict) – Dictionary containing the feature configuration for each of the model features
  • logger (Logging object, optional) – Logging object handler for logging progress messages
initialize_features()

Initialize the feature attributes with empty lists accordingly

extract_features()

Extract the features from the input feature config dictionary and assign to relevant FeatureConfig attributes

get_context_features(key: str = None)

Getter method for context_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned
Returns:Lift of feature configuration dictionaries or values for context features common to all sequence
Return type:list
get_sequence_features(key: str = None)

Getter method for sequence_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned
Returns:Lift of feature configuration dictionaries or values for sequence features unique to each sequence
Return type:list
log_initialization()

Log initial state of FeatureConfig object after extracting all the attributes

generate_mask()

Add mask information used to flag padded records. In order to create a batch of sequence examples from n TFRecords, we need to make sure that they all have the same number of sequences. To do this, we pad sequence records to a fixed max_sequence_size. Now, we do not want to use these additional padded sequence records to compute metrics and losses. Hence we maintain a boolean mask to tell ml4ir the sequence records that were originally present.

In this method, we add the feature_info for the above mask feature as it is not implicitly present in the data.

Returns:Dictionary configuration for the mask field that captures which sequence have been masked in a SequenceExample message
Return type:dict
get_rank(key: str = None)

Getter method for rank in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str) – Value from the rank feature configuration to be fetched
Returns:Rank value or entire config dictionary based on if the key is passed
Return type:str or int or bool or dict
get_mask(key: str = None)

Getter method for mask in FeatureConfig object Can additionally be used to only fetch a particular value from the dict

Parameters:key (str) – Value from the mask feature configuration to be fetched
Returns:Mask value or entire config dictionary based on if the key is passed
Return type:str or int or bool or dict
create_dummy_protobuf(num_records=1, required_only=False)

Generate a dummy TFRecord protobuffer with dummy values

Parameters:
  • num_records (int) – Number of records or sequence features per TFRecord message to fetch
  • required_only (bool) – Whether to fetch on fields with required_only=True
Returns:

Example or SequenceExample object with dummy values generated from the FeatureConfig

Return type:

protobuffer object