Feature Configuration¶
FeatureConfig¶
-
class
ml4ir.base.features.feature_config.
FeatureConfig
(features_dict, logger: Optional[logging.Logger] = None)¶ Bases:
object
Class that defines the features and their configurations used for training, evaluating and serving a RelevanceModel on ml4ir.
-
features_dict
¶ Dictionary of features containing the configuration for every feature in the model. This dictionary is used to define the FeatureConfig object.
Type: dict
-
logger
¶ Logging handler to log progress messages
Type: Logging object
-
query_key
¶ Dictionary containing the feature configuration for the unique data point ID, query key
Type: dict
-
label
¶ Dictionary containing the feature configuration for the label field for training and evaluating the model
Type: dict
-
mask
¶ Dictionary containing the feature configuration for the computed mask field which is used to identify padded values
Type: dict
-
features
¶ List of dictionaries containing configurations for all the features excluding query_key and label
Type: list of dict
-
all_features
¶ List of dictionaries containing configurations for all the features including query_key and label
Type: list of dict
-
train_features
¶ List of dictionaries containing configurations for all the features which are used for training, identified by trainable=False
Type: list of dict
-
metadata_features
¶ List of dictionaries containing configurations for all the features which are NOT used for training, identified by trainable=False. These can be used for computing custom losses and metrics.
Type: list of dict
-
features_to_log
¶ List of dictionaries containing configurations for all the features which will be logged when running model.predict(), identified using log_at_inference=True
Type: list of dict
-
group_metrics_keys
¶ List of dictionaries containing configurations for all the features which will be used to compute groupwise metrics
Type: list of dict
Notes
Abstract class that is overriden by ExampleFeatureConfig and SequenceExampleFeatureConfig for the respective TFRecord types
Constructor to instantiate a FeatureConfig object
Parameters: - features_dict (dict) – Dictionary containing the feature configuration for each of the model features
- logger (Logging object, optional) – Logging object handler for logging progress messages
-
initialize_features
()¶ Initialize the feature attributes with empty lists accordingly
-
static
get_instance
(feature_config_dict: dict, tfrecord_type: str, logger: logging.Logger)¶ Factory method to get FeatureConfig object from a dictionary of feature configurations based on the TFRecord type
Parameters: - feature_config_dict (dict) – Dictionary containing the feature definitions for all the features for the model
- tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord message type used for the ml4ir RelevanceModel
- logger (Logging object) – Logging object handler to log status and progress messages
Returns: ExampleFeatureConfig or SequenceExampleFeatureConfig object computed from the feature configuration dictionary
Return type: FeatureConfig object
-
extract_features
()¶ Extract the features from the input feature config dictionary and assign to relevant FeatureConfig attributes
-
log_initialization
()¶ Log initial state of FeatureConfig object after extracting all the attributes
-
get_query_key
(key: str = None)¶ Getter method for query_key in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the query_key feature configuration to be fetched Returns: Query key value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
get_label
(key: str = None)¶ Getter method for label in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the label feature configuration to be fetched Returns: Label value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
get_aux_label
(key: str = None)¶ Getter method for label in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the label feature configuration to be fetched Returns: Label value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
get_mask
(key: str = None)¶ Getter method for mask in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the mask feature configuration to be fetched Returns: Label value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
get_feature_by_node_name
(name: str)¶ Getter method for feature by node name in FeatureConfig object
Parameters: name (str) – Name of the feature node name to fetch Returns: Feature config dictionary for the name of the feature passed Return type: dict
-
get_feature
(name: str)¶ Getter method for feature in FeatureConfig object
Parameters: name (str) – Name of the feature to fetch Returns: Feature config dictionary for the name of the feature passed Return type: dict
-
feature_exists
(name: str, trainable=True)¶ Check if a feature exists in FeatureConfig object
Parameters: name (str) – Name of the feature to fetch Returns: If a feature exists Return type: Boolean
-
set_feature
(name: str, new_feature_info: dict)¶ Setter method to set the feature_info of a feature in the FeatureConfig as specified by the name argument
Parameters: - name (str) – name of feature whose feature_info is to be updated
- new_feature_info (dict) – dictionary used to set the feature_info for the feature with specified name
-
get_all_features
(key: str = None, include_label: bool = True, include_mask: bool = True)¶ Getter method for all_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: - key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned
- include_label (bool, optional) – Include label in list of features returned
- include_mask (bool, optional) – Include mask in the list of features returned. Only applicable with SequenceExampleFeatureConfig currently
Returns: Lift of feature configuration dictionaries or values for all features in FeatureConfig
Return type: list
-
get_train_features
(key: str = None)¶ Getter method for train_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for trainable features in FeatureConfig Return type: list
-
get_metadata_features
(key: str = None)¶ Getter method for metadata_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for metadata features in FeatureConfig Return type: list
-
get_features_to_log
(key: str = None)¶ Getter method for features_to_log in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for features to be logged at inference Return type: list
-
get_group_metrics_keys
(key: str = None)¶ Getter method for group_metrics_keys in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for features used to compute groupwise metrics Return type: list
-
get_dtype
(feature_info: dict)¶ Retrieve data type of a feature
Parameters: feature_info (dict) – Dictionary containing configuration for the feature Returns: Data type of the feature Return type: str
-
get_default_value
(feature_info)¶ Retrieve default value of a feature
Parameters: feature_info (dict) – Dictionary containing configuration for the feature Returns: Default value of the feature Return type: str or int or float
-
create_dummy_protobuf
(num_records=1, required_only=False)¶ Generate a dummy TFRecord protobuffer with dummy values
Parameters: - num_records (int) – Number of records or sequence features per TFRecord message to fetch
- required_only (bool) – Whether to fetch on fields with required_only=True
Returns: Example or SequenceExample object with dummy values generated from the FeatureConfig
Return type: protobuffer object
-
get_hyperparameter_dict
()¶ Create hyperparameter configs to track model metadata for best model selection Unwraps the feature config for each of the features to add preprocessing_info and feature_layer_info as key value pairs that can be tracked across the experiment. This can be used to identify the values that were set for the different feature layers in a given experiment. Will be used during best model selection and Hyper Parameter Optimization.
Returns: Flattened dictionary of important configuration keys and values that can be used for tracking the experiment run Return type: dict
-
ExampleFeatureConfig¶
-
class
ml4ir.base.features.feature_config.
ExampleFeatureConfig
(features_dict, logger: Optional[logging.Logger] = None)¶ Bases:
ml4ir.base.features.feature_config.FeatureConfig
Class that defines the features and their configurations used for training, evaluating and serving a RelevanceModel on ml4ir for Example data
-
features_dict
¶ Dictionary of features containing the configuration for every feature in the model. This dictionary is used to define the FeatureConfig object.
Type: dict
-
logger
¶ Logging handler to log progress messages
Type: Logging object
-
query_key
¶ Dictionary containing the feature configuration for the unique data point ID, query key
Type: dict
-
label
¶ Dictionary containing the feature configuration for the label field for training and evaluating the model
Type: dict
-
features
¶ List of dictionaries containing configurations for all the features excluding query_key and label
Type: list of dict
-
all_features
¶ List of dictionaries containing configurations for all the features including query_key and label
Type: list of dict
-
train_features
¶ List of dictionaries containing configurations for all the features which are used for training, identified by trainable=False
Type: list of dict
-
metadata_features
¶ List of dictionaries containing configurations for all the features which are NOT used for training, identified by trainable=False. These can be used for computing custom losses and metrics.
Type: list of dict
-
features_to_log
¶ List of dictionaries containing configurations for all the features which will be logged when running model.predict(), identified using log_at_inference=True
Type: list of dict
-
group_metrics_keys
¶ List of dictionaries containing configurations for all the features which will be used to compute groupwise metrics
Type: list of dict
Constructor to instantiate a FeatureConfig object
Parameters: - features_dict (dict) – Dictionary containing the feature configuration for each of the model features
- logger (Logging object, optional) – Logging object handler for logging progress messages
-
create_dummy_protobuf
(num_records=1, required_only=False)¶ Create a SequenceExample protobuffer with dummy values
-
SequenceExampleFeatureConfig¶
-
class
ml4ir.base.features.feature_config.
SequenceExampleFeatureConfig
(features_dict, logger)¶ Bases:
ml4ir.base.features.feature_config.FeatureConfig
Class that defines the features and their configurations used for training, evaluating and serving a RelevanceModel on ml4ir for SequenceExample data
-
features_dict
¶ Dictionary of features containing the configuration for every feature in the model. This dictionary is used to define the FeatureConfig object.
Type: dict
-
logger
¶ Logging handler to log progress messages
Type: Logging object
-
query_key
¶ Dictionary containing the feature configuration for the unique data point ID, query key
Type: dict
-
label
¶ Dictionary containing the feature configuration for the label field for training and evaluating the model
Type: dict
-
rank
¶ Dictionary containing the feature configuration for the rank field for training and evaluating the model. rank is used to assign an ordering to the sequences in the SequenceExample
Type: dict
-
mask
¶ Dictionary containing the feature configuration for the mask field for training and evaluating the model. mask is used to identify which sequence features are padded. A value of 1 represents an existing sequence feature and 0 represents a padded sequence feature.
Type: dict
-
features
¶ List of dictionaries containing configurations for all the features excluding query_key and label
Type: list of dict
-
all_features
¶ List of dictionaries containing configurations for all the features including query_key and label
Type: list of dict
-
context_features
¶ List of dictionaries containing configurations for all the features which represent the features common to the entire sequence in a protobuf message
Type: list of dict
-
sequence_features
¶ List of dictionaries containing configurations for all the features which represent the feature unique to a sequence
Type: list of dict
-
train_features
¶ List of dictionaries containing configurations for all the features which are used for training, identified by trainable=False
Type: list of dict
-
metadata_features
¶ List of dictionaries containing configurations for all the features which are NOT used for training, identified by trainable=False. These can be used for computing custom losses and metrics.
Type: list of dict
-
features_to_log
¶ List of dictionaries containing configurations for all the features which will be logged when running model.predict(), identified using log_at_inference=True
Type: list of dict
-
group_metrics_keys
¶ List of dictionaries containing configurations for all the features which will be used to compute groupwise metrics
Type: list of dict
Constructor to instantiate a FeatureConfig object
Parameters: - features_dict (dict) – Dictionary containing the feature configuration for each of the model features
- logger (Logging object, optional) – Logging object handler for logging progress messages
-
initialize_features
()¶ Initialize the feature attributes with empty lists accordingly
-
extract_features
()¶ Extract the features from the input feature config dictionary and assign to relevant FeatureConfig attributes
-
get_context_features
(key: str = None)¶ Getter method for context_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for context features common to all sequence Return type: list
-
get_sequence_features
(key: str = None)¶ Getter method for sequence_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for sequence features unique to each sequence Return type: list
-
log_initialization
()¶ Log initial state of FeatureConfig object after extracting all the attributes
-
generate_mask
()¶ Add mask information used to flag padded records. In order to create a batch of sequence examples from n TFRecords, we need to make sure that they all have the same number of sequences. To do this, we pad sequence records to a fixed max_sequence_size. Now, we do not want to use these additional padded sequence records to compute metrics and losses. Hence we maintain a boolean mask to tell ml4ir the sequence records that were originally present.
In this method, we add the feature_info for the above mask feature as it is not implicitly present in the data.
Returns: Dictionary configuration for the mask field that captures which sequence have been masked in a SequenceExample message Return type: dict
-
get_rank
(key: str = None)¶ Getter method for rank in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the rank feature configuration to be fetched Returns: Rank value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
get_mask
(key: str = None)¶ Getter method for mask in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the mask feature configuration to be fetched Returns: Mask value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
create_dummy_protobuf
(num_records=1, required_only=False)¶ Generate a dummy TFRecord protobuffer with dummy values
Parameters: - num_records (int) – Number of records or sequence features per TFRecord message to fetch
- required_only (bool) – Whether to fetch on fields with required_only=True
Returns: Example or SequenceExample object with dummy values generated from the FeatureConfig
Return type: protobuffer object
-