Feature Configuration¶
FeatureConfig¶
-
class
ml4ir.base.features.feature_config.FeatureConfig(features_dict, logger: Optional[logging.Logger] = None)¶ Bases:
objectClass that defines the features and their configurations used for training, evaluating and serving a RelevanceModel on ml4ir.
-
features_dict¶ Dictionary of features containing the configuration for every feature in the model. This dictionary is used to define the FeatureConfig object.
Type: dict
-
logger¶ Logging handler to log progress messages
Type: Logging object
-
query_key¶ Dictionary containing the feature configuration for the unique data point ID, query key
Type: dict
-
label¶ Dictionary containing the feature configuration for the label field for training and evaluating the model
Type: dict
-
mask¶ Dictionary containing the feature configuration for the computed mask field which is used to identify padded values
Type: dict
-
features¶ List of dictionaries containing configurations for all the features excluding query_key and label
Type: list of dict
-
all_features¶ List of dictionaries containing configurations for all the features including query_key and label
Type: list of dict
-
train_features¶ List of dictionaries containing configurations for all the features which are used for training, identified by trainable=False
Type: list of dict
-
metadata_features¶ List of dictionaries containing configurations for all the features which are NOT used for training, identified by trainable=False. These can be used for computing custom losses and metrics.
Type: list of dict
-
features_to_log¶ List of dictionaries containing configurations for all the features which will be logged when running model.predict(), identified using log_at_inference=True
Type: list of dict
-
group_metrics_keys¶ List of dictionaries containing configurations for all the features which will be used to compute groupwise metrics
Type: list of dict
-
secondary_labels¶ List of dictionaries containing configurations for all the features which will be used as secondary labels to compute secondary metrics. The implementation of the secondary metrics and the usage of the secondary labels is up to the users of ml4ir
Type: list of dict
Notes
Abstract class that is overriden by ExampleFeatureConfig and SequenceExampleFeatureConfig for the respective TFRecord types
Constructor to instantiate a FeatureConfig object
Parameters: - features_dict (dict) – Dictionary containing the feature configuration for each of the model features
- logger (Logging object, optional) – Logging object handler for logging progress messages
-
initialize_features()¶ Initialize the feature attributes with empty lists accordingly
-
static
get_instance(feature_config_dict: dict, tfrecord_type: str, logger: logging.Logger)¶ Factory method to get FeatureConfig object from a dictionary of feature configurations based on the TFRecord type
Parameters: - feature_config_dict (dict) – Dictionary containing the feature definitions for all the features for the model
- tfrecord_type ({"example", "sequence_example"}) – Type of the TFRecord message type used for the ml4ir RelevanceModel
- logger (Logging object) – Logging object handler to log status and progress messages
Returns: ExampleFeatureConfig or SequenceExampleFeatureConfig object computed from the feature configuration dictionary
Return type: FeatureConfig object
-
extract_features()¶ Extract the features from the input feature config dictionary and assign to relevant FeatureConfig attributes
-
log_initialization()¶ Log initial state of FeatureConfig object after extracting all the attributes
-
get_query_key(key: str = None)¶ Getter method for query_key in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the query_key feature configuration to be fetched Returns: Query key value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
get_label(key: str = None)¶ Getter method for label in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the label feature configuration to be fetched Returns: Label value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
get_mask(key: str = None)¶ Getter method for mask in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the mask feature configuration to be fetched Returns: Label value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
get_feature_by_node_name(name: str)¶ Getter method for feature by node name in FeatureConfig object
Parameters: name (str) – Name of the feature node name to fetch Returns: Feature config dictionary for the name of the feature passed Return type: dict
-
get_feature(name: str)¶ Getter method for feature in FeatureConfig object
Parameters: name (str) – Name of the feature to fetch Returns: Feature config dictionary for the name of the feature passed Return type: dict
-
feature_exists(name: str, trainable=True)¶ Check if a feature exists in FeatureConfig object
Parameters: name (str) – Name of the feature to fetch Returns: If a feature exists Return type: Boolean
-
set_feature(name: str, new_feature_info: dict)¶ Setter method to set the feature_info of a feature in the FeatureConfig as specified by the name argument
Parameters: - name (str) – name of feature whose feature_info is to be updated
- new_feature_info (dict) – dictionary used to set the feature_info for the feature with specified name
-
get_all_features(key: str = None, include_label: bool = True, include_mask: bool = True)¶ Getter method for all_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: - key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned
- include_label (bool, optional) – Include label in list of features returned
- include_mask (bool, optional) – Include mask in the list of features returned. Only applicable with SequenceExampleFeatureConfig currently
Returns: Lift of feature configuration dictionaries or values for all features in FeatureConfig
Return type: list
-
get_train_features(key: str = None)¶ Getter method for train_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for trainable features in FeatureConfig Return type: list
-
get_metadata_features(key: str = None)¶ Getter method for metadata_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for metadata features in FeatureConfig Return type: list
-
get_features_to_log(key: str = None)¶ Getter method for features_to_log in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for features to be logged at inference Return type: list
-
get_group_metrics_keys(key: str = None)¶ Getter method for group_metrics_keys in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for features used to compute groupwise metrics Return type: list
-
get_secondary_labels(key: str = None)¶ Getter method for secondary_labels in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for features to be used as secondary labels Return type: list
-
get_dtype(feature_info: dict)¶ Retrieve data type of a feature
Parameters: feature_info (dict) – Dictionary containing configuration for the feature Returns: Data type of the feature Return type: str
-
get_default_value(feature_info)¶ Retrieve default value of a feature
Parameters: feature_info (dict) – Dictionary containing configuration for the feature Returns: Default value of the feature Return type: str or int or float
-
define_inputs() → Dict[str, tensorflow.python.keras.engine.input_layer.Input]¶ Define the keras input placeholder tensors for the tensorflow model
Returns: Dictionary of tensorflow graph input nodes Return type: dict
-
create_dummy_protobuf(num_records=1, required_only=False)¶ Generate a dummy TFRecord protobuffer with dummy values
Parameters: - num_records (int) – Number of records or sequence features per TFRecord message to fetch
- required_only (bool) – Whether to fetch on fields with required_only=True
Returns: Example or SequenceExample object with dummy values generated from the FeatureConfig
Return type: protobuffer object
-
get_hyperparameter_dict()¶ Create hyperparameter configs to track model metadata for best model selection Unwraps the feature config for each of the features to add preprocessing_info and feature_layer_info as key value pairs that can be tracked across the experiment. This can be used to identify the values that were set for the different feature layers in a given experiment. Will be used during best model selection and Hyper Parameter Optimization.
Returns: Flattened dictionary of important configuration keys and values that can be used for tracking the experiment run Return type: dict
-
ExampleFeatureConfig¶
-
class
ml4ir.base.features.feature_config.ExampleFeatureConfig(features_dict, logger: Optional[logging.Logger] = None)¶ Bases:
ml4ir.base.features.feature_config.FeatureConfigClass that defines the features and their configurations used for training, evaluating and serving a RelevanceModel on ml4ir for Example data
-
features_dict¶ Dictionary of features containing the configuration for every feature in the model. This dictionary is used to define the FeatureConfig object.
Type: dict
-
logger¶ Logging handler to log progress messages
Type: Logging object
-
query_key¶ Dictionary containing the feature configuration for the unique data point ID, query key
Type: dict
-
label¶ Dictionary containing the feature configuration for the label field for training and evaluating the model
Type: dict
-
features¶ List of dictionaries containing configurations for all the features excluding query_key and label
Type: list of dict
-
all_features¶ List of dictionaries containing configurations for all the features including query_key and label
Type: list of dict
-
train_features¶ List of dictionaries containing configurations for all the features which are used for training, identified by trainable=False
Type: list of dict
-
metadata_features¶ List of dictionaries containing configurations for all the features which are NOT used for training, identified by trainable=False. These can be used for computing custom losses and metrics.
Type: list of dict
-
features_to_log¶ List of dictionaries containing configurations for all the features which will be logged when running model.predict(), identified using log_at_inference=True
Type: list of dict
-
group_metrics_keys¶ List of dictionaries containing configurations for all the features which will be used to compute groupwise metrics
Type: list of dict
-
secondary_labels¶ List of dictionaries containing configurations for all the features which will be used as secondary labels to compute secondary metrics. The implementation of the secondary metrics and the usage of the secondary labels is up to the users of ml4ir
Type: list of dict
Constructor to instantiate a FeatureConfig object
Parameters: - features_dict (dict) – Dictionary containing the feature configuration for each of the model features
- logger (Logging object, optional) – Logging object handler for logging progress messages
-
create_dummy_protobuf(num_records=1, required_only=False)¶ Create a SequenceExample protobuffer with dummy values
-
SequenceExampleFeatureConfig¶
-
class
ml4ir.base.features.feature_config.SequenceExampleFeatureConfig(features_dict, logger)¶ Bases:
ml4ir.base.features.feature_config.FeatureConfigClass that defines the features and their configurations used for training, evaluating and serving a RelevanceModel on ml4ir for SequenceExample data
-
features_dict¶ Dictionary of features containing the configuration for every feature in the model. This dictionary is used to define the FeatureConfig object.
Type: dict
-
logger¶ Logging handler to log progress messages
Type: Logging object
-
query_key¶ Dictionary containing the feature configuration for the unique data point ID, query key
Type: dict
-
label¶ Dictionary containing the feature configuration for the label field for training and evaluating the model
Type: dict
-
rank¶ Dictionary containing the feature configuration for the rank field for training and evaluating the model. rank is used to assign an ordering to the sequences in the SequenceExample
Type: dict
-
mask¶ Dictionary containing the feature configuration for the mask field for training and evaluating the model. mask is used to identify which sequence features are padded. A value of 1 represents an existing sequence feature and 0 represents a padded sequence feature.
Type: dict
-
features¶ List of dictionaries containing configurations for all the features excluding query_key and label
Type: list of dict
-
all_features¶ List of dictionaries containing configurations for all the features including query_key and label
Type: list of dict
-
context_features¶ List of dictionaries containing configurations for all the features which represent the features common to the entire sequence in a protobuf message
Type: list of dict
-
sequence_features¶ List of dictionaries containing configurations for all the features which represent the feature unique to a sequence
Type: list of dict
-
train_features¶ List of dictionaries containing configurations for all the features which are used for training, identified by trainable=False
Type: list of dict
-
metadata_features¶ List of dictionaries containing configurations for all the features which are NOT used for training, identified by trainable=False. These can be used for computing custom losses and metrics.
Type: list of dict
-
features_to_log¶ List of dictionaries containing configurations for all the features which will be logged when running model.predict(), identified using log_at_inference=True
Type: list of dict
-
group_metrics_keys¶ List of dictionaries containing configurations for all the features which will be used to compute groupwise metrics
Type: list of dict
-
secondary_labels¶ List of dictionaries containing configurations for all the features which will be used as secondary labels to compute secondary metrics. The implementation of the secondary metrics and the usage of the secondary labels is up to the users of ml4ir
Type: list of dict
Constructor to instantiate a FeatureConfig object
Parameters: - features_dict (dict) – Dictionary containing the feature configuration for each of the model features
- logger (Logging object, optional) – Logging object handler for logging progress messages
-
initialize_features()¶ Initialize the feature attributes with empty lists accordingly
-
extract_features()¶ Extract the features from the input feature config dictionary and assign to relevant FeatureConfig attributes
-
get_context_features(key: str = None)¶ Getter method for context_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for context features common to all sequence Return type: list
-
get_sequence_features(key: str = None)¶ Getter method for sequence_features in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str, optional) – Name of the configuration key to be fetched. If None, then entire dictionary for the feature is returned Returns: Lift of feature configuration dictionaries or values for sequence features unique to each sequence Return type: list
-
log_initialization()¶ Log initial state of FeatureConfig object after extracting all the attributes
-
generate_mask()¶ Add mask information used to flag padded records. In order to create a batch of sequence examples from n TFRecords, we need to make sure that they all have the same number of sequences. To do this, we pad sequence records to a fixed max_sequence_size. Now, we do not want to use these additional padded sequence records to compute metrics and losses. Hence we maintain a boolean mask to tell ml4ir the sequence records that were originally present.
In this method, we add the feature_info for the above mask feature as it is not implicitly present in the data.
Returns: Dictionary configuration for the mask field that captures which sequence have been masked in a SequenceExample message Return type: dict
-
get_rank(key: str = None)¶ Getter method for rank in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the rank feature configuration to be fetched Returns: Rank value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
get_mask(key: str = None)¶ Getter method for mask in FeatureConfig object Can additionally be used to only fetch a particular value from the dict
Parameters: key (str) – Value from the mask feature configuration to be fetched Returns: Mask value or entire config dictionary based on if the key is passed Return type: str or int or bool or dict
-
define_inputs() → Dict[str, tensorflow.python.keras.engine.input_layer.Input]¶ Define the keras input placeholder tensors for the tensorflow model
Returns: Dictionary of tensorflow graph input nodes Return type: dict
-
create_dummy_protobuf(num_records=1, required_only=False)¶ Generate a dummy TFRecord protobuffer with dummy values
Parameters: - num_records (int) – Number of records or sequence features per TFRecord message to fetch
- required_only (bool) – Whether to fetch on fields with required_only=True
Returns: Example or SequenceExample object with dummy values generated from the FeatureConfig
Return type: protobuffer object
-