rockfish.labs.steps
Classes
Recommender
Uses dataset properties and returns Rockfish actions for the specified steps.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dataset_properties
|
DatasetProperties
|
A DatasetProperties object that stores information about a dataset. |
required |
steps
|
Optional[list[AbstractStep]]
|
List of steps for which recommended Rockfish actions will be returned. If unspecified, actions will be returned for the following steps: HandleMissingValues, HandleAssociatedFields, ModelSelection. |
None
|
HandleMissingValues
HandleAssociatedFields
Handles merging and splitting associated fields by adding JoinFields (pre-processing) actions and SplitField (post-processing) actions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
separator
|
Optional[str]
|
The separator to be used while merging and splitting fields. If unspecified, the default JoinFields/SplitField separator ";" will be used. |
None
|
HandlePiiFields
Handles anonymizing fields with personally identifiable info (PII) by adding Transform (pre-processing) actions that remap PII.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pii_type_to_remap
|
Optional[dict]
|
Maps a pii_type to the remap type that fields with this pii_type should be transformed using. The pii_type should be one of the supported PII types = ["DATE_TIME", "EMAIL_ADDRESS", "IP_ADDRESS", "PHONE_NUMBER", "US_SSN", "CREDIT_CARD", "PERSON", "GENERAL_PII"]. The remap type should be one of the supported remap types = ["date", "email", "ip", "phone_number", "ssn", "credit_card", "name", "redact"]. Note that only string fields can be anonymized using "redact" for now. If any pii_type is unspecified, the default mapping will be used for it. |
None
|
ModelSelection
Selects a Rockfish model and returns the corresponding Train and Generate actions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_type
|
ModelType
|
Rockfish model for which the Train and Generate actions will be returned. |
None
|
keep_session_keys
|
bool
|
If true, the train dataset's session_key field values are preserved in the synthetic dataset. If false, the session_key field will have session numbers. |
False
|