Skip to content

rockfish.labs.steps

Classes

Recommender

Uses dataset properties and returns Rockfish actions for the specified steps.

Parameters:

Name Type Description Default
dataset_properties DatasetProperties

A DatasetProperties object that stores information about a dataset.

required
steps Optional[list[AbstractStep]]

List of steps for which recommended Rockfish actions will be returned. If unspecified, actions will be returned for the following steps: HandleMissingValues, HandleAssociatedFields, ModelSelection.

None

HandleMissingValues

Handles filling missing values in fields by adding FillNull (pre-processing) actions.

Parameters:

Name Type Description Default
dtype_to_fill_value Optional[dict]

Maps a dtype to a value that fields with this dtype should be filled with. The dtype should be a pyarrow DataType.

None

HandleAssociatedFields

Handles merging and splitting associated fields by adding JoinFields (pre-processing) actions and SplitField (post-processing) actions.

Parameters:

Name Type Description Default
separator Optional[str]

The separator to be used while merging and splitting fields. If unspecified, the default JoinFields/SplitField separator ";" will be used.

None

HandlePiiFields

Handles anonymizing fields with personally identifiable info (PII) by adding Transform (pre-processing) actions that remap PII.

Parameters:

Name Type Description Default
pii_type_to_remap Optional[dict]

Maps a pii_type to the remap type that fields with this pii_type should be transformed using. The pii_type should be one of the supported PII types = ["DATE_TIME", "EMAIL_ADDRESS", "IP_ADDRESS", "PHONE_NUMBER", "US_SSN", "CREDIT_CARD", "PERSON", "GENERAL_PII"]. The remap type should be one of the supported remap types = ["date", "email", "ip", "phone_number", "ssn", "credit_card", "name", "redact"]. Note that only string fields can be anonymized using "redact" for now. If any pii_type is unspecified, the default mapping will be used for it.

None

ModelSelection

Selects a Rockfish model and returns the corresponding Train and Generate actions.

Parameters:

Name Type Description Default
model_type ModelType

Rockfish model for which the Train and Generate actions will be returned.

None
keep_session_keys bool

If true, the train dataset's session_key field values are preserved in the synthetic dataset. If false, the session_key field will have session numbers.

False