Recommendation Engine

Overview

Choosing the appropriate model for your dataset is crucial to generating high-quality data that addresses your use case. However this process can be challenging. To ease the onboarding process, Rockfish has introduced a powerful Recommendation Engine.

Recommendation Engine

Rockfish's Recommendation engine auto suggests the steps you need to take to onboard your dataset quickly taking into conisderation your fidelity requirements. It recommends the optimal Rockfish model along with the necessary configurations and hyperparameters.

Pre-Processing Dataset
Model and Hyperparameter Selection

The Recommendation engine provides a Recommendation report that describes your source dataset and the recommended Rockfish actions you can add to your Rockfish workflow to create synthetic data

Using Rockfish's Recommendation Engine

Let's walk through a simple example of how to use the recommendation engine to onboard a dataset. Run this example yourself:

Suppose you want to onboard finance.csv:

customer	age	gender	merchant	category	amount	fraud	timestamp
C1093826151	4	M	M348934600	transportation	4.55	0	2023-01-01 00:00:00
C575345520	2	F	M348934600	wellnessandbeauty	76.67	0	2023-01-01 00:00:00
C1787537369	2	M	M1823072687	transportation	48.02	0	2023-01-01 00:00:00
C1732307957	5	F	M348934600	health	55.06	0	2023-01-01 00:00:00
...	...	...	...	...	...	...	...

First, import the required components from Rockfish SDK:

import rockfish as rf
from rockfish.labs.dataset_properties import DatasetPropertyExtractor
from rockfish.labs.steps import Recommender

Then, load the finance dataset:

dataset = rf.Dataset.from_csv("finance", "finance.csv")

Then, extract dataset properties (note how you can specify your domain knowledge while extracting these properties):

dataset_properties = DatasetPropertyExtractor(
    dataset,
    session_key="customer",
    metadata_fields=["age", "gender"],
    additional_property_keys=["association_rules"]
).extract()

Run the recommendation engine:

recommender_output = Recommender(dataset_properties).run()

View the report describing the recommendations:

print(recommender_output.report)

Run the recommended actions in a Rockfish workflow:

rec_actions = recommender_output.actions
save = ra.DatasetSave({"name": "synthetic"})

# use recommended actions in a Rockfish workflow
builder = rf.WorkflowBuilder()
builder.add_path(dataset, *rec_actions, save)

# run the Rockfish workflow
workflow = await builder.start(conn)
print(f"Workflow: {workflow.id()}")

Once the workflow finishes running, download the synthetic dataset!

syn = None
async for sds in workflow.datasets():
    syn = await sds.to_local(conn)
syn.to_pandas()

Configuring Rockfish's Recommendation Engine

The behaviour of Rockfish's recommendation engine can be configured to meet your requirements.

In particular, you can specify the steps you want recommendations for, and you can fix the output recommendations for a step according to your domain knowledge. The examples below demonstrate how you can configure our recommendation engine.

Currently, our recommendation engine can provide Rockfish actions for the following steps:

Filling in missing values
Handling dependent/associated fields
Model selection

Please see the Recommendation Engine SDK reference page for more information on these steps.

Example 1: Get recommendations only for filling out missing values

Suppose you know that your dataset has missing values, and you want the Recommender to only return the corresponding Rockfish FillNull actions.

You can use the steps argument while initializing the Recommender to specify this intent:

from rockfish.labs.steps import Recommender, HandleMissingValues

# load dataset as before
# extract dataset properties as before

# initialize Recommender to only give the required recommendations
recommender_output = Recommender(
   dataset_properties,
   steps=[HandleMissingValues()]
).run()

Example 2: Get train and generate actions for a specific model

Suppose you already know that you want to use a particular Rockfish model for your dataset, and you want the Recommender to return the Rockfish Train and Generate actions for this model.

You can initialize the ModelSelection step according to this intent, and pass the step to the Recommender using the steps argument:

from rockfish.labs.steps import Recommender, ModelSelection
from rockfish.labs.recommender import ModelType

# load dataset as before
# extract dataset properties as before

# configure ModelSelection step
model_selection = ModelSelection(
   model_type=ModelType.TIME_GAN
) 
# initialize Recommender to only give the required recommendations
recommender_output = Recommender(
   dataset_properties,
   steps=[model_selection]
).run()

Next Steps

Once you have onboarded your dataset using Rockfish's recommendation engine, you can modify your Rockfish workflow according to your requirements, and get the final synthetic dataset:

To add more Rockfish actions to a Rockfish workflow, see the Actions and Workflows section.
To tune Rockfish models' hyperparameters and improve synthetic data quality, refer to the Models page.
To evaluate your synthetic dataset using metrics and plots, see the Data Evaluation page.