Skip to content

Containers

Docker Compose is the quickest way to get the Rockfish Data workers up and running. It is suitable for small or test installations but you may want to consider container orchestration systems for production deployments. See our Kubernetes for details on a set up with it. You will need the Docker runtime and Docker compose. Once you have set up your Rockfish Data Workers, installed docker and you have connected to our container registry you will need to create the Docker compose configuration, then you should be able to bring the environment up. Copy the template below into a file called docker-compose.yaml.

services:
  redis:
    image: redis:7.2.4
    volumes:
      - /opt/redis-data:/data
  rockfish-worker:
    image: us-central1-docker.pkg.dev/still-totality-370100/saas/cuttlefish:0.15.0.dev2
    restart: always
    command:
      - "--model-store-url"
      - "http://modelstore:8070"
      - "--redis-client-use-hostname"
      - "--redis-host"
      - "redis"
      - "--rockfish-url"
      - "https://api.rockfish.ai"
      - "--api-key"
      - "<worker api key>
      - "--worker-id"
      - "<worker id>"
      - "--action-module"
      - "actions"
      - "workers"
      - "train"
      - "generate"
      - "tabular-gan-train"
      - "tabular-gan-generate"
      - "rtf-train"
      - "rtf-generate"
      - "train-tab-gan"
      - "train-time-gan"
      - "train-transformer"
      - "generate-tab-gan"
      - "generate-time-gan"
      - "generate-transformer"
      - "databricks-sql-load"
      - "databricks-dbfs-save"
      - "databricks-dbfs-save"

In the file you will need to change two values indicated with the <...> angle brackets.

The first value is <worker api key. This is the API key that was sent with your onboarding email. If you do not have access to the key please contact support for assistance.

The second value is the <worker id> that was created when adding new workers via the api.

Save the file, now you only need to run docker compose up. This will pull the docker image from the registry and start the container with the parameters listed above. This will start a single worker, and ensure everything is installed and configured correctly. In order to run full Rockfish Data workflows you will need multiple workers running. See the next session to update the Docker compose file to launch multiple workers. Once you have the desired number of workers running you are ready to use the Rockfish Data workbench via the SDK.

Launching Multiple Workers via Docker Compose

TBD

Package descriptions

Needs polish and full package list

API

The API package is our HTTP API, it is the entrypoint for interacting with the Workbench. It accepts a YAML as configuration file so you need to share it via --volume.

Have a look at the inline documentation for this YAML to get an idea about what all those sections are.

# PoC 2 exposes a debug http server that can be used for monitoring and
# profiling.
debug_server:
  addr: :9092
# sssql is the sql query engine, PoC 2 relays on this service to run queries
# against your datasets
sssql:
  address: http://sssql:3031
# dataset needs to be stored in an object, in this case on AWS S3
dataset:
  object_store:
    s3:
      bucket_name: model-store-beta-prod02
# Logging configuration
logger:
  development: false
  encoding: "json"
# Connection to the metadata store
metadata_store:
  # Currently redis is the only supported metadata store
  redis_store:
    addr: redis:6379
    db: 0
    pool_size: 1000
# The HTTP API configuration
api:
  addr: :8080
  # The HMAC secret used to sign and verify the JWT Token
  jwt_hmac_secret: "secret-personal"

Admin

Admin 2 is an API that we put togethre to do user management and troubleshooting. It shares the metadata store and the hwt secret with PoC 2 API.

# Connection to the metadata store, this is the same used by PoC2
metadata_store:
  redis_store:
    addr: redis:6379
    db: 0
# Seme as poc2 this is a debugging http server
debug_server:
  addr: :9093
logger:
  development: false
  encoding: "json"
api:
  # The HMAC secret used to sign and verify the JWT Token
  jwt_hmac_secret: "secret-personal"

sssql

sssql is a container that exposes a SQL query language that connects to the same object store populated by PoC2 queries your dataset.

api_addr: 0.0.0.0:3031
object_store:
  s3:
    bucket: model-store-beta-prod02
    region: us-east-1

Cuttlefish

Cuttlefish is a python application that executes the jobs instantiated as part of a workflow via the PoC 2 API.

It supports many kind of actions, what you see there is a subset of them.