Integration
Dataset is the abstraction we use to segment and organize training and generated data but how customers can move their data into our platform?
Rockfish offers an API to create a new dataset uploading parquet files, and the python-sdk offers utilities to upload CSV and other format but we know you need more.
For this reason we started to developer connectors capable of importing and dumping data from different sources.
Here the one we have so far, the one we are working at and if you have specific requirements let us know.
AWS S3
For AWS S3 we developed two actions one called s3-loader
, useful to download
data from AWS bucket you own and move them in our system for direct training or
to store them as dataset for future use.
{
"jobs": [
{
"worker_name": "s3-loader",
"worker_version": "1",
"config": {
"bucket": "netflow-small",
"format": "parquet",
"prefix": "/",
"access_key": "ss",
"secret_access_key": "sssss"
}
}
]
}
If you use our product on-prem you can authenticate our workers directly via
AWS service account and you don't need to specify an access_key
and a
secret_access_key
. We have also developed a secret store that can be used to
decrypt sensitive informations.
{
"jobs": [
{
"worker_name": "s3-dumper",
"worker_version": "1",
"config": {
"bucket": "netflow-small",
"format": "parquet",
"prefix": "/",
"access_key": "ss",
"secret_access_key": "sssss"
}
}
]
}
Databricks
Databricks is another popular SaaS where you can host data and train models. It offers various strategies to interact with system and for now we decided to develop a worker that downloads data via sql interface.
{
"jobs": [
{
"worker_name": "databricks-sql-loader",
"worker_version": "1",
"config": {
"sql": "select * from default.databricks_table"
"token": ""
"http_path": ""
"server_hostname": ""
}
}
]
}
For uploading generated data back to your Databricks account we use the Databricks dbfs API
{
"jobs": [
{
"worker_name": "databricks-dbfs-save",
"worker_version": "1",
"config": {
"path": "dbfs:/mnt/path/to/remote/file"
"format": "csv"
"token": ""
"server_hostname": ""
}
}
]
}
What do we have in development?
- Azure Blob Store
- GCP Object Store