Data sources
Rockfish platform supports training with data in different formats, with a strong focus on operational data, particularly time series data. It also handles tabular data effectively.
Handling Timeseries/Sequential data
Time-series data exhibit a sequential, has temportal dependency and has a range of patterns. Time-series data is any data that has time dependency.
Some common examples of Timeseries/Sequential data:
- Financial Transactions
- IoT Sensor Data
- Healthcare Monitoring
- Web Data
- Network traffic data
Time series data can be viewed as a combination of two interrelated tables:
- Metadata Table (Parent Table): Contains fields that describe the overall attributes of an entity or session.
- Measurement Table (Child Table): Contains fields that record the activities or events over time for each session.
This structure allows us to track activity over time across different groups, where each group of metadata fields defines a unique session.
Key Components of Time Series Data:
- Metadata Fields: Describe entities (e.g., users or devices).
- Timestamp Field: Tracks the order of events within each session.
- Measurement Fields: Record activities or actions taken by entities.
For example, in a finance dataset, metadata fields like "customer," "age," and "gender" describe user attributes, while measurement fields like "merchant," "category," "amount," and "fraud" capture transaction-related activities.
Lets select 3 'sessions' from Finance dataset
- Metadata fields: "customer", "age" and "gender" are all related to the users.
- Measurement fields: "merchant", "category", "amount" and "fraud" are all related to the financial transactions conducted by users. In other words, they represent activities of users' transactions.
Note: If the dataset contains only one session (a single group of metadata fields), it will be treated as tabular data rather than time series data, as there are no distinct sessions to learn from.
Sample Datasets
Handling Tabular data
In tabular data, each row represents an individual record and columns contain either categorical (discrete) or numerical (continuous) data. While a timestamp column can exist in tabular data, the absence of distinct sessions or groups means that tabular models might not capture sequential characteristics as effectively as time series models do.