rockfish.actions

import rockfish.actions as ra

Source and Sink Actions

`rockfish.actions.DatasetLoad`

Load a Dataset as the output table.

Attributes:

Name	Type	Description
`Config`	`type[LoadConfig]`	Alias for `LoadConfig`.

`rockfish.actions.DatasetSave`

Save table as a Dataset.

Attributes:

Name	Type	Description
`Config`	`type[SaveConfig]`	Alias for `SaveConfig`.

`rockfish.actions.ModelLoad`

Produce a model table.

Attributes:

Name	Type	Description
`Config`	`type[Config]`	Alias for `Config`.

Data Processing Actions

`rockfish.actions.Apply`

Apply a function and append the results to the table as a new field.

Attributes:

Name	Type	Description
`Config`	`type[ApplyConfig]`	Alias for `ApplyConfig`.

`rockfish.actions.Transform`

Transform a field replacing the values with the result of the function.

Attributes:

Name	Type	Description
`Config`	`type[TransformConfig]`	Alias for `TransformConfig`.

`rockfish.actions.AppendUUID`

Return table with new field of UUID values.

Append field 'a' with UUID values

import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    append_field="a",
    seed=1234
)

Append field 'b' with UUID values, per session

import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    group_fields=["session_key"],
    append_field="b",
    seed=1234
)

Append field 'c' with UUID values, per other group_fields

import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    group_fields=["d", "e"],
    append_field="c",
    seed=1234
)

Attributes:

Name	Type	Description
`Config`		Alias for `AppendUUIDConfig`.

`rockfish.actions.append.AppendUUIDConfig`

Config class for the AppendUUID action.

Attributes:

Name	Type	Description
`group_fields`	`Optional[list[str]]`	List of fields to group over. Each group will be assigned a new value in the append_field. If an empty list is specified, each row will be assigned a new value. If unspecified, group_fields will be taken from the dataset's TableMetadata.
`append_field`	`str`	The name of the new field to append.
`seed`	`Optional[int]`	The seed for the random number generator.

`rockfish.actions.AppendDomain`

Return table with new field of values from the given domain. All values in the domain should be of the same type. It is possible to pass only one value in the domain, in case one wants to add a single-valued field.

Append field 'a' with values from given domain

import rockfish.actions as ra
append_domain = ra.AppendDomain(
    append_field="a",
    domain=["one", "two", "three"],
    seed=1234
)

Append field 'a' with a constant value

import rockfish.actions as ra
append_domain = ra.AppendDomain(
    append_field="a",
    domain=[10],
    seed=1234
)

Append field 'b' with values from given domain, per session

import rockfish.actions as ra
append_domain = ra.AppendDomain(
    group_fields=["session_key"],
    append_field="b",
    domain=["one", "two", "three"],
    seed=1234
)

Append field 'c' with values from given domain, per other group_fields

import rockfish.actions as ra
append_domain = ra.AppendDomain(
    group_fields=["d", "e"],
    append_field="c",
    domain=["one", "two", "three"],
    seed=1234
)

Attributes:

Name	Type	Description
`Config`		Alias for `AppendDomainConfig`.

`rockfish.actions.append.AppendDomainConfig`

Config class for the AppendDomain action.

Attributes:

Name	Type	Description
`group_fields`	`Optional[list[str]]`	List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata.
`append_field`	`str`	The name of the new field to append.
`domain`	`Union[list[str], list[int], list[float]]`	List of values that the new field can have. All values should have the same data type. The list should be of size <= 100.
`seed`	`Optional[int]`	The seed for the random number generator.

`rockfish.actions.AppendNormal`

Return table with new field of values from the given normal distribution.

Append field 'a' with values from normal(mean=0.0, scale=1.0)

import rockfish.actions as ra
append_normal = ra.AppendNormal(
    append_field="a",
    mean=0.0,
    scale=1.0,
    seed=1234
)

Append field 'a' with values from normal(mean=0.0, scale=1.0), precision = 3 digits

import rockfish.actions as ra
append_normal = ra.AppendNormal(
    append_field="a",
    mean=0.0,
    scale=1.0,
    append_field_ndigits=3,
    seed=1234
)

Append field 'b' with values from normal(mean=0.0, scale=1.0), per session

import rockfish.actions as ra
append_normal = ra.AppendNormal(
    group_fields=["session_key"],
    append_field="b",
    mean=0.0,
    scale=1.0,
    seed=1234
)

Append field 'c' with values from normal(mean=0.0, scale=1.0), per other group_fields

import rockfish.actions as ra
append_normal = ra.AppendNormal(
    group_fields=["d", "e"],
    append_field="c",
    mean=0.0,
    scale=1.0,
    seed=1234
)

Attributes:

Name	Type	Description
`Config`		Alias for `AppendNormalConfig`.

`rockfish.actions.append.AppendNormalConfig`

Config class for the AppendNormal action.

Attributes:

Name	Type	Description
`group_fields`	`Optional[list[str]]`	List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata.
`append_field`	`str`	The name of the new field to append.
`mean`	`float`	Mean of normal distribution from which new field values are sampled from.
`scale`	`float`	Standard deviation of normal distribution from which new field values are sampled from.
`append_field_ndigits`	`int`	Precision of append field (default = 2).
`seed`	`Optional[int]`	The seed for the random number generator.

`rockfish.actions.Flatten`

Flatten a table by expanding json objects / pyarrow structs in a column into multiple columns. e.g.

col1	col2	col3
a	{"b": 1}	c

turns into

col1	col2.b	col3
a	1	c

This action recursively flattens the table until no more json nestings are present. This action does not handle lists or JSON arrays, and will raise an error if present in the table.

`rockfish.actions.flatten.FlattenConfig` `dataclass`

Configuration class for the Flatten action.

Attributes:

Name	Type	Description
`separator`	`str`	String that field values after expanding a struct will be concatenated by.

`rockfish.actions.Unflatten`

Unflatten a table by condensing multiple columns into json objects / pyarrow structs. e.g.

col1	col2.b	col3
a	1	c

turns into

col1	col2	col3
a	{"b": 1}	c

`rockfish.actions.flatten.UnflattenConfig` `dataclass`

Configuration class for the Unflatten action.

Attributes:

Name	Type	Description
`separator`	`str`	String that field values are split by when constructing structs.

`rockfish.actions.Sample`

Return table with sampled rows according to the provided sample_type.

Sample using default sampling method

import rockfish.actions as ra
sample = ra.Sample(sample_size=100, sample_type=None)

Sample using random sampling with replacement

import rockfish.actions as ra
sample = ra.Sample(frac=0.23, sample_type="random", replace=True, seed=3)

Attributes:

Name	Type	Description
`Config`		Alias for `SampleConfig`.

`rockfish.actions.sample.SampleConfig` `dataclass`

Config class for the Sample action.

Attributes:

Name	Type	Description
`sample_size`	`Optional[int]`	the number of rows to sample
`frac`	`Optional[float]`	the fraction of rows to sample
`sample_type`	`Optional[SampleType]`	the type of sampling to use, if None, uses first_n
`seed`	`Optional[int]`	the seed for the random number generator
`replace`	`Optional[bool]`	sample with replacement, if true, allows the same row to be sampled multiple times
`session_key`	`Optional[str]`	the field name that defines the session for timeseries datasets
`chunk`	`bool`	produce chunks of data
`chunk_row_limit`	`int`	number of rows in each chunk

`rockfish.actions.SampleLabel`

Sample rows/sessions that match a label.

Sample from a lable field

sample = ra.SampleLabel(
    field="my_label",
    dist={
        "value1": ra.SampleLabel.Count(2),
        "value2": ra.SampleLabel.Count(4),
        "": ra.SampleLabel.Count(6),
    }
    replace=True,
)

Attributes:

Name	Type	Description
`Config`		Alias for `SampleLabelConfig`.

`rockfish.actions.sample_label.SampleLabelConfig`

Config class for the SampleLabel action.

Attributes:

Name	Type	Description
`field`	`str`	field containing the sampling label
`dist`	`SampleDist`	distribution for each label; the empty string matches all unspecified values
`replace`	`bool`	sample with replacement, if true, allows the same row to be sampled multiple times
`session_key`	`Optional[str]`	the field name that defines the session for timeseries datasets
`seed`	`Optional[int]`	the seed for the random number generator
`chunk`	`bool`	produce chunks of data
`chunk_row_limit`	`int`	number of rows in each chunk

`rockfish.actions.AlterTimestamp`

Alter a timestamp field in the table.

The method to generate new timestamps depends on the interarrival_type option.

`fixed`

The fixed type generates new timestamps with fixed/regular interarrivals spread over the time range at a per session level.

`random`

The random type generates new timestamps with random interarivals at a per session level.

`squeeze`

The squeeze type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. If the interarrivals are larger than the range they are linearly scaled to fit.

`chop`

The chop type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. If the interarrivals are larger than the range they are trimmed.

`original`

The original type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. They are not scaled or trimmed.

AlterTimestamp Action Example

import rockfish.actions as ra

alter_timestamp = ra.AlterTimestamp(
    field="ts",
    start_time=datetime(2024, 11, 11, 0, 0, 0),
    end_time=datetime(2024, 11, 11, 23, 59, 59),
    interarrival_type="random",
)

Attributes:

Name	Type	Description
`Config`		Alias for `AlterTimestampConfig`.

`rockfish.actions.timestamps.AlterTimestampConfig`

Configuration class for the AlterTimestamp action.

Attributes:

Name	Type	Description
`field`	`str`	Field name containing the timestamp to alter.
`start_time`	`datetime`	Start time for the desired output range.
`end_time`	`datetime`	End time for the desired output range.
`flow_start_type`	`Literal['starting', 'ending', 'random']`	Method for placing the flow within the range, if the `interarrival_type` supports.
`interarrival_type`	`Literal['fixed', 'random', 'squeeze', 'chop', 'original']`	Method to use for generating new timestamps.
`seed`	`Optional[int]`	Fixed seed for the random number generator.

`rockfish.actions.PostAmplify`

`rockfish.actions.SQL`

Return table after applying the provided SQL query.

Run query on one table

import rockfish.actions as ra
sql = ra.SQL(
    query="select col_1 from foo_table;",
    table_name="foo_table"
)

Join two tables on a common column

import rockfish.actions as ra
query = "select t1.col_1, t2.col_1, from t1 inner join t2 on t1.id = t2.id;"
t2_id = "<ID_OF_REMOTE_DATASET>"  # using rockfish.RemoteDataset.id
sql = ra.SQL(
    query=query,
    table_name="t1",
    dataset_name_to_id={"t2": t2_id}
)

Note: If your table(s) contains columns that have uppercase names, please wrap the column names in backticks or quotation marks. For example, if your table has a column called 'Color', the SQL query should be passed as:

"select `Color` from my_table", OR
'select "Color" from my_table'

Attributes:

Name	Type	Description
`Config`		Alias for `Config`.

`rockfish.actions.sql.Config` `dataclass`

Config class for the SQL action.

Attributes:

Name	Type	Description
`query`	`str`	The SQL query to run on the table.
`table_name`	`str`	Name that the table is referred to in the SQL query, the default name is 'my_table'.
`dataset_name_to_id`	`dict[str, str]`	Dict that maps additional remote dataset names to their dataset IDs, these are retrieved before the query is applied.

Encoding Actions

`rockfish.actions.JoinFields`

Merge fields using a separator and append the merged field to the table. The original fields are dropped from the table.

Join fields 'a', 'b' and 'c'

import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b", "c"])

Join fields 'a' and 'b' with a custom separator

import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], separator="++")

Join fields 'a' and 'b' with a custom name for the new field

import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], append_field="a_and_b")

`rockfish.actions.join_split.JoinConfig`

Configuration class for the JoinFields action.

Attributes:

Name	Type	Description
`fields`	`list[str]`	List of field names in the table that need to be merged.
`append_field`	`Optional[str]`	Name of merged field that will be appended to the table.
`separator`	`str`	String that field values in the merged field will be separated by.

`rockfish.actions.SplitField`

Split a field using a separator and append the split fields to the table. The original field is dropped from the table.

Split previously joined fields 'a', 'b' and 'c'

import rockfish.actions as ra
split = ra.SplitField(field="a;b;c")

Split multiple previously joined fields 'a;b' and 'c;d'

import rockfish.actions as ra

# suppose the join actions were added as follows:
builder.add(join_ab, parents=[dataset])
builder.add(join_cd, parents=[join_ab])

# the corresponding split actions should be added
# in the reverse order:
split_ab = ra.SplitField(field="a;b")
split_cd = ra.SplitField(field="c;d")

builder.add(split_cd, parents=[model])
builder.add(split_ab, parents=[split_cd])

`rockfish.actions.join_split.SplitConfig`

Configuration class for the SplitField action.

Attributes:

Name	Type	Description
`field`	`Optional[str]`	Field name in the table that needs to be split.
`append_fields`	`Optional[list[str]]`	List of split field names that will be appended to the table.
`separator`	`Optional[str]`	String that field values in the split field will be separated by.

`rockfish.actions.LabelEncode`

Return table after label encoding has been applied on the given field.

Label encode field 'a'

import rockfish.actions as ra
label_encode = ra.LabelEncode(field="a")

`rockfish.actions.encode.LabelConfig`

Config class for the LabelEncode and the LabelDecode action.

Attributes:

Name	Type	Description
`field`	`str`	field to be encoded (should be categorical)

`rockfish.actions.LabelDecode`

Return table after label decoding has been applied on the given field. Assumes a LabelEncode action was applied on the field before training.

Label decode previously encoded field 'a'

import rockfish.actions as ra
label_decode = ra.LabelDecode(field="a")

Label decode previously encoded fields 'a', 'b'

import rockfish.actions as ra

# suppose the encoding actions were added as follows:
builder.add(label_encode_a, parents=[dataset])
builder.add(label_encode_b, parents=[label_encode_a])

# the corresponding decoding actions should be added
# in the reverse order:
label_decode_a = ra.LabelDecode(field="a")
label_decode_b = ra.LabelDecode(field="b")

builder.add(label_decode_b, parents=[model])
builder.add(label_decode_a, parents=[label_decode_b])

`rockfish.actions.encode.LabelConfig`

Config class for the LabelEncode and the LabelDecode action.

Attributes:

Name	Type	Description
`field`	`str`	field to be encoded (should be categorical)

`rockfish.actions.LogEncode`

Return table after log encoding has been applied on the given field.

Log encode field 'a'

import rockfish.actions as ra
log_encode = ra.LogEncode(field="a")

`rockfish.actions.encode.LogEncodeConfig`

Config class for the LogEncode action.

Attributes:

Name	Type	Description
`field`	`str`	field to be encoded (should be continuous)

`rockfish.actions.LogDecode`

Return table after log decoding has been applied on the given field. Assumes a LogEncode action was applied on the field before training.

Log decode previously encoded field 'a'

import rockfish.actions as ra
log_decode = ra.LogDecode(field="a")

Log decode previously encoded field 'a', specify precision for decoded field

import rockfish.actions as ra
log_decode = ra.LogDecode(field="a", field_ndigits=2)

Log decode previously encoded fields 'a', 'b'

import rockfish.actions as ra

# suppose the encoding actions were added as follows:
builder.add(log_encode_a, parents=[dataset])
builder.add(log_encode_b, parents=[log_encode_a])

# the corresponding decoding actions should be added
# in the reverse order:
log_decode_a = ra.LogDecode(field="a")
log_decode_b = ra.LogDecode(field="b")

builder.add(log_decode_b, parents=[model])
builder.add(log_decode_a, parents=[log_decode_b])

`rockfish.actions.encode.LogDecodeConfig`

Config class for the LogEncode action.

Attributes:

Name	Type	Description
`field`	`str`	field to be decoded (should be continuous)
`field_ndigits`	`Optional[int]`	precision of decoded field, applicable for float fields only (default = 3)

`rockfish.actions.SubtractTimestamp`

This calculates deltas for a list of timestamps relative to a primary timestamp. This is useful for calculating the time difference between two timestamps, if using the TimeGAN model.

Example:

timestamp1	timestamp2	timestamp3
2021-01-01	2021-01-02	2021-01-03

SubtractTimestamp Action Workflow Example

import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
                 fields=["timestamp2", "timestamp3"],
                 timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1	timestamp2	timestamp3
2021-01-01	1 day	2 days

Another example, if not all timestamps are correlated:

timestamp1	timestamp2	timestamp3
2021-01-01	2021-01-02	2011-10-03

SubtractTimestamp Action Workflow Example [uncorrelated timestamp3]

import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
                 fields=["timestamp2"],
                 timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1	timestamp2	timestamp3
2021-01-01	1 day	2011-10-03

Another example, if you do not want to replace the fields:

timestamp1	timestamp2	timestamp3
2021-01-01	2021-01-02	2021-01-03

SubtractTimestamp Action Workflow Example [append_fields]

import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
                 fields=["timestamp2", "timestamp3"],
                 append_fields=["timestamp2_delta", "timestamp3_delta"],
                 timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1	timestamp2	timestamp3	timestamp2_delta	timestamp3_delta
2021-01-01	2021-01-02	2021-01-03	1 day	2 days

`rockfish.actions.timestamps.SubtractTimestampConfig` `dataclass`

Configuration class for the SubtractTimestamp action

Attributes:

Name	Type	Description
`base_timestamp`	`str`	the timestamp to which the other timestamps are compared
`fields`	`list[str]`	the list of timestamps to calculate the deltas for
`append_fields`	`Optional[list[str]]`	the list of columns to append the durations to. If None, the durations will be appended to the same column.
`timestamp_format`	`Optional[str]`	the format of the timestamps IF they are strings.

`rockfish.actions.AddDuration`

This calculates timestamps from deltas for a list of timestamps relative to a primary timestamp. Post Synthesis, this is useful for converting the deltas back to timestamps.

Example:

timestamp1	timestamp2	timestamp3
2021-01-01	1 day	2 days

AddDuration Action Workflow Example

import rockfish.actions as ra
add = ra.AddDuration(base_timestamp="timestamp1",
                     fields=["timestamp2", "timestamp3"],
                     timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1	timestamp2	timestamp3
2021-01-01	2021-01-02	2021-01-03

Another example, if not all timestamps are correlated (will be ignored):

timestamp1	timestamp2	timestamp3
2021-01-01	1 day	2011-10-03

AddDuration Action Workflow Example [uncorrelated timestamp3]

import rockfish.actions as ra
add = ra.AddDuration(base_timestamp="timestamp1",
                     fields=["timestamp2"],
                     timestamp_format="%d-%m-%Y")

After running the workflow:

timestamp1	timestamp2	timestamp3
01-01-2021	02-01-2021	03-10-2011

`rockfish.actions.timestamps.AddDurationConfig` `dataclass`

Configuration class for the AddDuration action

Attributes:

Name	Type	Description
`base_timestamp`	`str`	the timestamp to which the other timestamps are compared
`fields`	`list[str]`	the list of columns that are timestamp deltas, or duration[s] dtype
`timestamp_format`	`str`	the format of the timestamps. This parameter is REQUIRED. This converts the primary timestamp to this format if it is a string. This also converts all relative_timestamps into this format after delta conversion.

Train and Generate Actions

`rockfish.actions.TrainTimeGAN`

Train a Rockfish DoppelGANger based model.

train = ra.Train(ra.Train.Config())

Attributes:

Name	Type	Description
`Config`	`type[Config]`	Alias for `Config`
`DGConfig`	`type[DGConfig]`	Alias for `DGConfig`
`DatasetConfig`	`type[DatasetConfig]`	Alias for `DatasetConfig`
`TimestampConfig`	`type[TimestampConfig]`	Alias for `TimestampConfig`
`FieldConfig`	`type[FieldConfig]`	Alias for `FieldConfig`
`EmbeddingConfig`	`type[EmbeddingConfig]`	Alias for `EmbeddingConfig`
`PrivacyConfig`	`type[PrivacyConfig]`	Alias for `PrivacyConfig`

`rockfish.actions.GenerateTimeGAN`

Generate synthetic data using the Rockfish DoppelGANger model.

generate = ra.Generate(ra.Generate.Config())

Attributes:

Name	Type	Description
`Config`	`type[Config]`	Alias for `Config`
`DGConfig`	`type[DGConfig]`	Alias for `DGConfig`
`DatasetConfig`	`type[DatasetConfig]`	Alias for `DatasetConfig`
`TimestampConfig`	`type[TimestampConfig]`	Alias for `TimestampConfig`
`FieldConfig`	`type[FieldConfig]`	Alias for `FieldConfig`
`EmbeddingConfig`	`type[EmbeddingConfig]`	Alias for `EmbeddingConfig`
`PrivacyConfig`	`type[PrivacyConfig]`	Alias for `PrivacyConfig`

`rockfish.actions.TrainTabGAN`

Train a model using a tabular GAN.

Attributes:

Name	Type	Description
`Config`	`type[TrainTabGANConfig]`	Alias for `TrainTabGANConfig`
`TrainConfig`	`type[TrainConfig]`	Alias for `TrainConfig`
`DatasetConfig`	`type[DatasetConfig]`	Alias for `DatasetConfig`
`TimestampConfig`	`type[TimestampConfig]`	Alias for `TimestampConfig`
`FieldConfig`	`type[FieldConfig]`	Alias for `FieldConfig`

`rockfish.actions.GenerateTabGAN`

Generate synthetic data using a tabular GAN model.

Attributes:

Name	Type	Description
`Config`	`type[GenerateTabGANConfig]`	Alias for `GenerateTabGANConfig`
`GenerateConfig`	`type[GenerateConfig]`	Alias for `GenerateConfig`

`rockfish.actions.TrainTabTransformer`

Train a Tab Transformer model.

`rockfish.actions.GenerateTabTransformer`

Generate synthetic data using the Tab Transformer model.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `GenerateTabTransformerConfig`.

`rockfish.actions.TrainTimeTransformer`

Train a Time Transformer model.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `TrainTimeTransformerConfig`.
`TrainConfig`	`TypeAlias`	Alias for `TrainTimeConfig`.
`ParentConfig`	`TypeAlias`	Alias for `ParentConfig`.
`ChildConfig`	`TypeAlias`	Alias for `ChildConfig`.
`GPT2Config`	`TypeAlias`	Alias for `GPT2Config`.
`DatasetConfig`	`TypeAlias`	Alias for `DatasetConfig`.
`TimestampConfig`	`TypeAlias`	Alias for `TimestampConfig`.
`FieldConfig`	`TypeAlias`	Alias for `FieldConfig`.

`rockfish.actions.GenerateTimeTransformer`

Generate synthetic data using the Time Transformer model.

Attributes:

Name	Type	Description
`Config`	`TypeAlias`	Alias for `GenerateTimeTransformerConfig`.

`rockfish.actions.SessionTarget`

SessionTarget can be used to trigger generation cycles until a desired target number of sessions is reached.

Attributes:

Name	Type	Description
`Config`	`type[Config]`	Alias for `Config`.

Evaluation

`rockfish.actions.EvaluateLogisticRegression`

Evaluate the classification performance using Logistic Regression.

Example:

Consider the fall detection dataset with labels for train and test sets.

Sex	Body Temperature	Heart Rate	Respiratory Rate	SBP	DBP	split
M	97	80	15	140	90	train
F	96	78	14	145	95	train
M	98	81	13	143	93	test
...	...	...	...	...	...	...

The configuration for the action includes the numerical features, the binary-valued target, and the positive label.

config = {
    "features": [
        "Body Temperature",
        "Heart Rate",
        "Respiratory Rate",
        "SBP",
        "DBP",
    ],
    "target": "Sex",
    "pos_label": "F",
}
evaluate_logistic_regression = ra.EvaluateLogisticRegression(config)

The output of the action is a table with a single AUC value.

`rockfish.actions.txtr.LogisticRegressionConfig`

Configuration for the EvaluateLogisticRegression action.

See details on some of the arguments in sklearn.linear_model.LogisticRegression v1.6.1.

Attributes:

Name	Type	Description
`features`	`list[str]`	Numerical features to use in the model.
`target`	`str`	The classification target. Must have two unique values.
`pos_label`	`Optional[str]`	The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1.
`table_split_col_name`	`str`	The name of the column that contains the split label (train/test).
`penalty`	`Optional[Literal['l1', 'l2', 'elasticnet']]`	Specify the norm of the penalty.
`dual`	`bool`	Dual (constrained) or primal (regularized) formulation.
`tol`	`float`	Tolerance for stopping criteria.
`C`	`float`	Inverse of regularization strength; must be a positive float.
`fit_intercept`	`bool`	Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.
`intercept_scaling`	`float`	Useful only when the solver `'liblinear'` is used and `fit_intercept` is set to `True`.
`class_weight`	`ClassWeight`	Weights associated with classes in the form `{class_label: weight}`. If not given, all classes are supposed to have weight one.
`random_state`	`Optional[int]`	Used when `solver` == `'sag'`, `'saga'` or `'liblinear'` to shuffle the data.
`solver`	`str`	Algorithm to use in the optimization problem.
`max_iter`	`int`	Maximum number of iterations taken for the solvers to converge.

`rockfish.actions.EvaluateRandomForest`

Evaluate the classification performance using Random Forest.

See the example in EvaluateLogisticRegression for usage.

`rockfish.actions.txtr.RandomForestConfig`

Configuration for the EvaluateRandomForest action.

See details on some of the arguments in sklearn.ensemble.RandomForestClassifier v1.6.1.

Attributes:

Name	Type	Description
`features`	`list[str]`	Numerical features to use in the model.
`target`	`str`	The classification target. Must have two unique values.
`pos_label`	`Optional[str]`	The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1.
`table_split_col_name`	`str`	The name of the column that contains the split label (train/test).
`n_estimators`	`int`	The number of trees in the forest.
`criterion`	`Literal['gini', 'entropy', 'log_loss']`	The function to measure the quality of a split.
`max_depth`	`Optional[int]`	The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than `min_samples_split` samples.
`min_samples_split`	`int`	The minimum number of samples required to split an internal node.
`min_samples_leaf`	`int`	The minimum number of samples required to be at a leaf node.
`min_weight_fraction_leaf`	`float`	The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.
`max_features`	`Union[str, int, float, None]`	The number of features to consider when looking for the best split.
`max_leaf_nodes`	`Optional[int]`	Grow trees with `max_leaf_nodes` in best-first fashion.
`min_impurity_decrease`	`float`	A node will be split if this split induces a decrease of the impurity greater than or equal to this value.
`bootstrap`	`bool`	Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
`oob_score`	`bool`	Whether to use out-of-bag samples to estimate the generalization score.
`n_jobs`	`Optional[int]`	The number of jobs to run in parallel.
`random_state`	`Optional[int]`	Controls both the randomness of the bootstrapping of the samples used when building trees (if `bootstrap=True`) and the sampling of the features to consider when looking for the best split at each node (if `max_features < n_features`).
`class_weight`	`ClassWeight`	Weights associated with classes in the form `{class_label: weight}`. If not given, all classes are supposed to have weight one.
`ccp_alpha`	`float`	Complexity parameter used for Minimal Cost-Complexity Pruning.
`max_samples`	`Optional[float]`	If bootstrap is True, the number of samples to draw from X to train each base estimator.

rockfish.actions

Source and Sink Actions

rockfish.actions.DatasetLoad

rockfish.actions.DatasetSave

rockfish.actions.ModelLoad

Data Processing Actions

rockfish.actions.Apply

rockfish.actions.Transform

rockfish.actions.AppendUUID

rockfish.actions.append.AppendUUIDConfig

rockfish.actions.AppendDomain

rockfish.actions.append.AppendDomainConfig

rockfish.actions.AppendNormal

rockfish.actions.append.AppendNormalConfig

rockfish.actions.Flatten

rockfish.actions.flatten.FlattenConfig dataclass

rockfish.actions.Unflatten

rockfish.actions.flatten.UnflattenConfig dataclass

rockfish.actions.Sample

rockfish.actions.sample.SampleConfig dataclass

rockfish.actions.SampleLabel

rockfish.actions.sample_label.SampleLabelConfig

rockfish.actions.AlterTimestamp

fixed

random

squeeze

chop

original

rockfish.actions.timestamps.AlterTimestampConfig

rockfish.actions.PostAmplify

rockfish.actions.SQL

rockfish.actions.sql.Config dataclass

Encoding Actions

rockfish.actions.JoinFields

rockfish.actions.join_split.JoinConfig

rockfish.actions.SplitField

rockfish.actions.join_split.SplitConfig

rockfish.actions.LabelEncode

rockfish.actions.encode.LabelConfig

rockfish.actions.LabelDecode

rockfish.actions.encode.LabelConfig

rockfish.actions.LogEncode

rockfish.actions.encode.LogEncodeConfig

rockfish.actions.LogDecode

rockfish.actions.encode.LogDecodeConfig

rockfish.actions.SubtractTimestamp

rockfish.actions.timestamps.SubtractTimestampConfig dataclass

rockfish.actions.AddDuration

rockfish.actions.timestamps.AddDurationConfig dataclass

Train and Generate Actions

rockfish.actions.TrainTimeGAN

rockfish.actions.GenerateTimeGAN

rockfish.actions.TrainTabGAN

rockfish.actions.GenerateTabGAN

rockfish.actions.TrainTabTransformer

rockfish.actions.GenerateTabTransformer

rockfish.actions.TrainTimeTransformer

rockfish.actions.GenerateTimeTransformer

rockfish.actions.SessionTarget

Evaluation

rockfish.actions.EvaluateLogisticRegression

rockfish.actions.txtr.LogisticRegressionConfig

rockfish.actions.EvaluateRandomForest

rockfish.actions.txtr.RandomForestConfig

rockfish.actions.txtr.ClassWeight = Union[dict[str, float], str, None] module-attribute

`rockfish.actions.DatasetLoad`

`rockfish.actions.DatasetSave`

`rockfish.actions.ModelLoad`

`rockfish.actions.Apply`

`rockfish.actions.Transform`

`rockfish.actions.AppendUUID`

`rockfish.actions.append.AppendUUIDConfig`

`rockfish.actions.AppendDomain`

`rockfish.actions.append.AppendDomainConfig`

`rockfish.actions.AppendNormal`

`rockfish.actions.append.AppendNormalConfig`

`rockfish.actions.Flatten`

`rockfish.actions.flatten.FlattenConfig` `dataclass`

`rockfish.actions.Unflatten`

`rockfish.actions.flatten.UnflattenConfig` `dataclass`

`rockfish.actions.Sample`

`rockfish.actions.sample.SampleConfig` `dataclass`

`rockfish.actions.SampleLabel`

`rockfish.actions.sample_label.SampleLabelConfig`

`rockfish.actions.AlterTimestamp`

`fixed`

`random`

`squeeze`

`chop`

`original`

`rockfish.actions.timestamps.AlterTimestampConfig`

`rockfish.actions.PostAmplify`

`rockfish.actions.SQL`

`rockfish.actions.sql.Config` `dataclass`

`rockfish.actions.JoinFields`

`rockfish.actions.join_split.JoinConfig`

`rockfish.actions.SplitField`

`rockfish.actions.join_split.SplitConfig`

`rockfish.actions.LabelEncode`

`rockfish.actions.encode.LabelConfig`

`rockfish.actions.LabelDecode`

`rockfish.actions.encode.LabelConfig`

`rockfish.actions.LogEncode`

`rockfish.actions.encode.LogEncodeConfig`

`rockfish.actions.LogDecode`

`rockfish.actions.encode.LogDecodeConfig`

`rockfish.actions.SubtractTimestamp`

`rockfish.actions.timestamps.SubtractTimestampConfig` `dataclass`

`rockfish.actions.AddDuration`

`rockfish.actions.timestamps.AddDurationConfig` `dataclass`

`rockfish.actions.TrainTimeGAN`

`rockfish.actions.GenerateTimeGAN`

`rockfish.actions.TrainTabGAN`

`rockfish.actions.GenerateTabGAN`

`rockfish.actions.TrainTabTransformer`

`rockfish.actions.GenerateTabTransformer`

`rockfish.actions.TrainTimeTransformer`

`rockfish.actions.GenerateTimeTransformer`

`rockfish.actions.SessionTarget`

`rockfish.actions.EvaluateLogisticRegression`

`rockfish.actions.txtr.LogisticRegressionConfig`

`rockfish.actions.EvaluateRandomForest`

`rockfish.actions.txtr.RandomForestConfig`

`rockfish.actions.txtr.ClassWeight = Union[dict[str, float], str, None]` `module-attribute`