Skip to content

rockfish.actions

import rockfish.actions as ra

Source and Sink Actions

rockfish.actions.DatasetLoad

Load a Dataset as the output table.

Attributes:

Name Type Description
Config type[LoadConfig]

Alias for LoadConfig.

rockfish.actions.DatasetSave

Save table as a Dataset.

Attributes:

Name Type Description
Config type[SaveConfig]

Alias for SaveConfig.

rockfish.actions.ModelLoad

Produce a model table.

Attributes:

Name Type Description
Config type[Config]

Alias for Config.

Data Processing Actions

rockfish.actions.Apply

Apply a function and append the results to the table as a new field.

Attributes:

Name Type Description
Config type[ApplyConfig]

Alias for ApplyConfig.

rockfish.actions.Transform

Transform a field replacing the values with the result of the function.

Attributes:

Name Type Description
Config type[TransformConfig]

Alias for TransformConfig.

rockfish.actions.AppendUUID

Return table with new field of UUID values.

Append field 'a' with UUID values
import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    append_field="a",
    seed=1234
)
Append field 'b' with UUID values, per session
import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    group_fields=["session_key"],
    append_field="b",
    seed=1234
)
Append field 'c' with UUID values, per other group_fields
import rockfish.actions as ra
append_uuid = ra.AppendUUID(
    group_fields=["d", "e"],
    append_field="c",
    seed=1234
)

Attributes:

Name Type Description
Config

Alias for AppendUUIDConfig.

rockfish.actions.append.AppendUUIDConfig

Config class for the AppendUUID action.

Attributes:

Name Type Description
group_fields Optional[list[str]]

List of fields to group over. Each group will be assigned a new value in the append_field. If an empty list is specified, each row will be assigned a new value. If unspecified, group_fields will be taken from the dataset's TableMetadata.

append_field str

The name of the new field to append.

seed Optional[int]

The seed for the random number generator.

rockfish.actions.AppendDomain

Return table with new field of values from the given domain. All values in the domain should be of the same type. It is possible to pass only one value in the domain, in case one wants to add a single-valued field.

Append field 'a' with values from given domain
import rockfish.actions as ra
append_domain = ra.AppendDomain(
    append_field="a",
    domain=["one", "two", "three"],
    seed=1234
)
Append field 'a' with a constant value
import rockfish.actions as ra
append_domain = ra.AppendDomain(
    append_field="a",
    domain=[10],
    seed=1234
)
Append field 'b' with values from given domain, per session
import rockfish.actions as ra
append_domain = ra.AppendDomain(
    group_fields=["session_key"],
    append_field="b",
    domain=["one", "two", "three"],
    seed=1234
)
Append field 'c' with values from given domain, per other group_fields
import rockfish.actions as ra
append_domain = ra.AppendDomain(
    group_fields=["d", "e"],
    append_field="c",
    domain=["one", "two", "three"],
    seed=1234
)

Attributes:

Name Type Description
Config

Alias for AppendDomainConfig.

rockfish.actions.append.AppendDomainConfig

Config class for the AppendDomain action.

Attributes:

Name Type Description
group_fields Optional[list[str]]

List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata.

append_field str

The name of the new field to append.

domain Union[list[str], list[int], list[float]]

List of values that the new field can have. All values should have the same data type. The list should be of size <= 100.

seed Optional[int]

The seed for the random number generator.

rockfish.actions.AppendNormal

Return table with new field of values from the given normal distribution.

Append field 'a' with values from normal(mean=0.0, scale=1.0)
import rockfish.actions as ra
append_normal = ra.AppendNormal(
    append_field="a",
    mean=0.0,
    scale=1.0,
    seed=1234
)
Append field 'a' with values from normal(mean=0.0, scale=1.0), precision = 3 digits
import rockfish.actions as ra
append_normal = ra.AppendNormal(
    append_field="a",
    mean=0.0,
    scale=1.0,
    append_field_ndigits=3,
    seed=1234
)
Append field 'b' with values from normal(mean=0.0, scale=1.0), per session
import rockfish.actions as ra
append_normal = ra.AppendNormal(
    group_fields=["session_key"],
    append_field="b",
    mean=0.0,
    scale=1.0,
    seed=1234
)
Append field 'c' with values from normal(mean=0.0, scale=1.0), per other group_fields
import rockfish.actions as ra
append_normal = ra.AppendNormal(
    group_fields=["d", "e"],
    append_field="c",
    mean=0.0,
    scale=1.0,
    seed=1234
)

Attributes:

Name Type Description
Config

Alias for AppendNormalConfig.

rockfish.actions.append.AppendNormalConfig

Config class for the AppendNormal action.

Attributes:

Name Type Description
group_fields Optional[list[str]]

List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata.

append_field str

The name of the new field to append.

mean float

Mean of normal distribution from which new field values are sampled from.

scale float

Standard deviation of normal distribution from which new field values are sampled from.

append_field_ndigits int

Precision of append field (default = 2).

seed Optional[int]

The seed for the random number generator.

rockfish.actions.Flatten

Flatten a table by expanding json objects / pyarrow structs in a column into multiple columns. e.g.

col1 col2 col3
a {"b": 1} c

turns into

col1 col2.b col3
a 1 c

This action recursively flattens the table until no more json nestings are present. This action does not handle lists or JSON arrays, and will raise an error if present in the table.

rockfish.actions.flatten.FlattenConfig dataclass

Configuration class for the Flatten action.

Attributes:

Name Type Description
separator str

String that field values after expanding a struct will be concatenated by.

rockfish.actions.Unflatten

Unflatten a table by condensing multiple columns into json objects / pyarrow structs. e.g.

col1 col2.b col3
a 1 c

turns into

col1 col2 col3
a {"b": 1} c

rockfish.actions.flatten.UnflattenConfig dataclass

Configuration class for the Unflatten action.

Attributes:

Name Type Description
separator str

String that field values are split by when constructing structs.

rockfish.actions.Sample

Return table with sampled rows according to the provided sample_type.

Sample using default sampling method
import rockfish.actions as ra
sample = ra.Sample(sample_size=100, sample_type=None)
Sample using random sampling with replacement
import rockfish.actions as ra
sample = ra.Sample(frac=0.23, sample_type="random", replace=True, seed=3)

Attributes:

Name Type Description
Config

Alias for SampleConfig.

rockfish.actions.sample.SampleConfig dataclass

Config class for the Sample action.

Attributes:

Name Type Description
sample_size Optional[int]

the number of rows to sample

frac Optional[float]

the fraction of rows to sample

sample_type Optional[SampleType]

the type of sampling to use, if None, uses first_n

seed Optional[int]

the seed for the random number generator

replace Optional[bool]

sample with replacement, if true, allows the same row to be sampled multiple times

session_key Optional[str]

the field name that defines the session for timeseries datasets

chunk bool

produce chunks of data

chunk_row_limit int

number of rows in each chunk

rockfish.actions.SampleLabel

Sample rows/sessions that match a label.

Sample from a lable field
sample = ra.SampleLabel(
    field="my_label",
    dist={
        "value1": ra.SampleLabel.Count(2),
        "value2": ra.SampleLabel.Count(4),
        "": ra.SampleLabel.Count(6),
    }
    replace=True,
)

Attributes:

Name Type Description
Config

Alias for SampleLabelConfig.

rockfish.actions.sample_label.SampleLabelConfig

Config class for the SampleLabel action.

Attributes:

Name Type Description
field str

field containing the sampling label

dist SampleDist

distribution for each label; the empty string matches all unspecified values

replace bool

sample with replacement, if true, allows the same row to be sampled multiple times

session_key Optional[str]

the field name that defines the session for timeseries datasets

seed Optional[int]

the seed for the random number generator

chunk bool

produce chunks of data

chunk_row_limit int

number of rows in each chunk

rockfish.actions.AlterTimestamp

Alter a timestamp field in the table.

The method to generate new timestamps depends on the interarrival_type option.

fixed

The fixed type generates new timestamps with fixed/regular interarrivals spread over the time range at a per session level.

random

The random type generates new timestamps with random interarivals at a per session level.

squeeze

The squeeze type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. If the interarrivals are larger than the range they are linearly scaled to fit.

chop

The chop type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. If the interarrivals are larger than the range they are trimmed.

original

The original type takes the original interarrivals and shifts them to the starting or ending of the time range depending on the value of flow_start_type. They are not scaled or trimmed.

AlterTimestamp Action Example
import rockfish.actions as ra

alter_timestamp = ra.AlterTimestamp(
    field="ts",
    start_time=datetime(2024, 11, 11, 0, 0, 0),
    end_time=datetime(2024, 11, 11, 23, 59, 59),
    interarrival_type="random",
)

Attributes:

Name Type Description
Config

rockfish.actions.timestamps.AlterTimestampConfig

Configuration class for the AlterTimestamp action.

Attributes:

Name Type Description
field str

Field name containing the timestamp to alter.

start_time datetime

Start time for the desired output range.

end_time datetime

End time for the desired output range.

flow_start_type Literal['starting', 'ending', 'random']

Method for placing the flow within the range, if the interarrival_type supports.

interarrival_type Literal['fixed', 'random', 'squeeze', 'chop', 'original']

Method to use for generating new timestamps.

seed Optional[int]

Fixed seed for the random number generator.

rockfish.actions.PostAmplify

rockfish.actions.SQL

Return table after applying the provided SQL query.

Run query on one table
import rockfish.actions as ra
sql = ra.SQL(
    query="select col_1 from foo_table;",
    table_name="foo_table"
)
Join two tables on a common column
import rockfish.actions as ra
query = "select t1.col_1, t2.col_1, from t1 inner join t2 on t1.id = t2.id;"
t2_id = "<ID_OF_REMOTE_DATASET>"  # using rockfish.RemoteDataset.id
sql = ra.SQL(
    query=query,
    table_name="t1",
    dataset_name_to_id={"t2": t2_id}
)

Note: If your table(s) contains columns that have uppercase names, please wrap the column names in backticks or quotation marks. For example, if your table has a column called 'Color', the SQL query should be passed as:

  1. "select `Color` from my_table", OR
  2. 'select "Color" from my_table'

Attributes:

Name Type Description
Config

Alias for Config.

rockfish.actions.sql.Config dataclass

Config class for the SQL action.

Attributes:

Name Type Description
query str

The SQL query to run on the table.

table_name str

Name that the table is referred to in the SQL query, the default name is 'my_table'.

dataset_name_to_id dict[str, str]

Dict that maps additional remote dataset names to their dataset IDs, these are retrieved before the query is applied.

Encoding Actions

rockfish.actions.JoinFields

Merge fields using a separator and append the merged field to the table. The original fields are dropped from the table.

Join fields 'a', 'b' and 'c'
import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b", "c"])
Join fields 'a' and 'b' with a custom separator
import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], separator="++")
Join fields 'a' and 'b' with a custom name for the new field
import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], append_field="a_and_b")

rockfish.actions.join_split.JoinConfig

Configuration class for the JoinFields action.

Attributes:

Name Type Description
fields list[str]

List of field names in the table that need to be merged.

append_field Optional[str]

Name of merged field that will be appended to the table.

separator str

String that field values in the merged field will be separated by.

rockfish.actions.SplitField

Split a field using a separator and append the split fields to the table. The original field is dropped from the table.

Split previously joined fields 'a', 'b' and 'c'
import rockfish.actions as ra
split = ra.SplitField(field="a;b;c")
Split multiple previously joined fields 'a;b' and 'c;d'
import rockfish.actions as ra

# suppose the join actions were added as follows:
builder.add(join_ab, parents=[dataset])
builder.add(join_cd, parents=[join_ab])

# the corresponding split actions should be added
# in the reverse order:
split_ab = ra.SplitField(field="a;b")
split_cd = ra.SplitField(field="c;d")

builder.add(split_cd, parents=[model])
builder.add(split_ab, parents=[split_cd])

rockfish.actions.join_split.SplitConfig

Configuration class for the SplitField action.

Attributes:

Name Type Description
field Optional[str]

Field name in the table that needs to be split.

append_fields Optional[list[str]]

List of split field names that will be appended to the table.

separator Optional[str]

String that field values in the split field will be separated by.

rockfish.actions.LabelEncode

Return table after label encoding has been applied on the given field.

Label encode field 'a'
import rockfish.actions as ra
label_encode = ra.LabelEncode(field="a")

rockfish.actions.encode.LabelConfig

Config class for the LabelEncode and the LabelDecode action.

Attributes:

Name Type Description
field str

field to be encoded (should be categorical)

rockfish.actions.LabelDecode

Return table after label decoding has been applied on the given field. Assumes a LabelEncode action was applied on the field before training.

Label decode previously encoded field 'a'
import rockfish.actions as ra
label_decode = ra.LabelDecode(field="a")
Label decode previously encoded fields 'a', 'b'
import rockfish.actions as ra

# suppose the encoding actions were added as follows:
builder.add(label_encode_a, parents=[dataset])
builder.add(label_encode_b, parents=[label_encode_a])

# the corresponding decoding actions should be added
# in the reverse order:
label_decode_a = ra.LabelDecode(field="a")
label_decode_b = ra.LabelDecode(field="b")

builder.add(label_decode_b, parents=[model])
builder.add(label_decode_a, parents=[label_decode_b])

rockfish.actions.encode.LabelConfig

Config class for the LabelEncode and the LabelDecode action.

Attributes:

Name Type Description
field str

field to be encoded (should be categorical)

rockfish.actions.LogEncode

Return table after log encoding has been applied on the given field.

Log encode field 'a'
import rockfish.actions as ra
log_encode = ra.LogEncode(field="a")

rockfish.actions.encode.LogEncodeConfig

Config class for the LogEncode action.

Attributes:

Name Type Description
field str

field to be encoded (should be continuous)

rockfish.actions.LogDecode

Return table after log decoding has been applied on the given field. Assumes a LogEncode action was applied on the field before training.

Log decode previously encoded field 'a'
import rockfish.actions as ra
log_decode = ra.LogDecode(field="a")
Log decode previously encoded field 'a', specify precision for decoded field
import rockfish.actions as ra
log_decode = ra.LogDecode(field="a", field_ndigits=2)
Log decode previously encoded fields 'a', 'b'
import rockfish.actions as ra

# suppose the encoding actions were added as follows:
builder.add(log_encode_a, parents=[dataset])
builder.add(log_encode_b, parents=[log_encode_a])

# the corresponding decoding actions should be added
# in the reverse order:
log_decode_a = ra.LogDecode(field="a")
log_decode_b = ra.LogDecode(field="b")

builder.add(log_decode_b, parents=[model])
builder.add(log_decode_a, parents=[log_decode_b])

rockfish.actions.encode.LogDecodeConfig

Config class for the LogEncode action.

Attributes:

Name Type Description
field str

field to be decoded (should be continuous)

field_ndigits Optional[int]

precision of decoded field, applicable for float fields only (default = 3)

rockfish.actions.SubtractTimestamp

This calculates deltas for a list of timestamps relative to a primary timestamp. This is useful for calculating the time difference between two timestamps, if using the TimeGAN model.

Example:

timestamp1 timestamp2 timestamp3
2021-01-01 2021-01-02 2021-01-03
SubtractTimestamp Action Workflow Example
import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
                 fields=["timestamp2", "timestamp3"],
                 timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1 timestamp2 timestamp3
2021-01-01 1 day 2 days

Another example, if not all timestamps are correlated:

timestamp1 timestamp2 timestamp3
2021-01-01 2021-01-02 2011-10-03
SubtractTimestamp Action Workflow Example [uncorrelated timestamp3]
import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
                 fields=["timestamp2"],
                 timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1 timestamp2 timestamp3
2021-01-01 1 day 2011-10-03

Another example, if you do not want to replace the fields:

timestamp1 timestamp2 timestamp3
2021-01-01 2021-01-02 2021-01-03
SubtractTimestamp Action Workflow Example [append_fields]
import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
                 fields=["timestamp2", "timestamp3"],
                 append_fields=["timestamp2_delta", "timestamp3_delta"],
                 timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1 timestamp2 timestamp3 timestamp2_delta timestamp3_delta
2021-01-01 2021-01-02 2021-01-03 1 day 2 days

rockfish.actions.timestamps.SubtractTimestampConfig dataclass

Configuration class for the SubtractTimestamp action

Attributes:

Name Type Description
base_timestamp str

the timestamp to which the other timestamps are compared

fields list[str]

the list of timestamps to calculate the deltas for

append_fields Optional[list[str]]

the list of columns to append the durations to. If None, the durations will be appended to the same column.

timestamp_format Optional[str]

the format of the timestamps IF they are strings.

rockfish.actions.AddDuration

This calculates timestamps from deltas for a list of timestamps relative to a primary timestamp. Post Synthesis, this is useful for converting the deltas back to timestamps.

Example:

timestamp1 timestamp2 timestamp3
2021-01-01 1 day 2 days
AddDuration Action Workflow Example
import rockfish.actions as ra
add = ra.AddDuration(base_timestamp="timestamp1",
                     fields=["timestamp2", "timestamp3"],
                     timestamp_format="%Y-%m-%d")

After running the workflow:

timestamp1 timestamp2 timestamp3
2021-01-01 2021-01-02 2021-01-03

Another example, if not all timestamps are correlated (will be ignored):

timestamp1 timestamp2 timestamp3
2021-01-01 1 day 2011-10-03
AddDuration Action Workflow Example [uncorrelated timestamp3]
import rockfish.actions as ra
add = ra.AddDuration(base_timestamp="timestamp1",
                     fields=["timestamp2"],
                     timestamp_format="%d-%m-%Y")

After running the workflow:

timestamp1 timestamp2 timestamp3
01-01-2021 02-01-2021 03-10-2011

rockfish.actions.timestamps.AddDurationConfig dataclass

Configuration class for the AddDuration action

Attributes:

Name Type Description
base_timestamp str

the timestamp to which the other timestamps are compared

fields list[str]

the list of columns that are timestamp deltas, or duration[s] dtype

timestamp_format str

the format of the timestamps. This parameter is REQUIRED. This converts the primary timestamp to this format if it is a string. This also converts all relative_timestamps into this format after delta conversion.

Train and Generate Actions

rockfish.actions.TrainTimeGAN

Train a Rockfish DoppelGANger based model.

train = ra.Train(ra.Train.Config())

Attributes:

Name Type Description
Config type[Config]

Alias for Config

DGConfig type[DGConfig]

Alias for DGConfig

DatasetConfig type[DatasetConfig]

Alias for DatasetConfig

TimestampConfig type[TimestampConfig]

Alias for TimestampConfig

FieldConfig type[FieldConfig]

Alias for FieldConfig

EmbeddingConfig type[EmbeddingConfig]

Alias for EmbeddingConfig

PrivacyConfig type[PrivacyConfig]

Alias for PrivacyConfig

rockfish.actions.GenerateTimeGAN

Generate synthetic data using the Rockfish DoppelGANger model.

generate = ra.Generate(ra.Generate.Config())

Attributes:

Name Type Description
Config type[Config]

Alias for Config

DGConfig type[DGConfig]

Alias for DGConfig

DatasetConfig type[DatasetConfig]

Alias for DatasetConfig

TimestampConfig type[TimestampConfig]

Alias for TimestampConfig

FieldConfig type[FieldConfig]

Alias for FieldConfig

EmbeddingConfig type[EmbeddingConfig]

Alias for EmbeddingConfig

PrivacyConfig type[PrivacyConfig]

Alias for PrivacyConfig

rockfish.actions.TrainTabGAN

Train a model using a tabular GAN.

Attributes:

Name Type Description
Config type[TrainTabGANConfig]

Alias for TrainTabGANConfig

TrainConfig type[TrainConfig]

Alias for TrainConfig

DatasetConfig type[DatasetConfig]

Alias for DatasetConfig

TimestampConfig type[TimestampConfig]

Alias for TimestampConfig

FieldConfig type[FieldConfig]

Alias for FieldConfig

rockfish.actions.GenerateTabGAN

Generate synthetic data using a tabular GAN model.

Attributes:

Name Type Description
Config type[GenerateTabGANConfig]
GenerateConfig type[GenerateConfig]

Alias for GenerateConfig

rockfish.actions.TrainTabTransformer

Train a Tab Transformer model.

rockfish.actions.GenerateTabTransformer

Generate synthetic data using the Tab Transformer model.

Attributes:

Name Type Description
Config TypeAlias

rockfish.actions.TrainTimeTransformer

Train a Time Transformer model.

Attributes:

Name Type Description
Config TypeAlias
TrainConfig TypeAlias

Alias for TrainTimeConfig.

ParentConfig TypeAlias

Alias for ParentConfig.

ChildConfig TypeAlias

Alias for ChildConfig.

GPT2Config TypeAlias

Alias for GPT2Config.

DatasetConfig TypeAlias

Alias for DatasetConfig.

TimestampConfig TypeAlias

Alias for TimestampConfig.

FieldConfig TypeAlias

Alias for FieldConfig.

rockfish.actions.GenerateTimeTransformer

Generate synthetic data using the Time Transformer model.

Attributes:

Name Type Description
Config TypeAlias

rockfish.actions.SessionTarget

SessionTarget can be used to trigger generation cycles until a desired target number of sessions is reached.

Attributes:

Name Type Description
Config type[Config]

Alias for Config.

Evaluation

rockfish.actions.EvaluateLogisticRegression

Evaluate the classification performance using Logistic Regression.

Example:

Consider the fall detection dataset with labels for train and test sets.

Sex Body Temperature Heart Rate Respiratory Rate SBP DBP split
M 97 80 15 140 90 train
F 96 78 14 145 95 train
M 98 81 13 143 93 test
... ... ... ... ... ... ...

The configuration for the action includes the numerical features, the binary-valued target, and the positive label.

config = {
    "features": [
        "Body Temperature",
        "Heart Rate",
        "Respiratory Rate",
        "SBP",
        "DBP",
    ],
    "target": "Sex",
    "pos_label": "F",
}
evaluate_logistic_regression = ra.EvaluateLogisticRegression(config)

The output of the action is a table with a single AUC value.

rockfish.actions.txtr.LogisticRegressionConfig

Configuration for the EvaluateLogisticRegression action.

See details on some of the arguments in sklearn.linear_model.LogisticRegression v1.6.1.

Attributes:

Name Type Description
features list[str]

Numerical features to use in the model.

target str

The classification target. Must have two unique values.

pos_label Optional[str]

The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1.

table_split_col_name str

The name of the column that contains the split label (train/test).

penalty Optional[Literal['l1', 'l2', 'elasticnet']]

Specify the norm of the penalty.

dual bool

Dual (constrained) or primal (regularized) formulation.

tol float

Tolerance for stopping criteria.

C float

Inverse of regularization strength; must be a positive float.

fit_intercept bool

Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function.

intercept_scaling float

Useful only when the solver 'liblinear' is used and fit_intercept is set to True.

class_weight ClassWeight

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

random_state Optional[int]

Used when solver == 'sag', 'saga' or 'liblinear' to shuffle the data.

solver str

Algorithm to use in the optimization problem.

max_iter int

Maximum number of iterations taken for the solvers to converge.

rockfish.actions.EvaluateRandomForest

Evaluate the classification performance using Random Forest.

See the example in EvaluateLogisticRegression for usage.

rockfish.actions.txtr.RandomForestConfig

Configuration for the EvaluateRandomForest action.

See details on some of the arguments in sklearn.ensemble.RandomForestClassifier v1.6.1.

Attributes:

Name Type Description
features list[str]

Numerical features to use in the model.

target str

The classification target. Must have two unique values.

pos_label Optional[str]

The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1.

table_split_col_name str

The name of the column that contains the split label (train/test).

n_estimators int

The number of trees in the forest.

criterion Literal['gini', 'entropy', 'log_loss']

The function to measure the quality of a split.

max_depth Optional[int]

The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.

min_samples_split int

The minimum number of samples required to split an internal node.

min_samples_leaf int

The minimum number of samples required to be at a leaf node.

min_weight_fraction_leaf float

The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.

max_features Union[str, int, float, None]

The number of features to consider when looking for the best split.

max_leaf_nodes Optional[int]

Grow trees with max_leaf_nodes in best-first fashion.

min_impurity_decrease float

A node will be split if this split induces a decrease of the impurity greater than or equal to this value.

bootstrap bool

Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.

oob_score bool

Whether to use out-of-bag samples to estimate the generalization score.

n_jobs Optional[int]

The number of jobs to run in parallel.

random_state Optional[int]

Controls both the randomness of the bootstrapping of the samples used when building trees (if bootstrap=True) and the sampling of the features to consider when looking for the best split at each node (if max_features < n_features).

class_weight ClassWeight

Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one.

ccp_alpha float

Complexity parameter used for Minimal Cost-Complexity Pruning.

max_samples Optional[float]

If bootstrap is True, the number of samples to draw from X to train each base estimator.

rockfish.actions.txtr.ClassWeight = Union[dict[str, float], str, None] module-attribute