rockfish.actions
import rockfish.actions as ra
Source and Sink Actions
rockfish.actions.DatasetLoad
Load a Dataset as the output table.
Attributes:
Name | Type | Description |
---|---|---|
Config |
type[LoadConfig]
|
Alias for |
rockfish.actions.DatasetSave
Save table as a Dataset.
Attributes:
Name | Type | Description |
---|---|---|
Config |
type[SaveConfig]
|
Alias for |
rockfish.actions.ModelLoad
Data Processing Actions
rockfish.actions.Apply
Apply a function and append the results to the table as a new field.
Attributes:
Name | Type | Description |
---|---|---|
Config |
type[ApplyConfig]
|
Alias for |
rockfish.actions.Transform
Transform a field replacing the values with the result of the function.
Attributes:
Name | Type | Description |
---|---|---|
Config |
type[TransformConfig]
|
Alias for |
rockfish.actions.AppendUUID
Return table with new field of UUID values.
import rockfish.actions as ra
append_uuid = ra.AppendUUID(
append_field="a",
seed=1234
)
import rockfish.actions as ra
append_uuid = ra.AppendUUID(
group_fields=["session_key"],
append_field="b",
seed=1234
)
import rockfish.actions as ra
append_uuid = ra.AppendUUID(
group_fields=["d", "e"],
append_field="c",
seed=1234
)
Attributes:
Name | Type | Description |
---|---|---|
Config |
Alias for |
rockfish.actions.append.AppendUUIDConfig
Config class for the AppendUUID
action.
Attributes:
Name | Type | Description |
---|---|---|
group_fields |
Optional[list[str]]
|
List of fields to group over. Each group will be assigned a new value in the append_field. If an empty list is specified, each row will be assigned a new value. If unspecified, group_fields will be taken from the dataset's TableMetadata. |
append_field |
str
|
The name of the new field to append. |
seed |
Optional[int]
|
The seed for the random number generator. |
rockfish.actions.AppendDomain
Return table with new field of values from the given domain. All values in the domain should be of the same type. It is possible to pass only one value in the domain, in case one wants to add a single-valued field.
import rockfish.actions as ra
append_domain = ra.AppendDomain(
append_field="a",
domain=["one", "two", "three"],
seed=1234
)
import rockfish.actions as ra
append_domain = ra.AppendDomain(
append_field="a",
domain=[10],
seed=1234
)
import rockfish.actions as ra
append_domain = ra.AppendDomain(
group_fields=["session_key"],
append_field="b",
domain=["one", "two", "three"],
seed=1234
)
import rockfish.actions as ra
append_domain = ra.AppendDomain(
group_fields=["d", "e"],
append_field="c",
domain=["one", "two", "three"],
seed=1234
)
Attributes:
Name | Type | Description |
---|---|---|
Config |
Alias for |
rockfish.actions.append.AppendDomainConfig
Config class for the AppendDomain
action.
Attributes:
Name | Type | Description |
---|---|---|
group_fields |
Optional[list[str]]
|
List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata. |
append_field |
str
|
The name of the new field to append. |
domain |
Union[list[str], list[int], list[float]]
|
List of values that the new field can have. All values should have the same data type. The list should be of size <= 100. |
seed |
Optional[int]
|
The seed for the random number generator. |
rockfish.actions.AppendNormal
Return table with new field of values from the given normal distribution.
import rockfish.actions as ra
append_normal = ra.AppendNormal(
append_field="a",
mean=0.0,
scale=1.0,
seed=1234
)
import rockfish.actions as ra
append_normal = ra.AppendNormal(
append_field="a",
mean=0.0,
scale=1.0,
append_field_ndigits=3,
seed=1234
)
import rockfish.actions as ra
append_normal = ra.AppendNormal(
group_fields=["session_key"],
append_field="b",
mean=0.0,
scale=1.0,
seed=1234
)
import rockfish.actions as ra
append_normal = ra.AppendNormal(
group_fields=["d", "e"],
append_field="c",
mean=0.0,
scale=1.0,
seed=1234
)
Attributes:
Name | Type | Description |
---|---|---|
Config |
Alias for |
rockfish.actions.append.AppendNormalConfig
Config class for the AppendNormal
action.
Attributes:
Name | Type | Description |
---|---|---|
group_fields |
Optional[list[str]]
|
List of fields to group over. Each group will be assigned a value in the append_field. If an empty list is specified, each row will be assigned a value. If unspecified, group_fields will be taken from the dataset's TableMetadata. |
append_field |
str
|
The name of the new field to append. |
mean |
float
|
Mean of normal distribution from which new field values are sampled from. |
scale |
float
|
Standard deviation of normal distribution from which new field values are sampled from. |
append_field_ndigits |
int
|
Precision of append field (default = 2). |
seed |
Optional[int]
|
The seed for the random number generator. |
rockfish.actions.Flatten
Flatten a table by expanding json objects / pyarrow structs in a column into multiple columns. e.g.
col1 | col2 | col3 |
---|---|---|
a | {"b": 1} | c |
turns into
col1 | col2.b | col3 |
---|---|---|
a | 1 | c |
This action recursively flattens the table until no more json nestings are present. This action does not handle lists or JSON arrays, and will raise an error if present in the table.
rockfish.actions.flatten.FlattenConfig
dataclass
rockfish.actions.Unflatten
Unflatten a table by condensing multiple columns into json objects / pyarrow structs. e.g.
col1 | col2.b | col3 |
---|---|---|
a | 1 | c |
turns into
col1 | col2 | col3 |
---|---|---|
a | {"b": 1} | c |
rockfish.actions.flatten.UnflattenConfig
dataclass
rockfish.actions.Sample
Return table with sampled rows according to the provided sample_type.
import rockfish.actions as ra
sample = ra.Sample(sample_size=100, sample_type=None)
import rockfish.actions as ra
sample = ra.Sample(frac=0.23, sample_type="random", replace=True, seed=3)
Attributes:
Name | Type | Description |
---|---|---|
Config |
Alias for |
rockfish.actions.sample.SampleConfig
dataclass
Config class for the Sample
action.
Attributes:
Name | Type | Description |
---|---|---|
sample_size |
Optional[int]
|
the number of rows to sample |
frac |
Optional[float]
|
the fraction of rows to sample |
sample_type |
Optional[SampleType]
|
the type of sampling to use, if None, uses first_n |
seed |
Optional[int]
|
the seed for the random number generator |
replace |
Optional[bool]
|
sample with replacement, if true, allows the same row to be sampled multiple times |
session_key |
Optional[str]
|
the field name that defines the session for timeseries datasets |
chunk |
bool
|
produce chunks of data |
chunk_row_limit |
int
|
number of rows in each chunk |
rockfish.actions.SampleLabel
Sample rows/sessions that match a label.
sample = ra.SampleLabel(
field="my_label",
dist={
"value1": ra.SampleLabel.Count(2),
"value2": ra.SampleLabel.Count(4),
"": ra.SampleLabel.Count(6),
}
replace=True,
)
Attributes:
Name | Type | Description |
---|---|---|
Config |
Alias for |
rockfish.actions.sample_label.SampleLabelConfig
Config class for the SampleLabel
action.
Attributes:
Name | Type | Description |
---|---|---|
field |
str
|
field containing the sampling label |
dist |
SampleDist
|
distribution for each label; the empty string matches all unspecified values |
replace |
bool
|
sample with replacement, if true, allows the same row to be sampled multiple times |
session_key |
Optional[str]
|
the field name that defines the session for timeseries datasets |
seed |
Optional[int]
|
the seed for the random number generator |
chunk |
bool
|
produce chunks of data |
chunk_row_limit |
int
|
number of rows in each chunk |
rockfish.actions.AlterTimestamp
Alter a timestamp field in the table.
The method to generate new timestamps depends on the interarrival_type
option.
fixed
The fixed
type generates new timestamps with fixed/regular interarrivals
spread over the time range at a per session level.
random
The random
type generates new timestamps with random interarivals at a
per session level.
squeeze
The squeeze
type takes the original interarrivals and shifts them to the
starting or ending of the time range depending on the value of
flow_start_type
. If the interarrivals are larger than the range they are
linearly scaled to fit.
chop
The chop
type takes the original interarrivals and shifts them to the
starting or ending of the time range depending on the value of
flow_start_type
. If the interarrivals are larger than the range they are
trimmed.
original
The original
type takes the original interarrivals and shifts them to the
starting or ending of the time range depending on the value of
flow_start_type
. They are not scaled or trimmed.
import rockfish.actions as ra
alter_timestamp = ra.AlterTimestamp(
field="ts",
start_time=datetime(2024, 11, 11, 0, 0, 0),
end_time=datetime(2024, 11, 11, 23, 59, 59),
interarrival_type="random",
)
Attributes:
Name | Type | Description |
---|---|---|
Config |
Alias for |
rockfish.actions.timestamps.AlterTimestampConfig
Configuration class for the AlterTimestamp
action.
Attributes:
Name | Type | Description |
---|---|---|
field |
str
|
Field name containing the timestamp to alter. |
start_time |
datetime
|
Start time for the desired output range. |
end_time |
datetime
|
End time for the desired output range. |
flow_start_type |
Literal['starting', 'ending', 'random']
|
Method for placing the flow within the range, if the |
interarrival_type |
Literal['fixed', 'random', 'squeeze', 'chop', 'original']
|
Method to use for generating new timestamps. |
seed |
Optional[int]
|
Fixed seed for the random number generator. |
rockfish.actions.PostAmplify
rockfish.actions.SQL
Return table after applying the provided SQL query.
import rockfish.actions as ra
sql = ra.SQL(
query="select col_1 from foo_table;",
table_name="foo_table"
)
import rockfish.actions as ra
query = "select t1.col_1, t2.col_1, from t1 inner join t2 on t1.id = t2.id;"
t2_id = "<ID_OF_REMOTE_DATASET>" # using rockfish.RemoteDataset.id
sql = ra.SQL(
query=query,
table_name="t1",
dataset_name_to_id={"t2": t2_id}
)
Note: If your table(s) contains columns that have uppercase names, please wrap the column names in backticks or quotation marks. For example, if your table has a column called 'Color', the SQL query should be passed as:
"select `Color` from my_table"
, OR'select "Color" from my_table'
Attributes:
Name | Type | Description |
---|---|---|
Config |
Alias for |
rockfish.actions.sql.Config
dataclass
Config class for the SQL
action.
Attributes:
Name | Type | Description |
---|---|---|
query |
str
|
The SQL query to run on the table. |
table_name |
str
|
Name that the table is referred to in the SQL query, the default name is 'my_table'. |
dataset_name_to_id |
dict[str, str]
|
Dict that maps additional remote dataset names to their dataset IDs, these are retrieved before the query is applied. |
Encoding Actions
rockfish.actions.JoinFields
Merge fields using a separator and append the merged field to the table. The original fields are dropped from the table.
import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b", "c"])
import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], separator="++")
import rockfish.actions as ra
join = ra.JoinFields(fields=["a", "b"], append_field="a_and_b")
rockfish.actions.join_split.JoinConfig
Configuration class for the JoinFields
action.
Attributes:
Name | Type | Description |
---|---|---|
fields |
list[str]
|
List of field names in the table that need to be merged. |
append_field |
Optional[str]
|
Name of merged field that will be appended to the table. |
separator |
str
|
String that field values in the merged field will be separated by. |
rockfish.actions.SplitField
Split a field using a separator and append the split fields to the table. The original field is dropped from the table.
import rockfish.actions as ra
split = ra.SplitField(field="a;b;c")
import rockfish.actions as ra
# suppose the join actions were added as follows:
builder.add(join_ab, parents=[dataset])
builder.add(join_cd, parents=[join_ab])
# the corresponding split actions should be added
# in the reverse order:
split_ab = ra.SplitField(field="a;b")
split_cd = ra.SplitField(field="c;d")
builder.add(split_cd, parents=[model])
builder.add(split_ab, parents=[split_cd])
rockfish.actions.join_split.SplitConfig
Configuration class for the SplitField
action.
Attributes:
Name | Type | Description |
---|---|---|
field |
Optional[str]
|
Field name in the table that needs to be split. |
append_fields |
Optional[list[str]]
|
List of split field names that will be appended to the table. |
separator |
Optional[str]
|
String that field values in the split field will be separated by. |
rockfish.actions.LabelEncode
Return table after label encoding has been applied on the given field.
import rockfish.actions as ra
label_encode = ra.LabelEncode(field="a")
rockfish.actions.encode.LabelConfig
Config class for the LabelEncode
and
the LabelDecode
action.
Attributes:
Name | Type | Description |
---|---|---|
field |
str
|
field to be encoded (should be categorical) |
rockfish.actions.LabelDecode
Return table after label decoding has been applied on the given field. Assumes a LabelEncode action was applied on the field before training.
import rockfish.actions as ra
label_decode = ra.LabelDecode(field="a")
import rockfish.actions as ra
# suppose the encoding actions were added as follows:
builder.add(label_encode_a, parents=[dataset])
builder.add(label_encode_b, parents=[label_encode_a])
# the corresponding decoding actions should be added
# in the reverse order:
label_decode_a = ra.LabelDecode(field="a")
label_decode_b = ra.LabelDecode(field="b")
builder.add(label_decode_b, parents=[model])
builder.add(label_decode_a, parents=[label_decode_b])
rockfish.actions.encode.LabelConfig
Config class for the LabelEncode
and
the LabelDecode
action.
Attributes:
Name | Type | Description |
---|---|---|
field |
str
|
field to be encoded (should be categorical) |
rockfish.actions.LogEncode
Return table after log encoding has been applied on the given field.
import rockfish.actions as ra
log_encode = ra.LogEncode(field="a")
rockfish.actions.encode.LogEncodeConfig
rockfish.actions.LogDecode
Return table after log decoding has been applied on the given field. Assumes a LogEncode action was applied on the field before training.
import rockfish.actions as ra
log_decode = ra.LogDecode(field="a")
import rockfish.actions as ra
log_decode = ra.LogDecode(field="a", field_ndigits=2)
import rockfish.actions as ra
# suppose the encoding actions were added as follows:
builder.add(log_encode_a, parents=[dataset])
builder.add(log_encode_b, parents=[log_encode_a])
# the corresponding decoding actions should be added
# in the reverse order:
log_decode_a = ra.LogDecode(field="a")
log_decode_b = ra.LogDecode(field="b")
builder.add(log_decode_b, parents=[model])
builder.add(log_decode_a, parents=[log_decode_b])
rockfish.actions.encode.LogDecodeConfig
rockfish.actions.SubtractTimestamp
This calculates deltas for a list of timestamps relative to a primary timestamp. This is useful for calculating the time difference between two timestamps, if using the TimeGAN model.
Example:
timestamp1 | timestamp2 | timestamp3 |
---|---|---|
2021-01-01 | 2021-01-02 | 2021-01-03 |
import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
fields=["timestamp2", "timestamp3"],
timestamp_format="%Y-%m-%d")
After running the workflow:
timestamp1 | timestamp2 | timestamp3 |
---|---|---|
2021-01-01 | 1 day | 2 days |
Another example, if not all timestamps are correlated:
timestamp1 | timestamp2 | timestamp3 |
---|---|---|
2021-01-01 | 2021-01-02 | 2011-10-03 |
import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
fields=["timestamp2"],
timestamp_format="%Y-%m-%d")
After running the workflow:
timestamp1 | timestamp2 | timestamp3 |
---|---|---|
2021-01-01 | 1 day | 2011-10-03 |
Another example, if you do not want to replace the fields:
timestamp1 | timestamp2 | timestamp3 |
---|---|---|
2021-01-01 | 2021-01-02 | 2021-01-03 |
import rockfish.actions as ra
subtract = ra.SubtractTimestamp(base_timestamp="timestamp1",
fields=["timestamp2", "timestamp3"],
append_fields=["timestamp2_delta", "timestamp3_delta"],
timestamp_format="%Y-%m-%d")
After running the workflow:
timestamp1 | timestamp2 | timestamp3 | timestamp2_delta | timestamp3_delta |
---|---|---|---|---|
2021-01-01 | 2021-01-02 | 2021-01-03 | 1 day | 2 days |
rockfish.actions.timestamps.SubtractTimestampConfig
dataclass
Configuration class for the SubtractTimestamp
action
Attributes:
Name | Type | Description |
---|---|---|
base_timestamp |
str
|
the timestamp to which the other timestamps are compared |
fields |
list[str]
|
the list of timestamps to calculate the deltas for |
append_fields |
Optional[list[str]]
|
the list of columns to append the durations to. If None, the durations will be appended to the same column. |
timestamp_format |
Optional[str]
|
the format of the timestamps IF they are strings. |
rockfish.actions.AddDuration
This calculates timestamps from deltas for a list of timestamps relative to a primary timestamp. Post Synthesis, this is useful for converting the deltas back to timestamps.
Example:
timestamp1 | timestamp2 | timestamp3 |
---|---|---|
2021-01-01 | 1 day | 2 days |
import rockfish.actions as ra
add = ra.AddDuration(base_timestamp="timestamp1",
fields=["timestamp2", "timestamp3"],
timestamp_format="%Y-%m-%d")
After running the workflow:
timestamp1 | timestamp2 | timestamp3 |
---|---|---|
2021-01-01 | 2021-01-02 | 2021-01-03 |
Another example, if not all timestamps are correlated (will be ignored):
timestamp1 | timestamp2 | timestamp3 |
---|---|---|
2021-01-01 | 1 day | 2011-10-03 |
import rockfish.actions as ra
add = ra.AddDuration(base_timestamp="timestamp1",
fields=["timestamp2"],
timestamp_format="%d-%m-%Y")
After running the workflow:
timestamp1 | timestamp2 | timestamp3 |
---|---|---|
01-01-2021 | 02-01-2021 | 03-10-2011 |
rockfish.actions.timestamps.AddDurationConfig
dataclass
Configuration class for the AddDuration
action
Attributes:
Name | Type | Description |
---|---|---|
base_timestamp |
str
|
the timestamp to which the other timestamps are compared |
fields |
list[str]
|
the list of columns that are timestamp deltas, or duration[s] dtype |
timestamp_format |
str
|
the format of the timestamps. This parameter is REQUIRED. This converts the primary timestamp to this format if it is a string. This also converts all relative_timestamps into this format after delta conversion. |
Train and Generate Actions
rockfish.actions.TrainTimeGAN
Train a Rockfish DoppelGANger based model.
train = ra.Train(ra.Train.Config())
Attributes:
Name | Type | Description |
---|---|---|
Config |
type[Config]
|
Alias for |
DGConfig |
type[DGConfig]
|
Alias for |
DatasetConfig |
type[DatasetConfig]
|
Alias for |
TimestampConfig |
type[TimestampConfig]
|
Alias for |
FieldConfig |
type[FieldConfig]
|
Alias for |
EmbeddingConfig |
type[EmbeddingConfig]
|
Alias for |
PrivacyConfig |
type[PrivacyConfig]
|
Alias for |
rockfish.actions.GenerateTimeGAN
Generate synthetic data using the Rockfish DoppelGANger model.
generate = ra.Generate(ra.Generate.Config())
Attributes:
Name | Type | Description |
---|---|---|
Config |
type[Config]
|
Alias for |
DGConfig |
type[DGConfig]
|
Alias for |
DatasetConfig |
type[DatasetConfig]
|
Alias for |
TimestampConfig |
type[TimestampConfig]
|
Alias for |
FieldConfig |
type[FieldConfig]
|
Alias for |
EmbeddingConfig |
type[EmbeddingConfig]
|
Alias for |
PrivacyConfig |
type[PrivacyConfig]
|
Alias for |
rockfish.actions.TrainTabGAN
Train a model using a tabular GAN.
Attributes:
Name | Type | Description |
---|---|---|
Config |
type[TrainTabGANConfig]
|
Alias for |
TrainConfig |
type[TrainConfig]
|
Alias for |
DatasetConfig |
type[DatasetConfig]
|
Alias for |
TimestampConfig |
type[TimestampConfig]
|
Alias for |
FieldConfig |
type[FieldConfig]
|
Alias for |
rockfish.actions.GenerateTabGAN
Generate synthetic data using a tabular GAN model.
Attributes:
Name | Type | Description |
---|---|---|
Config |
type[GenerateTabGANConfig]
|
Alias for |
GenerateConfig |
type[GenerateConfig]
|
Alias for |
rockfish.actions.TrainTabTransformer
Train a Tab Transformer model.
rockfish.actions.GenerateTabTransformer
Generate synthetic data using the Tab Transformer model.
Attributes:
Name | Type | Description |
---|---|---|
Config |
TypeAlias
|
Alias for |
rockfish.actions.TrainTimeTransformer
Train a Time Transformer model.
Attributes:
Name | Type | Description |
---|---|---|
Config |
TypeAlias
|
Alias for |
TrainConfig |
TypeAlias
|
Alias for |
ParentConfig |
TypeAlias
|
Alias for |
ChildConfig |
TypeAlias
|
Alias for |
GPT2Config |
TypeAlias
|
Alias for |
DatasetConfig |
TypeAlias
|
Alias for |
TimestampConfig |
TypeAlias
|
Alias for |
FieldConfig |
TypeAlias
|
Alias for |
rockfish.actions.GenerateTimeTransformer
Generate synthetic data using the Time Transformer model.
Attributes:
Name | Type | Description |
---|---|---|
Config |
TypeAlias
|
Alias for |
rockfish.actions.SessionTarget
Evaluation
rockfish.actions.EvaluateLogisticRegression
Evaluate the classification performance using Logistic Regression.
Example:
Consider the fall detection dataset with labels for train and test sets.
Sex | Body Temperature | Heart Rate | Respiratory Rate | SBP | DBP | split |
---|---|---|---|---|---|---|
M | 97 | 80 | 15 | 140 | 90 | train |
F | 96 | 78 | 14 | 145 | 95 | train |
M | 98 | 81 | 13 | 143 | 93 | test |
... | ... | ... | ... | ... | ... | ... |
The configuration for the action includes the numerical features, the binary-valued target, and the positive label.
config = {
"features": [
"Body Temperature",
"Heart Rate",
"Respiratory Rate",
"SBP",
"DBP",
],
"target": "Sex",
"pos_label": "F",
}
evaluate_logistic_regression = ra.EvaluateLogisticRegression(config)
The output of the action is a table with a single AUC value.
rockfish.actions.txtr.LogisticRegressionConfig
Configuration for the EvaluateLogisticRegression
action.
See details on some of the arguments in
sklearn.linear_model.LogisticRegression
v1.6.1.
Attributes:
Name | Type | Description |
---|---|---|
features |
list[str]
|
Numerical features to use in the model. |
target |
str
|
The classification target. Must have two unique values. |
pos_label |
Optional[str]
|
The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1. |
table_split_col_name |
str
|
The name of the column that contains the split label (train/test). |
penalty |
Optional[Literal['l1', 'l2', 'elasticnet']]
|
Specify the norm of the penalty. |
dual |
bool
|
Dual (constrained) or primal (regularized) formulation. |
tol |
float
|
Tolerance for stopping criteria. |
C |
float
|
Inverse of regularization strength; must be a positive float. |
fit_intercept |
bool
|
Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. |
intercept_scaling |
float
|
Useful only when the solver |
class_weight |
ClassWeight
|
Weights associated with classes in the form |
random_state |
Optional[int]
|
Used when |
solver |
str
|
Algorithm to use in the optimization problem. |
max_iter |
int
|
Maximum number of iterations taken for the solvers to converge. |
rockfish.actions.EvaluateRandomForest
Evaluate the classification performance using Random Forest.
See the example in
EvaluateLogisticRegression
for usage.
rockfish.actions.txtr.RandomForestConfig
Configuration for the EvaluateRandomForest
action.
See details on some of the arguments in
sklearn.ensemble.RandomForestClassifier
v1.6.1.
Attributes:
Name | Type | Description |
---|---|---|
features |
list[str]
|
Numerical features to use in the model. |
target |
str
|
The classification target. Must have two unique values. |
pos_label |
Optional[str]
|
The positive label. If None and the target value set is {0, 1} or {-1, 1}, then the positive label is 1. |
table_split_col_name |
str
|
The name of the column that contains the split label (train/test). |
n_estimators |
int
|
The number of trees in the forest. |
criterion |
Literal['gini', 'entropy', 'log_loss']
|
The function to measure the quality of a split. |
max_depth |
Optional[int]
|
The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than |
min_samples_split |
int
|
The minimum number of samples required to split an internal node. |
min_samples_leaf |
int
|
The minimum number of samples required to be at a leaf node. |
min_weight_fraction_leaf |
float
|
The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node. |
max_features |
Union[str, int, float, None]
|
The number of features to consider when looking for the best split. |
max_leaf_nodes |
Optional[int]
|
Grow trees with |
min_impurity_decrease |
float
|
A node will be split if this split induces a decrease of the impurity greater than or equal to this value. |
bootstrap |
bool
|
Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree. |
oob_score |
bool
|
Whether to use out-of-bag samples to estimate the generalization score. |
n_jobs |
Optional[int]
|
The number of jobs to run in parallel. |
random_state |
Optional[int]
|
Controls both the randomness of the bootstrapping of the samples used when building trees (if |
class_weight |
ClassWeight
|
Weights associated with classes in the form |
ccp_alpha |
float
|
Complexity parameter used for Minimal Cost-Complexity Pruning. |
max_samples |
Optional[float]
|
If bootstrap is True, the number of samples to draw from X to train each base estimator. |