Skip to content

rockfish.labs.vis

vis

Classes

Functions

plot_bar(datasets: list[LocalDataset], field: str, weights: Optional[str] = None, order: Optional[list] = None, orient: str = 'vertical', nlargest: Optional[int] = 10, stat: BinStat = 'percent', **kwargs)

Plot data as a bar plot.

This plot should only be used for categorical data. For numerical data consider using :func:plot_kde.

Parameters:

Name Type Description Default
datasets list[LocalDataset]

List of Dataset all with a same schema.

required
field str

A categorical field name.

required
weights Optional[str]

If set, a field name containing frequencies of each category for the specified field. Set this if you have pre-aggregated data.

None
order Optional[list]

Order of categories to display.

None
orient str

Orientation of the plot. Can be either "vertical" or "horizontal". Default is "vertical", meaning the x-axis will represent the field. If set to "horizontal", the y-axis will represent the field.

'vertical'
nlargest Optional[int]

Limit the number of categories to display. It will not be effective if the Dataset is aggregated with weights provided. Default is 10. Set to None to display all categories.

10
stat BinStat

Statistic to compute for each bin. Default is "percent", which represents the percentage of each bin relative to the total counts and is useful for comparing distributions with different data sizes.

'percent'

plot_kde(datasets: list[LocalDataset], field: str, weights: Optional[str] = None, duration_unit: DurationUnit = 's', **kwargs)

Create a kernel density estimate plot.

Parameters:

Name Type Description Default
datasets list[LocalDataset]

List of Dataset all with a same schema.

required
field str

A continuous numerical field.

required
weights Optional[str]

If set, a field name containing the weights for the specified field. Set this if you have pre-aggregated data.

None
duration_unit DurationUnit

When the specified field is a duration type, display it using these units.

's'
kwargs

Additional arguments are passed to the seaborn displot function.

{}

plot_cdf(datasets: list[LocalDataset], field: str, weights: Optional[str] = None, duration_unit: DurationUnit = 's', **kwargs)

Create a cumulative distribution function plot.

Parameters:

Name Type Description Default
datasets list[LocalDataset]

List of Dataset all with a same schema.

required
field str

A continuous numerical field.

required
weights Optional[str]

If set, a field name containing the weights for the specified field. Set this if you have pre-aggregated data.

None
duration_unit DurationUnit

When the specified field is a duration type, display it using these units.

's'
kwargs

Additional arguments are passed to the seaborn displot function.

{}

plot_hist(datasets: list[LocalDataset], field: str, weights: Optional[str] = None, duration_unit: DurationUnit = 's', stat: BinStat = 'density', **kwargs)

Create a histogram plot.

Parameters:

Name Type Description Default
datasets list[LocalDataset]

List of Dataset all with a same schema.

required
field str

A continuous numerical field name.

required
weights Optional[str]

If set, a field name containing the weights for the specified field. Set this if you have pre-aggregated data.

None
duration_unit DurationUnit

When the specified field is a duration type, display it using these units.

's'
kwargs

Additional arguments are passed to the seaborn displot function.

{}
stat BinStat

Statistic to compute for each bin. Default is "density", which normalizes the histogram so that the area under the histogram equals 1 and is useful for comparing distributions with different sample sizes.

'density'

plot_distribution(datasets: list[LocalDataset], field: str, weights: Optional[str] = None, order: Optional[list] = None, **kwargs)

Plot the data as either a histogram or kde depending on the type of data.

If you don't like which one this picks then you can call one of the lower level functions directly, either :func:plot_kde or :func:plot_histogram.

plot_scatter(datasets: list[LocalDataset], field_x: str, field_y: str, **kwargs)

Create a scatter plot of the data in the x and y fields of the tables.

Each table is plotted with a different color and listed in the legend by name.

Parameters:

Name Type Description Default
datasets list[LocalDataset]

Tables containing plot data. Each table must have fields with names field_x and field_y.

required
field_x str

A continuous numerical field name to plot as the x-axis.

required
field_y str

A continuous numerical field name to plot as the y-axis.

required

plot_correlation(datasets: list[LocalDataset], field_x: str, field_y: str, **kwargs)

Create a scatter plot plot with Pearson correlation coefficient.

Parameters:

Name Type Description Default
datasets list[LocalDataset]

Tables containing plot data. Each table must have fields with names field_x and field_y.

required
field_x str

A continuous numerical field name to plot as the x-axis.

required
field_y str

A continuous numerical field name to plot as the y-axis.

required

plot_correlation_heatmap(datasets: list[LocalDataset], fields: list[str], cmap='Reds', **kwargs)

Plot correlation heatmap.

This plot should only be used for numerical columns.

Parameters:

Name Type Description Default
datasets list[LocalDataset]

List of Datasets all with the same schema.

required
fields list[str]

List of field names for the numerical values.

required
kwargs

Additional arguments are passed to the seaborn heatmap function.

{}

plot_association_heatmap(datasets: list[LocalDataset], fields: list, correction: bool = False, cmap='Blues', **kwargs)

Plot association heatmap. This plot should only be used for catigorical columns.

Parameters:

Name Type Description Default
datasets list[LocalDataset]

List of Datasets all with the same schema.

required
fields list

List of field names for the catigorical values.

required
correction bool

Boolean value. If True, apply bias correction to Cramer's V. Default is False.

False
kwargs

Additional arguments are passed to the seaborn heatmap function.

{}

custom_plot(datasets: list, query: str, plot_func, *args, **kwargs)

Create a custom plot using custom datasets via the chosen plot_func.

Parameters:

Name Type Description Default
datasets list

List of LocalDatasets all with the same schema.

required
query str

An SQL query against a table named my_table, which applies to LocalDatasets.

required
plot_func

A callable plotting function chosen from rf.labs.vis to create a plot.

required
kwargs

Additional arguments except datasets are passed to the chosen plot_func based on their required parameters.

{}