Note

Go to the end to download the full example code.

Skore: getting started#

This getting started guide illustrates how to use skore and why:

Get assistance when developing your ML/DS projects to avoid common pitfalls and follow recommended practices.
- skore.EstimatorReport: get an insightful report on your estimator
- skore.CrossValidationReport: get an insightful report on your cross-validation results
- skore.ComparisonReport: benchmark your skore estimator reports
- skore.train_test_split(): get diagnostics when splitting your data
Track your ML/DS results using skore’s Project (for storage).

Machine learning evaluation and diagnostics#

Skore implements new tools or wraps some key scikit-learn class / functions to automatically provide insights and diagnostics when using them, as a way to facilitate good practices and avoid common pitfalls.

Model evaluation with skore#

In order to assist its users when programming, skore has implemented a skore.EstimatorReport class.

Let us load some synthetic data and get the estimator report for a LogisticRegression:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from skore import EstimatorReport

X, y = make_classification(n_classes=2, n_samples=100_000, n_informative=4)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

log_reg = LogisticRegression(random_state=0)

log_reg_report = EstimatorReport(
    log_reg, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test
)

Now, we can display the help tree to see all the insights that are available to us (skore detected that we are doing binary classification):

log_reg_report.help()

╭────────────────── Tools to diagnose estimator LogisticRegression ───────────────────╮
│ EstimatorReport                                                                     │
│ ├── .metrics                                                                        │
│ │   ├── .accuracy(...)         (↗︎)     - Compute the accuracy score.                │
│ │   ├── .brier_score(...)      (↘︎)     - Compute the Brier score.                   │
│ │   ├── .log_loss(...)         (↘︎)     - Compute the log loss.                      │
│ │   ├── .precision(...)        (↗︎)     - Compute the precision score.               │
│ │   ├── .precision_recall(...)         - Plot the precision-recall curve.           │
│ │   ├── .recall(...)           (↗︎)     - Compute the recall score.                  │
│ │   ├── .roc(...)                      - Plot the ROC curve.                        │
│ │   ├── .roc_auc(...)          (↗︎)     - Compute the ROC AUC score.                 │
│ │   ├── .custom_metric(...)            - Compute a custom metric.                   │
│ │   └── .report_metrics(...)           - Report a set of metrics for our estimator. │
│ ├── .cache_predictions(...)            - Cache estimator's predictions.             │
│ ├── .clear_cache(...)                  - Clear the cache.                           │
│ └── Attributes                                                                      │
│     ├── .X_test                                                                     │
│     ├── .X_train                                                                    │
│     ├── .y_test                                                                     │
│     ├── .y_train                                                                    │
│     ├── .estimator_                                                                 │
│     └── .estimator_name_                                                            │
│                                                                                     │
│                                                                                     │
│ Legend:                                                                             │
│ (↗︎) higher is better (↘︎) lower is better                                            │
╰─────────────────────────────────────────────────────────────────────────────────────╯

We can get the report metrics that was computed for us:

df_log_reg_report_metrics = log_reg_report.metrics.report_metrics()
df_log_reg_report_metrics

		LogisticRegression
Metric	Label / Average
Precision	0	0.875310
Precision	1	0.871969
Recall	0	0.872449
Recall	1	0.874839
ROC AUC		0.944256
Brier score		0.092121

We can also plot the ROC curve that was generated for us:

import matplotlib.pyplot as plt

roc_plot = log_reg_report.metrics.roc()
roc_plot.plot()
plt.tight_layout()

See also

For more information about the motivation and usage of skore.EstimatorReport, see EstimatorReport: Get insights from any scikit-learn estimator.

Cross-validation with skore#

skore has also (re-)implemented a skore.CrossValidationReport class that contains several skore.EstimatorReport for each fold.

from skore import CrossValidationReport

cv_report = CrossValidationReport(log_reg, X, y, cv_splitter=5)

We display the cross-validation report helper:

cv_report.help()

╭─────────────────── Tools to diagnose estimator LogisticRegression ───────────────────╮
│ CrossValidationReport                                                                │
│ ├── .metrics                                                                         │
│ │   ├── .accuracy(...)         (↗︎)     - Compute the accuracy score.                 │
│ │   ├── .brier_score(...)      (↘︎)     - Compute the Brier score.                    │
│ │   ├── .log_loss(...)         (↘︎)     - Compute the log loss.                       │
│ │   ├── .precision(...)        (↗︎)     - Compute the precision score.                │
│ │   ├── .precision_recall(...)         - Plot the precision-recall curve.            │
│ │   ├── .recall(...)           (↗︎)     - Compute the recall score.                   │
│ │   ├── .roc(...)                      - Plot the ROC curve.                         │
│ │   ├── .roc_auc(...)          (↗︎)     - Compute the ROC AUC score.                  │
│ │   ├── .custom_metric(...)            - Compute a custom metric.                    │
│ │   └── .report_metrics(...)           - Report a set of metrics for our estimator.  │
│ ├── .cache_predictions(...)            - Cache the predictions for sub-estimators    │
│ │   reports.                                                                         │
│ ├── .clear_cache(...)                  - Clear the cache.                            │
│ └── Attributes                                                                       │
│     ├── .X                                                                           │
│     ├── .y                                                                           │
│     ├── .estimator_                                                                  │
│     ├── .estimator_name_                                                             │
│     ├── .estimator_reports_                                                          │
│     └── .n_jobs                                                                      │
│                                                                                      │
│                                                                                      │
│ Legend:                                                                              │
│ (↗︎) higher is better (↘︎) lower is better                                             │
╰──────────────────────────────────────────────────────────────────────────────────────╯

We display the metrics for each fold:

df_cv_report_metrics = cv_report.metrics.report_metrics()
df_cv_report_metrics

		LogisticRegression
		Split #0	Split #1	Split #2	Split #3	Split #4
Metric	Label / Average
Precision	0	0.872391	0.875463	0.874875	0.877140	0.875163
Precision	1	0.876635	0.873640	0.872531	0.867285	0.870365
Recall	0	0.877449	0.873451	0.872251	0.865654	0.869652
Recall	1	0.871549	0.875650	0.875150	0.878651	0.875850
ROC AUC		0.946881	0.945183	0.944682	0.945068	0.945238
Brier score		0.090180	0.091689	0.092021	0.092021	0.091441

We display the ROC curves for each fold:

roc_plot_cv = cv_report.metrics.roc()
roc_plot_cv.plot()
plt.tight_layout()

We can retrieve the estimator report of a specific fold to investigate further, for example the first fold:

log_reg_report_fold = cv_report.estimator_reports_[0]
df_log_reg_report_fold_metrics = log_reg_report_fold.metrics.report_metrics()
df_log_reg_report_fold_metrics

		LogisticRegression
Metric	Label / Average
Precision	0	0.872391
Precision	1	0.876635
Recall	0	0.877449
Recall	1	0.871549
ROC AUC		0.946881
Brier score		0.090180

See also

For more information about the motivation and usage of skore.CrossValidationReport, see Simplified experiment reporting.

Comparing estimators reports#

skore.ComparisonReport enables users to compare several estimator reports (corresponding to several estimators) on a same test set, as in a benchmark of estimators.

Apart from the previous log_reg_report, let use define another estimator report:

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(max_depth=2, random_state=0)
rf_report = EstimatorReport(
    rf, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test
)

Now, let us compare these two estimator reports, that were applied to the exact same test set:

from skore import ComparisonReport

comparison_report = ComparisonReport(reports=[log_reg_report, rf_report])

As for the EstimatorReport and the CrossValidationReport, we have a helper:

comparison_report.help()

╭──────────────────────────── Tools to compare estimators ─────────────────────────────╮
│ ComparisonReport                                                                     │
│ ├── .metrics                                                                         │
│ │   ├── .accuracy(...)         (↗︎)     - Compute the accuracy score.                 │
│ │   ├── .brier_score(...)      (↘︎)     - Compute the Brier score.                    │
│ │   ├── .log_loss(...)         (↘︎)     - Compute the log loss.                       │
│ │   ├── .precision(...)        (↗︎)     - Compute the precision score.                │
│ │   ├── .precision_recall(...)         - Plot the precision-recall curve.            │
│ │   ├── .recall(...)           (↗︎)     - Compute the recall score.                   │
│ │   ├── .roc(...)                      - Plot the ROC curve.                         │
│ │   ├── .roc_auc(...)          (↗︎)     - Compute the ROC AUC score.                  │
│ │   ├── .custom_metric(...)            - Compute a custom metric.                    │
│ │   └── .report_metrics(...)           - Report a set of metrics for the estimators. │
│ ├── .cache_predictions(...)            - Cache the predictions for sub-estimators    │
│ │   reports.                                                                         │
│ ├── .clear_cache(...)                  - Clear the cache.                            │
│ └── Attributes                                                                       │
│     ├── .estimator_reports_                                                          │
│     ├── .n_jobs                                                                      │
│     └── .report_names_                                                               │
│                                                                                      │
│                                                                                      │
│ Legend:                                                                              │
│ (↗︎) higher is better (↘︎) lower is better                                             │
╰──────────────────────────────────────────────────────────────────────────────────────╯

Let us display the result of our benchmark:

benchmark_metrics = comparison_report.metrics.report_metrics()
benchmark_metrics

	Estimator	LogisticRegression	RandomForestClassifier
Metric	Label / Average
Precision	0	0.875310	0.830577
Precision	1	0.871969	0.861436
Recall	0	0.872449	0.868782
Recall	1	0.874839	0.821532
ROC AUC		0.944256	0.919813
Brier score		0.092121	0.149600

We have the result of our benchmark.

We display the ROC curve for the two estimator reports we want to compare, by superimposing them on the same figure:

comparison_report.metrics.roc().plot()
plt.tight_layout()

Train-test split with skore#

Skore has implemented a skore.train_test_split() function that wraps scikit-learn’s sklearn.model_selection.train_test_split().

Let us load a dataset containing some time series data:

import pandas as pd
from skrub.datasets import fetch_employee_salaries

dataset = fetch_employee_salaries()
X, y = dataset.X, dataset.y
X["date_first_hired"] = pd.to_datetime(X["date_first_hired"])
X.head(2)

Downloading 'employee_salaries' from https://github.com/skrub-data/skrub-data-files/raw/refs/heads/main/employee_salaries.zip (attempt 1/3)

	gender	department	department_name	division	assignment_category	employee_position_title	date_first_hired	year_first_hired
0	F	POL	Department of Police	MSB Information Mgmt and Tech Division Records...	Fulltime-Regular	Office Services Coordinator	1986-09-22	1986
1	M	POL	Department of Police	ISB Major Crimes Division Fugitive Section	Fulltime-Regular	Master Police Officer	1988-09-12	1988

We can observe that there is a date_first_hired which is time-based. Now, let us apply skore.train_test_split() on this data:

import skore

X_train, X_test, y_train, y_test = skore.train_test_split(
    X, y, random_state=0, shuffle=False
)

╭─────────────────────────────── TimeBasedColumnWarning ───────────────────────────────╮
│ We detected some time-based columns (column "date_first_hired") in your data. We     │
│ recommend using scikit-learn's TimeSeriesSplit instead of train_test_split.          │
│ Otherwise you might train on future data to predict the past, or get inflated model  │
│ performance evaluation because natural drift will not be taken into account.         │
╰──────────────────────────────────────────────────────────────────────────────────────╯

We get a TimeBasedColumnWarning advising us to use sklearn.model_selection.TimeSeriesSplit instead! Indeed, we should not shuffle time-ordered data!

See also

More methodological advice is available. For more information about the motivation and usage of skore.train_test_split(), see train_test_split: get diagnostics when splitting your data.

Tracking: skore project#

A key feature of skore is its Project that allows to store items of many types.

Setup: creating and loading a skore project#

Let’s start by creating a skore project directory named my_project.skore in our current directory:

my_project = skore.Project("my_project")

Skore project: storing and retrieving some items#

Now that the project exists, we can store some useful items in it (in the same directory) using put()), with a “universal” key-value convention:

my_project.put("my_int", 3)
my_project.put("df_cv_report_metrics", df_cv_report_metrics)
my_project.put("roc_plot", roc_plot)

Note

With the skore put(), there is no need to remember the API for each type of object: df.to_csv(...), plt.savefig(...), np.save(...), etc.

We can retrieve the value of an item:

my_project.get("my_int")

my_project.get("df_cv_report_metrics")

		(LogisticRegression, Split #0)	(LogisticRegression, Split #1)	(LogisticRegression, Split #2)	(LogisticRegression, Split #3)	(LogisticRegression, Split #4)
Metric	Label / Average
Precision	0	0.872391	0.875463	0.874875	0.877140	0.875163
Precision	1	0.876635	0.873640	0.872531	0.867285	0.870365
Recall	0	0.877449	0.873451	0.872251	0.865654	0.869652
Recall	1	0.871549	0.875650	0.875150	0.878651	0.875850
ROC AUC		0.946881	0.945183	0.944682	0.945068	0.945238
Brier score		0.090180	0.091689	0.092021	0.092021	0.091441

See also

For more information about the functionalities and the different types of items that we can store in a skore Project, see Working with projects.

Tracking the history of items#

Suppose we store several values for a same item called my_key_metric:

my_project.put("my_key_metric", 4)

my_project.put("my_key_metric", 9)

my_project.put("my_key_metric", 16)

Skore does not overwrite items with the same name (key): instead, it stores their history so that nothing is lost:

history = my_project.get("my_key_metric", version="all")
history

[4, 9, 16]

These tracking functionalities are very useful to:

never lose some key machine learning metrics,
and observe the evolution over time / runs.

See also

For more functionalities about the tracking of items using their history, see Tracking items.

Stay tuned!

These are only the initial features: skore is a work in progress and aims to be an end-to-end library for data scientists.

Feedbacks are welcome: please feel free to join our Discord or create an issue.

Total running time of the script: (0 minutes 9.853 seconds)

Gallery generated by Sphinx-Gallery

Skore: getting started#

Machine learning evaluation and diagnostics#

Model evaluation with skore#

Cross-validation with skore#

Comparing estimators reports#

Train-test split with skore#

Tracking: skore project#

Setup: creating and loading a skore project#

Skore project: storing and retrieving some items#

Tracking the history of items#

This Page