Note
Go to the end to download the full example code.
Skore: getting started#
This getting started guide illustrates how to use skore and why:
Get assistance when developing your ML/DS projects to avoid common pitfalls and follow recommended practices.
skore.EstimatorReport
: get an insightful report on your estimatorskore.CrossValidationReport
: get an insightful report on your cross-validation resultsskore.ComparisonReport
: benchmark your skore estimator reportsskore.train_test_split()
: get diagnostics when splitting your data
Track your ML/DS results using skore’s
Project
(for storage).
Machine learning evaluation and diagnostics#
Skore implements new tools or wraps some key scikit-learn class / functions to automatically provide insights and diagnostics when using them, as a way to facilitate good practices and avoid common pitfalls.
Model evaluation with skore#
In order to assist its users when programming, skore has implemented a
skore.EstimatorReport
class.
Let us load some synthetic data and get the estimator report for a
LogisticRegression
:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from skore import EstimatorReport
X, y = make_classification(n_classes=2, n_samples=100_000, n_informative=4)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
log_reg = LogisticRegression(random_state=0)
log_reg_report = EstimatorReport(
log_reg, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test
)
Now, we can display the help tree to see all the insights that are available to us (skore detected that we are doing binary classification):
╭────────────────── Tools to diagnose estimator LogisticRegression ───────────────────╮
│ EstimatorReport │
│ ├── .metrics │
│ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │
│ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │
│ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │
│ │ ├── .precision(...) (↗︎) - Compute the precision score. │
│ │ ├── .precision_recall(...) - Plot the precision-recall curve. │
│ │ ├── .recall(...) (↗︎) - Compute the recall score. │
│ │ ├── .roc(...) - Plot the ROC curve. │
│ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │
│ │ ├── .custom_metric(...) - Compute a custom metric. │
│ │ └── .report_metrics(...) - Report a set of metrics for our estimator. │
│ ├── .cache_predictions(...) - Cache estimator's predictions. │
│ ├── .clear_cache(...) - Clear the cache. │
│ └── Attributes │
│ ├── .X_test │
│ ├── .X_train │
│ ├── .y_test │
│ ├── .y_train │
│ ├── .estimator_ │
│ └── .estimator_name_ │
│ │
│ │
│ Legend: │
│ (↗︎) higher is better (↘︎) lower is better │
╰─────────────────────────────────────────────────────────────────────────────────────╯
We can get the report metrics that was computed for us:
df_log_reg_report_metrics = log_reg_report.metrics.report_metrics()
df_log_reg_report_metrics
We can also plot the ROC curve that was generated for us:
import matplotlib.pyplot as plt
roc_plot = log_reg_report.metrics.roc()
roc_plot.plot()
plt.tight_layout()

See also
For more information about the motivation and usage of
skore.EstimatorReport
, see EstimatorReport: Get insights from any scikit-learn estimator.
Cross-validation with skore#
skore has also (re-)implemented a skore.CrossValidationReport
class that
contains several skore.EstimatorReport
for each fold.
from skore import CrossValidationReport
cv_report = CrossValidationReport(log_reg, X, y, cv_splitter=5)
We display the cross-validation report helper:
╭─────────────────── Tools to diagnose estimator LogisticRegression ───────────────────╮
│ CrossValidationReport │
│ ├── .metrics │
│ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │
│ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │
│ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │
│ │ ├── .precision(...) (↗︎) - Compute the precision score. │
│ │ ├── .precision_recall(...) - Plot the precision-recall curve. │
│ │ ├── .recall(...) (↗︎) - Compute the recall score. │
│ │ ├── .roc(...) - Plot the ROC curve. │
│ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │
│ │ ├── .custom_metric(...) - Compute a custom metric. │
│ │ └── .report_metrics(...) - Report a set of metrics for our estimator. │
│ ├── .cache_predictions(...) - Cache the predictions for sub-estimators │
│ │ reports. │
│ ├── .clear_cache(...) - Clear the cache. │
│ └── Attributes │
│ ├── .X │
│ ├── .y │
│ ├── .estimator_ │
│ ├── .estimator_name_ │
│ ├── .estimator_reports_ │
│ └── .n_jobs │
│ │
│ │
│ Legend: │
│ (↗︎) higher is better (↘︎) lower is better │
╰──────────────────────────────────────────────────────────────────────────────────────╯
We display the metrics for each fold:
df_cv_report_metrics = cv_report.metrics.report_metrics()
df_cv_report_metrics
We display the ROC curves for each fold:
roc_plot_cv = cv_report.metrics.roc()
roc_plot_cv.plot()
plt.tight_layout()

We can retrieve the estimator report of a specific fold to investigate further, for example the first fold:
log_reg_report_fold = cv_report.estimator_reports_[0]
df_log_reg_report_fold_metrics = log_reg_report_fold.metrics.report_metrics()
df_log_reg_report_fold_metrics
See also
For more information about the motivation and usage of
skore.CrossValidationReport
, see Simplified experiment reporting.
Comparing estimators reports#
skore.ComparisonReport
enables users to compare several estimator reports
(corresponding to several estimators) on a same test set, as in a benchmark of
estimators.
Apart from the previous log_reg_report
, let use define another estimator report:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(max_depth=2, random_state=0)
rf_report = EstimatorReport(
rf, X_train=X_train, X_test=X_test, y_train=y_train, y_test=y_test
)
Now, let us compare these two estimator reports, that were applied to the exact same test set:
from skore import ComparisonReport
comparison_report = ComparisonReport(reports=[log_reg_report, rf_report])
As for the EstimatorReport
and the
CrossValidationReport
, we have a helper:
╭──────────────────────────── Tools to compare estimators ─────────────────────────────╮
│ ComparisonReport │
│ ├── .metrics │
│ │ ├── .accuracy(...) (↗︎) - Compute the accuracy score. │
│ │ ├── .brier_score(...) (↘︎) - Compute the Brier score. │
│ │ ├── .log_loss(...) (↘︎) - Compute the log loss. │
│ │ ├── .precision(...) (↗︎) - Compute the precision score. │
│ │ ├── .precision_recall(...) - Plot the precision-recall curve. │
│ │ ├── .recall(...) (↗︎) - Compute the recall score. │
│ │ ├── .roc(...) - Plot the ROC curve. │
│ │ ├── .roc_auc(...) (↗︎) - Compute the ROC AUC score. │
│ │ ├── .custom_metric(...) - Compute a custom metric. │
│ │ └── .report_metrics(...) - Report a set of metrics for the estimators. │
│ ├── .cache_predictions(...) - Cache the predictions for sub-estimators │
│ │ reports. │
│ ├── .clear_cache(...) - Clear the cache. │
│ └── Attributes │
│ ├── .estimator_reports_ │
│ ├── .n_jobs │
│ └── .report_names_ │
│ │
│ │
│ Legend: │
│ (↗︎) higher is better (↘︎) lower is better │
╰──────────────────────────────────────────────────────────────────────────────────────╯
Let us display the result of our benchmark:
benchmark_metrics = comparison_report.metrics.report_metrics()
benchmark_metrics
We have the result of our benchmark.
We display the ROC curve for the two estimator reports we want to compare, by superimposing them on the same figure:
comparison_report.metrics.roc().plot()
plt.tight_layout()

Train-test split with skore#
Skore has implemented a skore.train_test_split()
function that wraps
scikit-learn’s sklearn.model_selection.train_test_split()
.
Let us load a dataset containing some time series data:
import pandas as pd
from skrub.datasets import fetch_employee_salaries
dataset = fetch_employee_salaries()
X, y = dataset.X, dataset.y
X["date_first_hired"] = pd.to_datetime(X["date_first_hired"])
X.head(2)
Downloading 'employee_salaries' from https://github.com/skrub-data/skrub-data-files/raw/refs/heads/main/employee_salaries.zip (attempt 1/3)
We can observe that there is a date_first_hired
which is time-based.
Now, let us apply skore.train_test_split()
on this data:
╭─────────────────────────────── TimeBasedColumnWarning ───────────────────────────────╮
│ We detected some time-based columns (column "date_first_hired") in your data. We │
│ recommend using scikit-learn's TimeSeriesSplit instead of train_test_split. │
│ Otherwise you might train on future data to predict the past, or get inflated model │
│ performance evaluation because natural drift will not be taken into account. │
╰──────────────────────────────────────────────────────────────────────────────────────╯
We get a TimeBasedColumnWarning
advising us to use
sklearn.model_selection.TimeSeriesSplit
instead!
Indeed, we should not shuffle time-ordered data!
See also
More methodological advice is available.
For more information about the motivation and usage of
skore.train_test_split()
, see train_test_split: get diagnostics when splitting your data.
Tracking: skore project#
A key feature of skore is its Project
that allows to store
items of many types.
Setup: creating and loading a skore project#
Let’s start by creating a skore project directory named my_project.skore
in our
current directory:
my_project = skore.Project("my_project")
Skore project: storing and retrieving some items#
Now that the project exists, we can store some useful items in it (in the same
directory) using put()
), with a “universal” key-value convention:
my_project.put("my_int", 3)
my_project.put("df_cv_report_metrics", df_cv_report_metrics)
my_project.put("roc_plot", roc_plot)
Note
With the skore put()
, there is no need to remember the API for
each type of object: df.to_csv(...)
, plt.savefig(...)
, np.save(...)
,
etc.
We can retrieve the value of an item:
my_project.get("my_int")
3
my_project.get("df_cv_report_metrics")
See also
For more information about the functionalities and the different types
of items that we can store in a skore Project
,
see Working with projects.
Tracking the history of items#
Suppose we store several values for a same item called my_key_metric
:
my_project.put("my_key_metric", 4)
my_project.put("my_key_metric", 9)
my_project.put("my_key_metric", 16)
Skore does not overwrite items with the same name (key): instead, it stores their history so that nothing is lost:
history = my_project.get("my_key_metric", version="all")
history
[4, 9, 16]
These tracking functionalities are very useful to:
never lose some key machine learning metrics,
and observe the evolution over time / runs.
See also
For more functionalities about the tracking of items using their history, see Tracking items.
Stay tuned!
These are only the initial features: skore is a work in progress and aims to be an end-to-end library for data scientists.
Feedbacks are welcome: please feel free to join our Discord or create an issue.
Total running time of the script: (0 minutes 9.853 seconds)