.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/skore_project/plot_working_with_projects.py"
.. LINE NUMBERS ARE GIVEN BELOW.
.. only:: html
.. note::
:class: sphx-glr-download-link-note
:ref:`Go to the end `
to download the full example code.
.. rst-class:: sphx-glr-example-title
.. _sphx_glr_auto_examples_skore_project_plot_working_with_projects.py:
.. _example_working_with_projects:
=====================
Working with projects
=====================
This example provides an overview of the functionalities and the different types
of items that we can store in a skore :class:`~skore.Project`.
.. GENERATED FROM PYTHON SOURCE LINES 13-15
Creating and loading the skore project
======================================
.. GENERATED FROM PYTHON SOURCE LINES 17-18
We create and load the skore project from the current directory:
.. GENERATED FROM PYTHON SOURCE LINES 18-22
.. code-block:: Python
import skore
my_project = skore.Project("my_project")
.. GENERATED FROM PYTHON SOURCE LINES 32-43
There is a very simple and unique API for all objects:
.. code-block:: python
my_project.put("my_key", "my_value")
There is no need to remember ``plt.savefig(...)``, ``df.to_csv(...)``,
``np.save(...)``, etc for each type of object.
In the following, we will list all the different types of objects that we can
:func:`~skore.Project.put` inside a skore :class:`~skore.Project`.
.. GENERATED FROM PYTHON SOURCE LINES 46-51
Storing integers
================
Now, let us store our first object using :func:`~skore.Project.put`, for example an
integer:
.. GENERATED FROM PYTHON SOURCE LINES 53-55
.. code-block:: Python
my_project.put("my_int", 3)
.. GENERATED FROM PYTHON SOURCE LINES 56-59
Here, the name of the object is ``my_int`` and the integer value is 3.
We can read it from the project by using :func:`~skore.Project.get`:
.. GENERATED FROM PYTHON SOURCE LINES 61-63
.. code-block:: Python
my_project.get("my_int")
.. rst-class:: sphx-glr-script-out
.. code-block:: none
3
.. GENERATED FROM PYTHON SOURCE LINES 64-65
More generally, we follow the principile of "what you put is what you get".
.. GENERATED FROM PYTHON SOURCE LINES 67-69
Like in a traditional Python dictionary, the ``put`` method will *overwrite*
past data if we use a key that already exists:
.. GENERATED FROM PYTHON SOURCE LINES 71-73
.. code-block:: Python
my_project.put("my_int", 30_000)
.. GENERATED FROM PYTHON SOURCE LINES 74-75
We can check the updated value:
.. GENERATED FROM PYTHON SOURCE LINES 77-79
.. code-block:: Python
my_project.get("my_int")
.. rst-class:: sphx-glr-script-out
.. code-block:: none
30000
.. GENERATED FROM PYTHON SOURCE LINES 80-85
.. seealso::
Actually, skore does not exactly *overwrite*, but stores the history of items.
For more information about the tracking of items using their history,
see :ref:`example_tracking_items`.
.. GENERATED FROM PYTHON SOURCE LINES 87-88
By using the :func:`~skore.Project.delete` method, we can also delete an object:
.. GENERATED FROM PYTHON SOURCE LINES 90-92
.. code-block:: Python
my_project.put("my_int_2", 10)
.. GENERATED FROM PYTHON SOURCE LINES 93-95
.. code-block:: Python
my_project.delete("my_int_2")
.. GENERATED FROM PYTHON SOURCE LINES 96-97
We can display all the keys in our project:
.. GENERATED FROM PYTHON SOURCE LINES 99-101
.. code-block:: Python
my_project.keys()
.. rst-class:: sphx-glr-script-out
.. code-block:: none
['my_int']
.. GENERATED FROM PYTHON SOURCE LINES 102-104
Storing strings and texts
=========================
.. GENERATED FROM PYTHON SOURCE LINES 106-107
We just stored a integer, now let us store some text using strings!
.. GENERATED FROM PYTHON SOURCE LINES 109-111
.. code-block:: Python
my_project.put("my_string", "Hello world!")
.. GENERATED FROM PYTHON SOURCE LINES 112-114
.. code-block:: Python
my_project.get("my_string")
.. rst-class:: sphx-glr-script-out
.. code-block:: none
'Hello world!'
.. GENERATED FROM PYTHON SOURCE LINES 115-118
:func:`~skore.Project.get` infers the type of the inserted object by default. For
example, strings are assumed to be in Markdown format. Hence, we can customize the
display of our text:
.. GENERATED FROM PYTHON SOURCE LINES 120-133
.. code-block:: Python
my_project.put(
"my_string_2",
(
"""Hello world!, **bold**, *italic*, `code`
```python
def my_func(x):
return x+2
```
"""
),
)
.. GENERATED FROM PYTHON SOURCE LINES 134-136
Moreover, we can also explicitly tell skore the way we want to display an object, for
example in HTML:
.. GENERATED FROM PYTHON SOURCE LINES 138-145
.. code-block:: Python
my_project.put(
"my_string_3",
"
Title
bold, italic, etc.
",
display_as="HTML",
)
.. GENERATED FROM PYTHON SOURCE LINES 146-147
Note that the `display_as` is only used for the UI, and not in this notebook at hand:
.. GENERATED FROM PYTHON SOURCE LINES 149-151
.. code-block:: Python
my_project.get("my_string_3")
.. rst-class:: sphx-glr-script-out
.. code-block:: none
'
Title
bold, italic, etc.'
.. GENERATED FROM PYTHON SOURCE LINES 152-153
We can also conveniently use a Python f-string:
.. GENERATED FROM PYTHON SOURCE LINES 155-159
.. code-block:: Python
x = 2
y = [1, 2, 3, 4]
my_project.put("my_string_4", f"The value of `x` is {x} and the value of `y` is {y}.")
.. GENERATED FROM PYTHON SOURCE LINES 160-162
Storing many kinds of data
==========================
.. GENERATED FROM PYTHON SOURCE LINES 164-165
Python list:
.. GENERATED FROM PYTHON SOURCE LINES 167-171
.. code-block:: Python
my_list = [1, 2, 3, 4]
my_project.put("my_list", my_list)
my_list
.. rst-class:: sphx-glr-script-out
.. code-block:: none
[1, 2, 3, 4]
.. GENERATED FROM PYTHON SOURCE LINES 172-173
Python dictionary:
.. GENERATED FROM PYTHON SOURCE LINES 175-182
.. code-block:: Python
my_dict = {
"company": "probabl",
"year": 2023,
}
my_project.put("my_dict", my_dict)
my_dict
.. rst-class:: sphx-glr-script-out
.. code-block:: none
{'company': 'probabl', 'year': 2023}
.. GENERATED FROM PYTHON SOURCE LINES 183-184
Numpy array:
.. GENERATED FROM PYTHON SOURCE LINES 186-192
.. code-block:: Python
import numpy as np
my_arr = np.random.randn(3, 3)
my_project.put("my_arr", my_arr)
my_arr
.. rst-class:: sphx-glr-script-out
.. code-block:: none
array([[-0.86330973, 1.35312969, -0.19322573],
[ 0.48155874, 0.5343325 , 1.42763297],
[-0.39758877, 1.33064685, 0.32829924]])
.. GENERATED FROM PYTHON SOURCE LINES 193-194
Pandas data frame:
.. GENERATED FROM PYTHON SOURCE LINES 196-202
.. code-block:: Python
import pandas as pd
my_df_pandas = pd.DataFrame(np.random.randn(10, 5))
my_project.put("my_df_pandas", my_df_pandas)
my_df_pandas.head()
.. raw:: html
0
1
2
3
4
0
-0.163540
0.082665
0.041304
-0.437077
-1.739408
1
0.524022
1.450858
1.676659
0.738416
0.253720
2
-1.349323
-0.302111
-0.318985
-0.222656
-1.673151
3
-0.204321
-0.924742
-1.236936
1.696530
-0.768724
4
-0.951621
-0.357377
-0.856962
0.066722
1.118867
.. GENERATED FROM PYTHON SOURCE LINES 203-204
Polars data frame:
.. GENERATED FROM PYTHON SOURCE LINES 206-212
.. code-block:: Python
import polars as pl
my_df_polars = pl.DataFrame(np.random.randn(10, 5))
my_project.put("my_df_polars", my_df_polars)
my_df_polars.head()
.. raw:: html
['Department of Police', 'Department of Health and Human Services', 'Fire and Rescue Services', 'Department of Transportation', 'Correction and Rehabilitation', 'Department of Liquor Control', 'Department of General Services', 'Department of Public Libraries', 'Department of Permitting Services', "Sheriff's Office"]
division
ObjectDType
Null values
0 (0.0%)
Unique values
694 (7.5%)
Most frequent values
School Health Services
Transit Silver Spring Ride On
Transit Gaithersburg Ride On
Highway Services
Child Welfare Services
FSB Traffic Division School Safety Section
Income Supports
PSB 3rd District Patrol
PSB 4th District Patrol
Transit Nicholson Ride On
List:
['School Health Services', 'Transit Silver Spring Ride On', 'Transit Gaithersburg Ride On', 'Highway Services', 'Child Welfare Services', 'FSB Traffic Division School Safety Section', 'Income Supports', 'PSB 3rd District Patrol', 'PSB 4th District Patrol', 'Transit Nicholson Ride On']
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
Column
Column name
dtype
Null values
Unique values
Mean
Std
Min
Median
Max
0
gender
ObjectDType
17 (0.2%)
2 (< 0.1%)
1
department
ObjectDType
0 (0.0%)
37 (0.4%)
2
department_name
ObjectDType
0 (0.0%)
37 (0.4%)
3
division
ObjectDType
0 (0.0%)
694 (7.5%)
4
assignment_category
ObjectDType
0 (0.0%)
2 (< 0.1%)
5
employee_position_title
ObjectDType
0 (0.0%)
443 (4.8%)
6
date_first_hired
ObjectDType
0 (0.0%)
2264 (24.5%)
7
year_first_hired
Int64DType
0 (0.0%)
51 (0.6%)
2.00e+03
9.33
1,965
2,005
2,016
No columns match the selected filter: . You can change the column filter in the dropdown menu above.
To construct a list of column names that you can easily copy-paste
(in the box above), select some columns using the checkboxes next
to the column names or the "Select all" button.
['Department of Police', 'Department of Health and Human Services', 'Fire and Rescue Services', 'Department of Transportation', 'Correction and Rehabilitation', 'Department of Liquor Control', 'Department of General Services', 'Department of Public Libraries', 'Department of Permitting Services', "Sheriff's Office"]
division
ObjectDType
Null values
0 (0.0%)
Unique values
694 (7.5%)
Most frequent values
School Health Services
Transit Silver Spring Ride On
Transit Gaithersburg Ride On
Highway Services
Child Welfare Services
FSB Traffic Division School Safety Section
Income Supports
PSB 3rd District Patrol
PSB 4th District Patrol
Transit Nicholson Ride On
List:
['School Health Services', 'Transit Silver Spring Ride On', 'Transit Gaithersburg Ride On', 'Highway Services', 'Child Welfare Services', 'FSB Traffic Division School Safety Section', 'Income Supports', 'PSB 3rd District Patrol', 'PSB 4th District Patrol', 'Transit Nicholson Ride On']
The table below shows the strength of association between the most similar columns in the dataframe.
Cramér's V statistic is a number between 0 and 1.
When it is close to 1 the columns are strongly associated — they contain similar information.
In this case, one of them may be redundant and for some models (such as linear models) it might be beneficial to remove it.
Please enable javascript
The skrub table reports need javascript to display correctly. If you are
displaying a report in a Jupyter notebook and you see this message, you may need to
re-execute the cell or to trust the notebook (button on the top right or
"File > Trust notebook").
.. GENERATED FROM PYTHON SOURCE LINES 226-231
Storing data visualizations
===========================
Note that, in the dashboard, the interactivity of plots is supported, for example for
Altair and Plotly.
.. GENERATED FROM PYTHON SOURCE LINES 233-234
Matplotlib figure:
.. GENERATED FROM PYTHON SOURCE LINES 236-252
.. code-block:: Python
import matplotlib.pyplot as plt
x = np.linspace(0, 2, 100)
fig, ax = plt.subplots(layout="constrained")
ax.plot(x, x, label="linear")
ax.plot(x, x**2, label="quadratic")
ax.plot(x, x**3, label="cubic")
ax.set_xlabel("x label")
ax.set_ylabel("y label")
ax.set_title("Simple Plot")
ax.legend()
plt.show()
my_project.put("my_figure", fig)
.. image-sg:: /auto_examples/skore_project/images/sphx_glr_plot_working_with_projects_001.png
:alt: Simple Plot
:srcset: /auto_examples/skore_project/images/sphx_glr_plot_working_with_projects_001.png
:class: sphx-glr-single-img
.. GENERATED FROM PYTHON SOURCE LINES 253-254
Altair chart:
.. GENERATED FROM PYTHON SOURCE LINES 257-276
.. code-block:: Python
import altair as alt
alt.renderers.enable("default")
num_points = 100
df_plot = pd.DataFrame(
{"x": np.random.randn(num_points), "y": np.random.randn(num_points)}
)
my_altair_chart = (
alt.Chart(df_plot)
.mark_circle()
.encode(x="x", y="y", tooltip=["x", "y"])
.interactive()
.properties(title="My title")
)
my_project.put("my_altair_chart", my_altair_chart)
.. GENERATED FROM PYTHON SOURCE LINES 277-287
.. note::
For Plotly figures, some users reported the following error when running Plotly
cells: ``ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not
installed``. This is a Plotly issue which is documented `here
`_; to solve it, we recommend
installing ``nbformat`` in your environment, e.g. with:
.. code-block:: console
pip install --upgrade nbformat
.. GENERATED FROM PYTHON SOURCE LINES 289-290
Plotly figure:
.. GENERATED FROM PYTHON SOURCE LINES 292-303
.. code-block:: Python
import plotly.express as px
df = px.data.iris()
fig = px.scatter(
df, x=df.sepal_length, y=df.sepal_width, color=df.species, size=df.petal_length
)
my_project.put("my_plotly_fig", fig)
fig
.. raw:: html
.. GENERATED FROM PYTHON SOURCE LINES 328-332
Storing scikit-learn models and pipelines
=========================================
First of all, we can store a scikit-learn model:
.. GENERATED FROM PYTHON SOURCE LINES 334-340
.. code-block:: Python
from sklearn.linear_model import Lasso
my_model = Lasso(alpha=2)
my_project.put("my_model", my_model)
my_model
.. raw:: html
Lasso(alpha=2)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Lasso(alpha=2)
.. GENERATED FROM PYTHON SOURCE LINES 341-342
We can also store scikit-learn pipelines:
.. GENERATED FROM PYTHON SOURCE LINES 344-353
.. code-block:: Python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
my_pipeline = Pipeline(
[("standard_scaler", StandardScaler()), ("lasso", Lasso(alpha=2))]
)
my_project.put("my_pipeline", my_pipeline)
my_pipeline
.. raw:: html
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
.. GENERATED FROM PYTHON SOURCE LINES 354-355
Moreover, we can store fitted scikit-learn pipelines:
.. GENERATED FROM PYTHON SOURCE LINES 357-367
.. code-block:: Python
from sklearn.datasets import load_diabetes
diabetes = load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
my_pipeline.fit(X, y)
my_project.put("my_fitted_pipeline", my_pipeline)
my_pipeline
.. raw:: html
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.