Jupyter Cache#
Execute and cache multiple Jupyter Notebook-like files via an API and CLI.
- 🤓 Smart re-execution
Notebooks will only be re-executed when code cells have changed (or code related metadata), not Markdown/Raw cells.
- 🧩 Pluggable execution modes
Select the executor for notebooks, including serial and parallel execution
- 📈 Execution reports
Timing statistics and exception tracebacks are stored for analysis
- 📖 jupytext integration
Read and execute notebooks written in multiple formats
Why use jupyter-cache?#
If you have a number of notebooks whose execution outputs you want to ensure are kept up to date, without having to re-execute them every time (particularly for long running code, or text-based formats that do not store the outputs).
The notebooks must have deterministic execution outputs:
You use the same environment to run them (e.g. the same installed packages)
They run no non-deterministic code (e.g. random numbers)
They do not depend on external resources (e.g. files or network connections) that change over time
For example, it is utilised by jupyter-book, to allow for fast document re-builds.
Installation#
Install jupyter-cache
, via pip or Conda:
pip install jupyter-cache
conda install jupyter-cache
Quick-start#
Add one or more source notebook files to the “project” (a folder containing a database and a cache of executed notebooks):
$ jcache notebook add tests/notebooks/basic_unrun.ipynb tests/notebooks/basic_failing.ipynb
Cache path: ../.jupyter_cache
The cache does not yet exist, do you want to create it? [y/N]: y
Adding: ../tests/notebooks/basic_unrun.ipynb
Adding: ../tests/notebooks/basic_failing.ipynb
Success!
These files are now ready for execution:
$ jcache notebook list
ID URI Reader Added Status
---- ----------------------------------- -------- ---------------- --------
1 tests/notebooks/basic_unrun.ipynb nbformat 2023-11-13 16:34 -
2 tests/notebooks/basic_failing.ipynb nbformat 2023-11-13 16:34 -
Now run the execution:
$ jcache project execute
Executing 2 notebook(s) in serial
Executing: ../tests/notebooks/basic_unrun.ipynb
0.00s - Debugger warning: It seems that frozen modules are being used, which may
0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
0.00s - to python to disable frozen modules.
0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
Execution Successful: ../tests/notebooks/basic_unrun.ipynb
Executing: ../tests/notebooks/basic_failing.ipynb
warning: Execution Excepted: ../tests/notebooks/basic_failing.ipynb
warning: CellExecutionError: An error occurred while executing the following cell:
warning: ------------------
warning: raise Exception('oopsie!')
warning: ------------------
warning:
warning:
warning: ---------------------------------------------------------------------------
warning: Exception Traceback (most recent call last)
warning: Cell In[1], line 1
warning: ----> 1 raise Exception('oopsie!')
warning:
warning: Exception: oopsie!
Finished! Successfully executed notebooks have been cached.
succeeded:
- ../tests/notebooks/basic_unrun.ipynb
excepted:
- ../tests/notebooks/basic_failing.ipynb
errored: []
Successfully executed files will now be associated with a record in the cache:
$ jcache notebook list
ID URI Reader Added Status
---- ----------------------------------- -------- ---------------- --------
1 tests/notebooks/basic_unrun.ipynb nbformat 2023-11-13 16:34 ✅ [1]
2 tests/notebooks/basic_failing.ipynb nbformat 2023-11-13 16:34 ❌
The cache record includes execution statistics:
$ jcache cache info 1
ID: 1
Origin URI: ../tests/notebooks/basic_unrun.ipynb
Created: 2023-11-13 16:34
Accessed: 2023-11-13 16:34
Hashkey: 94c17138f782c75df59e989fffa64e3a
Data:
execution_seconds: 1.1906915819999995
Next time we execute, jupyter-cache will check which files require re-execution:
$ jcache project execute
Executing 1 notebook(s) in serial
Executing: ../tests/notebooks/basic_failing.ipynb
warning: Execution Excepted: ../tests/notebooks/basic_failing.ipynb
warning: CellExecutionError: An error occurred while executing the following cell:
warning: ------------------
warning: raise Exception('oopsie!')
warning: ------------------
warning:
warning:
warning: ---------------------------------------------------------------------------
warning: Exception Traceback (most recent call last)
warning: Cell In[1], line 1
warning: ----> 1 raise Exception('oopsie!')
warning:
warning: Exception: oopsie!
Finished! Successfully executed notebooks have been cached.
succeeded: []
excepted:
- ../tests/notebooks/basic_failing.ipynb
errored: []
The source files themselves will not be modified during/after execution. You can create a new “final” notebook, with the cached outputs merged into the source notebook with:
$ jcache notebook merge 1 final_notebook.ipynb
Merged with cache PK 1
Success!
You can also add notebooks with custom formats, such as those read by jupytext:
$ jcache notebook add --reader jupytext tests/notebooks/basic.md
Adding: ../tests/notebooks/basic.md
Success!
$ jcache notebook list
ID URI Reader Added Status
---- ----------------------------------- -------- ---------------- --------
1 tests/notebooks/basic_unrun.ipynb nbformat 2023-11-13 16:34 ✅ [1]
2 tests/notebooks/basic_failing.ipynb nbformat 2023-11-13 16:34 ❌
3 tests/notebooks/basic.md jupytext 2023-11-13 16:34 ✅ [1]
Design considerations#
Although there are certainly other use cases, the principle use case this was written for is generating books / websites, created from multiple notebooks (and other text documents). It is desired that notebooks can be auto-executed only if the notebook had been modified in a way that may alter its code cell outputs.
Some desired requirements (not yet all implemented):
A clear and robust API
The cache is persistent on disk
Notebook comparisons separate out “edits to content” from “edits to code cells”. Cell rearranges and code cell changes should require a re-execution. Text content changes should not.
Allow parallel access to notebooks (for execution)
Store execution statistics/reports.
Store external assets: Notebooks being executed often require external assets: importing scripts/data/etc. These are prepared by the users.
Store execution artefacts: created during execution
A transparent and robust cache invalidation: imagine the user updating an external dependency or a Python module, or checking out a different git branch.