Python API#
This page outlines how to utilise the cache programatically. We step throught the three aspects illustrated in the diagram below: cacheing, staging and executing.
Note
The full Jupyter notebook for this page can accessed here; api.ipynb
.
Try it for yourself!
Initialisation#
from pathlib import Path
import nbformat as nbf
from jupyter_cache import get_cache
from jupyter_cache.base import CacheBundleIn
from jupyter_cache.executors import load_executor, list_executors
from jupyter_cache.utils import (
tabulate_cache_records,
tabulate_project_records
)
First we setup a cache and ensure that it is cleared.
Important
Clearing a cache wipes its entire content, including any settings (such as cache limit).
cache = get_cache(".jupyter_cache")
cache.clear_cache()
cache
JupyterCacheBase('/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/.jupyter_cache')
print(cache.list_cache_records())
print(cache.list_project_records())
[]
[]
Cacheing Notebooks#
To directly cache a notebook:
record = cache.cache_notebook_file(
path=Path("example_nbs", "basic.ipynb")
)
record
NbCacheRecord(pk=1)
This will add a physical copy of the notebook to tha cache (stripped of any text cells) and return the record that has been added to the cache database.
Important
The returned record is static, as in it will not update if the database is updated.
The record stores metadata for the notebook:
record.to_dict()
{'description': '',
'hashkey': '94c17138f782c75df59e989fffa64e3a',
'created': datetime.datetime(2022, 1, 12, 15, 15, 27, 255299),
'accessed': datetime.datetime(2022, 1, 12, 15, 15, 27, 255312),
'data': {},
'uri': 'example_nbs/basic.ipynb',
'pk': 1}
Important
The URI that the notebook is read from is stored, but does not have an impact on later comparison of notebooks. They are only compared by their internal content.
We can retrive cache records by their Primary Key (pk):
cache.list_cache_records()
[NbCacheRecord(pk=1)]
cache.get_cache_record(1)
NbCacheRecord(pk=1)
To load the entire notebook that is related to a pk:
nb_bundle = cache.get_cache_bundle(1)
nb_bundle
CacheBundleOut(nb=Notebook(cells=1), record=NbCacheRecord(pk=1), artifacts=NbArtifacts(paths=0))
nb_bundle.nb
{'cells': [{'cell_type': 'code',
'execution_count': 1,
'metadata': {},
'outputs': [{'name': 'stdout', 'output_type': 'stream', 'text': '1\n'}],
'source': 'a=1\nprint(a)'}],
'metadata': {'kernelspec': {'display_name': 'Python 3',
'language': 'python',
'name': 'python3'},
'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
'file_extension': '.py',
'mimetype': 'text/x-python',
'name': 'python',
'nbconvert_exporter': 'python',
'pygments_lexer': 'ipython3',
'version': '3.6.1'},
'test_name': 'notebook1'},
'nbformat': 4,
'nbformat_minor': 2}
Trying to add a notebook to the cache that matches an existing one will result in a error, since the cache ensures that all notebook hashes are unique:
record = cache.cache_notebook_file(
path=Path("example_nbs", "basic.ipynb")
)
---------------------------------------------------------------------------
CachingError Traceback (most recent call last)
/var/folders/t2/xbl15_3n4tsb1vr_ccmmtmbr0000gn/T/ipykernel_99993/3576020660.py in <module>
----> 1 record = cache.cache_notebook_file(
2 path=Path("example_nbs", "basic.ipynb")
3 )
~/Documents/GitHub/jupyter-cache/jupyter_cache/cache/main.py in cache_notebook_file(self, path, uri, artifacts, data, check_validity, overwrite)
268 """
269 notebook = nbf.read(str(path), nbf.NO_CONVERT)
--> 270 return self.cache_notebook_bundle(
271 CacheBundleIn(
272 notebook,
~/Documents/GitHub/jupyter-cache/jupyter_cache/cache/main.py in cache_notebook_bundle(self, bundle, check_validity, overwrite, description)
213 if path.exists():
214 if not overwrite:
--> 215 raise CachingError(
216 "Notebook already exists in cache and overwrite=False."
217 )
CachingError: Notebook already exists in cache and overwrite=False.
If we load a notebook external to the cache, then we can try to match it to one stored inside the cache:
notebook = nbf.read(str(Path("example_nbs", "basic.ipynb")), 4)
notebook
{'cells': [{'cell_type': 'markdown',
'metadata': {},
'source': '# a title\n\nsome text\n'},
{'cell_type': 'code',
'execution_count': 1,
'metadata': {},
'source': 'a=1\nprint(a)',
'outputs': [{'name': 'stdout', 'output_type': 'stream', 'text': '1\n'}]}],
'metadata': {'test_name': 'notebook1',
'kernelspec': {'display_name': 'Python 3',
'language': 'python',
'name': 'python3'},
'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
'file_extension': '.py',
'mimetype': 'text/x-python',
'name': 'python',
'nbconvert_exporter': 'python',
'pygments_lexer': 'ipython3',
'version': '3.6.1'}},
'nbformat': 4,
'nbformat_minor': 2}
cache.match_cache_notebook(notebook)
NbCacheRecord(pk=1)
Notebooks are matched by a hash based only on aspects of the notebook that will affect its execution (and hence outputs). So changing text cells will match the cached notebook:
notebook.cells[0].source = "change some text"
cache.match_cache_notebook(notebook)
NbCacheRecord(pk=1)
But changing code cells will result in a different hash, and so will not be matched:
notebook.cells[1].source = "change some source code"
cache.match_cache_notebook(notebook)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/var/folders/t2/xbl15_3n4tsb1vr_ccmmtmbr0000gn/T/ipykernel_99993/941642554.py in <module>
----> 1 cache.match_cache_notebook(notebook)
~/Documents/GitHub/jupyter-cache/jupyter_cache/cache/main.py in match_cache_notebook(self, nb)
333 """
334 _, hashkey = self.create_hashed_notebook(nb)
--> 335 cache_record = NbCacheRecord.record_from_hashkey(hashkey, self.db)
336 return cache_record
337
~/Documents/GitHub/jupyter-cache/jupyter_cache/cache/db.py in record_from_hashkey(hashkey, db)
158 )
159 if result is None:
--> 160 raise KeyError(
161 "Cache record not found for NB with hashkey: {}".format(hashkey)
162 )
KeyError: 'Cache record not found for NB with hashkey: 07e6a47c8c180cb7851ede6dbb088769'
To understand the difference between an external notebook, and one stored in the cache, we can ‘diff’ them:
print(cache.diff_nbnode_with_cache(1, notebook, as_str=True))
nbdiff
--- cached pk=1
+++ other:
## inserted before nb/cells/0:
+ code cell:
+ execution_count: 1
+ source:
+ change some source code
+ outputs:
+ output 0:
+ output_type: stream
+ name: stdout
+ text:
+ 1
## deleted nb/cells/0:
- code cell:
- execution_count: 1
- source:
- a=1
- print(a)
- outputs:
- output 0:
- output_type: stream
- name: stdout
- text:
- 1
If we cache this altered notebook, note that this will not remove the previously cached notebook:
nb_bundle = CacheBundleIn(
nb=notebook,
uri=Path("example_nbs", "basic.ipynb"),
data={"tag": "mytag"}
)
cache.cache_notebook_bundle(nb_bundle)
NbCacheRecord(pk=2)
print(tabulate_cache_records(
cache.list_cache_records(), path_length=1, hashkeys=True
))
ID Origin URI Created Accessed Hashkey
---- ------------ ---------------- ---------------- --------------------------------
2 basic.ipynb 2022-01-12 15:16 2022-01-12 15:16 07e6a47c8c180cb7851ede6dbb088769
1 basic.ipynb 2022-01-12 15:15 2022-01-12 15:16 94c17138f782c75df59e989fffa64e3a
Notebooks are retained in the cache, until the cache limit is reached, at which point the oldest notebooks are removed.
cache.get_cache_limit()
1000
cache.change_cache_limit(100)
Staging Notebooks for Execution#
Notebooks can be staged, by adding the path as a stage record.
Important
This does not physically add the notebook to the cache, merely store its URI, for later use.
record = cache.add_nb_to_project(Path("example_nbs", "basic.ipynb"))
record
NbProjectRecord(pk=1)
record.to_dict()
{'uri': '/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb',
'assets': [],
'created': datetime.datetime(2022, 1, 12, 15, 16, 27, 64960),
'traceback': '',
'read_data': {'name': 'nbformat', 'type': 'plugin'},
'pk': 1}
If the staged notbook relates to one in the cache, we will be able to retrieve the cache record:
cache.get_cached_project_nb(1)
NbCacheRecord(pk=1)
print(tabulate_project_records(
cache.list_project_records(), path_length=2, cache=cache
))
ID URI Reader Added Status
---- ----------------------- -------- ---------------- --------
1 example_nbs/basic.ipynb nbformat 2022-01-12 15:16 ✅ [1]
We can also retrieve a merged notebook. This is a copy of the source notebook with the following added to it from the cached notebook:
Selected notebook metadata keys (generally only those keys that affect its execution)
All code cells, with their outputs and metadata (only selected metadata can be merged if
cell_meta
is notNone
)
In this way we create a notebook that is fully up-to-date for both its code and textual content:
cache.merge_match_into_file(
cache.get_project_record(1).uri,
nb_meta=('kernelspec', 'language_info', 'widgets'),
cell_meta=None
)
(1,
{'cells': [{'cell_type': 'markdown',
'metadata': {},
'source': '# a title\n\nsome text\n'},
{'cell_type': 'code',
'execution_count': 1,
'metadata': {},
'outputs': [{'name': 'stdout', 'output_type': 'stream', 'text': '1\n'}],
'source': 'a=1\nprint(a)'}],
'metadata': {'test_name': 'notebook1',
'kernelspec': {'display_name': 'Python 3',
'language': 'python',
'name': 'python3'},
'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
'file_extension': '.py',
'mimetype': 'text/x-python',
'name': 'python',
'nbconvert_exporter': 'python',
'pygments_lexer': 'ipython3',
'version': '3.6.1'}},
'nbformat': 4,
'nbformat_minor': 2})
If we add a notebook that cannot be found in the cache, it will be listed for execution:
record = cache.add_nb_to_project(Path("example_nbs", "basic_failing.ipynb"))
record
NbProjectRecord(pk=2)
cache.get_cached_project_nb(2) # returns None
cache.list_unexecuted()
[NbProjectRecord(pk=2)]
print(tabulate_project_records(
cache.list_project_records(), path_length=2, cache=cache
))
ID URI Reader Added Status
---- ------------------------------- -------- ---------------- --------
1 example_nbs/basic.ipynb nbformat 2022-01-12 15:16 ✅ [1]
2 example_nbs/basic_failing.ipynb nbformat 2022-01-12 15:17 -
To remove a notebook from the staging area:
cache.remove_nb_from_project(1)
print(tabulate_project_records(
cache.list_project_records(), path_length=2, cache=cache
))
ID URI Reader Added Status
---- ------------------------------- -------- ---------------- --------
2 example_nbs/basic_failing.ipynb nbformat 2022-01-12 15:17 -
Execution#
If we have some staged notebooks:
cache.clear_cache()
cache.add_nb_to_project(Path("example_nbs", "basic.ipynb"))
cache.add_nb_to_project(Path("example_nbs", "basic_failing.ipynb"))
NbProjectRecord(pk=2)
print(tabulate_project_records(
cache.list_project_records(), path_length=2, cache=cache
))
ID URI Reader Added Status
---- ------------------------------- -------- ---------------- --------
1 example_nbs/basic.ipynb nbformat 2022-01-12 15:17 -
2 example_nbs/basic_failing.ipynb nbformat 2022-01-12 15:17 -
Then we can select an executor (specified as entry points) to execute the notebook.
Tip
To view the executors log, make sure logging is enabled,
or you can parse a logger directly to load_executor()
.
list_executors()
{'local-parallel', 'local-serial', 'temp-parallel', 'temp-serial'}
from logging import basicConfig, INFO
basicConfig(level=INFO)
executor = load_executor("local-serial", cache=cache)
executor
JupyterExecutorLocalSerial(cache=JupyterCacheBase('/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/.jupyter_cache'))
Calling run_and_cache()
will run all staged notebooks that do not already have matches in the cache.
It will return a dictionary with lists for:
succeeded: The notebook was executed successfully with no (or only expected) exceptions
excepted: A notebook cell was encountered that raised an unexpected exception
errored: An exception occured before/after the actual notebook execution
Tip
Code cells can be tagged with raises-exception
to let the executor known that
a cell may raise an exception (see this issue on its behaviour).
Note
You can use the filter_uris
and/or filter_pks
options to only run selected staged notebooks.
You can also specify the timeout for execution in seconds using the timeout
option.
result = executor.run_and_cache()
result
INFO:jupyter_cache.executors.base:Executing 2 notebook(s) in serial
INFO:jupyter_cache.executors.base:Executing: /Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb
INFO:jupyter_cache.executors.base:Execution Successful: /Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb
INFO:jupyter_cache.executors.base:Executing: /Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic_failing.ipynb
WARNING:jupyter_cache.executors.base:Execution Excepted: /Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic_failing.ipynb
CellExecutionError: An error occurred while executing the following cell:
------------------
raise Exception('oopsie!')
------------------
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
/var/folders/t2/xbl15_3n4tsb1vr_ccmmtmbr0000gn/T/ipykernel_1308/340246212.py in <module>
----> 1 raise Exception('oopsie!')
Exception: oopsie!
Exception: oopsie!
ExecutorRunResult(succeeded=['/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb'], excepted=['/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic_failing.ipynb'], errored=[])
Successfully executed notebooks will be added to the cache, and data about their execution (such as time taken) will be stored in the cache record:
cache.list_cache_records()
[NbCacheRecord(pk=1)]
record = cache.get_cache_record(1)
record.to_dict()
{'description': '',
'hashkey': '94c17138f782c75df59e989fffa64e3a',
'created': datetime.datetime(2022, 1, 12, 15, 17, 45, 471862),
'accessed': datetime.datetime(2022, 1, 12, 15, 17, 45, 471871),
'data': {'execution_seconds': 1.8344826350000005},
'uri': '/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb',
'pk': 1}
Notebooks which failed to run will not be added to the cache, but details about their execution (including the exception traceback) will be added to the stage record:
record = cache.get_project_record(2)
print(record.traceback)
Traceback (most recent call last):
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/jupyter_cache/executors/utils.py", line 58, in single_nb_execution
executenb(
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/client.py", line 1093, in execute
return NotebookClient(nb=nb, resources=resources, km=km, **kwargs).execute()
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/util.py", line 84, in wrapped
return just_run(coro(*args, **kwargs))
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/util.py", line 62, in just_run
return loop.run_until_complete(coro)
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nest_asyncio.py", line 81, in run_until_complete
return f.result()
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/asyncio/futures.py", line 178, in result
raise self._exception
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/asyncio/tasks.py", line 280, in __step
result = coro.send(None)
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/client.py", line 559, in async_execute
await self.async_execute_cell(
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/client.py", line 854, in async_execute_cell
self._check_raise_for_error(cell, exec_reply)
File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/client.py", line 756, in _check_raise_for_error
raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
raise Exception('oopsie!')
------------------
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
/var/folders/t2/xbl15_3n4tsb1vr_ccmmtmbr0000gn/T/ipykernel_1308/340246212.py in <module>
----> 1 raise Exception('oopsie!')
Exception: oopsie!
Exception: oopsie!
We now have two staged records, and one cache record:
print(tabulate_project_records(
cache.list_project_records(), path_length=2, cache=cache
))
ID URI Reader Added Status
---- ------------------------------- -------- ---------------- --------
1 example_nbs/basic.ipynb nbformat 2022-01-12 15:17 ✅ [1]
2 example_nbs/basic_failing.ipynb nbformat 2022-01-12 15:17 ❌
print(tabulate_cache_records(
cache.list_cache_records(), path_length=1, hashkeys=True
))
ID Origin URI Created Accessed Hashkey
---- ------------ ---------------- ---------------- --------------------------------
1 basic.ipynb 2022-01-12 15:17 2022-01-12 15:17 94c17138f782c75df59e989fffa64e3a
Timeout#
A timeout argument can also be passed to run_and_cache()
which takes value in seconds.
Alternatively, timeout can also be specified inside the notebook metadata:
'execution': {
'timeout': 30
}
Note
Timeout specified in notebook metadata will take precedence over the one passed as an argument to run_and_cache()
.