Python API#

This page outlines how to utilise the cache programatically. We step throught the three aspects illustrated in the diagram below: cacheing, staging and executing.

../_images/execution_process.svg

Illustration of the execution process.#

Note

The full Jupyter notebook for this page can accessed here; api.ipynb. Try it for yourself!

Initialisation#

from pathlib import Path
import nbformat as nbf
from jupyter_cache import get_cache
from jupyter_cache.base import CacheBundleIn
from jupyter_cache.executors import load_executor, list_executors
from jupyter_cache.utils import (
    tabulate_cache_records, 
    tabulate_project_records
)

First we setup a cache and ensure that it is cleared.

Important

Clearing a cache wipes its entire content, including any settings (such as cache limit).

cache = get_cache(".jupyter_cache")
cache.clear_cache()
cache
JupyterCacheBase('/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/.jupyter_cache')
print(cache.list_cache_records())
print(cache.list_project_records())
[]
[]

Cacheing Notebooks#

To directly cache a notebook:

record = cache.cache_notebook_file(
    path=Path("example_nbs", "basic.ipynb")
)
record
NbCacheRecord(pk=1)

This will add a physical copy of the notebook to tha cache (stripped of any text cells) and return the record that has been added to the cache database.

Important

The returned record is static, as in it will not update if the database is updated.

The record stores metadata for the notebook:

record.to_dict()
{'description': '',
 'hashkey': '94c17138f782c75df59e989fffa64e3a',
 'created': datetime.datetime(2022, 1, 12, 15, 15, 27, 255299),
 'accessed': datetime.datetime(2022, 1, 12, 15, 15, 27, 255312),
 'data': {},
 'uri': 'example_nbs/basic.ipynb',
 'pk': 1}

Important

The URI that the notebook is read from is stored, but does not have an impact on later comparison of notebooks. They are only compared by their internal content.

We can retrive cache records by their Primary Key (pk):

cache.list_cache_records()
[NbCacheRecord(pk=1)]
cache.get_cache_record(1)
NbCacheRecord(pk=1)

To load the entire notebook that is related to a pk:

nb_bundle = cache.get_cache_bundle(1)
nb_bundle
CacheBundleOut(nb=Notebook(cells=1), record=NbCacheRecord(pk=1), artifacts=NbArtifacts(paths=0))
nb_bundle.nb
{'cells': [{'cell_type': 'code',
   'execution_count': 1,
   'metadata': {},
   'outputs': [{'name': 'stdout', 'output_type': 'stream', 'text': '1\n'}],
   'source': 'a=1\nprint(a)'}],
 'metadata': {'kernelspec': {'display_name': 'Python 3',
   'language': 'python',
   'name': 'python3'},
  'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
   'file_extension': '.py',
   'mimetype': 'text/x-python',
   'name': 'python',
   'nbconvert_exporter': 'python',
   'pygments_lexer': 'ipython3',
   'version': '3.6.1'},
  'test_name': 'notebook1'},
 'nbformat': 4,
 'nbformat_minor': 2}

Trying to add a notebook to the cache that matches an existing one will result in a error, since the cache ensures that all notebook hashes are unique:

record = cache.cache_notebook_file(
    path=Path("example_nbs", "basic.ipynb")
)
---------------------------------------------------------------------------
CachingError                              Traceback (most recent call last)
/var/folders/t2/xbl15_3n4tsb1vr_ccmmtmbr0000gn/T/ipykernel_99993/3576020660.py in <module>
----> 1 record = cache.cache_notebook_file(
      2     path=Path("example_nbs", "basic.ipynb")
      3 )

~/Documents/GitHub/jupyter-cache/jupyter_cache/cache/main.py in cache_notebook_file(self, path, uri, artifacts, data, check_validity, overwrite)
    268         """
    269         notebook = nbf.read(str(path), nbf.NO_CONVERT)
--> 270         return self.cache_notebook_bundle(
    271             CacheBundleIn(
    272                 notebook,

~/Documents/GitHub/jupyter-cache/jupyter_cache/cache/main.py in cache_notebook_bundle(self, bundle, check_validity, overwrite, description)
    213         if path.exists():
    214             if not overwrite:
--> 215                 raise CachingError(
    216                     "Notebook already exists in cache and overwrite=False."
    217                 )

CachingError: Notebook already exists in cache and overwrite=False.

If we load a notebook external to the cache, then we can try to match it to one stored inside the cache:

notebook = nbf.read(str(Path("example_nbs", "basic.ipynb")), 4)
notebook
{'cells': [{'cell_type': 'markdown',
   'metadata': {},
   'source': '# a title\n\nsome text\n'},
  {'cell_type': 'code',
   'execution_count': 1,
   'metadata': {},
   'source': 'a=1\nprint(a)',
   'outputs': [{'name': 'stdout', 'output_type': 'stream', 'text': '1\n'}]}],
 'metadata': {'test_name': 'notebook1',
  'kernelspec': {'display_name': 'Python 3',
   'language': 'python',
   'name': 'python3'},
  'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
   'file_extension': '.py',
   'mimetype': 'text/x-python',
   'name': 'python',
   'nbconvert_exporter': 'python',
   'pygments_lexer': 'ipython3',
   'version': '3.6.1'}},
 'nbformat': 4,
 'nbformat_minor': 2}
cache.match_cache_notebook(notebook)
NbCacheRecord(pk=1)

Notebooks are matched by a hash based only on aspects of the notebook that will affect its execution (and hence outputs). So changing text cells will match the cached notebook:

notebook.cells[0].source = "change some text"
cache.match_cache_notebook(notebook)
NbCacheRecord(pk=1)

But changing code cells will result in a different hash, and so will not be matched:

notebook.cells[1].source = "change some source code"
cache.match_cache_notebook(notebook)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/var/folders/t2/xbl15_3n4tsb1vr_ccmmtmbr0000gn/T/ipykernel_99993/941642554.py in <module>
----> 1 cache.match_cache_notebook(notebook)

~/Documents/GitHub/jupyter-cache/jupyter_cache/cache/main.py in match_cache_notebook(self, nb)
    333         """
    334         _, hashkey = self.create_hashed_notebook(nb)
--> 335         cache_record = NbCacheRecord.record_from_hashkey(hashkey, self.db)
    336         return cache_record
    337 

~/Documents/GitHub/jupyter-cache/jupyter_cache/cache/db.py in record_from_hashkey(hashkey, db)
    158             )
    159             if result is None:
--> 160                 raise KeyError(
    161                     "Cache record not found for NB with hashkey: {}".format(hashkey)
    162                 )

KeyError: 'Cache record not found for NB with hashkey: 07e6a47c8c180cb7851ede6dbb088769'

To understand the difference between an external notebook, and one stored in the cache, we can ‘diff’ them:

print(cache.diff_nbnode_with_cache(1, notebook, as_str=True))
nbdiff
--- cached pk=1
+++ other: 
## inserted before nb/cells/0:
+  code cell:
+    execution_count: 1
+    source:
+      change some source code
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          1

## deleted nb/cells/0:
-  code cell:
-    execution_count: 1
-    source:
-      a=1
-      print(a)
-    outputs:
-      output 0:
-        output_type: stream
-        name: stdout
-        text:
-          1


If we cache this altered notebook, note that this will not remove the previously cached notebook:

nb_bundle = CacheBundleIn(
    nb=notebook,
    uri=Path("example_nbs", "basic.ipynb"),
    data={"tag": "mytag"}
)
cache.cache_notebook_bundle(nb_bundle)
NbCacheRecord(pk=2)
print(tabulate_cache_records(
    cache.list_cache_records(), path_length=1, hashkeys=True
))
  ID  Origin URI    Created           Accessed          Hashkey
----  ------------  ----------------  ----------------  --------------------------------
   2  basic.ipynb   2022-01-12 15:16  2022-01-12 15:16  07e6a47c8c180cb7851ede6dbb088769
   1  basic.ipynb   2022-01-12 15:15  2022-01-12 15:16  94c17138f782c75df59e989fffa64e3a

Notebooks are retained in the cache, until the cache limit is reached, at which point the oldest notebooks are removed.

cache.get_cache_limit()
1000
cache.change_cache_limit(100)

Staging Notebooks for Execution#

Notebooks can be staged, by adding the path as a stage record.

Important

This does not physically add the notebook to the cache, merely store its URI, for later use.

record = cache.add_nb_to_project(Path("example_nbs", "basic.ipynb"))
record
NbProjectRecord(pk=1)
record.to_dict()
{'uri': '/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb',
 'assets': [],
 'created': datetime.datetime(2022, 1, 12, 15, 16, 27, 64960),
 'traceback': '',
 'read_data': {'name': 'nbformat', 'type': 'plugin'},
 'pk': 1}

If the staged notbook relates to one in the cache, we will be able to retrieve the cache record:

cache.get_cached_project_nb(1)
NbCacheRecord(pk=1)
print(tabulate_project_records(
    cache.list_project_records(), path_length=2, cache=cache
))
  ID  URI                      Reader    Added             Status
----  -----------------------  --------  ----------------  --------
   1  example_nbs/basic.ipynb  nbformat  2022-01-12 15:16  ✅ [1]

We can also retrieve a merged notebook. This is a copy of the source notebook with the following added to it from the cached notebook:

  • Selected notebook metadata keys (generally only those keys that affect its execution)

  • All code cells, with their outputs and metadata (only selected metadata can be merged if cell_meta is not None)

In this way we create a notebook that is fully up-to-date for both its code and textual content:

cache.merge_match_into_file(
    cache.get_project_record(1).uri,
    nb_meta=('kernelspec', 'language_info', 'widgets'),
    cell_meta=None
)
(1,
 {'cells': [{'cell_type': 'markdown',
    'metadata': {},
    'source': '# a title\n\nsome text\n'},
   {'cell_type': 'code',
    'execution_count': 1,
    'metadata': {},
    'outputs': [{'name': 'stdout', 'output_type': 'stream', 'text': '1\n'}],
    'source': 'a=1\nprint(a)'}],
  'metadata': {'test_name': 'notebook1',
   'kernelspec': {'display_name': 'Python 3',
    'language': 'python',
    'name': 'python3'},
   'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3},
    'file_extension': '.py',
    'mimetype': 'text/x-python',
    'name': 'python',
    'nbconvert_exporter': 'python',
    'pygments_lexer': 'ipython3',
    'version': '3.6.1'}},
  'nbformat': 4,
  'nbformat_minor': 2})

If we add a notebook that cannot be found in the cache, it will be listed for execution:

record = cache.add_nb_to_project(Path("example_nbs", "basic_failing.ipynb"))
record
NbProjectRecord(pk=2)
cache.get_cached_project_nb(2)  # returns None
cache.list_unexecuted()
[NbProjectRecord(pk=2)]
print(tabulate_project_records(
    cache.list_project_records(), path_length=2, cache=cache
))
  ID  URI                              Reader    Added             Status
----  -------------------------------  --------  ----------------  --------
   1  example_nbs/basic.ipynb          nbformat  2022-01-12 15:16  ✅ [1]
   2  example_nbs/basic_failing.ipynb  nbformat  2022-01-12 15:17  -

To remove a notebook from the staging area:

cache.remove_nb_from_project(1)
print(tabulate_project_records(
    cache.list_project_records(), path_length=2, cache=cache
))
  ID  URI                              Reader    Added             Status
----  -------------------------------  --------  ----------------  --------
   2  example_nbs/basic_failing.ipynb  nbformat  2022-01-12 15:17  -

Execution#

If we have some staged notebooks:

cache.clear_cache()
cache.add_nb_to_project(Path("example_nbs", "basic.ipynb"))
cache.add_nb_to_project(Path("example_nbs", "basic_failing.ipynb"))
NbProjectRecord(pk=2)
print(tabulate_project_records(
    cache.list_project_records(), path_length=2, cache=cache
))
  ID  URI                              Reader    Added             Status
----  -------------------------------  --------  ----------------  --------
   1  example_nbs/basic.ipynb          nbformat  2022-01-12 15:17  -
   2  example_nbs/basic_failing.ipynb  nbformat  2022-01-12 15:17  -

Then we can select an executor (specified as entry points) to execute the notebook.

Tip

To view the executors log, make sure logging is enabled, or you can parse a logger directly to load_executor().

list_executors()
{'local-parallel', 'local-serial', 'temp-parallel', 'temp-serial'}
from logging import basicConfig, INFO
basicConfig(level=INFO)

executor = load_executor("local-serial", cache=cache)
executor
JupyterExecutorLocalSerial(cache=JupyterCacheBase('/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/.jupyter_cache'))

Calling run_and_cache() will run all staged notebooks that do not already have matches in the cache. It will return a dictionary with lists for:

  • succeeded: The notebook was executed successfully with no (or only expected) exceptions

  • excepted: A notebook cell was encountered that raised an unexpected exception

  • errored: An exception occured before/after the actual notebook execution

Tip

Code cells can be tagged with raises-exception to let the executor known that a cell may raise an exception (see this issue on its behaviour).

Note

You can use the filter_uris and/or filter_pks options to only run selected staged notebooks. You can also specify the timeout for execution in seconds using the timeout option.

result = executor.run_and_cache()
result
INFO:jupyter_cache.executors.base:Executing 2 notebook(s) in serial
INFO:jupyter_cache.executors.base:Executing: /Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb
INFO:jupyter_cache.executors.base:Execution Successful: /Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb
INFO:jupyter_cache.executors.base:Executing: /Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic_failing.ipynb
WARNING:jupyter_cache.executors.base:Execution Excepted: /Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic_failing.ipynb
CellExecutionError: An error occurred while executing the following cell:
------------------
raise Exception('oopsie!')
------------------

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
/var/folders/t2/xbl15_3n4tsb1vr_ccmmtmbr0000gn/T/ipykernel_1308/340246212.py in <module>
----> 1 raise Exception('oopsie!')

Exception: oopsie!
Exception: oopsie!
ExecutorRunResult(succeeded=['/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb'], excepted=['/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic_failing.ipynb'], errored=[])

Successfully executed notebooks will be added to the cache, and data about their execution (such as time taken) will be stored in the cache record:

cache.list_cache_records()
[NbCacheRecord(pk=1)]
record = cache.get_cache_record(1)
record.to_dict()
{'description': '',
 'hashkey': '94c17138f782c75df59e989fffa64e3a',
 'created': datetime.datetime(2022, 1, 12, 15, 17, 45, 471862),
 'accessed': datetime.datetime(2022, 1, 12, 15, 17, 45, 471871),
 'data': {'execution_seconds': 1.8344826350000005},
 'uri': '/Users/chrisjsewell/Documents/GitHub/jupyter-cache/docs/using/example_nbs/basic.ipynb',
 'pk': 1}

Notebooks which failed to run will not be added to the cache, but details about their execution (including the exception traceback) will be added to the stage record:

record = cache.get_project_record(2)
print(record.traceback)
Traceback (most recent call last):
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/jupyter_cache/executors/utils.py", line 58, in single_nb_execution
    executenb(
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/client.py", line 1093, in execute
    return NotebookClient(nb=nb, resources=resources, km=km, **kwargs).execute()
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/util.py", line 84, in wrapped
    return just_run(coro(*args, **kwargs))
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/util.py", line 62, in just_run
    return loop.run_until_complete(coro)
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nest_asyncio.py", line 81, in run_until_complete
    return f.result()
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/asyncio/futures.py", line 178, in result
    raise self._exception
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/asyncio/tasks.py", line 280, in __step
    result = coro.send(None)
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/client.py", line 559, in async_execute
    await self.async_execute_cell(
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/client.py", line 854, in async_execute_cell
    self._check_raise_for_error(cell, exec_reply)
  File "/Users/chrisjsewell/Documents/GitHub/jupyter-cache/.tox/py38/lib/python3.8/site-packages/nbclient/client.py", line 756, in _check_raise_for_error
    raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)
nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:
------------------
raise Exception('oopsie!')
------------------

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
/var/folders/t2/xbl15_3n4tsb1vr_ccmmtmbr0000gn/T/ipykernel_1308/340246212.py in <module>
----> 1 raise Exception('oopsie!')

Exception: oopsie!
Exception: oopsie!

We now have two staged records, and one cache record:

print(tabulate_project_records(
    cache.list_project_records(), path_length=2, cache=cache
))
  ID  URI                              Reader    Added             Status
----  -------------------------------  --------  ----------------  --------
   1  example_nbs/basic.ipynb          nbformat  2022-01-12 15:17  ✅ [1]
   2  example_nbs/basic_failing.ipynb  nbformat  2022-01-12 15:17  ❌
print(tabulate_cache_records(
    cache.list_cache_records(), path_length=1, hashkeys=True
))
  ID  Origin URI    Created           Accessed          Hashkey
----  ------------  ----------------  ----------------  --------------------------------
   1  basic.ipynb   2022-01-12 15:17  2022-01-12 15:17  94c17138f782c75df59e989fffa64e3a

Timeout#

A timeout argument can also be passed to run_and_cache() which takes value in seconds. Alternatively, timeout can also be specified inside the notebook metadata:

'execution': {
   'timeout': 30
 }

Note

Timeout specified in notebook metadata will take precedence over the one passed as an argument to run_and_cache().