Skip to content

demo: notebook with attached cluster + declarative workflows #116

@lukasheinrich

Description

@lukasheinrich

Hi all,

this touches on #16 #51 and other issues

as promised a small demo on how one could use both declaratively defined workflows + a docker swarm cluster to run workflows whose steps are each captured in different docker containers. This

https://github.com/lukasheinrich/yadage-binder/blob/master/example_three.ipynb

in the GIF, each of the yellow bubbles executes in its own container, in parallel if possible. All these containers and the container that the notebook runs in share a docker volume mounted at /workdir so that they can share at least a filesystem state. This keeps the execution itself isolated but allows steps to read the outputs of previous steps and take them as inputs.

let me explain the different parts:

this is a small workflow tool I wrote in order to be able to execute arbitrary DAGs of python callables in cases where the full DAG is not known upfront, but only develops with time. I keeps track of a graph and has a set of rules of when and how to extend the graph

this is the same concept but adds a declarative layer. In effect it defines a callable based on a JSON file like this one

https://github.com/lukasheinrich/yadage-workflows/blob/master/lhcb_talk/dataacquisition.yml

that defines a process with a couple of parameters complete with its environment and a procedure how to determine the result.

this is already helpful to use docker container's basically as black-box python callables like here:

https://github.com/lukasheinrich/yadage-binder/blob/master/example_two.ipynb

On top of these callables, there is also a way to define complete workflows in a declarative manner like here:

https://github.com/lukasheinrich/yadage-workflows/blob/master/lhcb_talk/simple_mapreduce.yml

https://github.com/lukasheinrich/yadage-binder/blob/master/example_four.ipynb (try changing the number of input datasets, but don't forget to clean up the workdir using the cell above)

which then can be executed by the notebook. As a result we get the full resulting DAG (complete with execution times) as well as PROV-like graph of "entities" and "activities".

  • yadage-binder

just a small wrapper on top of yadage that install the ipython notebook.. it doesn't really work in binder as originally intended since I can't get binder to have writable VOLUMES. So currently you have to start it on carina instead like so

docker run -v /workdir -p 80:8888 -e YADAGE_WITHIN_DOCKER=true -e CARINA_USERNAME=$CARINA_USERNAME -e CARINA_APIKEY=$CARINA_APIKEY -e YADAGE_CLUSTER=yadage lukasheinrich/yadage-binder

where you pass your carina creadentials and the cluster name

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions