-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Hi all,
this touches on #16 #51 and other issues
as promised a small demo on how one could use both declaratively defined workflows + a docker swarm cluster to run workflows whose steps are each captured in different docker containers. This
https://github.com/lukasheinrich/yadage-binder/blob/master/example_three.ipynb
in the GIF, each of the yellow bubbles executes in its own container, in parallel if possible. All these containers and the container that the notebook runs in share a docker volume mounted at /workdir so that they can share at least a filesystem state. This keeps the execution itself isolated but allows steps to read the outputs of previous steps and take them as inputs.
let me explain the different parts:
this is a small workflow tool I wrote in order to be able to execute arbitrary DAGs of python callables in cases where the full DAG is not known upfront, but only develops with time. I keeps track of a graph and has a set of rules of when and how to extend the graph
this is the same concept but adds a declarative layer. In effect it defines a callable based on a JSON file like this one
https://github.com/lukasheinrich/yadage-workflows/blob/master/lhcb_talk/dataacquisition.yml
that defines a process with a couple of parameters complete with its environment and a procedure how to determine the result.
this is already helpful to use docker container's basically as black-box python callables like here:
https://github.com/lukasheinrich/yadage-binder/blob/master/example_two.ipynb
On top of these callables, there is also a way to define complete workflows in a declarative manner like here:
https://github.com/lukasheinrich/yadage-workflows/blob/master/lhcb_talk/simple_mapreduce.yml
https://github.com/lukasheinrich/yadage-binder/blob/master/example_four.ipynb (try changing the number of input datasets, but don't forget to clean up the workdir using the cell above)
which then can be executed by the notebook. As a result we get the full resulting DAG (complete with execution times) as well as PROV-like graph of "entities" and "activities".
- yadage-binder
just a small wrapper on top of yadage that install the ipython notebook.. it doesn't really work in binder as originally intended since I can't get binder to have writable VOLUMES. So currently you have to start it on carina instead like so
docker run -v /workdir -p 80:8888 -e YADAGE_WITHIN_DOCKER=true -e CARINA_USERNAME=$CARINA_USERNAME -e CARINA_APIKEY=$CARINA_APIKEY -e YADAGE_CLUSTER=yadage lukasheinrich/yadage-binder
where you pass your carina creadentials and the cluster name