demo: notebook with attached cluster + declarative workflows

Hi all,

this touches on #16 #51 and other issues

as promised a small demo on how one could use both declaratively defined workflows + a docker swarm cluster to run workflows whose steps are each captured in different docker containers. This 

https://github.com/lukasheinrich/yadage-binder/blob/master/example_three.ipynb

in the GIF, each of the yellow bubbles executes in its own container, in parallel if possible. All these containers and the container that the notebook runs in share a docker volume mounted at `/workdir` so that they can share at least a filesystem state. This keeps the execution itself isolated but allows steps to read the outputs of previous steps and take them as inputs.

let me explain the different parts:
- adage:
  https://github.com/lukasheinrich/adage

this is a small workflow tool I wrote in order to be able to execute arbitrary DAGs of python callables in cases where the full DAG is not known upfront, but only develops with time. I keeps track of a graph and has a set of rules of when and how to extend the graph
- yadage
  https://github.com/lukasheinrich/yadage

this is the same concept but adds a declarative layer. In effect it defines a callable based on a JSON file like this one 

https://github.com/lukasheinrich/yadage-workflows/blob/master/lhcb_talk/dataacquisition.yml

that defines a process with a couple of parameters complete with its environment and a procedure how to determine the result.

this is already helpful to use docker container's basically as black-box python callables like here:

https://github.com/lukasheinrich/yadage-binder/blob/master/example_two.ipynb

On top of these callables, there is also a way to define complete workflows in a declarative manner like here:

https://github.com/lukasheinrich/yadage-workflows/blob/master/lhcb_talk/simple_mapreduce.yml

https://github.com/lukasheinrich/yadage-binder/blob/master/example_four.ipynb (try changing the number of input datasets, but don't forget to clean up the workdir using the cell above)

which then can be executed by the notebook. As a result we get the full resulting DAG (complete with execution times) as well as PROV-like graph of "entities" and "activities". 
- yadage-binder 

just a small wrapper on top of yadage that install the ipython notebook.. it doesn't really work in binder as originally intended since I can't get binder to have writable VOLUMES. So currently you have to start it on carina instead like so

```
docker run -v /workdir -p 80:8888 -e YADAGE_WITHIN_DOCKER=true -e CARINA_USERNAME=$CARINA_USERNAME -e CARINA_APIKEY=$CARINA_APIKEY -e YADAGE_CLUSTER=yadage lukasheinrich/yadage-binder
```

where you pass your carina creadentials and the cluster name


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo: notebook with attached cluster + declarative workflows #116

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

demo: notebook with attached cluster + declarative workflows #116

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions