Brane: The User Guide

Brane logo

drawing This book is still in development. Everything is currently subject to change.

Welcome to the user guide for the Brane infrastructure!

In this book, we document and outline how to use Brane, a programmable application orchestration framework, from a user perspective.

If you want to know more about Brane before you begin, checkout the overview chapter. Otherwise, we recommend you to read the Before you read chapter. It explains how this book is structured, and also goes through some need-to-know terminology.

Attribution

The icons used in this book (info, warning, source) are provided by Flaticon.

Overview

In this chapter, we will provide a brief overview of what the framework is, how it is build and what kind of features it supports.

It is not, however, a complete, technical description of its implementation; for that, we recommend you read our other book.

Brane: Programmable Orchestration of Applications and Networking

Regardless of the context and rationale, running distributed applications on geographically dispersed IT resources often comes with various technical and organizational challenges. If not addressed appropriately, these challenges may impede development, and in turn, scientific and business innovation. We have designed and developed Brane to support implementers in addressing these challenges. Brane makes use of containerization to encapsulate functionalities as portable building blocks. Through programmability, application orchestration can be expressed using intuitive domain-specific languages. As a result, end-users with limited or no programming experience are empowered to compose applications by themselves, without having to deal with the underlying technical details.

In context of the EPI project, Brane is extended to orchestrate data distribution as well. Because the project concerns itself with health data, this orchestration does not just include distributing the data, but also policing access and making sure that applications adhere to both global and local data access policies. The same applies to the network orchestration of Brane; here, too, we have to make sure that secure and policy-appliant networking between different sites is possible and automated by Brane.

The framework in a nutshell

Concretely, the Brane framework is primarily designed to take an application in the form of a workflow and perform the work specified in it over multiple nodes spread over multiple domains, to which we refer as compute sites. This basic idea is shown in figure 1.

An image showing the abstraction the Brane framework provides over multiple compute sites Figure 1: Schematic showing the abstraction Brane provides over multiple domains / compute sites. The framework orchestrates over multiple sites, where each sites orchestrates over its own nodes. Together, this allows the user to utilize the work of all compute sites together as if they were one.

An important design feature of Brane is that it tries to be intuitive in use for different roles that users have when developing a workflow. We identify three: system engineers, who build and manage the compute sites; software engineers, who implement compute steps or algorithms; and scientists, who use the algorithms to write workflows that implement their research.

Since Brane's intergration into the EPI project, there is also a fourth role: that of policy makers, who define and write the policies that are related to data handling.

Finally, there is also a fifth, "hidden" role: the Brane administrators, who manage the framework itself.

This separation of concerns means that the framework provides different levels of abstraction to interact with it, where each of these levels are designed to be familiar to the users who will use it.

For system engineers, the framework hosts a number of tools and configuration files that allow them to setup and specify their infrastructures; software engineers can write software in any language they like, and then package that using a script-like domain-specific language (BraneScript); policy makers can define the policies in an already existing reasoner language called eFLINT; and for scientists, the framework provides a natural language-like domain-specific language to write the workflows (Bakery), so their work is easily shareable with scientists who do not have extensive Brane knowledge.

Before you read

As discussed in the overview chapter, the Brane framework is aimed at different kind of users, each of them with their own role within the framework. This split of the framework in terms of roles is referred to as the separation of concerns.

The four roles that this book focusses on are:

  • system engineers, who are in charge of one of the compute sites that Brane abstracts over. They have to prepare their site for the framework and discuss its integration with the infrastructure managers.
  • software engineers, who write the majority of the software used in the Brane framework. This software will be distributed in the form of packages.
  • policy makers, who define and write the policies that are relevant to the framework. These are both data-level policies, which describe who can access what data and how; and network-level policies, which describe where the data can be send on the infrastructure and what kind of security measures are needed when that happens.
  • scientists, who orchestrate different packages into a workflow, which eventually implements the computation needed for their research.

To this end, the book itself is split into four groups of chapters, one for each of the roles in the separation of concerns.

Terminology

Before you can begin, there is also some extra terminology that will be used throughout this book and that is useful to know here.

The Brane instance

Looked at from the highest level of abstraction, Brane has a client part (in the form of a command-line tool or a Jupyter notebook) and a server part. The first is referred to as a Brane client, while the latter is referred to as the Brane instance.

Where next

To continue reading, we suggest you start at the first chapter for your role. You can select it in the sidebar to the left.

If you are part of the fifth, "hidden" role (the Brane administrators), you have your own book; we recommend you continue there. It also details how to obtain, compile and run the framework for testing purposes.

Introduction

In this series of chapters, we will discuss the role of system administrators and how they may prepare their system for Brane. The chapters will talk about what the requirements are on their system and what kind of information they are expected to share with the Brane instance. Finally, we will also discuss defining datasets.

To know more about the inner workings of Brane, we recommend you checkout the Brane: A Specification book. That details the framework's inner workings.

Background & Terminology

The Brane instance defines a control node (or central node), which is where the orchestrator itself and associated services run. This node is run by the Brane administrators. Then, as a counterpart to this control node, there is the worker plane, which is composed of all the different compute sites that Brane orchestrates over. Each such compute site is referred to as a domain, a location or, since Brane treats them as a single entity, a worker node. Multiple worker nodes may exist per physical domain (e.g., a single hospital can have multiple domains for different tasks), but Brane will treat these as conceptually different places.

Within the framework, a system administrator is someone who acts as the 'technical owner' of a certain worker node. They are the ones who can make sure their system is prepared and meets the Brane requirements, and who defines the security requirements of any operation of the framework on their system. They are also the ones who make any data technically available that is published from their domain. And although policies are typically handled by policy writers, another role in the framework, in practise, this can be the same person as the system administrator.

The Central node

For every Brane instance, there is typically only one control node. Even if multiple VMs are used, the framework expects it to behave like a single node; this is due to the centralized nature of it.

The control node consists of the following few services:

  • The driver service is, as the name suggests, the driving service behing a control node. It takes incoming workflows submitted by scientists, and starts executing them, emitting jobs that need to be executed on the worker nodes.
  • The planner service takes incoming workflows submitted to the driver service and plans them. This is simply the act of defining which worker node will execute which task, and takes into account available resources on each of the domains, as well as policies that determine if a domain can actually transfer data or execute the job.
  • The registry service (sometimes called central registry service or API service for disambiguation) is the centralized version of the local registry services (see below). It acts as a centralized database for the framework, which provides information about which dataset is located where, which domains are participating and where to find them1, and in addition hosts a central package repository.
  • Finally, the proxy service acts as a gateway between the other services and the outside world to enable proxying (i.e., it does not accept proxied requests, but rather creates them). In addition, it is also the point that handles server certificates and parses client certificates for identifications.

For more details, check the specification.

Note that, if you need any compute to happen on the central node, this cannot be done through the central node itself; instead, setup a worker node alongside the central node to emulate the same behaviour.

The Worker node

As specified, a domain typically hosts a worker node. This worker node collectively describes both a local control part of the framework, referred to as the framework delegate, and some computing backend that actually executes the jobs. In this section, we provide a brief overview of both.

The delegate itself consists of a few services. Their exact working is detailled in the specification, but as a brief overview:

  • The delegate service is the main service on the delegate, and takes incoming job requests and will attempt to schedule them. This is also the service that directly connects to the compute backend (see below). You can think of it as a local driver service.
  • The registry service (sometimes called local registry service for disambiguation) keeps track of the locally available datasets and intermediate results (see the data tutorial for Software Engineers or the data tutorial for Scientists for more information) and acts as a point from where the rest of the framework downloads them.
  • The checker service acts as the Policy Enforcement Point (PEP) for the framework. It hosts a reasoner, typically eFLINT, and is queried by both the delegate and registry services to see if operations are allowed.
  • Finally, the local node also has proxy service, just like the central node.

As for the compute backend, Brane is designed to connect to different types. An overview:

  • A local backend schedules new jobs on the same Docker engine where the control plane of Brane runs. This is the simplest infrastructure of them all, and requires no other preparation than required when installing the control plane. This is typically the choice of backend when the worker node is running on a single server or VM.
  • A VM backend uses an SSH connection (via the Xenon middleware) to launch jobs on the Docker engine of another server or VM. This is typically useful for simple setups that still emphasise a split between a local control plane and a local compute plane, but don't have extensive clusters to connect to.
  • A Kubernetes backend connects to a Kubernetes cluster on which incoming jobs are hosted. This is the recommended option if you need larger compute power, since Kubernetes is designed to work with containers.
  • A Slurm backend connects to a Slurm cluster on which incoming jobs are hosted. This infrastructure type may be harder to setup, as Slurm does not have any builtin container support. However, when setup properly, it can be used to connect to existing large-scale compute clusters to execute Brane jobs on.

More information on each backend and how to set it up is discussed in the backends chapter(s).

Next

To start setting up your own worker node, we recommend checking out the installation chapters. These will walk you through everything you need to setup a node, both control nodes and worker nodes.

For information on setting up different backends, check the backend chapters.

Alternatively, if you are looking for extensive documentation on the Brane configuration files relevant to a worker node, checkout the documentation chapters.

Introduction

In these chapters, we will walk you through installing a node in the Brane instance.

There are two types of nodes: a central node (or control node), and a worker node (see the generic introduction chapter for more information). These series of chapters will discuss how to install both of them.

First, for any kind of node, you should start by downloading the dependencies on the VM where your worker node will run. Then, install the branectl executable, which will help you in setting up and managing your node.

You can then go into the specifics for each kind of node. You can either setup a control node, worker node or a proxy node.

Dependencies

The first step to install any piece of software is to install its dependencies.

The next section will discuss the runtime dependencies. If you plan to compile the framework instead of downloading the prebuilt executables, you must install both the dependencies in the Runtime dependencies- and Compilation dependencies sections.

Runtime dependencies

In all Brane node types, the Brane services are implemented as containers, which means that the number of runtime dependencies is relatively few.

However, the following dependencies are required:

  1. You have to install Docker to run the container services. To install, follow one of the following links: Ubuntu, Debian, Arch Linux or macOS (note the difference between Ubuntu and Debian; they use different keys and repositories).
    • If you are running Docker on Linux, it is extremely convenient to set it up such that no root is required:

      sudo usermod -aG docker "$USER"
      

      warning Don't forget to log in and -out again after running the above command to make the new changes effective.

      warning This effectively gives power to all non-root users that are part of the docker-group to modify any file as if they had root access. Be careful who you include in this group.

  2. Install the BuildKit plugin for Docker:
    # Clone the repo, CD into it and install the plugin
    # NOTE: You will need to install 'make'
    # (check https://github.com/docker/buildx for alternative methods if that fails)
    git clone https://github.com/docker/buildx.git && cd buildx
    make install
    
    # Set the plugin as the default builder
    docker buildx install
    
    # Switch to the buildx driver
    docker buildx create --use
    
    If these instructions don't work for you, you can also check the plugin's repository README for more installation methods.

    info Docker Buildx is included by default in most distributions of Docker noawadays. You can just run the docker buildx install and docker buildx create --use functions first, and if they work, skip the top ones.

  3. Install OpenSSL for the branectl executable:
    • Ubuntu / Debian:
      sudo apt-get install openssl
      
    • Arch Linux:
      sudo pacman -Syu openssl
      
    • macOS:
      # We assume you installed Homebrew (https://brew.sh/)
      brew install openssl
      

Aside from that, you have to make sure that your system can run executables compiled against GLIBC 2.27 or higher. You can verify this by running:

ldd --version

The top line of the rest will show you the GLIBC version installed on your machine:

The top line of the result of running 'ldd --version'

If you do not meet this requirement, you will have to compile branectl (and any other non-containerized binaries) yourself on a machine with that version of GLIBC installed or lower. In that case, also install the compilation dependencies.

Compilation dependencies

If you want to compile the framework yourself, you have to install additional dependencies to do so.

There are two parts you may want to compile yourself as a system administrator: branectl, the tool for managing a node, and the service images themselves.

For the latter, two modes are available:

  • In release mode, you will compile the framework directly in the containers that will be using it. This is the recommended method in most cases.
  • In debug or development mode, you will compile the framework with debug symbols, additional debug prints and outside of a container which may optimize builds if you're running Docker virtualized (e.g., on macOS). Additionally, it also statically links GLIBC (using the musl toolchain) so the resulting binaries more portable. This method should only be preferred if you are actively developing the framework.

We will describe the dependencies for compiling branectl, compiling the services in release mode and compiling them in debug mode in the following subsections.

branectl

To compile branectl, we will be depending on Rust's compiler rustc. Additionally, some of Brane's dependencies require some additional packages to be installed too.

Ubuntu / Debian

  1. Install Rust and its tools using rustup:
    # Same command as suggested by Rustup itself
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
    Or, if you want a version that installs the default setup and that does not ask for confirmation:
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --profile default -y
    
  2. Install the packages required by Brane's dependencies with apt:
    sudo apt-get update && sudo apt-get install -y \
        gcc g++ \
        cmake \
        pkg-config libssl-dev
    

Arch Linux

  1. Install Rust and its tools using rustup:
    # Same command as suggested by Rustup itself
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
    Or, if you want a version that installs the default setup and that does not ask for confirmation:
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --profile default -y
    
  2. Install the packages from pacman required by Brane's dependencies
    sudo pacman -Syu \
        gcc g++ \
        cmake \
        pkg-config
    
    • Note that the source for OpenSSL is already provided by openssl (which is a runtime dependency)

macOS

  1. Install Rust and its tools using rustup:
    # Same command as suggested by Rustup itself
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    
    Or, if you want a version that installs the default setup and that does not ask for confirmation:
    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- --profile default -y
    
  2. Install the Xcode Command-Line Tools, since that contains most of the compilers and other tools we need
    # Follow the prompt after this to install it (might take a while)
    xcode-select --install
    
  3. Install other packages using Homebrew:
    • If you have not installed Homebrew yet:
      # As suggested on their own website
      /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
      
    • Install the packages
      brew install \
          pkg-config \
          openssl
      

info The main compilation script, make.py, is tested for Python 3.7 and higher. If you have an older Python version, you may have to upgrade it first. We recommend using some virtualized environment such as pyenv to avoid breaking your OS or other projects.

The services (release mode)

No dependencies are required to build the services in release mode other than the runtime dependencies. This is because the containers in which the services are built already contain all of the dependencies.

The services (debug mode)

Debug mode is the most work to install, because it relies on statically linking GLIBC using the musl-toolchain.

warning Before you consider installing in debug mode, be aware that the resulting images will be very large (due to the debug symbols and the statically linked GLIBC). Moreover, the build cache kept in between builds is also huge. Make sure you have enough space on your machine available (~10GB) before continuing, and regularly clean the cache yourself to avoid it growing boundlessly.

Note that most of these dependencies overlap with the dependencies for compiling branectl, so you should first install all the dependencies there. Then, extend upon those by doing the following:

Ubuntu / Debian

  1. Install the musl toolchain:
    sudo apt-get install -y musl-tools
    
  2. Add shortcuts to GNU tools that emulate missing musl tools (well enough)
    # You can place these shortcuts anywhere in your PATH
    sudo ln -s /bin/g++ /usr/local/bin/musl-g++
    sudo ln -s /usr/bin/ar /usr/local/bin/musl-ar
    
  3. Add the musl target for Rust:
    rustup target add x86_64-unknown-linux-musl
    

Arch Linux

  1. Install the musl toolchain:
    sudo pacman -Syu musl
    
  2. Add shortcuts to GNU tools that emulate missing musl tools (well enough)
    # You can place these shortcuts anywhere in your PATH
    sudo ln -s /bin/g++ /usr/local/bin/musl-g++
    sudo ln -s /usr/bin/ar /usr/local/bin/musl-ar
    
  3. Add the musl target for Rust:
    rustup target add x86_64-unknown-linux-musl
    

macOS (TODO untested)

  1. Install the musl toolchain:
    brew install filosottile/musl-cross/musl-cross
    
  2. Add shortcuts to GNU tools that emulate missing musl tools (well enough)
    # You can place these shortcuts anywhere in your PATH
    sudo ln -s /bin/g++ /usr/local/bin/musl-g++
    sudo ln -s /usr/bin/ar /usr/local/bin/musl-ar
    
  3. Add the musl target for Rust:
    rustup target add x86_64-unknown-linux-musl
    

Next

Congratulations, you have prepared your machine for running (or compiling) a Brane instance! In the next chapter, we will discuss installing the invaluable node management tool branectl. After that, depending on which node you want to setup, you can follow the guide for installing control nodes or worker nodes.

branectl

Your best friend for managing a Brane node is the Brane server Command-Line Tool, or branectl (do not confuse with the user tool, brane or the Brane CLI).

This chapter concerns itself with installing branectl itself. Make sure that you have followed the previous chapter to install the necessary dependencies before you begin.

Precompiled binary

In most cases, it's the easiest to download the precompiled binary from the GitHub repository.

To download it, you can simply go to the repository (https://github.com/epi-project/brane) and navigate to 'tags'. From there, you can select your desired release and choose it from among the list. Alternatively, you can also go to the latest release by clicking this link: https://github.com/epi-project/brane/releases/latest.

info Note that branectl was only introduced in version 1.0.0, so any version before that will not have a downloadable branectl executable (or any compatible one, for that matter).

Once downloaded, it's highly recommended to move the executable to a location in your PATH (for example, /usr/local/bin). You can do so by running:

sudo mv ./branectl /usr/local/bin/branectl

if you are in the folder where you downloaded the tool.

Alternatively, you can also download the latest version using curl from the command-line:

# For Linux (x86-64)
sudo curl -Lo /usr/local/bin/branectl https://github.com/epi-project/brane/releases/latest/download/branectl-linux-x86_64

# For macOS (Intel)
sudo curl -Lo /usr/local/bin/branectl https://github.com/epi-project/brane/releases/latest/download/branectl-darwin-x86_64

# For macOS (M1/M2)
sudo curl -Lo /usr/local/bin/branectl https://github.com/epi-project/brane/releases/latest/download/branectl-darwin-aarch64

Don't forget to make the executable runnable:

sudo chmod +x /usr/local/bin/branectl

Compile it yourself

Sometimes, though, the executable provided on the repository doesn't suit your needs. This is typically the case if you need a cutting-edge version that isn't released, you have an uncommon OS or processor architecture or an incompatible GLIBC version.

In that case, it's necessary to compile the branectl executable yourself.

To do so, first make sure that you have installed the compilation dependencies of branectl as discussed in the previous chapter.

Then, you can clone the repository to obtain the source code:

# Will clone to './brane'
git clone https://github.com/epi-project/brane

Navigate to the source directory, and then use the make.py script to compile branectl:

# Replace './brane' with some other path if needed
cd ./brane
./make.py ctl

The make.py script will handle the rest.

You can also compile the ctl in development mode (i.e., with added debug statements and symbols) by appending the --dev flag:

./make.py ctl --dev

Finally, you can also compile the binary for another architecture:

# To compile for M1 macs on a Linux machine, for example
./make.py ctl --os macOS --arch aarch64

Note, however, the additional dependencies if you do so.

Next

If you can now run branectl --help without errors, congratulations! You have successfully installed the management tool for the Brane instance.

You can now choose what kind of node to install. To install a central node, go to the next chapter; go to the chapter after that to install a worker node; or go the the final chapter to setup a proxy node.

Control node

Before you follow the steps in this chapter, we assume you have installed the required dependencies and installed branectl, as discussed in the previous two chapters.

If you did, then you are ready to install the control node. This chapter will explain you how to do that.

Obtaining images

Just as with branectl itself, there are two ways of obtaining the Docker images and related resources: downloading them from the repository or compiling them. Note, however, that multiple files should be downloaded; and to aid with this, the branectl executable can be used to automate the downloading process for you.

info In the future, a third option might be to download the standard images from DockerHub. However, due to the experimental nature of the framework, the images are not yet published. Instead, rely on branectl to make the process easy for you.

Downloading prebuilt images

The recommended way to download the Brane images is to use branectl. These will download the images to .tar files, which can be send around at your leisure; and, if you will be deploying the framework on a device where internet is limited or restricted, you can also use it to download Brane's auxillary images (ScyllaDB).

Run the following command to download the Brane services themselves:

# Download the images
branectl download services central -f

And to download the auxillary images (run in addition to the previous command):

branectl download services auxillary -f

(the -f will automatically create missing directories for the target output path)

Once these complete successfully, you should have the images for the control node in the directory target/release. While this path may be changed, it is recommended to stick to the default to make the commands in subsequent sections easier.

info By default, branectl will download the version for which it was compiled. However, you can change this with the --version option:

# You should change this on all download commands
branectl download services central --version 1.0.0

Note, however, that not every Brane version may have the same services or the same method of downloading, and so this option may fail. Download the branectl for the desired version instead for a more reliable experience.

Compiling the images

The other way to obtain the images is to compile them yourself.

Make sure that you have installed the additional compilation dependencies before continuing (and make sure you match the mode you choose below).

There are two modes of compilation:

  • In release mode, you will compile the framework directly in the containers that will be using it. This is the recommended method in most cases.
  • In debug or development mode, you will compile the framework with debug symbols, additional debug prints and outside of a container which optimizes repeated recompilation. Additionally, it also statically links GLIBC so the resulting binaries are very portable. This method should only be preferred if you are actively developing the framework.

warning Before you consider installing in debug mode, be aware that the resulting images will be very large (due to the debug symbols and the statically linked GLIBC). Moreover, the build cache kept in between builds is also huge. Make sure you have enough space on your machine available (~10GB) before continuing, and regularly clean the cache yourself to avoid it growing boundlessly.

Regardless of which one you choose, though, clone the repository first:

# Will clone to './brane'
git clone https://github.com/epi-project/brane

Navigate to the source directory, and then use the make.py script to compile branectl:

# Run the compilation in release mode
cd ./brane && ./make.py instance

# Run the compilation in debug mode (note the '--dev')
cd ./brane && ./make.py instance --dev

The make.py script will handle the rest, compiling the Docker images to the target/release directory for release mode, and target/debug for the debug mode.

Generating configuration

Once you have downloaded the images, it is time to setup the configuration files for the node. These files determine the type of node, as well as any of the node's properties and network specifications.

For a control node, this means generating the following files:

  • An infrastructure file (infra.yml), which will determine the worker nodes available in the instance;
  • A proxy file (proxy.yml), which describes if any proxying should occur and how; and
  • A node file (node.yml), which will contain the node-specific configuration like service names, ports, file locations, etc.

All of these can be generated with branectl for convenience.

First, we generate the infra.yml file. This can be done using the following command:

branectl generate infra <ID>:<ADDR> ...

Here, multiple <ID>:<ADDR> pairs can be given, one per worker node that is available to the instance. In such a pair, the <ID> is the location ID of that domain (which must be the same as indicated in that node; see the chapter for setting up worker nodes), and the <ADDR> is the address (IP or hostname) where that domain is available.

For example, suppose that we want to instantiate a central node for a Brane instance with two worker nodes: one called amy, at amy-worker-node.com, and one called bob, at 1.2.3.4. We would generate an infra.yml as follows:

branectl generate infra -f -p ./config/infra.yml amy:amy-worker-node.com bob:1.2.3.4

Running this command will generate the file ./config/infra.yml for you, with default settings for each domain. If you want to change these, you can simply use more options and flags in the tool itself (see the branectl documentation or the builtin branectl generate infra --help), or change the file manually (see the infra.yml documentation).

info While the -f flag (fix missing directories) and the -p option (path of generated file) are not required, you will typically use these to make your life easier down the road. See the branectl generate node command below to find out why.

Next, we will generate the proxy.yml file. Typically, this configuration can be left to the default settings, and so the following command will do the trick in most situations:

branectl generate proxy -f -p ./config/proxy.yml

A proxy.yml file should be available in ./config/proxy.yml after running this command.

The contents of this file will typically only differ if you have advanced networking requirements. If so, consult the branectl documentation or the builtin branectl generate proxy --help, or the proxy.yml documentation.

info This file may be skipped if you are setting up an external proxy node for this node. See the chapter on proxy nodes for more information.

Then we will generate the final file, the node.yml file. This file is done last, because it itself defines where the BRANE software may find any of the other configuration files.

When generating this file, it is possible to manually specify where to find each of those files. However, in practise, it is more convenient to make sure that the files are at the default locations that the tools expects. The following tree structure displays the default locations for the configuration of a central node:

<current dir>
├ config
│ ├ certs
│ │ └ <domain certs>
│ ├ infra.yml
│ └ proxy.yml
â”” node.yml

The config/certs directory will be used to store the certificates for each of the domains; we will do that in the following section.

Assuming that you have the files stored as above, the following command can be used to create a node.yml for a central node:

branectl generate node -f central <HOSTNAME>

Here, <HOSTNAME> is the address where any worker node may reach the central node. Only the hostname will suffice (e.g., some-domain.com), but any scheme or path you supply will be automatically stripped away.

The -f flag will make sure that any of the missing directories (e.g., config/certs) will be generated automatically.

Once again, you can change many of the properties in the node.yml file by specifying additional command-line options (see the branectl documentation or the builtin branectl generate node --help) or by changing the file manually (see the node.yml documentation).

warning Due to a bug in one of the framework's dependencies, it cannot handle certificates on IP addresses. To workaround this issue, the -H option is provided; it can be used to specify a certain hostname/IP mapping for this node only. Example:

# We can address '1.2.3.4' with 'bob-domain' now
branectl generate node -f -H bob-domain:1.2.3.4 central central-domain.com

Note that this is local to this domain only; you have to specify this on other nodes as well. For more information, see the node.yml documentation.

info Since the above is highly localized, it can be abused to do node-specific routing, by assigning the same hostname to different IPs on different machines. Definitely entering "hacky" territory here, though...

Adding certificates

Before the framework can be fully used, the central node will need the public certificates of the worker nodes to be able to verify their identity during connection. Since we assume Brane may be running in a decentralized and shielded environment, the easiest is to add the domain's certificates to the config/certs directory.

To do so, obtain the public certificate of each of the workers in your instance. Then, navigate to the config/certs directory (or wherever you pointed it to in node.yml), and do the following for each certificate:

  1. Create a directory with that domain's name (for the example above, you would create a directory named amy for that domain)
  2. Move the certificate to that folder and call it ca.pem.

At runtime, the Brane services will look for the peer domain's identity by looking up the folder with their name in it. Thus, make sure that every worker in your system has a name that you filesystem can represent.

Launching the instance

Finally, now that you have the images and the configuration files, it's time to start the instance.

We assume that you have installed your images to target/release. If you have built your images in development mode, however, they will be in target/debug; see the box below for the command then.

This can be done with one branectl command:

branectl start central

This will launch the services in the local Docker daemon, which completes the setup!

info The command above assumes default locations for the images (./target/release) and for the node.yml file (./node.yml). If you use non-default locations, however, you can use the following flags:

  • Use -n or --node to specify another location for the node.yml file:
    branectl -n <PATH TO NODE YML> start central
    
    It will define the rest of the configuration locations.
  • If you have installed all images to another folder than ./target/release (e.g., ./target/debug), you can use the quick option --image-dir to change the folders. Specifically:
    branectl start --image-dir "./target/debug" central
    
  • If you want to use pre-downloaded image for the auxillary services (aux-scylla) that are in the same folder as the one indicated by --image-dir, you can specify --local-aux to use the folder version instead:
    branectl start central --local-aux
    
  • You can also specify the location of each image individually. To see how, refer to the branectl documentation or the builtin branectl start --help.

warning Note that the Scylla database this command launches might need a minute to come online, even though its container already reports ready. Thus, before you can use your instance, wait until docker ps shows all Brane containers running (in particular the brane-api service will crash until the Scylla service is done). You can use watch docker ps if you don't want to re-call the command yourself.

Next

Congratulations, you have configured and setup a Brane control node!

Depending on which domains you are in charge of, you may also have to setup one or more worker nodes or proxy nodes. Note, though, that these are written to be used on their own, so parts of it overlap with this chapter.

Otherwise, you can move on to other work! If you want to test your instance like a normal user, you can go to the documentation for Software Engineers or Scientists.

Worker node

Before you follow the steps in this chapter, we assume you have installed the required dependencies and installed branectl, as discussed in the previous two chapters.

If you did, then you are ready to install a worker node. This chapter will explain you how to do that.

Obtaining images

Just as with branectl itself, there are two ways of obtaining the Docker images: downloading them from the repository or compiling them. Note, however, that multiple files should be downloaded; and to aid with this, the branectl executable can be used to automate the downloading process for you.

info In the future, a third option might be to download the standard images from DockerHub. However, due to the experimental nature of the framework, the images are not yet published. Instead, rely on branectl to make the process easy for you.

Downloading prebuilt images

The recommended way to download the Brane images is to use branectl. These will download the images to .tar files, which can be send around at your leisure.

Run the following command to download the Brane service images for a worker node:

# Download the images
branectl download services worker -f

(the -f will automatically create missing directories for the target output path)

Once these complete successfully, you should have the images for the worker node in the directory target/release. While this path may be changed, it is recommended to stick to the default to make the commands in subsequent sections easier.

info By default, branectl will download the version for which it was compiled. However, you can change this with the --version option:

branectl download services worker -f --version 1.0.0

Note, however, that not every Brane version may have the same services or the same method of downloading, and so this option may fail. Download the branectl for the desired version instead for a more reliable experience.

Compiling the images

The other way to obtain the images is to compile them yourself.

Make sure that you have installed the additional compilation dependencies before continuing (and make sure you match the mode you choose below).

There are two modes of compilation:

  • In release mode, you will compile the framework directly in the containers that will be using it. This is the recommended method in most cases.
  • In debug or development mode, you will compile the framework with debug symbols, additional debug prints and outside of a container which optimizes repeated recompilation. Additionally, it also statically links GLIBC so the resulting binaries are very portable. This method should only be preferred if you are actively developing the framework.

warning Before you consider installing in debug mode, be aware that the resulting images will be very large (due to the debug symbols and the statically linked GLIBC). Moreover, the build cache kept in between builds is also huge. Make sure you have enough space on your machine available (~10GB) before continuing, and regularly clean the cache yourself to avoid it growing boundlessly.

Regardless of which one you choose, though, clone the repository first:

# Will clone to './brane'
git clone https://github.com/epi-project/brane

Navigate to the source directory, and then use the make.py script to compile branectl:

# Run the compilation in release mode
cd ./brane && ./make.py worker-instance

# Run the compilation in debug mode (note the '--dev')
cd ./brane && ./make.py worker-instance --dev

The make.py script will handle the rest, compiling the Docker images to the target/release directory for release mode, and target/debug for the debug mode.

Generating configuration

Once you have downloaded the images, it is time to setup the configuration files for the node. These files determine the type of node, as well as any of the node's properties and network specifications.

For a worker node, this means generating the following files:

  • A backend file (backend.yml), which will define how the worker node connects to which backend that will actually execute the tasks;
  • A proxy file (proxy.yml), which describes if any proxying should occur and how;
  • A policy secret for the deliberation API (policy_deliberation_secret.json), which contains the private key for accessing the Brane-side of brane-chk;
  • A policy secret for the policy expert API (policy_expert_secret.json), which contains the private key for accessing the management-side of brane-chk;
  • A policy database (polocies.db), which is the persistent storage for brane-chk's policies; and
  • A node file (node.yml), which will contain the node-specific configuration like service names, ports, file locations, etc.

All of these can be generated with branectl for convenience.

We will first generate a backend.yml file. This will define how the worker node can connect to the infrastructure that will actually execute incoming containers. Multiple backend types are possible (see the series of chapters on it), but by default, the configuration assumes that work will be executed on the local machine's Docker daemon.

Thus, to generate such a backend.yml file, you can use the following command:

branectl generate backend -f -p ./config/backend.yml local

Running this command will generate the file ./config/backend.yml for you, with default settings for how to connect to the local daemon. If you want to change these, you can simply use more options and flags in the tool itself (see the branectl documentation or the builtin branectl generate backend --help), or change the file manually (see the backend.yml documentation).

info While the -f flag (--fix-dirs, fix missing directories) and the -p option (--path, path of generated file) are not required, you will typically use these to make your life easier down the road. See the branectl generate node command below to find out why.

Next up is the proxy.yml file. Typically, these can be left to the default settings, and so the following command will do the trick in most situations:

branectl generate proxy -f -p ./config/proxy.yml

A proxy.yml file should be available in ./config/proxy.yml after running this command.

The contents of this file will typically only differ if you have advanced networking requirements. If so, consult the branectl documentation or the builtin branectl generate proxy --help, or the proxy.yml documentation.

info This file may be skipped if you are setting up an external proxy node for this node. See the chapter on proxy nodes for more information.

Next, we will generate the policy keys. To do so, run the following two commands:

branectl generate policy_secret -f -p ./config/policy_deliberation_secret.json
branectl generate policy_secret -f -p ./config/policy_expect_secret.json

The default settings should suffice. If not, check branectl generate policy_secret --help for more information.

Then, we will generate the policy database. This is not a configuration file, but does need to be bootstrapped and explicitly passed to the node's brane-chk service. To generate it, run:

branectl generate policy_db -f -p ./policies.db

Finally, we will generate the node.yml file. This file is done last, because it itself defines where BRANE software may find any of the others.

When generating this file, it is possible to manually specify where to find each of those files. However, in practise, it is more convenient to make sure that the files are at the default locations that the tools expects. The following tree structure displays the default locations for the configuration of a worker node:

<current dir>
├ config
│ ├ certs
│ │ └ <domain certs>
│ ├ backend.yml
│ ├ policy_deliberation_secret.yml
│ ├ policy_expert_secret.yml
│ └ proxy.yml
├ policies.db
â”” node.yml

The config/certs directory will be used to store the certificates for this worker node and any node it wants to download data from. We will do that in the following section.

Assuming that you have the other configuration files stored at their default locations, the following command can be used to create a node.yml for a worker node:

branectl generate node -f worker <HOSTNAME> <LOCATION_ID>

Here, the <HOSTNAME> is the address where any worker node may reach the central node. Only the hostname will suffice (e.g., some-domain.com), but any scheme or path you supply will be automatically stripped away. Then, the <LOCATION_ID> is the identifier that the system will use for your location. Accordingly, it must be unique in the instance, and you must choose the same one as defined in the central node of the instance.

The -f flag will make sure that any of the missing directories (e.g., config/certs) will be generated automatically.

For example, we can generate a node.yml file for a worker with the identifier bob:

branectl generate node -f worker 1.2.3.4 bob

Once again, you can change many of the properties in the node.yml file by specifying additional command-line options (see the branectl documentation or the builtin branectl generate node --help) or by changing the file manually (see the node.yml documentation).

warning Due to a bug in one of the framework's dependencies, it cannot handle certificates on IP addresses. To workaround this issue, the -H option is provided; it can be used to specify a certain hostname/IP mapping for this node only. Example:

# We can address '1.2.3.4' with 'some-domain' now
branectl generate node -f -H some-domain:1.2.3.4 worker bob-domain.com bob

Note that this is local to this domain only; you have to specify this on other nodes as well. For more information, see the node.yml documentation.

Generating certificates

In contrast to setting up a control node, a worker node will have to strongly identify itself to prove to other worker nodes who it is. This is relevant, because worker nodes may want to download data from one another; and if this dataset is private, then the other domains likely won't share it unless they know who they are talking to.

In Brane, the identity of domains is proven by the use of X.509 certificates. Thus, before you can start your worker node, we will have to generate some certificates.

Server-side certificates

Every worker node is required to have at least a certificate authority (CA) certificate and a server certificate. The first is used as the "authority" of the domain, which is used to sign other certificates such that the worker can see that it has been signed by itself in the past. The latter, in contrast, is used to provide the identity of the worker in case it plays the role of a server (some other domain connects to us and requests a dataset).

Once again, we can use the power of branectl to generate both of these certificates for us. Use the following command to generate both a certificate autority and server certificate:

branectl generate certs -f -p ./config/certs server <LOCATION_ID> -H <HOSTNAME>

where <LOCATION_ID> is the identifier of the worker node (the one configured in the node.yml file), and <HOSTNAME> is the hostname that other domains can connect to this domain to.

You can omit the -H <HOSTNAME> flag to default the hostname to be the same as the <LOCATION_ID>. This is useful where you've given manual host mappings when generating the node.yml file (i.e., the -H option there).

For example, to generate certificates for the domain amy that lives at amy-worker-node.com:

branectl generate certs -f -p ./config/certs server amy -H amy-worker-node.com

This should generate multiple files in the ./config/certs directory, chief of which are ca.pem and server.pem.

info Certificate generation is done using cfssl, which is dynamically downloaded by branectl. The checksum of the downloaded file is asserted, and if you ever see a checksum-related error, then you might be dealing with a fake binary that is being downloaded under a real address. In that case, tread with care.

When the certificates are generated, be sure to share ca.pem with the central node. If you are also adminstrating that node, see here for instructions on what to do with it.

Client-side certificates

The previous certificates only authenticate a server to a client; not the other way around. That is where the client certificates come into play.

The power of client certificates come from the fact that they are signed using the certificate authority of the domain to which they want to authenticate. In other words, the domain has to "approve" that a certain user exists by creating a certificate for them, and then sending it over.

Note, however, that currently, Brane does not use any hostnames or IPs embedded in the client certificate. This means that anyone with the client certificate can obtain access to the domain as if they were the user for which it was issued. Treat the certificates with care, and be sure that the client is also careful with the certificate.

info If a certificate is leaked or compromised, don't worry; the certificate only proves the identity of a user. What kind of rights that user has can be separately determined (see the chapter series for policy experts), and so you can simply withdraw any rights that user has when it happens.

To generate a client certificate, its easiest to navigate to the ./config/certs directory where you generate the server certificates. Then, you can run:

branectl generate certs client <LOCATION_ID> -H <HOSTNAME> -f -p ./client-certs

Note, that the <LOCATION_ID> is now the ID of the worker for which you are generating the certificate, and <HOSTNAME> is their address. Similarly to server certificates, you can omit -H <HOSTNAME> to default to the <LOCATION_ID>.

warning Note the -f and -p options. These are optional, and work together to redirect the output of the commands to a nested folder called client-certs. This is however very recommendable, since running this command without that flag in the server certificates folder will accidentally clear the ca.pem file, rendering the rest of the certificates useless.

For example, contuining the example in the previous subsection, we now generate a client certificate for bob at bobs-emporium.com:

branectl generate certs client bob -H 1.2.3.4

Once the client certificates are generated, you can share the ca.pem and client-id.pem files with the client who intends to connect to this node.

Adding client certificates of other domains

If your worker node needs to download data from other worker nodes, you will have to add the client certificates they generated to your configuration.

The procedure to do so is identical as for central nodes. For every pair of a ca.pem and client-id.pem certificates you want to:

  1. Create a directory with that domain's name in the certs directory (for the example, you would create a directory named certs/amy for a domain named amy)
  2. Move the certificates to that folder.

At runtime, whenever your worker node will need to download a dataset from another worker, it will read the certificates in that worker's folder if they exist to authenticate itself.

Writing policies

Before you launch the instance, you may want to change the node's policy. If not, then the default policy kicks in; which is deny all.

You can learn how to do so by assuming the role of a policy expert and learning how to manage policy. You can skip most of the installation, except perhaps for some practical test environments.

Launching the instance

Finally, now that you have the images and the configuration files, it's time to start the instance.

We assume that you have installed your images to target/release. If you have built your images in development mode, however, they will be in target/debug; see the box below for the command then.

This can be done with one branectl command:

branectl start worker

This will launch the services in the local Docker daemon, which completes the setup!

info The command above assumes default locations for the images (./target/release) and for the node.yml file (./node.yml). If you use non-default locations, however, you can use the following flags:

  • Use -n or --node to specify another location for the node.yml file:
    branectl -n <PATH TO NODE YML> start worker
    
    It will define the rest of the configuration locations.
  • If you have installed all images to another folder than ./target/release (e.g., ./target/debug), you can use the quick option --image-dir to change the folders. Specifically:
    branectl start --image-dir "./target/debug" worker
    
  • You can also specify the location of each image individually. To see how, refer to the branectl documentation or the builtin branectl start --help.

Next

Congratulations, you have configured and setup a Brane worker node!

If you are in charge of more worker nodes, you can repeat the steps in this chapter to add more. If you are also charged with setting up a control node, you can check the previous chapter for control-node specific instructions.

Alternatively, you can also see if a proxy node is something for your use-case in the next chapter.

Otherwise, you can move on to other work! If you want to test your node like a normal user, you can go to the documentation for Software Engineers or Scientists.

Proxy node

Before you follow the steps in this chapter, we assume you have installed the required dependencies and installed branectl, as discussed in the previous two chapters.

If you did, then you are ready to install a proxy node. This chapter will explain you how to do that.

Obtaining images

Just as with branectl itself, there are two ways of obtaining the Docker images: downloading them from the repository or compiling them. Note, however, that multiple files should be downloaded; and to aid with this, the branectl executable can be used to automate the downloading process for you.

info In the future, a third option might be to download the standard images from DockerHub. However, due to the experimental nature of the framework, the images are not yet published. Instead, rely on branectl to make the process easy for you.

Downloading prebuilt images

The recommended way to download the Brane images is to use branectl. These will download the images to .tar files, which can be send around at your leisure.

Run the following command to download the Brane service images for a worker node:

# Download the images
branectl download services proxy -f

(the -f will automatically create missing directories for the target output path)

Once these complete successfully, you should have the images for the worker node in the directory target/release. While this path may be changed, it is recommended to stick to the default to make the commands in subsequent sections easier.

info By default, branectl will download the version for which it was compiled. However, you can change this with the --version option:

branectl download services proxy -f --version 1.0.0

Note, however, that not every Brane version may have the same services or the same method of downloading, and so this option may fail. Download the branectl for the desired version instead for a more reliable experience.

Compiling the images

The other way to obtain the images is to compile them yourself.

Make sure that you have installed the additional compilation dependencies before continuing (and make sure you match the mode you choose below).

There are two modes of compilation:

  • In release mode, you will compile the framework directly in the containers that will be using it. This is the recommended method in most cases.
  • In debug or development mode, you will compile the framework with debug symbols, additional debug prints and outside of a container which optimizes repeated recompilation. Additionally, it also statically links GLIBC so the resulting binaries are very portable. This method should only be preferred if you are actively developing the framework.

warning Before you consider installing in debug mode, be aware that the resulting images will be very large (due to the debug symbols and the statically linked GLIBC). Moreover, the build cache kept in between builds is also huge. Make sure you have enough space on your machine available (~10GB) before continuing, and regularly clean the cache yourself to avoid it growing boundlessly.

Regardless of which one you choose, though, clone the repository first:

# Will clone to './brane'
git clone https://github.com/epi-project/brane

Navigate to the source directory, and then use the make.py script to compile branectl:

# Run the compilation in release mode
cd ./brane && ./make.py proxy-instance

# Run the compilation in debug mode (note the '--dev')
cd ./brane && ./make.py proxy-instance --dev

The make.py script will handle the rest, compiling the Docker images to the target/release directory for release mode, and target/debug for the debug mode.

Generating configuration

Once you have downloaded the images, it is time to setup the configuration files for the node. These files determine the type of node, as well as any of the node's properties and network specifications.

For a worker node, this means generating the following files:

  • A proxy file (proxy.yml), which describes if any proxying should occur and how; and
  • A node file (node.yml), which will contain the node-specific configuration like service names, ports, file locations, etc.

All of these can be generated with branectl for convenience.

We first generate the proxy.yml file. Typically, these can be left to the default settings, and so the following command will do the trick in most situations:

branectl generate proxy -f -p ./config/proxy.yml

A proxy.yml file should be available in ./config/proxy.yml after running this command.

The contents of this file will typically only differ if you have advanced networking requirements. If so, consult the branectl documentation or the builtin branectl generate proxy --help, or the proxy.yml documentation.

info While the -f flag (--fix-dirs, fix missing directories) and the -p option (--path, path of generated file) are not required, you will typically use these to make your life easier down the road. See the branectl generate node command below to find out why.

Then we will generate the node.yml file. This file is done last, because it itself defines where Brane software may find any of the others.

When generating this file, it is possible to manually specify where to find each of those files. However, in practise, it is more convenient to make sure that the files are at the default locations that the tools expects. The following tree structure displays the default locations for the configuration of a proxy node:

<current dir>
├ config
│ ├ certs
│ │ └ <domain certs>
│ └ proxy.yml
â”” node.yml

The config/certs directory will be used to store the certificates for this proxy node and any node it wants to download data from. We will do that in the following section.

Assuming that you have the other configuration files stored at their default locations, the following command can be used to create a node.yml for a proxy node:

branectl generate node -f proxy <HOSTNAME>

Here, the <HOSTNAME> is the address where any other node may reach the proxy node. Only the hostname will suffice (e.g., some-domain.com), but any scheme or path you supply will be automatically stripped away.

The -f flag will make sure that any of the missing directories (e.g., config/certs) will be generated automatically.

For example, we can generate a node.yml file for a proxy found at 1.2.3.4:

branectl generate node -f proxy 1.2.3.4

Once again, you can change many of the properties in the node.yml file by specifying additional command-line options (see the branectl documentation or the builtin branectl generate node --help) or by changing the file manually (see the node.yml documentation).

warning Due to a bug in one of the framework's dependencies, it cannot handle certificates on IP addresses. To workaround this issue, the -H option is provided; it can be used to specify a certain hostname/IP mapping for this node only. Example:

# We can address '1.2.3.4' with 'some-domain' now
branectl generate node -f -H some-domain:1.2.3.4 proxy bob-domain.com

Note that this is local to this domain only; you have to specify this on other nodes as well. For more information, see the node.yml documentation.

Generating certificates

In contrast to setting up a control node, a proxy node will have to strongly identify itself to prove to other nodes who it is. This is relevant, because worker nodes may want to download data from one another through their proxy nodes; and if this dataset is private, then the other domains likely won't share it unless they know who they are talking to.

In Brane, the identity of domains is proven by the use of X.509 certificates. Thus, before you can start your proxy node, we will have to generate some certificates.

Server-side certificates

Every proxy node is required to have at least a certificate authority (CA) certificate and a server certificate. The first is used as the "authority" of the domain, which is used to sign other certificates such that the proxy can see that it has been signed by itself in the past. The latter, in contrast, is used to provide the identity of the proxy in case it plays the role of a server (some other domain connects to us and requests a dataset).

Once again, we can use the power of branectl to generate both of these certificates for us. Use the following command to generate both a certificate autority and server certificate:

branectl generate certs -f -p ./config/certs server <LOCATION_ID> -H <HOSTNAME>

where <LOCATION_ID> is the identifier of the proxy node (the one configured in the node.yml file), and <HOSTNAME> is the hostname that other domains can connect to this domain to.

You can omit the -H <HOSTNAME> flag to default the hostname to be the same as the <LOCATION_ID>. This is useful where you've given manual host mappings when generating the node.yml file (i.e., the -H option there).

For example, to generate certificates for the domain amy that lives at amy-proxy-node.com:

branectl generate certs -f -p ./config/certs server amy -H amy-proxy-node.com

This should generate multiple files in the ./config/certs directory, chief of which are ca.pem and server.pem.

info Certificate generation is done using cfssl, which is dynamically downloaded by branectl. The checksum of the downloaded file is asserted, and if you ever see a checksum-related error, then you might be dealing with a fake binary that is being downloaded under a real address. In that case, tread with care.

When the certificates are generated, be sure to share ca.pem with the central node. If you are also adminstrating that node, see here for instructions on what to do with it.

Client-side certificates

The previous certificates only authenticate a server to a client; not the other way around. That is where the client certificates come into play.

The power of client certificates come from the fact that they are signed using the certificate authority of the domain to which they want to authenticate. In other words, the domain has to "approve" that a certain user exists by creating a certificate for them, and then sending it over.

Note, however, that currently, Brane does not use any hostnames or IPs embedded in the client certificate. This means that anyone with the client certificate can obtain access to the domain as if they were the user for which it was issued. Treat the certificates with care, and be sure that the client is also careful with the certificate.

info If a certificate is leaked or compromised, don't worry; the certificate only proves the identity of a user. What kind of rights that user has can be separately determined (see the chapter series for policy experts), and so you can simply withdraw any rights that user has when it happens.

To generate a client certificate, its easiest to navigate to the ./config/certs directory where you generate the server certificates. Then, you can run:

branectl generate certs client <LOCATION_ID> -H <HOSTNAME> -f -p ./client-certs

Note, that the <LOCATION_ID> is now the ID of the proxy for which you are generating the certificate, and <HOSTNAME> is their address. Similarly to server certificates, you can omit -H <HOSTNAME> to default to the <LOCATION_ID>.

warning Note the -f and -p options. These are optional, and work together to redirect the output of the commands to a nested folder called client-certs. This is however very recommendable, since running this command without that flag in the server certificates folder will accidentally clear the ca.pem file, rendering the rest of the certificates useless.

For example, contuining the example in the previous subsection, we now generate a client certificate for bob at bobs-emporium.com:

branectl generate certs client bob -H 1.2.3.4

Once the client certificates are generated, you can share the ca.pem and client-id.pem files with the client who intends to connect to this node.

Adding client certificates of other domains

If your proxy node needs to download data from other nodes, you will have to add the client certificates they generated to your configuration.

The procedure to do so is identical as for central nodes. For every pair of a ca.pem and client-id.pem certificates you want to:

  1. Create a directory with that domain's name in the certs directory (for the example, you would create a directory named certs/amy for a domain named amy)
  2. Move the certificates to that folder.

At runtime, whenever your proxy node will need to download a dataset from another node, it will read the certificates in that node's folder if they exist to authenticate itself.

Launching the instance

Finally, now that you have the images and the configuration files, it's time to start the instance.

We assume that you have installed your images to target/release. If you have built your images in development mode, however, they will be in target/debug; see the box below for the command then.

This can be done with one branectl command:

branectl start proxy

This will launch the services in the local Docker daemon, which completes the setup!

info The command above assumes default locations for the images (./target/release) and for the node.yml file (./node.yml). If you use non-default locations, however, you can use the following flags:

  • Use -n or --node to specify another location for the node.yml file:
    branectl -n <PATH TO NODE YML> start proxy
    
    It will define the rest of the configuration locations.
  • If you have installed all images to another folder than ./target/release (e.g., ./target/debug), you can use the quick option --image-dir to change the folders. Specifically:
    branectl start --image-dir "./target/debug" proxy
    
  • You can also specify the location of each image individually. To see how, refer to the branectl documentation or the builtin branectl start --help.

Next

Congratulations, you have configured and setup a Brane proxy node!

If you are in charge of more proxy nodes, you can repeat the steps in this chapter to add more. If you are also charged with setting up a control node or worker node, you can check the control node chapter or the worker node chapter, respectively, for node specific instructions.

Otherwise, you can move on to other work! If you want to test your node like a normal user, you can go to the documentation for Software Engineers or Scientists.

Brane computer backends

Local backend

SSH infrastructures

drawing Documentation for this infrastructure type will be added soon

Kubernetes infrastructures

One of the supported infrastructure types that Brane may orchestrate over is Kubernetes.

Brane is essentially build around containers, each of which represents a package with pieces of code that may be orchestrated in a workflow. Thus, connecting Brane to a system like Kubernetes is an easy choice, and requires preparation from the perspective of the system engineer in charge of the cluster.

Essentially, all that has to be done is provide Brane with the proper credentials to connect to the cluster, and prepare a separate namespace.

drawing For now, Brane containers do not yet need any additional support beyond access to a network so they may reach the Brane control plane. Any distibuted file systems are implemented in the container as a Redis networked filesystem.

Prepare the cluster

First, you should create a namespace for Brane to run in. While you can also technically let it run in the default namespace, to avoid any naming conflicts we highly recommend to create a namespace. For example, run:

kubectl create namespace brane

to create a namespace called brane.

Then, the second step is to provide Brane with a Kubernetes config file that contains the cluster location and credentials to login.

Because providing the administrator config is very bad practise, it's better to create a new account that only has access to Brane's dedicated namespace. To do so, follow the steps detailled in this tutorial.

That said, any configuration file would do, as long as Brane can launch Jobs and PODs in the namespace where it is supposed to run.

Information requirements

To add your infrastructure to the Brane instance, the control plane will require the following information:

  • The address of your Kubernetes control plane (with port)
  • A configuration file with the proper tokens and certificates so Brane may access its namespace

And that's it. Because Brane simply submits contains to the cluster, everything else is already supported by a standard Kubernetes installation.

Slurm infrastructures

drawing Documentation for this infrastructure type will be added soon

Introduction

Installation

The Policy File

warning This page is for the deprecated method of entering policies into the system using a policies.yml file. A better method (involving eFLINT) is implemented through the policy-reasoner project.

Brane used to read its policies from a so-called policy file (also known as policies.yml) which defines a very simplistic set of access-control policies.

Typically, there is one such policy file per domain, which instructs the "reasoner" for that domain what is should allow and what not.

In this chapter, we discuss how one might write such a policy file. In particular, we will discuss the general layout of the file, and then the two kinds of policies currently supported: user policies and container policies.

Overview

The policies.yml file is written in YAML for the time being.

It has two sections, each of them corresponding to a kind of policy (users and containers, respectively). Each section is then a simple list of rules. At runtime, the framework will consider the rules top-to-bottom, in order, to find the first rule that says something about the user/dataset pair or the container in question. A full list of available policies can be found below.

Before that, we will first describe the kinds of policies in some more detail in the following sections.

User policies

User policies concern themselves what a user may access, and then specifically, which dataset they may access. These policies thus always describe some kind of rule on a pair of a user (known by their ID) and a dataset (also known by its ID).

As a policy expert, you may assume that by the time your policy file is consulted, the framework has already verified the user's ID. As for datasets, your policies are only consulted when data is accessed on your own domain, and so you can also assume that dataset IDs used correspond to the desired dataset.

Note that which user IDs and dataset IDs to use should be done in cooperation with the system administrator of your domain. Currently, the framework doesn't provide a safe way of communicating which IDs are available to the policy file, so you will have to retrieve the up-to-date list of IDs the old-fashioned way.

Container policies

Container policies concern themselves with which container is allowed to be run at a certain domain. Right now, it would have seemed obvious that they are triplets of users, datasets and containers - but due to time constraints, they currently only feature a container hash (e.g., its ID) that says if they are allowed to be implemented or not.

Because the ID of a container is a SHA256-hash, you can safely assume that whatever container your referencing will actually reference that container with the properties you know of it. However, similarly to user policies, there is no list available in the framework itself of known container hashes; thus, this list must be obtained by asking the system's administrator or, maybe more relevant, a scientist who wants to run their container.

Policies

In this section, we describe the concrete policies and their syntax. Remember that policies are checked in-order for a matching rule, and that the framework will throw an error if no matching rule is found.

In general, there are two possible actions to be taken for a given request: allow it, in which case the framework proceeds, or deny it, in which case the framework aborts the request. For each of those action, though, there are multiple ways of matching a user/dataset pair or a container hash, which results in the different policies described below.

Syntax-wise, the policies are given as a vector of dictionaries, where each dictionary is a policy. Then, every such dictionary must always have the policy key, which denotes its type (see the two sections below). Any other key is policy-dependent.

User policies

The following policies are available for user/dataset pairs:

  • allow: Matches a specific user/dataset pair and allows it.
    • user: The identifier of the user to match.
    • data: The identifier of the dataset to match.
  • deny: Matches a specific user/dataset pair and denies it.
    • user: The identifier of the user to match.
    • data: The identifier of the dataset to match.
  • allow_user_all: Matches all datasets for the given user and allows them.
    • user: The identifier of the user to match.
  • deny_user_all: Matches all datasets for the given user and denies them.
    • user: The identifier of the user to match.
  • allow_all: Matches all user/dataset pairs and allows them.
  • deny_all: Matches all user/dataset pairs and denies them.

Container policies

The following policies are available for containers:

  • allow: Matches a specific container hash and allows it.
    • hash: The hash of the container to match.
    • name (optional): A human-friendly name for the container (no effect on policy, but for debugging purposes).
  • deny: Matches a specific container hash and denies it.
    • hash: The hash of the container to match.
    • name (optional): A human-friendly name for the container (no effect on policy, but for debugging purposes).
  • allow_all: Matches all container hashes and allows them.
  • deny_all: Matches all container hashes and denies them.

Example

The following snippet is an example policy file:

# The user policies
users:
# Allow the user 'Amy' to access the datasets 'A', 'B', but not 'C'
- policy: allow
  user: Amy
  data: A
- policy: allow
  user: Amy
  data: B
- policy: deny
  user: Amy
  data: C

# Specifically deny access to `Dan` to do anything
- policy: deny_user_all
  user: Dan

# For any other case, we deny access
- policy: deny_all



# The container policies
containers:
# We allow the `hello_world` container to be run
- policy: allow
  hash: "GViifYnz2586qk4n7fdyaJB7ykASVuptvZyOpRW3E7o="
  name: hello_world

# But not the `cat` container
- policy: deny
  hash: "W5WS23jAAtjatN6C5PQRb0JY3yktDpFHnzZBykx7fKg="
  name: cat

# Any container not matched is allowed (bad practice, but to illustrate)
- policy: allow_all

Introduction

In these series of chapters, we will discuss how you can develop and then upload packages to the Brane instance for use by scientists and other software engineers.

First, in the next section, we will give a bit of background that will help you understand what you're doing. Then, in the next chapter, we will help you preparing your local machine for Brane package development.

Background & Terminology

In Brane, every kind of job that is executed is done so by submitting a workflow. This is simply a high-level specification of which external functions will be called in what order, and how data is passed between them.

You may think of them as a program, except that it's meant to be more high-level and abstracted over the actual algorithms that are run part of the execution.

That means that the bulk of the work will be done in these external function calls. Because of this modularity present in these workflows, Brane collects these functions in packages, which may be used in zero or more workflows as independent compute steps.

Technically, these packages are implemented as containers, which means that they might be written in any language (as long as they adhere to the protocol Brane uses to communicate with packages) and will ship together with all required dependencies.

As a consequence, this means that Brane package calls are, in principle, always completely self-contained. After execution, the container is destroyed, removing any work that the package has done. The only way to retrieve results is by either sending them back to the workflow-space directly as a return value (which can contain limited data), or by returning so-called datasets or intermediate results (see the scientist chapters for more background information, or the software engineer's data chapter for practical usage).

Next

Before we will go more in-depth on the functionality and process of developing Brane packages, we will first walk you through setting up your machine for development in the next chapter.

Then, in the chapter after that, we will discuss the different types of packages supported by Brane and how to create them.

Installation

To develop Brane packages, you will need three components:

  • The Brane Command-Line Interface (Brane CLI), which you use to package your code and publish it to an instance
  • A Docker engine, which is used to build the package containers by the Brane CLI
  • Support for your language of choice

The third component, the language support, is hard to generalize as it will depend on the language you choose. However, there is an import difference in setup between interpreted languages and compiled languages.

For interpreted languages, (such as Python), you should setup your machine in such a way that it is able to run the scripts locally (for development purposes). Additionally, you should make sure that you have some way of installing the interpreter (and any dependencies) on Ubuntu (since the Brane containers are based on that OS).

For compiled languages (such as Rust), you should prepare your machine to not only develop but also compile the language for use in an Ubuntu container. Then, you should only package the resulting binaries so that the package container remains as lightweight as possible.

The other two prerequisites will be discussed below.

The Docker engine

First, you should install Docker on the machine that you will use for development. Brane will use this to build the containers, since Docker features an excellent build system. However, Brane also requires you to have the BuildKit plugin installed on top of the normal Docker build system.

To install Docker, refer to their official documentation (macOS, Ubuntu, Debian or Arch Linux). Note that, if you install Docker on Linux, you should make sure that you can execute Docker commands without sudo (see here, first section) Then, you should install the Buildkit plugin by running the following commands:

# Clone the repo, CD into it and install the plugin (check https://github.com/docker/buildx for alternative methods if that fails)
git clone https://github.com/docker/buildx.git && cd buildx
make install

# Set the plugin as the default builder
docker buildx install

# Switch to the buildx driver
docker buildx create --use

The Brane CLI

With Docker installed, you may then install the Brane Command-Line Interface.

You can either download the binary directly from the repository, or build the tool from scratch. The first method should be preferred in most cases, which the latter is only required if you require a non-released version or run Brane on non-x86_64 hardware.

info Note that you probably already installed the Brane Command-Line Interface if you've installed a node on your local machine (follow this guide, for example).

Downloading the binary

To download the Brane CLI binary, use the following commands:

# For Linux
sudo wget -O /usr/local/bin/brane https://github.com/epi-project/brane/releases/latest/download/brane-linux-x86_64

# For macOS (Intel)
sudo wget -O /usr/local/bin/brane https://github.com/epi-project/brane/releases/latest/download/brane-darwin-x86_64

# For macOS (M1/M2)
sudo wget -O /usr/local/bin/brane https://github.com/epi-project/brane/releases/latest/download/brane-darwin-aarch64

These commands download the latest Brane CLI binary for your OS, and store them in /usr/local/bin (which is why the command requires sudo). You may install the binary anywhere you like, but don't forget to add the binary to your PATH if you choose a location that is not part of it already.

Compiling the binary

You may also compile the binary from source if you need the cutting-edge latest version or are running a system that doesn't have any default binary available.

To do so, you should first install a couple of additional dependencies that are required when building the framework:

  • Install Rust's compiler and the associated Cargo package manager (the easiest is to install using rustup (cross-platform))

    • If you use rustup, don't forget to logout and in again to refresh the PATH.
  • On macOS:

    • Install XCode Command-Line Tools:
      # On macOS 10.9+ or higher, running any command part of the tools will prompt you to install them:
      git --version
      
    • Install OpenSSL, pkg-config (so the Rust packages find your OpenSSL installation) and CMake:
      # We assume you already have Homebrew (https://brew.sh/) installed
      brew install pkg-config openssl cmake
      
    • Make sure that pkg-config is able to find the OpenSSL installation by running:
      export PKG_CONFIG_PATH="/usr/local/opt/openssl@3/lib/pkgconfig"
      
      (Run this command every time you open a new terminal and want to compile Brane stuff. Alternatively, if you want it be permanent, add the command to your ~/.zshrc file)
  • On Ubuntu / Debian:

    • Install the build dependencies for Rust packages: GCC (gcc and g++), OpenSSL (headers only), pkg-config, make and CMake:
      sudo apt-get update && sudo apt-get install \
          gcc g++ \
          libssl-dev \
          pkg-config \
          make \
          cmake
      
    • To clone the repository, also install git:
      sudo apt-get install git
      
  • On Arch Linux:

    • Install the build dependencies for Rust packages: GCC, OpenSSL, pkg-config, make and CMake:
      sudo pacman -Syu gcc openssl pkg-config make cmake
      
    • To clone the repository, also install git:
      sudo pacman -Syu git
      

With the dependencies installed, you may then clone the repository and build the Command-Line Interface:

# Clone the repo and CD into it
git clone https://github.com/epi-project/brane && cd brane

# Run the make script to build the CLI
chmod +x ./make.py
./make.py cli

drawing Note that compiling the CLI generates quite a large build cache (~2.4 GB). Be sure to have at least 7 GB available on your device before you start compiling to make sure your OS keeps functioning.

Once done (this may take some time), the resulting binary will be written to ./target/release/brane. You can then copy the binary to /usr/local/bin to make it available in your PATH:

sudo cp ./target/release/brane /usr/local/bin/brane

Alternatively, you can also add the ./target/release folder to your PATH instead (don't forget to prepend the path to the cloned repository, e.g., /home/user/Downloads/brane/target/release).

Next

Now that you have the Brane CLI installed, we will give a brief tutorial on how to start writing packages in the next chapter.

If you would like to know more about the different packages types that Brane supports, check the Packages series of chapters.

Your first package

In this chapter, we will guide you through creating the simplest and most basic package available: the hello world package.

This tutorial assumes that you have experience with programming. In particular, it's useful to known about standard streams and environment variables.

info The code used in this tutorial can be found in examples/doc/hello-world of the repository.

1. Writing the code

Because Brane will package your code as an Ubuntu container, you may choose virtually any language you like to write your code in.

For the purpose of this tutorial (because the code is very simple), we will write in GNU Bash, which is a very commonly used Unix shell.

To begin, create a new directory (which we will call hello-world), and create a file hello_world.sh. All it does it printing: "Hello, world!", and so we only have to use an echo-statement:

#!/bin/bash
echo 'Hello, world!'

info Don't forget the shebang at the top of the file; this special comment, #!/bin/bash, tells the terminal how it should run this script (using the Bash interpreter, in this case). If you omit it, it will try to run your script as a normal Linux executable - which will not work, as this is not binary code.

However, if we were to build this as a package and launch it in Brane as it is, we wouldn't see anything. That's because Brane doesn't pass the stdout directly to the user; instead, it reads it and parses it as YAML.

Specifically, Brane will expect a YAML file as output that has a certain key/value mapping, where it will only return the result of a specific key. The name of this key is arbitrary; for this tutorial, we will call it output.

Thus, change your script to:

#!/bin/bash
echo 'output: "Hello, world!"'

which just writes the YAML equivalent of a key output with a value Hello, world!.

For now, this is all the code that we will package in a container, and so you can save the script and move to the next section.

2. Creating a container.yml

Every Brane (code) package exists of two components: the code to run and a file describing how to interface with the code. For us, the first part is the hello_world.sh script, and the second is a file conventionally called container.yml.

For this tutorial, we will only focus on the general file structure of the container.yml and how to read package output. The next tutorial will focus on how to partition a package into multiple functions and provide them with input.

The container.yml file describes a couple of things about your package:

  • Metadata (name, version, kind, etc).
  • Files and dependencies that should be added to the container
  • The functions that your package implements and how they can be called.

We will go through these step-by-step.

Create a file container.yml in the hello-world directory, and populate it with the following to start:

# The package metadata
name: hello_world
version: 1.0.0
kind: ecu

The first line (name) specifies the package name, and the second line (version) specifies the package version. Together, the provide a unique identifier for each package. This means that we can have multiple versions of the same packages around, which the framework will treat as different packages.

The third line (kind) is the most important one, because this specifies that this package contains arbitrary code (Executable Code Unit; see the packages series of chapters).

Next, we will specify the dependencies of this package. Because Bash is installed in the Ubuntu image by default, we only have to provide the files that should be copied over to the container, and then which file the container should run. Do this by adding the following to your file:

...

# Specify the files to copy over (relative to the container.yml file)
files:
- hello_world.sh

# Specify which file to run
entrypoint:
  # 'task' means the script should be run synchronously (i.e., blocking)
  kind: task
  exec: hello_world.sh

As you can see, Brane supports only one entry point, even though a package may contain multiple functions. As you will see in the next tutorial, Brane will tell your entrypoint which function to run by specifying certain command-line arguments or environment variables. However, because our script contains only one function, we do not worry about this; every time it is called, it will only ever have to return the Hello, world! message.

However, we still have to define this function. To do so, add the following lines to your container.yml:

...

actions:
  "hello_world":
    command:
    input:
    output:
    - type: string
      name: output

This defines a function with the identifier hello_world, that requires no input (input: is empty) and also doesn't need to pass any command-line arguments to the script (command: is empty). What is does define, however, is that it should return the value of the output-key in the function's output. We define that value to be of type string, and the name of the key corresponds to the one we set in the hello_world.sh Bash script.

With that defined, your container.yml file should now look like this:

# Container.yml for the hello_world package
name: hello_world
version: 1.0.0
kind: ecu

files:
  - hello_world.sh

entrypoint:
  kind: task
  exec: hello_world.sh

actions:
  'hello_world':
    command:
    input:
    output:
    - name: output
      type: string

We are now ready to build the package.

3. Building the package

To build a package, we will finally use the Brane CLI. We will assume that you have named it brane, and that it is reachable under the PATH of your machine.

To build the package, simply run the following from within the hello-world directory:

brane build ./container.yml

While the command above seems simple, there are a couple of semantics to think about:

  • All relative paths in the file are relative to the container.yml file; use the --workdir option to change the working directory.
  • Brane will automatically try to deduce the kind of the package based on the name of the file you specify it. container.yml will default to an ecu package (see the packages series of chapters). To change this, or if Brane could not deduce the package kind, use the --kind option to manually specify it.
  • The CLI will automatically download the branelet executable that will live in the container from the repository. However, if you have a non-released version of the CLI in any way, you should probably build your own (download the repository as described here) and pass it to the build command with the --init option.

If everything succeeds, you should see something along the lines of:

Successfully built version 1.0.0 of container (ECU) package hello_world.

Your package is now available in the local repository that only exists on your laptop. To verify it, you can run:

brane list

which should show you:

An entry for 'hello_world' in the list

4. Testing your package

Because publishing your package to a Brane instance immediately exposes it for others to use, it is often better to first test your package locally to catch any errors or bugs.

To do so, the Brane CLI provides a build-in test capability, which can run any function you defined in the package container with some (properly-typed) input and test its computation.

To run it for the hello_world package, run the following command:

brane test hello_world

You will then be greeted by something along the lines of:

The function to execute (and then you can select a function)

This TUI (Terminal UI) will help you to select a function, give input to it (though now not relevant) and show its output.

If everything went alright, you should see the Hello, world! message if you hit 'enter':

Hello, world!

This confirms that your package is working and Brane can interact with it! If it doesn't, you'll see an error that hopefully allows you to debug your package. You can check the troubleshooting chapter with some general tips on how to debug any such errors.

If everything checks out, you are now ready to push your package to a Brane instance.

5. Publishing your package

For this step, you will need to have a running Brane instance. If you do not have one where you can test this tutorial, you can download and install one yourself by following the steps listed in the chapters for system administrators.

We will assume that you have a Brane instance available at 127.0.0.1 (localhost). If you will be using a remote instance, replace all of the occurrences of the localhost IP address with the address of your instance.

The first step, before you can publish to a cluster, is to login to one. Run the following command for that:

brane login http://127.0.0.1 --username <user>

where you should replace <user> with a name of your choosing. This is the name that will be used to 'sign' all your packages (i.e., list you as owner).

info This command does not actually have any interaction with the instance you login; it simply remembers the value for subsequent commands. This means it will return instantly and always, even if your IP is invalid (this will likely change in a future release).

Next, you may try to publish your package by pushing it to the instance you just logged-in to:

bash push hello_world

This command will automatically push the latest version of your package to the remote instance. If you want to be explicit about which version to push, you may add it to the end of the command. For this tutorial, this command will give the same result as the one above:

brane push hello_world 1.0.0

Your package is now available in the remote instance. You can verify this by running:

brane search

This commands does exactly the same as the brane list command, except that it doesn't inspect your local repository but instead the remote one you are logged-in to. Thus, it should show you something along the lines of:

An entry for 'hello_world' in the list

6. Running your package

Finally, we can properly run the function that you have just created!

To do so, we will connect to the remote instance using the REPL (Read, Eval, Print-Loop) of the Brane CLI tool. This loop will take BraneScript statements line-by-line, and run them on the remote instance. Effectively, this will be like "interactively" running a workflow on the remote instance.

To start the REPL, run:

brane repl --remote http://127.0.0.1:50053

info If you omit the --remote option from the command, you will run a local REPL instead. This can be used to test workflows and run package locally more thoroughly, and should work the same (except that you don't actually push anything to a Brane instance).

If the REPL launched and connected successfully, you will see:

An empty Brane CLI REPL

Any command you write will be executed as BraneScript. For a more in-depth documentation of how BraneScript works, you can refer to its documentation chapters.

For now, we will restrict ourselves to testing our package.

First, we will bring the function that we have defined in our package into scope, by importing the package:

import hello_world;

(Note the delimiter, ;. BraneScript requires all stataments to be terminated by it.)

If the instance was able to find the package, then the command will return without printing anything. Otherwise, it might give you an error saying the package is unknown. If so, try re-pushing your package and making sure you are logged-in to the correct instance.

Next, you can call the function to run your package on the instance:

hello_world();

After running that command, you should see:

Hello, world!

Wait, we're not seeing anything?! Did something go wrong?!

No, it didn't! Remember, Brane never simply shows the user the stdout of the package. Instead, it uses the value of the parsed YAML field (in our case, output) as the return value of the function that we defined. Thus, if we wrap the hello_world()-call in a println-statement (a builtin in BraneScript), you will finally see:

Hello, world!

info You may notice that the second time, the package call went significantly faster than the first call. This is because Brane lazily imports packages in the Docker engine, which means that it still had to download the container during the first call, while it was already loaded during the second.

Congratulations! You have now written, built, tested, published and then executed your first Brane package.

// TODO: Replace pic above here with one that uses println

Next

In the next chapter, we will consider a slightly more complicated case, where we will talk about passing inputs to functions and separating a package to have multiple functions. To do so, you will implement a simple Base64 encoding/decoding package.

Alternatively, you can also look at the Package documentation to find out the details of the different package types, or dive into BraneScript by reading its documentation.

Package inputs & multiple functions

In the previous chapter, you created your first package, and learned how to build and run a function that take no inputs.

However, this makes for very boring workflows. Thus, in this chapter, we will extend upon this by creating a container with multiple functions, and where we can pass inputs to those functions. Concretely, we will describe how to implement a base64 package, which will contain a function to encode a string and decode a string to and from Base64, respectively.

info The code used in this tutorial can be found in examples/doc/base64 of the repository.

1. Writing code

To implement the package, we will write a simple Python script that contains the two functions.

First, create the directory for this package. We will call it base64. Then, create a Python file code.py with the skeletons for the two functions:

#!/usr/bin/env python3


# Imports
# TODO


# The functions
def encode(s: str) -> str:
    """
        Encodes a given string as Base64, and returns the result as a string
        again.
    """

    # TODO


def decode(b: str) -> str:
    """
        Decodes the given Base64 string back to plain text.
    """

    # TODO


# The entrypoint of the script
if __name__ == "__main__":
    # TODO

(Don't forget the shebang at the top of the file!)

info You may notice the strs in the function headers. If you're unfamiliar with it, this annotes the types of the arguments. If you're interested, you can read more about it here.

The functions themselves are pretty straightforward to implement if we employ the help of the base64 module, which is part of the Python standard library. Thus, import it first:

# Imports
import base64

...

The implementation of encode():

...

def encode(s: str) -> str:
    """
        Encodes a given string as Base64, and returns the result as a string
        again.
    """

    # First, get the raw bytes of the string (to have correct padding and such)
    b = s.encode("utf-8")

    # We simply encode using the b64encode function
    b = base64.b64encode(s)

    # We return the value, but not after interpreting the raw bytes returned by the function as a string
    return b.decode("utf-8")

...

The implementation of decode() is very similar:

...

def decode(b: str) -> str:
    """
        Decodes the given Base64 string back to plain text.
    """

    # Remove any newlines that may be present from line splitting first, as these are not part of the Base64 character set
    b = b.replace("\n", "")

    # Decode using the base64 module again
    s = base64.b64decode(b)

    # Finally, we return the value, once again casting it
    return s.decode("utf-8")

...

Up to this point, we are just writing a Python package; Brane is not yet involved.

But that will change now. In the entrypoint of our package, we have to do two things: we have to let Brane select which of the functions to call, and we have to be able to process the input that Brane presents us with.

The first is done by Brane specifying a command-line argument (see below) that specifies the function to call. Thus, we will write a piece of code that reads the first argument passed to the script, and then uses that to select the function.

...

# The entrypoint of the script
if __name__ == "__main__":
    # Make sure that at least one argument is given, that is either 'encode' or 'decode'
    if len(sys.argv) != 2 or (sys.argv[1] != "encode" and sys.argv[1] != "decode"):
        print(f"Usage: {sys.argv[0]} encode|decode")
        exit(1)

    # If it checks out, call the appropriate function
    command = sys.argv[0]
    if command == "encode":
        result = encode(<TODO>)
    else:
        result = decode(<TODO>)

    # TODO

Don't forget to import the sys module:

# Imports
import base64
import sys

...

However, to call our functions, we will first have to know the input that the caller of the function wants to be encoded or decoded.

Brane does this by passing the arguments of the function call to the package as environment variables. Specifically, it takes the value in BraneScript, serializes it to JSON and then sets the resulting string in the matching environment variable. The names of these variables are derived from the container.yml file (see below), but let's for now just assume that it's called: INPUT.

Thus, to give our functions their input, we can just pass the value of the INPUT environment variable to the json package, and pass the resulting string to our functions:

...

if __name__ == "__main__":
    ...

    # If it checks out, call the appropriate function
    command = sys.argv[0]
    if command == "encode":
        # Parse the input as JSON, then pass that to the `encode` function
        arg = json.loads(os.environ["INPUT"])
        result = encode(arg)
    else:
        # Parse the input as JSON, then pass that to the `decode` function
        arg = json.loads(os.environ["INPUT"])
        result = decode(arg)

    # TODO

Again, don't forget to add our new dependencies as imports:

# Imports
import base64
import json    # new
import os      # new
import sys

...

Now, finally, we have to give the result back to Brane like we did before.

We will do so in a slightly complicated manner, using the yaml package of Python. This is both to show that Brane just expects YAML, which might make it easier to return arbitrary output, and it gives us an opportunity to talk about package dependencies in a later section.

To return the values, we will return the value as a YAML key/value pair with the key name called output:

...
if __name__ == "__main__":
    ...

    # Print the result with the YAML package
    print(yaml.dump({ "output": result }))

    # Done!

Finally, add the yaml-module dependency:

# Imports
import base64
import json
import os
import sys
import yaml

...

And that gives us the final base64/code.py Python file that implements the base64-package:

#!/usr/bin/env python3


# Imports
import base64
import json
import os
import sys
import yaml


# The functions
def encode(s: str) -> str:
    """
        Encodes a given string as Base64, and returns the result as a string
        again.
    """

    # First, get the raw bytes of the string (to have correct padding and such)
    b = s.encode("utf-8")

    # We simply encode using the b64encode function
    b = base64.b64encode(b)

    # We return the value, but not after interpreting the raw bytes returned by the function as a string
    return b.decode("utf-8")


def decode(b: str) -> str:
    """
        Decodes the given Base64 string back to plain text.
    """

    # Remove any newlines that may be present from line splitting first, as these are not part of the Base64 character set
    b = b.replace("\n", "")

    # Decode using the base64 module again
    s = base64.b64decode(b)

    # Finally, we return the value, once again casting it
    return s.decode("utf-8")


# The entrypoint of the script
if __name__ == "__main__":
    # Make sure that at least one argument is given, that is either 'encode' or 'decode'
    if len(sys.argv) != 2 or (sys.argv[1] != "encode" and sys.argv[1] != "decode"):
        print(f"Usage: {sys.argv[0]} encode|decode")
        exit(1)

    # If it checks out, call the appropriate function
    command = sys.argv[1]
    if command == "encode":
        # Parse the input as JSON, then pass that to the `encode` function
        arg = json.loads(os.environ["INPUT"])
        result = encode(arg)
    else:
        # Parse the input as JSON, then pass that to the `encode` function
        arg = json.loads(os.environ["INPUT"])
        result = decode(arg)

    # Print the result with the YAML package
    print(yaml.dump({ "output": result }))

    # Done!

2. Creating a container.yml

With the code complete, we will once again create a container.yml.

Again, write the package metadata first, together with the files that contain the code and the entrypoint:

name: base64
version: 1.0.0
kind: ecu

files:
  - code.py

entrypoint:
  kind: task
  exec: code.py

(see the previous chapter for a more in-depth explanation on these)

Next, we can specify additional dependencies for the package. Not only do we require Python to run our script, we also require the yaml package in Python. To do so, we will add an extra section, which will tell Brane to install both of these in the package container:

...

dependencies:
  - python3
  - python3-yaml

The dependencies are just apt packages for Ubuntu 20.04. If you require another OS or system, you should check the in-depth container.yml documentation.

Next, we once again write the section that describes the functions. However, this time, we have two functions (encode and decode), and so we will create two entries:

...

actions:
  encode:
    command:
      # TODO
    input:
      # TODO
    output:
      # TODO

  decode:
    command:
      # TODO
    input:
      # TODO
    output:
      # TODO

First, we will fill in the command-field.

If you think back to the previous section, we said that Brane would tell us which function to run based on the argument given to the script. We can fullfill this assumption by using the command-field of each function:

...

actions:
  encode:
    command:
      # This is just a list of arguments we pass to the function
      args:
      - encode
    input:
      # TODO
    output:
      # TODO

  decode:
    command:
      # Note that we give another argument here, selecting the other function
      args:
      - decode
    input:
      # TODO
    output:++
      # TODO

This (correctly) implies that there are other ways of selecting functions in a package. See the container.yml documentation for more information.

With the function selected, we will next specify the input arguments to each function. For both functions, this is a simple string that we would like to encode.

Now, remember that Brane will pass the input arguments as environment variables. Because environment variables are (by convention) spelled with CAPS, Brane will translate the name you give to an input argument to an appropriate environment variable name - which is the same but all alphabetical characters converted to UPPERCASE.

Thus, for each function, we define an input argument input (which translates to the INPUT in the code.py file) that is of type string:

...

actions:
  encode:
    command:
      args:
      - encode
    input:
    # This specifies one input of type string, in similar syntax to how we specified outputs.
    - name: input
      type: string
    output:
      # TODO

  decode:
    command:
      args:
      - decode
    input:
    # This specifies one input of type string, in similar syntax to how we specified outputs.
    - name: input
      type: string
    output:
      # TODO

Finally, we will define an output (called output again) in much the same way as in the Your first package tutorial:

...

actions:
  encode:
    command:
      args:
      - encode
    input:
    - name: input
      type: string
    output:
    # See the previous section
    - name: output
      type: string

  decode:
    command:
      args:
      - decode
    input:
    - name: input
      type: string
    output:
    # See the previous section
    - name: output
      type: string

The complete container.yml may be found in the project repository (examples/doc/base64/container.yml).

3. Building & Publishing the package

If you've done everything right, this will be exactly the same as with the previous tutorial.

First, we will build the package:

brane build ./container.yml

Once that's ready, test your package by running brane test:

brane test base64

If you test your encode function and then your decode function, you should get something along the lines of:

Encoding and decoding text

Once you've verified everything works, we will push it to the remote repository:

brane push base64

info If you get errors saying that you haven't logged-in yet (or perhaps errors saying a file is missing), login first with brane login. Refer to the previous tutorial for more details.

And then, like before, we can use the REPL to interact with our package:

brane repl --remote http://<IP>:50053

For example, you can now do the following:

Encoding and decoding text

// TODO: Replace pic above here with one that uses println

You can refer to the chapters on writing workflows or the documentation of BraneScript for more explanation on the syntax used here.

Next

You should now be able to build most functions, congratulations!

In the next chapter, we will consider a last-but-not-least aspect of building packages: datasets and intermediate results. If you plan to do any serious data processing with Brane, we highly recommend you to check that chapter out.

Otherwise, check the in-depth documentation on the package system. It will talk about the different types of packages, how they are implemented and the complete overview of the interface with code and the container.yml file.

You can also continue with the chapters for [scientists] to know more about how to write workflows, or check the documentation of BraneScript and Bakery.

Datasets & Intermediate results

The you have followed the previous two tutorials (here and here), you should be a little familiar with how to package your code as one or more Brane functions, that can accept input and return output.

However, so far, your code will not be very usable to data scientists. That's because a key ingredient is missing: datasets, and especially large ones.

In this tutorial, we will cover exactly that: how you can define a (local) dataset and use it in your package. This is illustrated by creating a package that can compute the minimum or maximum of a given file. First, however, we will provide a little background on how datasets are represented, and what's the difference between Brane's concept of data and Brane's concept of intermediate results. If you're eager and already know this stuff, you can skip ahead to the section after the next one.

info The code used in this tutorial can be found in examples/doc/minmax of the repository.

0. Background: Variables & Data

In Brane, there is an explicit distinction between variables and data.

Variables are probably familiar to you from other programming languages. There, they can be though of as (simple) values or data structures that live in memory only, and is something that typically the processor is able to directly1 manipulate. This is almost exactly the same in Brane, except that they are emphesised to be simple, and mostly used for configuration or control flow decisions only.

Data, on the other hand, represents the complex, large data structures that typically live on disk or on remote servers. In Brane, this is typically the information that a package wants to work on, and is also the information that may be sensitive. It is thus subject to policies.

Another useful advantage of being able to separate variables and data this way is that we can now leave the transfer of large datasets up to the framework to handle. This significantly reduces complexity when attempting to use data from different sources.

As a rule of thumb, something is a variable if it can be created, accessed and manipulated in BraneScript (or Bakery). In contrast, data can only be accessed by the code in packages, and only exist in BraneScript itself as a reference. It isn't possible to inspect any of the data in a dataset in BraneScript, unless a package is used.

1

From a programmer's perspective, anyway.

Datasets & Intermediate Results

Data itself, however, knows a smaller but important distinction. Brane will call a certain piece of data either datasets or intermediate results. Conceptually, they are both data (i.e., referencing some file on a disk or some other source), but the first one can outlive a workflow whereas the other can't. This distinction is used for policies, where it's important that intermediate results can only be referenced by users in the framework participating within the same workflow and not by others.

For you, a software engineer, the important thing to know is that functions can take both as input, but return only intermediate results as output. To get a dataset from a workflow, a scientist has to use builtin functions to commit and intermediate result to a full dataset.

1. Creating a dataset

This time, before we will write code, we first have to create the dataset that we will be using.

Note, though, that creating datasets is typically the role of the system administrator of a given domain that offers the dataset. In other words, you will typically only use datasets already available on the domains in a Brane instance.

However, it can still be useful to create a dataset that is locally available only - typically for testing purposes. That's what we will do here.

For the purpose of the tutorial, we will use a very simple dataset that is a single list of numbers where our code may find the min/max of. To do so, create a folder for the package (which we will call minmax) and a folder for the dataset (we will use minmax/data). Then, you can either download the dataset from the repository or generate it yourself by running:

echo "numbers" > numbers.csv && for i in $(awk 'BEGIN{srand(); for(i = 0; i < 100; i++) print int(rand()*100)}'); do echo "$i" >> numbers.csv; done

We will assume that after this step, you have a file called minmax/data/numbers.csv.

Next, similar to how we use a container.yml file to define a package, we will create a data.yml file to define a dataset. Create the file (minmax/data/data.yml) and write in it:

# This determines the name, or more accurately, identifier, of the dataset.
name: numbers

# This determine how we access the data. In this example, we use a file, but check the wiki to find all possible kinds.
access:
  kind: file
  # Note that relative paths, per default, are relative to this file.
  path: ./numbers.csv

This will tell Brane out of which file(s) this dataset consists, and by which identifier it is known. The identifier is arbitrary, but should be unique across your local machine. We will assume numbers.

info To package multiple files in a dataset, simply create a folder and refer to that in your data.yml file. Be aware, though, that this adds additional uniqueness to your dataset; see below.

Then you can build the dataset by running:

brane data build ./data.yml

in the data folder.

You can confirm that this has worked by executing:

brane data list

which lists all locally available datasets. You should see something like this:

// TODO

2. Creating a container.yml

In this tutorial, we will break the format you've come to expect so far some more by first looking at a container.yml that we will use for our package.

This is almost exactly the same as in previous tutorials, so you should be able to write it yourself (use any of the previous tutorials as example, or check the repository). The only thing that differs is the input and output to the functions we define in our package:

...

actions:
  # The max command, which should be mostly familiar by now
  max:
    command:
      args:
      - max
    input:
    - name: column
      type: string
    - name: file
      # This is new!
      type: Data
    output:
    - name: output
      # This is also new!
      type: IntermediateResult

  # Same here
  min:
    command:
      args:
      - min
    input:
    - name: column
      type: string
    - name: file
      type: Data
    output:
    - name: output
      type: IntermediateResult

We will focus on the two new parts in max only, since they are identical for min.

The first is that, instead of requiring an atomic variable such as a string or an int as input, we now require a class named Dataset. Classes are a whole different story altogher (see the BraneScript documentation or the container.yml documentation), but because Data is a special builtin we can safely ignore it for now.

All that you have to know is that Data represents a dataset reference; it is not the data itself, but merely some way for the framework to known which dataset you are talking about. You can find more information about this in the chapters for scientists, but as a teaser, this is how such a reference is created:

let data_reference := new Data { name := "numbers" };

This creates a reference for a dataset called numbers (what a coincidence!). Thus, by specifying that our package takes a Data as input, Brane will know that it's actually some larger dataset that we're referencing.

In the output, we are using something extremely similar: a class named IntermediateResult. This is Brane's builtin class for intermediate results, and this is once again a reference to a dataset. The only concrete differences between these two (other than those specified in the background section) is that Data cannot be the output of your function, only IntermediateResult. This should be obvious from the semantic difference between them.

This is all that is necessary for Brane to arrange that data is appropriate made available to our package. The rest is done in the package code itself.

info Typically, it's better practise to take an IntermediateResult as an input instead of a Data. This is because Data-objects are trivially convertible to IntermediateResult objects, but the reverse isn't true. Thus, using IntermediateResult is more general and typically better practise.

3. Writing code

We can now finally start writing the code that runs in our package. Because we have already written the container.yml file, we can safely assume that we will have two inputs, COLUMN and FILE, and that our function should return an intermediate result called output somehow.

The code itself will be based on Python, like in the previous tutorial, and then specifically the Pandas library, since that is able to compute the minumum/maximum of a CSV file in just a few lines.

Like before, create a file code.yml that will contain our Python code in the package directory (remember, we use minmax as that directory):

#!/usr/bin/env python3


# Imports
import json
import os
import pandas as pd
import sys


# The functions
def max(column: str, df: pd.DataFrame) -> int:
    """
        Finds the maximum number in the given column in the given pandas
        DataFrame.
    """

    # We use the magic of pandas
    return df.max(axis=column)


def min(column: str, data: pd.DataFrame) -> int:
    """
        Finds the minimum number in the given column in the given pandas
        DataFrame.
    """

    # We use the magic of pandas again
    return df.min(axis=column)


# The entrypoint of the script
if __name__ == "__main__":
    # This bit is identical to that in the previous tutorial, but with different keywords
    if len(sys.argv) != 2 or (sys.argv[1] != "max" and sys.argv[1] != "min"):
        print(f"Usage: {sys.argv[0]} max|min")
        exit(1)

    # Read the column from the Brane-specified arguments
    column = json.loads(os.environ["COLUMN"])

    # TODO 1

    # Use the loaded file to call the functions
    command = sys.argv[0]
    if command == "max":
        result = max(column, <TODO>)
    else:
        result = min(column, <TODO>)

    # TODO 2

(Don't forget the shebang!)

More than in the previous tutorial, we will leave understanding the Python code up to you. If you have trouble understanding what it does, we refer you to the Pandas documentation. The two # TODOs are the places where we will interact with the given dataset or result and return the resulting result, respectively.

First, we will examine how to access given datasets. We assume that two arguments are given to the package: COLUMN (which defines the name of the column to read) and FILE (which will somehow be our dataset). COLUMN will be a simple string, and FILE will be some reference to the dataset that the scientist wants our package to work on (see the container.yml section).

But what is passed exactly? This is a very case-specific answer, since Brane assumes that every dataset is completely unique - even up to the point of its representation (i.e., a file, a remote API, ...). This means that, as a package writer, it is very hard to write general packages, and instead you will have to make assumptions about a specific format of a dataset. Thus, if you want to support multiple types of datasets, it's instead recommended to create multiple functions, one per data type, and verbosely document the types of data required.

info In the future, it is likely that BraneScript will be extended to have a concept of Dataset types which exactly defines what kind of dataset is allowed to be passed to a function. However, until that time, the best you can do is simply error at runtime if the dataset is of invalid format.

For the tutorial, however, we can commit ourselves to the numbers dataset only. This is of kind file (see above), which means that Brane will do two things when it passes it to your package:

  1. Before the container with your package is launched, the dataset's referenced file (or folder) will be available under some path (in practise, this is typically a folder nested in the /data directory in the container).

  2. It will pass the path of the dataset's file (or folder) to you as a string. This is the value passed in the FILE argument.

    info You should always use the given path instead of using a hardcoded one. Not only is the generated path undefined (it may differ per implementation or even domain you're running on), it's also a different path each time a result is passed to your function. Relying on hardcoded values is very bad practise.

Concretely, the following Python snippet will use Pandas to load the dataset at the path given by the FILE argument:

...

# TODO 1

# Load the path given in FILE (you can assume it's always absolute)
file = json.loads(os.environ["FILE"])
df = pd.read_csv(file)

...

if command == "max":
    # Note that we replaced '<TODO>' with the loaded dataset here
    result = max(column, df)
else:
    result = min(column, df)

...

Despite all the theoretical background, accessing the dataset is typically relatively easy; the only thing to keep in mind is that it is highly specific to the dataset you are committing yourself to.

info If you package a folder as a dataset, this procedure becomes slightly more complex. The path given by Brane is the path pointing to the folder itself in that case, meaning that you will manually have to append the target file in the folder to the path. For example, if the numbers dataset packaged a folder with the file numbers.csv in it, the following should be done instead:

file = json.loads(os.environ["FILE"])
df = pd.read_csv(f"{file}/numbers.csv")

However, in this tutorial things are kept simple, and a single file is packaged directly.

With the dataset loaded, we will now consider the second part, which is writing the result.

For educational purposes, we assume that we do not want to use the minimum / maximum number directly, but instead package it as a new dataset. This is actually very common, since this way the result is also subject to policies and cannot be send everywhere.

Recall from the container.yml section that we have defined that our package returns an IntermediateResult with name output. By using that return type, Brane will do the following:

  1. A folder /result becomes available that is writable (in contrast to the input files/folders). Everything that is written to that folder is, after your package call completes, automatically packages as a new piece of data (an IntermediateResult, to be precise).

This means that for our package, all that it has to do to write the result is simply write it to a file in the /result directory. This is exactly what we'll be doing:

...

# TODO 2

# We will write the `result` variable to `/result/result.txt`
with open("/result/result.txt", "w") as h:
    h.write(f"{result}")

Perhaps a bit counter-intuitively, note that our statement that we will have to return the result as output somehow isn't actually true; because functions can have only a single output, and this output is now solely on disk under a defined folder, Brane packages shouldn't actually return anything on stdout when they return an intermediate result. Thus, the output name defined in the container.yml is actually unused in this case.

And with that, our package code is complete! The full code can be inspected in the repository.

info Be sure to document properly how the /result directory looks like once your package call is done with it. Other packages will get the same directory as-is, so will have to know which files to load and in what format they are written.

4. Building & Publishing the package

This will mostly be the same as in the previous tutorial(s), and because this tutorial is already getting pretty long already, we assume you are getting familiar to this now.

One key difference with before is that when testing your package, you should now be prompted to use a dataset as input:

// TODO

It will only show you the locally available datasets, which should include the numbers dataset. If not, go back to the first section and redo those steps.

Similarly, calling your package from the terminal will require you to explicitly reference the numbers dataset:

// TODO

You should also see that executing your package call will not be very exciting, since all it does is produce a new dataset. This is alright, since subsequent package calls in a workflows are still able to use it; however, for demonstration purposes, you can try to download the cat package to inspect it:

// TODO

(Refer to the pull chapter for scientists to learn how to install it).

Next

Congratulations! You have now mastered Brane's packaging system. This should allow you to create useful data science packages for the Brane ecosystem, that scientists may rely upon in their workflows.

As a follow-up to these chapters, you can continue with the chapters for scientists to learn about the workflows for which you write packages. Alternatively, you can also check the documentation of container.yml or data.yml to see everything you can do with those files. Finally, you can also go to the BraneScript documentation to find a complete overview of the language if you're interested.

Alternative packages: OpenAPI Standard

info The fourth tutorial will be written soon.

Introduction

In these few chapters, we will explain the role of scientists within the framework. Specifically, we will talk about BraneScript and Bakery, two domain-specific languages (DSLs) for Brane that are used to write workflows. Concretely, this chapters will thus focus on writing the high-level workflows that may implement a specific use-case.

To start, we recommend that you first read the next section to get a little background and read about some terminology that we will be using. After that, you can go the next chapter, where we will discuss preparing your machine for interacting with Brane.

Background

Typically, workflows revolve around packages that contain external functions (also known as package functions). These are treated extensively in the chapters for software engineers, but all a scientist needs to know is that each function is an algorithm that maybe be executed on a remote backend, managed by Brane.

Another important concept is that of datasets, which are (typically large) files or other sources that contain the data that package functions may operate on. For example, a dataset may be a CSV file with tabular data; or in another instance, it's a compressed archive of CT-scan images.

Workflows are typically in the business of using a combination of package functions acting on certain datasets to achieve certain goals. In short, they are high-level descriptions and implementation of a use-case. And that's exactly the role that a Scientist has in the Brane framework: writing these high-level workflows using low-level packages provided by software engineers as implementation.

Next

In the next chapter, we will walk you through setting up your machine to start writing workflows. If you have already done so previously, you can also skip ahead and learn how to manage packages for your workflows.

Installation

In this chapter, we will discuss how to install the Brane Command-Line Tool, or the brane-executable, on your machine.

If you already have this executable available, you can skip ahead to the next chapter instead. If you do not, you should begin with the next chapter.

info Aside from the brane executable, you may make your life easier by installing the Brane JupyterLab environment; check out its repository.

Prerequisites

Before you can write and test workflows on your machine, make sure that you install the following:

  • Install Docker on your machine. You can refer to the official documentation to find how to install it for Debian, Ubuntu, Arch Linux, macOS or other operating systems.
  • Install the Docker Buildkit plugin. Their repository contains information on how to install it, but typically, the following works:
    # Clone the repo, CD into it and install the plugin (check https://github.com/docker/buildx#building for alternative methods if that fails)
    git clone https://github.com/docker/buildx.git && cd buildx
    make install
    
    # Set the plugin as the default builder
    docker buildx install
    
    # Switch to the buildx driver
    docker buildx create --use
    

Downloading the binaries

The easiest way to install the brane-executable is by downloading it from the project's repository.

Head to https://github.com/epi-project/brane/releases/latest/ to find the latest release. From there, you can download the appropriate brane executable by clicking on the desired entry in the Assets-list:

Successfully built version 1.0.0 of container (ECU) package hello_world.

Example list of assets in a specific Brane release; you can click the one you want to download.

To know which of the executables you need, it helps to know the naming scheme behind the assets:

  • Every asset starts with some name identifying the kind of asset. We are looking for the brane executable, so find one that starts with only brane.
  • Next, the OS is listed. Linux users can select a binary with linux, whereas macOS users should select darwin instead.
  • Finally, the processor architecture is listed. Typically, this will be x86_64, unless you are on a mac device running an M-series processor; then you should select aarch64.

So, for example, if you want to write workflows on a Linux machine, choose brane-linux-x86_64; for Intel-based Macs choose brane-darwin-x86_64, and for M1/M2-based Macs, choose brane-darwin-aarch64.

Once you have downloaded the executable, it is very useful to put it somewhere in your $PATH so that your terminal can find it for your. To do so, open up a terminal (Ctrl+Alt+T on Ubuntu) and type:

sudo mv <download_location> /usr/local/bin/brane

where you should replace <download_location> with the path of the downloaded executable.

For example, if you are running on an Intel-based Mac, you can typically use:

sudo mv ~/Downloads/brane-darwin-x86_64 /usr/local/bin/brane

To verify that the installation was succesfull, you can run:

brane --version

If you see a version number, the installation was successful; but if you see an error (likely something along the lines of No such file or directory), you should try to re-do the above steps and try again.

info Note that the act of copying the brane executable to somewhere in your PATH is not necessary. However, if you don't, remember that you will have to replace all calls to brane with the path of where your downloaded the executable. For example, to verify whether it works, use this command instead:

~/Downloads/brane-darwin-x86_64 --version

info If you see an error along the lines of Permission denied, you can try to give execution rights to the binary:

sudo chmod +x /usr/local/bin/brane

and try again.

Compiling the binary

Instead of downloading the binary and running it, you can also choose to compile the binary yourself. This is usually only necessary if you need a cutting-edge latest, unreleased version, you have an OS or processor architecture for which there is no brane-executable readily available or you are actively developing the framework.

To compile the binary, we refer your to the installation chapter for software engineers, which contains information on how to do exactly this.

Next

If you are able to run the brane --version command, you have installed your brane executable successfully! You can now move to the next chapter, which contains information on how to connect to remote instances and manage your credentials. After that, continue with the chapter on package management, or start by writing your first workflow in either BraneScript or Bakery.

Managing instances

The Brane framework has as main goal to act as an interface to one or more High-Performance Compute systems, such as cluster computers or grid computers.

To this end, it is often the case that you want to connect to such a system. In Brane terminology, this is called a Brane instance1, and in this chapter we will discuss how you can connect to it.

In the first section, we will discuss how to define an instance in the CLI and manage it. Then, in the second section we will show how to add credentials (certificates) to the CLI as to easily use them when connecting to an instance.

1

It's a little bit more complex than presented here. A single Brane instance may actually abstract over multiple HPC systems at once, effectively acting as a "HPC orchestrator". However, from your point of view, the scientist, Brane will act as if it is a single HPC divided into separate domains.

The instance-command

All of the commands for managing the basic information about instances is grouped under the brane instance subcommand in the brane CLI. We will assume in this chapter that you have already installed this tool, so consult the installation chapter if you did not.

In the CLI, instances are defined as separate entities that can be created and destroyed. Think of them as keys in a keychain, where each of them has a unique name to identify them, and furthermore carries information such as where to reach the instance or credentials to connect with.

If you have just installed the CLI, you won't have any instances yet. You can check this by running:

brane instance list

This should show you an empty table:

List of packages living in a remote instance.

Let's change that by defining our own instance!

Defining instances

For the purpose of this tutorial, we assume that there is a Brane instance running at some-domain.com, which is what we want to connect to.

The most basic form of the command to generate a new instance is as follows:

brane instance add <HOSTNAME>

where we want to replace <HOSTNAME> with the address where we can reach the instance.

For our example, you can run:

brane instance add some-domain.com

which then adds a new instance with default settings. You can see this by running brane instance list again:

List of packages living in a remote instance.

This shows you the name by which you can refer to this instance and the addresses that the CLI uses to connect to this instance. In addition, you can also add the --show-status flag to ping the remote backend and see if it's online:

List of packages living in a remote instance.

warning By default, the CLI will also ping the remote instance when you define it to help you to see if you entered the hostname correctly. If you want to disable this behaviour, or if you are not connected to the internet when you define a new instance, add the --unchecked flag:

brane instance add <HOSTNAME> --unchecked

Defining non-default instances

While the command used above is nice and concise, it is often desireable to change some properties about the instance upon creation.

One of such properties is the name of the instance. By default, this equals the hostname, but you can easily specify this to be something else using the --name option:

brane instance add <HOSTNAME> --name <SOME_OTHER_NAME>

For example:

brane instance add some-domain.com --name instance1

Inspecting the instance using brane instance list now shows:

List of packages living in a remote instance.

info As you can see, you can use the --name flag to define multiple instances that point to the same hostname. This might be useful if you have two sets of credentials you want to login with (see below).

There are other properties that can be set, too. You can inspect them using brane instance add --help, or consult this list:

  • --api-port <NUMBER> changes the port number with which the CLI connects to the instance's registry. Leaving this to the default value is probably fine, unless the system administrator of the instance told you to use something else.
  • --drv-port <NUMBER> changes the port number with which the CLI connects to the instance's execution engine. Leaving this to the default value is probably fine, unless the system administrator of the instance told you to use something else.

Selecting instances

After you have created an instance, however, you must select it before you can use it. This effectively tells the CLI that all subsequent commands should be executed on the selected instance, if relevant, until the selection is changed.

To do so, use the following command:

brane instance select <NAME>

For example:

brane instance select instance1

You can verify that you have selected an instance by running brane instance list again. The selected instance should be printed in bold:

List of packages living in a remote instance.

info When creating an instance, you can also add the --use flag to instantly select it:

brane instance add <HOSTNAME> --use

avoiding the need to manually call brane instance select ... afterwards.

Editing instances

If you ever need to change some property of the instance, then you can use the brane instance edit subcommand to do so.

You can change the same properties from an instance as given during creation, except for the name of an instance. To "change" the name of an instance, you have to re-define it with the same properties as the old one.

The properties that can be changed can be found when running brane instance edit --help, or else in this list:

  • --hostname: Change the hostname where this instance lives. For example: brane instance edit instance1 --hostname some-other-domain.com.
  • --api-port: Change the port number with which the CLI connects to the instance's registry.
  • --drv-port: Change the port number with which the CLI connects to the instance's execution engine.

Note that you can specify multiple options at once, e.g.:

brane instance edit instance1 --hostname some-other-domain.com --api-port 42

changes both the hostname and the API port for the instance instance1.

Removing instances

Finally, if you no longer have the need to connect to an instance, you can remove it using the following command:

brane instance remove <NAME>

When you attempt to remove it, brane will not do so before you have given confirmation. Simply hit y if you want to remove it (no need to press enter), or n if you changed your mind.

For example, if you run:

brane instance remove some-domain.com

and then hit y, you should no longer see it in the list generated by brane instance list:

List of packages living in a remote instance.

If you remove a selected instance, then no instance will be selected afterwards, and you have to re-run brane instance select with a different one.

info For unattended access, you can also provide the --force flag to skip the confirmation check:

brane instance remove <HOSTNAME> --force

Use at your own risk!

info You can also specify multiple instance to remove at once, simply by giving multiple names. For example:

brane instance remove some-domain.com instance1

would remove both of those instances.

Note that you will only be asked for confirmation once.

And that's it for the basic instance mangement!

Credentials

Aside from the basic properties of an instance, there is also the matter of credential management. After all, an instance may handle sensitive data, in which case it's paramount that a user is able to identify themselves.

A complicating factor in this story is that a Brane instance may consist of multiple domains (for example, it may feature two hospitals who want to collaboratively run some workflow). The problem, however, if that they are both in charge of their own authentication scheme; while this is very nice for the hospitals, it gets a little complicated for you, the scientist, because you will have to have credentials for each domain with an instance. Typically, this will be a certificate, and every domain will provide you with one that proves you are who you say you are - but only on their domain.

info Note that you don't have to have credentials for every domain to use the Brane instance. This is only relevant if you directly need to interact with a domain, and that is only relevant if a part of your workflow will be executed there or if you attempt to download a dataset from that domain.

For this section, we once again assume that there is some instance over at some-domain.com and that you have already defined an instance called instance1 to refer to it (see the previous section). Additionally, we assume that you have been provided with the certificates for a domain called domain1: two files, ca.pem and client-id.pem.

Adding certificates

Brane always assumes that a certificate pair for the purpose of connecting to a domain consists of two files:

  • A root certificate, canonically called ca.pem, which allows the Brane CLI to detect if the remote domain is who they say they are. It is a public certificate, so it is not very sensitive.
  • A client identity file, canonically called client-id.pem, which contains both the public and private parts of your key to that domain. Because of this private key, however, this file is sensitive, so never share this with anyone!

Note, however, that it may be the case that the system administrator of the target domain provides you with a single file that contains both, or three files to separate the client certificate and key. Regardless, to add them to an instance, you can run the following command:

brane certs add <FILES>

By default, the CLI will add the certificates to the instance you have currently selected, but you can also use the --instance option to target some other instance.

So, for our certificates:

brane certs add ca.pem client-id.pem

Similarly to how you can use brane instance list to check your instances, you can use brane certs list to check your certificates:

List of packages living in a remote instance.

info Note that the CA and CLIENT mentioned in the table refer to the files generated by the command, not by your input. That means that regardless of how many certificate/key files you specify, it will always separate them into one CA file and one client file internally.

Note that the domain name is automatically deduced based on the issuer of the certificates. Typically, this is what you want, since the domain name is used automatically based on the name of the domain to which the CLI will connect. However, if necessary, you can manually specify it using the --domain flag.

Removing certificates

Just as with instances, removing certificates is also useful at times. To do so, use the following command:

brane certs remove <DOMAIN>

This will remove the certificates for the given domain in the currently selected instance. Just as with brane certs add, you can remove them in another instance using the --instance flag, and just like brane instance remove, you can specify multiple domains at once to mass-delete them.

After running the command, the certificates will disappear again if you run brane certs list:

List of packages living in a remote instance.

info You can specify brane certs list --all to see all of the certificates in all of the instances.

Next

Now that you can manage instances and their credentials, you are ready to start writing your own workflow. This can be done in multiple languages - either in BraneScript or in Bakery, and the remainder of the chapters for scientists will be dependent on one of these languages.

Playing with packages

In this chapter, we will explain how to manage packages on your local system and push them to a remote instance for running your workflows.

Currently, Brane has three ways of obtaining packages that contain functions: package code yourself, download a package from GitHub or pull them from a Brane instance. The first option is typically the role of a software engineer, and so we focus on the latter two in this chapter.

We will first discuss where you can move packages to/from in the first section. Then, in the two following sections, we will explain how to get packages from GitHub and from a Brane instance, respectively. Then, in the section after that, we will also explain how to push the obtained packages to (another) Brane instance to use them, and we close by discussing how to remove packages in the final section.

Note that we assume that you are able to login to some remote instance, as described in the previous chapter.

Package locations

Brane packages can have two possible kinds of locations: they can be local, in which case they are only usable on your machine; or they can be remote, in which case they are usable in some Brane instance.

Typically, you first download a package to your local machine from whatever source (see the next section and the section after that) to play around with it and to test your workflows; and then, when you are ready to schedule the workflow "for real", you push them to a remote instance and submit your workflow there.

// TODO package diagram

Another thing to note is that every Brane instance will also host its own package repository. So another instance of where you have to manage packages is to pull a package from one instance to your local machine, and then push it to another instance for use there.

Downloading packages from GitHub

The first method to download a package is by downloading it from GitHub using the brane-executable. You can find how to install this tool in the installation chapter.

info Of course, you can easily download the packages manually from GitHub and then build them as if you wrote the package yourself. See the chapters for software engineers on how to do that. This section focusses on using the more convenient method provided by brane.

As an example repository, we will use the brane-std repository. It provides a set of packages that are useful in general scenarios, and so can be thought of as a kind of standard library for Brane.

Before we can install a package from a repository, first we have to find the identifier of the repository. This identifier is written as the GitHub user or organisation name, a slash, and then the name of the repository. Or, more precisely, the identifier is the part of the URL of a repository that comes after https://github.com/. For example, for the standard library, which can be found at https://github.com/epi-project/brane-std, the ID would be:

epi-project/brane-std

To download a package and install it locally, you can use the following command:

brane import <REPO> <FILE>

where <REPO> is the identifier of the repository, and <FILE> is the path to the package to download in that repository. Note that you have to refer to the container.yml file (or similar) for that package; consult the documentation of the package to find which file to refer to specifically.

info If the target repository contains only one package such that there is a container.yml file in the root of the repository, you can also omit the <FILE> argument.

So, for example, to download the hello_world package from the standard library:

brane import epi-project/brane-std hello_world/container.yml

Brane will then download the package and install it, making it available for local use.

Pulling packages from an instance

Another method is to pull a package from a remote instance to your local machine so you can distribute it later.

To use it, you first have to define and then select an instance to work on. We won't go into detail here; consult the previous chapter for that. Instead, you can use this command to quickly log into an instance if you haven't already:

brane instance add <ADDRESS> --use

where <ADDRESS> is the URL where the instance may be reached.

warning If the above command fails, you may want to retry it with the --unchecked flag behind it:

brane instance add <ADDRESS> --use --unchecked

However, note that if it works with this flag, it means the remote instance isn't available - so any of the subsequent commands won't work either.

Once logged-in, you can fetch a list of available packages by using:

brane search

Which should display something like:

List of packages living in a remote instance.

Then you can use the brane pull command to pull one of the available packages:

brane pull <ID>

where <ID> is typically the name of the package. However, if you want to download a specific package version instead of just the latest version, you can also use <NAME>:<VERSION>.

For example, to download the hello_world package from an instance that is reachable at some-domain.com:

# Needs only doing once
brane instance add some-domain.com --use

# Pull the package
brane pull hello_world
# Or, to pull version 1.0.0 specifically:
brane pull hello_world:1.0.0

info Note that you only have to login once, which is then saved and remembered until you manually login to another instance later.

Pushing packages

Aside from making packages available on your local machine, you also need the ability to publish packages to a remote instance.

warning Note that this action makes packages available for everyone with access to that instance. Make sure that you have permission to do so before you publish a package.

To publish a package, you first have to make sure you are logged-in to an instance. If you have not already in the previous section, do so by running:

brane instance add <ADDRESS> --use

where <ADDRESS> is the URL where the instance may be reached. Consult the previous chapter for more information on this and related commands.

Then, you can find a list of the packages installed locally by running:

brane list

which should return something like:

List of packages living locally.

Next, you can push the latest version of a package to the remote instance by using:

brane push <ID>

where <ID> is the name of the package. However, if you want to download a specific package version, you can also use <NAME>:<VERSION>.

For example, to push the package hello_world to an instance that is reachable at some-domain.com:

# Needs only doing once
brane instance add some-domain.com --use

# Push the package
brane push hello_world
# Or, to push version 1.0.0 specifically:
brane push hello_world:1.0.0

Removing packages

Finally, Brane conveniently offers you functions for removing existing packages without diving into filesystems.

Local packages

For local packages, you can use the following command:

brane remove <ID>

where <ID> is the name of the package. If you want to delete a specific version instead of all its versions, you can use <NAME>:<VERSION> instead.

For example, to remove the hello_world package from the local repository:

brane remove hello_world
# Or, a specific version:
brane remove hello_world:1.0.0

info Don't worry - Brane will always ask you if you are sure before removing a package. Should you want to consciously skip that, however, you can use the --force flag to skip the check:

brane remove hello_world --force

Use at your own risk.

The same can be done for remote packages, except that you should use brane unpublish instead:

brane unpublish <ID>
# Don't forget to login first if you haven't already - you are interacting with an instance again
brane instance add some-domain.com --use

# For hello_world:
brane unpublish hello_world
# Or a specific version:
brane unpublish hello_world:1.0.0

Next

Now that you can manage packages, it is finally time to move on to that which you are here to do: write workflows. There are multiple languages to choose from; you can go to the next chapter for BraneScript, or to the Bakery chapter for the Bakery language.

However, if you already have a basic idea, you can also skip ahead to further chapters in each language to discuss increasingly advanced concepts. Alternatively, you can also just check the more extensive tutorials on BraneScript or Bakery documentation in their own series of chapters.

BraneScript Workflows

In these chapters, we will discuss what it means to write a workflow, and how BraneScript can help you to do this.

In the first chapter, we will discuss what it is that we exactly try to model with BraneScript. To this end, we will go in a bit more depth on workflows, and discuss the example that we will spend untangling for the rest of this series.

In the second chapter, we will discuss calling the external functions in BraneScript, which is arguably the most elementary yet useful operation that can be done. Then, in the third chapter, we will discuss variables, after which we will treat control flows statements in the fourth chapter. Finally, we will discuss the notion of Data in Brane, after which you will have completed you brief BraneScript bootcamp.

Note that this tutorial is not meant to give you a complete overview of BraneScript. Instead, it will teach you the most important concepts for writing most workflows. If you are eager to learn about all of its features, consider checking the extended tutorial in the chapters on BraneScript itself, or even consult the language specification in the Brane: A Specification book.

The language in high-level

Calling functions

Variables

Control flow

Datasets in a workflow

Bakery Workflows

Your first workflow

Workflows and Data

Advanced workflows

Writing a full workflow: Jupyter-style

Introduction

In these series of chapters, you can find reference materials for the various configuration files used in BRANE. These are mostly relevant for system administrators, but also include user-facing configuration files such as the container or data files.

The configuration files are ordered by user. In the admin chapters, you can find configuration files for system administrators, and in the user chapters you can find configuration files for system engineers or scientists.

Alternatively, you can use the sidebar to the left to find an overview of all configuration files.

Conventions

Throughout these chapters, we use the following convention to represents configuration files.

In most cases, configuration files are defined as either YAML or JSON. In either such case, we typically define the toplevel fields in the struct as the JSON type their are - or as a

Configuration files for users

In this chapter, you can find an overview of the configuration files for software engineers and scientists.

There are other configuration files in BRANE, but these are for use of administrators of nodes and instances. You can find those here, or check the sidebar to the left.

In addition, there are a few configuration files that are present on the user's machine, but only relevant for the framework itself. These are discussed in the specification.

The following configuration files are relevant for users of the framework:

  • container.yml: A YAML file that defines the metadata of a BRANE package.
  • data.yml: A YAML file that defines the metadata of a BRANE dataset.

The container file

source ContainerInfo in specifications/data.rs.

The container file, or more commonly referenced as the container.yml file, is a user-facing configuration file that describes the metadata of a BRANE package. Most notably, it defines how the package's container may be built by stating which files to include, how to run the code, and which BRANE functions are contained within. Additionally, it can also carry data such as the owner of the package or any hardware requirements the package has.

Examples where simple container.ymls are written can be found in the chapters for software engineers.

Toplevel layout

The container.yml file is written in YAML. It has quite a lot toplevel fields, so they are discussed separately in the following subsections.

First, we discuss all required fields in the first subsection. Then, in the subsequent sections, all optional fields are discussed.

Required fields

  • name: A string defining the package's identifier. Must be unique within a BRANE instance.
  • version: A string defining the package's version number. Given as three non-negative numbers, separated by a dot (e.g., 0.1.0), representing the major, minor and patch versions, respectively. Conventionally adheres to semantic versioning. Forms a unique identifier for this specific package together with the name.
  • kind: A string defining the kind of the package. For a container.yml, this must always be ecu (which stands for ExeCutable Unit).
  • entrypoint: A map that describes which file to run when any function in the package is executed. The following fields are supported:
    • kind: The kind of entrypoint. Currently, only task is supported.
    • exec: The path to the file to execute. Note that all paths are relative to the rootmost file or directory defined in the files-field.

An example of the required toplevel fields:

# Shows an absolute bare minimum header for a `hello_world` package that contributes nothing

name: hello_world
version: 1.0.0
kind: ecu

entrypoint:
  kind: task
  # This won't do anything
  exec: ":"

Extra metadata

  • owners [optional]: A sequence of strings which defines the owners/writers of the package. Omitting this field will default to no owners.
  • description [optional]: A string description the package in more detail. This is only used for the brane inspect-subcommand (see the chapters for software engineers). If omitted, will default to an empty string / no description.

An example of using these fields:

...

owners:
- Amy
- Bob

description: |
  An example package with a lengthy description!

  We even have two lines, using YAML's bar-syntax.

Functions & Classes

  • actions [optional]: A map of strings to nested maps that specifies which functions are available to call in this package. Every key defines the name of the function, and every nested map defines what BRANE needs to know about it. The definition of this nested map is given below. Omitting the field will default to no functions defined.
  • types [optional]: A map of strings to nested maps that specifies any custom classes contributed by this package. Omitting this field will default to no such types defined. Every key defines the name of the class, and the map accepts the following possible fields:
    • properties [optional]: A map of string property names to string data types. The data types are listed in the appropriate section below. Omitting the field will default to no properties defined.
    • methods [optional]: A map of string method names to nested maps that define what BRANE needs to know about a function body. The definition of this nested map is given below, with the additional requirement imposed on it that there must be at least one input argument called self that has the same type as the class of which is it part. Omitting the field will default to no methods defined.

An example of either of the two fields:

# Shows an example function that returns a "Hello, world!"-string as discussed in the chapters for software engineers

...

entrypoint:
  kind: task
  exec: hello_world.sh

actions:
  hello_world:
    output:
    - name: output
      type: string
# Shows an example function that return a "Hello, world!"-string using commands only

...

entrypoint:
  kind: task
  exec: /bin/bash

actions:
  hello_world:
    command:
      args:
      - "-c"
      - "echo 'output: Hello, world!'"
    output:
    - name: output
      type: string
# Example that shows the definition of a HelloWorld-class that would say hello to a replacement of 'world'

...

types:
  HelloWorld:
    properties:
      world: string
    methods:
      hello_world:
        # Mandatory argument
        input:
        - name: self
          type: HelloWorld
        output:
        - name: output
          type: string

warning Older BRANE versions (<= 3.0.0) have more limited support for custom classes. First, the key is effectively ignored, and instead an additional name-field defines the string name. Second, the properties are not defined by map but instead more like arguments (i.e., as a sequence with a separate name field). Finally, the methods-field is not supported altogether. The closest alternative to the above example would thus be:

actions:
  # Replaces the method, instead requiring a manual function call
  hello_world_method:
    input:
    - name: self
      type: HelloWorld
    output:
    - name: output
      type: string

types:
  HelloWorld:
    name: HelloWorld
    properties:
    - name: world
      type: string

Container creation fields

  • base [optional]: A string describing the name of the base image for the container. Note that currently, only Debian-based images are supported (due to dependencies being installed with apt-get). If omitted, will default to ubuntu:20.04.
  • environment [optional]: A map of string environment variable names to their values to set in the container using the ENV-command. These occur first in the Dockerfile. Omitting this field means no custom environment variables will be set.
  • dependencies [optional]: A sequence of strings describing additional packages to install in the image. They should be given as package names in the repository of the base image, since they will be installed using apt-get. The installation of these occurs before any of the subsequent fields in the Dockerfile. If omitted, no custom dependencies will be installed.
  • install [optional]: A sequence of strings that defines additional commands to run in the container. Every string will be one RUN-command. Since these are placed after the dependencies-step, but before the files-step in the Dockerfile, they are conventionally used to install non-apt dependencies. If omitted, none of such RUN-steps will be added.
  • files [optional]: A sequence of strings that refer to files to copy into the container. Every entry is one file, which can either be an absolute path or a relative path. The latter will be interpreted as relative to the container.yml file itself, unless the brane build --context flag is used (see brane build --help for more information). The copy of the files will occur after the install-steps, but before the postinstall-steps. If omitted, no files will be copied.
  • postinstall (or unpack) [optional]: A sequence of strings that defines additional commands to run in the container after the files have been copied. Every string will be one RUN-command. Since these are placed after the files-step in the Dockerfile, they are conventionally used to post-process the source files, such as unpacking archives, downloading additional files or executing Pipfiles. If omitted, none of such RUN-steps will be added.

A few examples of the above fields:

# Shows a typical installation for a Hello-World python script

...

dependencies:
- python3
- python3-pip

install:
- pip3 install yaml

files:
- hello_world.py

...
# Shows a more complex example where we use a new Ubuntu version and postinstall from a requirements.txt

...

base: ubuntu:22.04

dependencies:
- python3
- python3-pip

files:
- hello_world.py
- requirements.txt

postinstall:
- pip3 install -f requirements.txt

...

Function layout

The following fields define a function layout:

  • requirements [optional]: A sequence of strings that defines hardware capabilities that are required for this package. An overview of the possible capabilities can be found here.
  • command [optional]: A map that can modify the command to call the file defined in the entrypoint toplevel field. Omitting it implies that no additional arguments should be passed. It has the following two possible fields:
    • args: A sequence of strings that defines the arguments to pass to the file. Conventionally, this is used to distinguish between the various functions in the file (since there is only one entrypoint).
    • capture [optional]: A string that defines the possible modes of capturing the entrypoint's output. The output should be a YAML file, but only that defined by the capture is identified as such. Possible options are: complete to capture the entire stdout; marked to capture everything in between a --> START CAPTURE-line and an --> END CAPTURE-line (not including those markers, and only once); or prefixed to capture every line that starts with ~~> (which is stripped away). Omitting the capture-field, or omitting the command-field altogether, will default to the complete capture mode.
  • input [optional]: A sequence of maps that defines the input arguments to the function. The order of the sequence determines the order of the arguments in the workflow language. Omitting the sequence defaults to an empty sequence, i.e., no input arguments taken. The following fields can be used in each nested map:
    • name: A string name of the argument. This will define the name of the environment variable that the entrypoint executable can read to obtain the given value for this argument. It is not a one-to-one mapping; instead, the environment variable has the same name but in UPPERCASE. In addition, this name will also be used in BraneScript error messages when relevant.
    • type: A string describing the BRANE data type of the input argument. The possible types are defined in the relevant section below.
  • output [optional]: A sequence of maps that defines the possible output values to a function. The nested maps are of the same as for the input-field (see above), except that the names of the values are used as fieldnames in the YAML outputted by the package code. The order or the sequence determines the order of the returned values in the workflow language. Omitting the sequence defaults to an empty sequence, i.e., returning a void-value.

    warning Older BRANE versions (<= 3.0.0) do not support more than one output value, even though they do require a YAML map to be passed. In other words, the sequence cannot be longer than one entry.

  • description [optional]: An additional description for this specific function. This is only used for the brane inspect-subcommand (see the chapters for software engineers). If omitted, will default to an empty string / no description.

For examples, see the Functions & Classes section.

Data types

source DataType in brane_ast/data_type.rs.

BRANE abstracts the various workflow languages it accepts as input to a common representation. This representation is what is referred to in the container.yml file when we are talking about data types, and so these data types are language-agnostic.

The following identifiers can be used to refer to certain data types:

  • bool or boolean refers to a boolean value.
  • int or integer refers to a whole number (64-bit integer).
  • float or real refers to a floating-point number (64-bit float).
  • string refers to a string value.
  • Any other alphanumerical identifier is interpreted to be a custom class name (see the toplevel types-field).
  • Any of the above can be wrapped in square brackets (e.g., [int]) or suffixed by square brackets (e.g., int[]) to define an array of the wrapped/suffixed type.
    • Nested arrays are possible (e.g., [[float]] or float[][]).

For examples, see the Functions & Classes section.

Full example

A full example of a container.yml file, taken from the Hello, world!-tutorial:

# Define the file metadata
# Note the 'kind', which defines that it is an Executable Code Unit (i.e., runs arbitrary code)
name: hello_world
version: 1.0.0
kind: ecu

# Specify the files that are part of the package. All entries will be resolved to relative to the container.yml file (by default)
files:
- hello_world.sh

# Define the entrypoint: i.e., which file to call when the package function(s) are run
entrypoint:
  kind: task
  exec: hello_world.sh

# Define the functions in this package
actions:
  # We only have one: the 'hello_world()' function
  'hello_world':
    # We define the output: a string string, which will be read from the return YAML under the 'output' key.
    output:
    - type: string
      name: output

The data file

source AssetInfo in specifications/data.rs.

The data file, or more commonly referenced as the data.yml file, is a user-facing configuration file that describes the metadata of a BRANE dataset. Most notably, it defines how the dataset can be referenced in BraneScript (i.e., its identifier) and describes which files or other resources actually make up the dataset.

Examples where simple data.ymls are written can be found in the chapters for scientists.

Toplevel layout

The container.yml file is written in YAML. It has the following toplevel fields:

  • name: The name/identifier of the dataset. This will be used in the workflow language to refer to it, and must thus be unique within an instance.
  • access: A map that describes how the package may be accessed. The map has multiple variants in principle, although currently only the !file-variant is supported:
    • path: A string that refers to a file or folder that has the actual dataset. Can be absolute or relative, where the latter case is interpreted as relative to the data.yml file itself (unless brane data build --context changes it; see brane data build --help for more information). The pointed file or folder will be attached to containers as-is.
  • owners [optional]: A sequence of strings which defines the owners/writers of the dataset. Omitting this field will default to no owners.
  • description [optional]: A string description the dataset in more detail. This is only used for the brane data inspect-subcommand. If omitted, will default to an empty string / no description.

For example, the following defines a data.yml file for a simple CSV file called jedis.csv:

name: jedis
description: A simple CSV file listing Jedis that survived Order 66.
owners:
- Sheev Palpatine
access: !file
  path: ./jedis.csv

Configuration files for administrators

In this chapter, you can find an overview of the configuration files for administrators.

There are also a few configuration files not mentioned here, which are user-facing. You can find them in these chapters instead. Or check the sidebar on the left.

The configuration files for administrators are sorted by node type. The files are referenced by their canonical name.

Control node

  • infra.yml: A YAML file that defines the worker nodes in the instance represented by the control node.
  • proxy.yml: A YAML file that defines the proxy settings for outgoing node traffic. Can also be found on the worker and proxy nodes.
  • node.yml: A YAML file that defines the environment settings for this node, such as paths of the directories and the other configuration files, ports, hostnames, etc. Can also be found on the worker and proxy nodes.

Worker node

  • backend.yml: A YAML file that defines how the worker node connects to the container execution backend. More information about the specific backends can be found in the chapters for administrators.
  • policies.yml: A YAML file that defines any access control rules for which containers may be executed, and which datasets may be downloaded by whom. For more information, see the chapters for policy experts.
  • data.yml: A YAML file that defines the layout of a dataset that is published on a worker node. Is the same file as used by users.
  • proxy.yml: A YAML file that defines the proxy settings for outgoing node traffic. Can also be found on the central and proxy nodes.
  • node.yml: A YAML file that defines the environment settings for this node, such as paths of the directories and the other configuration files, ports, hostnames, etc. Can also be found on the central and proxy nodes.

Proxy node

  • proxy.yml: A YAML file that defines the proxy settings for outgoing node traffic. Can also be found on the central and worker nodes.
  • node.yml: A YAML file that defines the environment settings for this node, such as paths of the directories and the other configuration files, ports, hostnames, etc. Can also be found on the central and worker nodes.

The infrastructure file

source InfraFile in brane_cfg/infra.rs.

The infrastructure file, or more commonly referenced as the infra.yml file, is a control node configuration file that is used to define the worker nodes part of a particular BRANE instance. Its location is defined by the node.yml file.

The branectl tool can generate this file for you, using the branectl generate infra subcommand. See the chapter on installing a control node for a realistic example.

Toplevel layout

The infra.yml file is written in YAML. It features only the following toplevel field:

  • locations: A map that details the nodes present in the instance. It maps from strings, representing the node identifiers, to another map with three fields:
    • name: Defines a human-friendly name for the node. This is only used on the control node, and only to make some logging messages nicer; there are therefor no constraints on this name.
    • delegate: The address of the delegate service (i.e., brane-job) on the target worker node. Must be given using a scheme (either http or grpc), an IP address or hostname and a port.
    • registry: The address of the local registry service (i.e., brane-reg) on the target worker node. Must be given using a scheme (https), an IP address or hostname and a port.

For example, the following defines an infra.yml file for two workers, amy at amy-worker-node.com and bob at 1.2.3.4:

locations:
  # Amy's node
  amy:
    name: Amy's Worker Node
    delegate: grpc://amy-worker-node.com:50052
    registry: https://amy-worker-node.com:50051

  # Bob's node
  bob:
    name: Bob's Worker Node
    delegate: http://1.2.3.4:1234
    registry: https://1.2.3.4:1235

The backend file

source BackendFile in brane_cfg/backend.rs.

The backend file, or more commonly referenced as the backend.yml file, is a worker node configuration file that describes how to connect to the container execution backend. Its location is defined by the node.yml file.

The branectl tool can generate this file for you, using the branectl generate backend subcommand. See the chapter on installing a worker node for a realistic example.

Toplevel layout

The backend.yml file is written in YAML. It features only the following three toplevel fields:

  • method: A map that defines the method of accessing the container execution backend. Can be one of the following options:
    • !Local: Connects to the Docker engine local to the node on which the worker node runs. This variant has the following two fields:
      • path [optional]: A string with the path to the Docker socket to connect to. If omitted, will default to /var/docker/run.sock.
      • version [optional]: A sequence of two numbers detailling the Docker client version to connect to. If omitted, will negotiate a client version on the fly.
  • capabilities [optional]: A sequence of strings, each of which defines a capability of the computing backend. Currently supported capabilities are defined below. If omitted, no capabilities are enabled.
  • hash_containers [optional]: A boolean that defines whether to hash containers, enabling container policies in the policies.yml file. It may give a massive performance boost when using many different larger containers (100MB+), although the hashes are cached as long as the containers are cached. If omitted, will default to 'true'.

A few example backend.yml files:

# Defines the simplest possible file, which is a local file with default options
method: !Local
# Defines a local file that has a different Docker socket path
method: !Local
  path: /home/amy/my-own-docker.sock
# Defines a default local backend that supports CUDA containers and explicitly hashes all containers
capabilities:
- cuda_gpu
method: !Local
hash_containers: true

warning In older versions of BRANE (<= 2.0.0), the tagged enum representation (e.g., !Local) was not yet supported. Instead, use the additional kind-field to distinguish. For example:

# This is the same as the first example
method:
  kind: local
# This is the same as the second example
method:
  kind: local
  path: /home/amy/my-own-docker.sock

...

Capabilities

The following capabilities can be used in the backend.yml file:

  • cuda_gpu: States that the compute backend can provide a CUDA accelerator to containers who ask for that. See the requirements-field in the user's container.yml file.

The data file

source AssetInfo in specifications/data.rs.

The data file, or more commonly referenced as the data.yml file, is a user-facing configuration file that describes the metadata of a BRANE dataset. Most notably, it defines how the dataset can be referenced in BraneScript (i.e., its identifier) and describes which files or other resources actually make up the dataset.

Examples where simple data.ymls are written can be found in the chapters for scientists.

Toplevel layout

The container.yml file is written in YAML. It has the following toplevel fields:

  • name: The name/identifier of the dataset. This will be used in the workflow language to refer to it, and must thus be unique within an instance.
  • access: A map that describes how the package may be accessed. The map has multiple variants in principle, although currently only the !file-variant is supported:
    • path: A string that refers to a file or folder that has the actual dataset. Can be absolute or relative, where the latter case is interpreted as relative to the data.yml file itself (unless brane data build --context changes it; see brane data build --help for more information). The pointed file or folder will be attached to containers as-is.
  • owners [optional]: A sequence of strings which defines the owners/writers of the dataset. Omitting this field will default to no owners.
  • description [optional]: A string description the dataset in more detail. This is only used for the brane data inspect-subcommand. If omitted, will default to an empty string / no description.

For example, the following defines a data.yml file for a simple CSV file called jedis.csv:

name: jedis
description: A simple CSV file listing Jedis that survived Order 66.
owners:
- Sheev Palpatine
access: !file
  path: ./jedis.csv

The proxy file

source ProxyConfig in brane_cfg/proxy.rs.

The proxy file, or more commonly referenced as the proxy.yml file, is a central-, worker- and proxy node configuration file that describes how to deal with outgoing connections out of the node. For more information, see the documentation for the brane-prx service. Its location is defined by the node.yml file.

The branectl tool can generate this file for you, using the branectl generate proxy subcommand. See the chapter on installing a control node for a realistic example.

Toplevel layout

The proxy.yml file is written in YAML. It features only the following three toplevel fields:

  • outgoing_range: A map that defines the range of ports that can be allocated when other BRANE services request new outgoing connections. This port should be sufficiently large to support at least two connections to every worker node that this node will talk to (which, in the case of a central node('s proxy node), is all worker nodes). The map has the following two fields:
    • start: A positive number indicating the start port (inclusive).
    • end: A positive number indicating the end port (inclusive).
  • incoming: A map that maps incoming ports to BRANE service addresses for incoming connections. Specifically, every key is a number indicating the port that can be connected to, where the connection will then be forwarded to the address specified in the value. Must be given using a scheme, an IP address or hostname and a port.
  • forward [optional]: A map that carries any configuration for forwarding traffic through a sockx proxy. Specifically, it is a map with the following fields:
    • address: The address to forward the traffic to. Must be given using a scheme (either socks5 or socks6), an IP address or hostname and a port.
    • protocol: The protocol to use for forwarding traffic. Can be either socks5 or socks6 to use the SOCKS protocol version 5 or 6, respectively.

The following examples are examples of valid proxy.yml files:

# This is a minimal example, supporting up to ~50 worker nodes
outgoing_range:
  start: 4200
  end: 4299
incoming: {}
# A more elaborate example mapping a few incoming ports as well
outgoing_range:
  start: 4200
  end: 4299
incoming:
  5200: http://brane-api:50051
  5201: grpc://brane-drv:50053
# An example where we route some network traffic
outgoing_range:
  start: 4200
  end: 4299
incoming: {}
forward:
  address: socks5://socks-proxy.net:1234
  protocol: socks5

info The protocol-field in the forward-map may become obsolete in future versions of BRANE if we apply stricter code restrictions on the protocol used in the address-field. You can ease the transition already by being careful which protocol to use.

The node file

source NodeConfig in brane_cfg/node.rs.

The node file, or more commonly referenced as the node.yml file, is a central-, worker- and proxy node configuration file that describes the environment in which the node should run. Most notably, it defines the type of node, where any BRANE software (branectl, services) may find other configuration files and which ports to use for all of the services.

The branectl tool can generate this file for you, using the branectl generate node subcommand. See the chapter on installing a control node for a realistic example.

Toplevel layout

The node.yml file is written in YAML. It defines only two toplevel fields:

  • hostnames: A map of strings to other strings, which maps hostnames to IP addresses. This is used to work around the issue that certificates cannot be issued for raw IP addresses alone, and need a hostname instead. The hostnames can be defined in this map to make them available to all the services running in this node. For more information, see the chapter on installing a control node (at the end).
  • node: A map that has multiple variants based on the specific node configuration. These are all treated below in their own sections.

An example of just the toplevel fields would be:

# We don't define any hostnames
hostnames: {}
node: ...
  ...
# This example allows us to use `amy-worker-node.com` on any of the services to refer to `4.3.2.1`
hostnames:
  amy-worker-node.com: 4.3.2.1
node: ...
  ...

Because there are quite a lot of nested fields, we will discuss the various variants of the node-map in subsequent sections.

Central nodes

source CentralConfig in brane_cfg/node.rs.

The first variant of the node-map is the !central variant, which defines a central node. There are two fields in this map:

  • paths: A map that defines all paths relevant to the central node. Specifically, it maps a string identifier to a string path. The following identifiers are defined:

    • certs: The path to the directory with certificate authority files for the worker nodes in the instance. See the chapter on installing a control node for more information.
    • packages: The path to the directory where uploaded packages will be stored. This should be a persistent directory, or at the very least exactly as persistent as the storage of the instance's Scylla database.
    • infra: The path to the infra.yml configuration file.
    • proxy: The path to the proxy.yml configuration file.

    warning Note that all paths defined in the node.yml file must be absolute paths, since they are mounted as Docker volumes.

  • services: A map that defines the service containers in the central node and how they are reachable. It is a map of a service identifier to one of three possible maps: a private service, a public service or a variable service. Each of these are explained at the end of the chapter.
    The following identifiers are available:

An example illustrating just the central node:

...

node: !central
  paths:
    # Note all paths are full, absolute paths
    certs: /home/amy/config/certs
    packages: /home/amy/packages
    infra: /home/amy/config/infra.yml
    proxy: /home/amy/config/proxy.yml

  services:
    api:
      ...
    drv:
      ...
    # (We can also use the aliases, if we like)
    planner:
      ...
    proxy: ...
      ...

    aux_scylla:
      ...
    aux_kafka:
      ...
    zookeeper:
      ...

Worker nodes

source WorkerConfig in brane_cfg/node.rs.

The second variant of the node-map is the !worker variant, which defines a worker node. There are three fields in this map:

  • name (or location_id): A string that contains the identifier used to recognize this worker node throughout the system.

  • usecases (or use_cases): A map of string identifiers to worker usecases. This essentially defines several central instances that the work trusts and is aware of, and acts as a map of the identifier to where to find the instance's registry.

  • paths: A map that defines all paths relevant to the central node. Specifically, it maps a string identifier to a string path. The following identifiers are defined:

    • certs: The path to the directory with certificate authority files for the worker nodes in the instance. See the chapter on installing a control node for more information.
    • packages: The path to the directory where uploaded packages will be stored. This should be a persistent directory, or at the very least exactly as persistent as the storage of the instance's Scylla database.
    • backend: The path to the backend.yml configuration file.
    • policy_database (or policy_db): The path to the [policies.db] file that is the persistent storage for the policy's of the worker's brane-chk service.
    • policy_deliberation_secret: The path to a JWK that defines the secret used for brane-chk's deliberation API.
    • policy_expert_secret: The path to a JWK that defines the secret used for brane-chk's policy expert (management) API.
    • policy_audit_log: An optional path to a file to which the brane-chk service writes it audit log. If omitted, the audit log only exists within the brane-chk container.
    • proxy: The path to the proxy.yml configuration file.
    • data: The path to the directory where datasets may be defined that are available on this node. See data.yml for more information.
    • results: The path to a directory where intermediate results are stored that are created on this node. It does not have to be persistent per sé, although the services will assume they are persistent for the duration of a workflow execution.
    • temp_data: The path to a directory where datasets are stored that are downloaded from other nodes. It does not have to be a persistent folder.
    • temp_results: The path to a directory where intermediate results are stored that are downloaded from other nodes. It does not have to be a persistent folder.

    warning Note that all paths defined in the node.yml file must be absolute paths, since they are mounted as Docker volumes.

  • services: A map that defines the service containers in the central node and how they are reachable. It is a map of a service identifier to one of three possible maps: a private service, a public service or a variable service. Each of these are explained at the end of the chapter.
    The following identifiers are available:

An example illustrating just the worker node:

...

node: !worker
  paths:
    # Note all paths are full, absolute paths
    certs: /home/amy/config/certs
    packages: /home/amy/packages
    backend: /home/amy/config/backend.yml
    policy_database: /home/amy/policies.db
    policy_deliberation_secret: /home/amy/config/policy_delib_secret.json
    policy_expert_secret: /home/amy/config/policy_expert_secret.json
    policy_audit_log: /home/amy/checker-audit.log     # May be omitted!
    proxy: /home/amy/config/proxy.yml
    data: /home/amy/data
    results: /home/amy/results
    temp_data: /tmp/data
    temp_results: /tmp/results

  services:
    reg:
      ...
    job:
      ...
    # (We can also use the aliases, if we like)
    checker:
      ...
    proxy: ...
      ...

Proxy nodes

source ProxyConfig in brane_cfg/node.rs.

The third variant of the node-map is the !proxy variant, which defines a proxy node. There are two fields in this map:

  • paths: A map that defines all paths relevant to the proxy node. Specifically, it maps a string identifier to a string path. The following identifiers are defined:

    • certs: The path to the directory with certificate authority files for the worker nodes in the instance. See the chapter on installing a control node for more information.
    • proxy: The path to the proxy.yml configuration file.

    warning Note that all paths defined in the node.yml file must be absolute paths, since they are mounted as Docker volumes.

  • services: A map that defines the service containers in the proxy node and how they are reachable. It is a map of a service identifier to a variable service. This is explained at the end of the chapter.
    The following identifiers are available:

    • prx (or proxy): Defines the brane-prx container as a public service (note: this is different from the other node types).

An example illustrating just the worker node:

...

node: !worker
  paths:
    # Note all paths are full, absolute paths
    certs: /home/amy/config/certs
    proxy: /home/amy/config/proxy.yml

  services:
    prx:
      ...

Service maps

Through the various node variants, a few types of service maps appear. In this section, we will define their layouts.

Private services

source PrivateService in brane_cfg/node.rs.

A private service represents a service that is only accessible for other BRANE services, but not from outside of the Docker network. A few examples of such services are aux-scylla or aux-kafka.

Private services have three fields:

  • name: A string with the name of the Docker container. This can be anything, but by convention, this is brane- followed by the ID of the service (e.g., brane-prx or brane-api). On worker nodes, this may optionally be suffixed by the name of the worker (e.g., brane-reg-bob), and on proxy nodes, this may be suffixed by proxy (e.g., brane-prx-proxy). Finally, third-party services are often named aux- and then the service ID instead or brane- (e.g., aux-scylla).
  • address: A string with the address that other services running on this node can use to reach this service. Because this only applies to services in the same Docker network, you can use Docker DNS names (e.g., you can use aux-scylla as a hostname to refer a container with the same name).
  • bind: A string with the socket address (and port) that the service should launch as. The port should match the one given in address.

An example showing a private service:

...

node: !central
  # The type of service is hardcoded for every node, so no need for the tags (e.g., `!kafka`)
  aux_scylla:
    name: aux-scylla
    # The Scylla images launches of 9042 by default, so might as well use that
    address: aux-scylla:9042
    # Accepts any connection
    bind: 0.0.0.0:9042

  ...

warning Note that providing 127.0.0.1 as a bind address will not work, since the 127.0.0.1 refers to the container and not the host. Thus, using that address will make the service inaccessible for everyone.

Public services

source PublicService in brane_cfg/node.rs.

A public service represents a service that is accessible for other BRANE services and from outside of the Docker network. A few examples of such services are brane-drv or brane-reg.

Private services have three fields:

  • name: A string with the name of the Docker container. This can be anything, but by convention, this is brane- followed by the ID of the service (e.g., brane-prx or brane-api). On worker nodes, this may optionally be suffixed by the name of the worker (e.g., brane-reg-bob), and on proxy nodes, this may be suffixed by proxy (e.g., brane-prx-proxy). Finally, third-party services are often named aux- and then the service ID instead or brane- (e.g., aux-scylla).
  • address: A string with the address that other services running on this node can use to reach this service. Because this only applies to services in the same Docker network, you can use Docker DNS names (e.g., you can use brane-drv as a hostname to refer a container with the same name).
  • bind: A string with the socket address (and port) that the service should launch as. The port should match the one given in address.
  • external_address: A string with an address that services running on other nodes can use to reach this service. Specifically, this is the address that the node will send to other nodes as a kind of calling card, i.e., an address where they can be reached.

    info Because this is just an advertised address, this address can be used to connect through a gateway (or proxy node) that then redirects the traffic to the correct machine and port.

An example showing a public service:

...

node: !central
  # The type of service is hardcoded for every node, so no need for the tags (e.g., `!kafka`)
  api:
    name: brane-api
    address: http://brane-api:50051
    # Accepts any connection
    bind: 0.0.0.0:50051
    # In this example, we are running on node `amy` living at `amy-central-node.com`
    external_address: http://amy-central-node.com:50051

  ...

warning Note that providing 127.0.0.1 as a bind address will not work, since the 127.0.0.1 refers to the container and not the host. Thus, using that address will make the service inaccessible for everyone.

Variable services

source PrivateOrExternalService in brane_cfg/node.rs.

A variable service is one where a choice can be made between two different kinds of services. Specifically, one can choose to either host a private service, or something called an external service, which defines a service hosted on another node or machine. This is currently only used by the brane-prx service in central or worker nodes, to support optionally outsourcing the proxy service to a dedicated node.

Subsequently, there are two variants of this type of service:

  • !private: Defines a private service map that describes how to host the service. This is exactly identical to the private service other than the tag.

  • !external: Defines an externally running service. It has one field only:

    • address: A string with the address where all the other services on this node should send their traffic to.

    (source The external map variant is defined as ExternalService in brane_cfg/node.rs.)

A few examples of variable services:

# Example that show the private variant of the variable service.

...

node: !worker
  # Note that this is just a private service
  prx: !private
    name: brane-prx
    address: brane-prx:50050
    bind: 0.0.0.0:50050

  ...
# Example that show the external variant of the variable service

...

node: !worker
  # We refer to a node living at the host `amy-proxy-node.com`
  prx: !external
    address: amy-proxy-node.com:50050

  ...

Full examples

Finally, we show a few full examples of node.yml files.

# Shows a full central node
hostnames: {}
node: !central
  paths:
    # Note all paths are full, absolute paths
    certs: /home/amy/config/certs
    packages: /home/amy/packages
    infra: /home/amy/config/infra.yml
    proxy: /home/amy/config/proxy.yml

  services:
    api:
      name: brane-api
      address: http://brane-api:50051
      # Accepts any connection
      bind: 0.0.0.0:50051
      # In this example, we are running on node `amy` living at `amy-central-node.com`
      external_address: http://amy-central-node.com:50051
    drv:
      name: brane-drv
      address: http://brane-drv:50053
      bind: 0.0.0.0:50053
      external_address: http://amy-central-node.com:50053
    # (We can also use the aliases, if we like)
    planner:
      name: brane-plr
      address: http://brane-plr:50052
      bind: 0.0.0.0:50052
    # (Shows the private variant of the proxy service)
    proxy: !private
      name: brane-prx
      address: brane-prx:50050
      bind: 0.0.0.0:50050

    aux_scylla:
      name: aux-scylla
      address: aux-scylla:9042
      bind: 0.0.0.0:9042
# Shows a full worker node, with a hostname mapping for `amy-worker-node.com`
hostnames:
  amy-worker-node.com: 4.3.2.1
node: !worker
  name: amy-worker-node
  usecases:
    central:
      api: http://amy-central-node.com:50051
  paths:
    # Note all paths are full, absolute paths
    certs: /home/amy/config/certs
    packages: /home/amy/packages
    backend: /home/amy/config/backend.yml
    policy_database: /home/amy/policies.db
    policy_deliberation_secret: /home/amy/config/policy_delib_secret.json
    policy_expert_secret: /home/amy/config/policy_expert_secret.json
    policy_audit_log: /home/amy/checker-audit.log
    proxy: /home/amy/config/proxy.yml
    data: /home/amy/data
    results: /home/amy/results
    temp_data: /tmp/data
    temp_results: /tmp/results

  services:
    reg:
      name: brane-reg-amy
      address: http://brane-reg:50051
      bind: 0.0.0.0:50051
      external_address: http://amy-worker-node.com:50051
    job:
      name: brane-job-amy
      address: http://brane-job:50052
      bind: 0.0.0.0:50052
      external_address: http://amy-worker-node.com:50052
    # (We can also use the aliases, if we like)
    checker:
      name: brane-chk-amy
      address: http://brane-chk:50053
      bind: 0.0.0.0:50053
    # (Shows the external variant of the proxy service)
    proxy: !external
      address: amy-proxy-node.com:50050
# Shows a full proxy node
hostnames: {}
node: !proxy
  paths:
    # Note all paths are full, absolute paths
    certs: /home/amy/config/certs
    proxy: /home/amy/config/proxy.yml

  services:
    # The proxy node uses a hardcoded public service
    proxy: !external
      name: brane-prx-proxy
      address: http://brane-prx:50050
      bind: 0.0.0.0:50050
      external_address: http://amy-proxy-node.com:50050

The Policy File

warning This page is for the deprecated method of entering policies into the system using a policies.yml file. A better method (involving eFLINT) is implemented through the policy-reasoner project.

Brane used to read its policies from a so-called policy file (also known as policies.yml) which defines a very simplistic set of access-control policies.

Typically, there is one such policy file per domain, which instructs the "reasoner" for that domain what is should allow and what not.

In this chapter, we discuss how one might write such a policy file. In particular, we will discuss the general layout of the file, and then the two kinds of policies currently supported: user policies and container policies.

Overview

The policies.yml file is written in YAML for the time being.

It has two sections, each of them corresponding to a kind of policy (users and containers, respectively). Each section is then a simple list of rules. At runtime, the framework will consider the rules top-to-bottom, in order, to find the first rule that says something about the user/dataset pair or the container in question. A full list of available policies can be found below.

Before that, we will first describe the kinds of policies in some more detail in the following sections.

User policies

User policies concern themselves what a user may access, and then specifically, which dataset they may access. These policies thus always describe some kind of rule on a pair of a user (known by their ID) and a dataset (also known by its ID).

As a policy expert, you may assume that by the time your policy file is consulted, the framework has already verified the user's ID. As for datasets, your policies are only consulted when data is accessed on your own domain, and so you can also assume that dataset IDs used correspond to the desired dataset.

Note that which user IDs and dataset IDs to use should be done in cooperation with the system administrator of your domain. Currently, the framework doesn't provide a safe way of communicating which IDs are available to the policy file, so you will have to retrieve the up-to-date list of IDs the old-fashioned way.

Container policies

Container policies concern themselves with which container is allowed to be run at a certain domain. Right now, it would have seemed obvious that they are triplets of users, datasets and containers - but due to time constraints, they currently only feature a container hash (e.g., its ID) that says if they are allowed to be implemented or not.

Because the ID of a container is a SHA256-hash, you can safely assume that whatever container your referencing will actually reference that container with the properties you know of it. However, similarly to user policies, there is no list available in the framework itself of known container hashes; thus, this list must be obtained by asking the system's administrator or, maybe more relevant, a scientist who wants to run their container.

Policies

In this section, we describe the concrete policies and their syntax. Remember that policies are checked in-order for a matching rule, and that the framework will throw an error if no matching rule is found.

In general, there are two possible actions to be taken for a given request: allow it, in which case the framework proceeds, or deny it, in which case the framework aborts the request. For each of those action, though, there are multiple ways of matching a user/dataset pair or a container hash, which results in the different policies described below.

Syntax-wise, the policies are given as a vector of dictionaries, where each dictionary is a policy. Then, every such dictionary must always have the policy key, which denotes its type (see the two sections below). Any other key is policy-dependent.

User policies

The following policies are available for user/dataset pairs:

  • allow: Matches a specific user/dataset pair and allows it.
    • user: The identifier of the user to match.
    • data: The identifier of the dataset to match.
  • deny: Matches a specific user/dataset pair and denies it.
    • user: The identifier of the user to match.
    • data: The identifier of the dataset to match.
  • allow_user_all: Matches all datasets for the given user and allows them.
    • user: The identifier of the user to match.
  • deny_user_all: Matches all datasets for the given user and denies them.
    • user: The identifier of the user to match.
  • allow_all: Matches all user/dataset pairs and allows them.
  • deny_all: Matches all user/dataset pairs and denies them.

Container policies

The following policies are available for containers:

  • allow: Matches a specific container hash and allows it.
    • hash: The hash of the container to match.
    • name (optional): A human-friendly name for the container (no effect on policy, but for debugging purposes).
  • deny: Matches a specific container hash and denies it.
    • hash: The hash of the container to match.
    • name (optional): A human-friendly name for the container (no effect on policy, but for debugging purposes).
  • allow_all: Matches all container hashes and allows them.
  • deny_all: Matches all container hashes and denies them.

Example

The following snippet is an example policy file:

# The user policies
users:
# Allow the user 'Amy' to access the datasets 'A', 'B', but not 'C'
- policy: allow
  user: Amy
  data: A
- policy: allow
  user: Amy
  data: B
- policy: deny
  user: Amy
  data: C

# Specifically deny access to `Dan` to do anything
- policy: deny_user_all
  user: Dan

# For any other case, we deny access
- policy: deny_all



# The container policies
containers:
# We allow the `hello_world` container to be run
- policy: allow
  hash: "GViifYnz2586qk4n7fdyaJB7ykASVuptvZyOpRW3E7o="
  name: hello_world

# But not the `cat` container
- policy: deny
  hash: "W5WS23jAAtjatN6C5PQRb0JY3yktDpFHnzZBykx7fKg="
  name: cat

# Any container not matched is allowed (bad practice, but to illustrate)
- policy: allow_all

Introduction

In this chapter, we will provide a brief overview of the different packages that Brane supports.

Overview

To provide an as versatile and easy-to-use interface as possible, Brane has different ways of defining packages. Apart from just being able to execute arbitrary code, it also supports perform requests according to the OpenAPI standard and (in the future) supports publishing Common Workflow Language workflows as packages as well.

Concretely, the different types that are supported are:

  • Executable Code Unit (ecu) packages are containers containing arbitrary code that is run via the branelet wrapper.
  • OpenAPI Standard (oas) packages are packages that make API requests defined in the OpenAPI format. It is, once again, the branelet executable that performs these calls.

drawing We are working on adding other package formats, which will be added to this list in the future. One promiment technology that we would like to add is support for the Common Workflow Language, and another one is publishing Brane's DSLs (BraneScript and Bakery) as packages as well.

In the subsequent chapters, we will document the exact workings of each supported package kind. The next chapter starts with a documentation of the Executable Code Unit packages, but you can skip to others by using the sidebar to the left.

container.yml documentation

drawing Documentation for container.yml will be added soon.

Introduction

The whole Brane framework revolves around the workflows, which define how package functions need to be called, in what order and with what data.

Because Brane aims to be easily accessible by multiple roles (the famous separation of concerns), it provides two Domain-Specific Languages (DSLs) that can be used to write workflows: BraneScript and Bakery.

Under the hood, these languages translate to the same code, and thus have the same semantics (i.e., meaning behind the code). Their syntax, however, is different: BraneScript resembles classical scripting languages (such as Bash or Lua) and is aimed to be convienient in use by software engineers; Bakery, in contrast, is designed to be more "natural language-like", to help scientists without much programming experience to understand the code they are writing or that someone else wrote.

In this series of chapters, we will be focussing on BraneScript and its syntax. For Bakery, you should refer to its own series of chapters.

Concept & Terminology

As stated, BraneScript is designed as a workflow specification. This means that the real work of any BraneScript file is not performed within the domain of BraneScript, but rather in the domain of the package functions that BraneScript calls. It only acts as a way to "glue" all these functions together and show the result(s) to the caller of the workflow.

In these few chapters, we will refer to BraneScript files as both workflows and scripts, making the terms interchangeable for the purpose of this documentation. Anything that the workflows call is referred to as package functions or external functions, which are implemented by deploying the package container and running the appropriate function therein.

info In the code snippets in these chapters, we wil use text enclosed in triangular brackets (<example>) to define parts of the syntax that are variable. For example, Hello <world> means that there must be a token Hello, followed by some arbitrary token that we will name world for being able to reference it.

Structure

This series aims to be a comprehensive introduction to BraneScript's features, much more elaborate than given in the chapters for Scientists. It will list all of BraneScript's language features in a tutorial-like fashion, assuming minimal programming experience in languages such as Python, Lua, C or Java.

In the first chapter, we will write a simple "Hello, world!" workflow to get your feet wet in the dirt, and to practise submitting workflows. Then, chapter two

Nexts

You can start this series by reading the next chapter, which will set you on your journey. It is recommended to follow the chapters in-order if it is your first time reading about BraneScript, but you can also jump between them using the sidebar on the left.

Alternatively, if you are looking for more technical details on how the BraneScript language is specified, we recommend you to inspect the specifications of the language in the Brane: A Specification book.

Writing a full workflow

In Brane, the role of a scientist is to write workflows: high-level descriptions of some algorithm or other program that implements some goal. They are high-level by the fact that Brane will try to handle tedious stuff such as choosing the best location for each task, moving around datasets or even applying optimisations.

In this chapter, we describe the basics about writing workflow. Specifically, we will discuss how to write a workflow to run the hello_world package and print its output (first section), as well as how to run it or submit it to a remote instance (second section). Finally, we also briefly discuss the Read-Eval-Print Loop (REPL), that provides a more interactive way of running workflows (last section).

Writing a workflow

To write a workflow, all you have to do is open a plain text file with your favourite text editor. The name does not matter, but it is conventional to have it end in .bs or .bscript if you are writing in BraneScript. For this tutorial, we will use the name: hello_world.bs.

Next, it is good practise to write a header comment to explain what a file does. This gives us a good excuse to talk about BraneScript comments: everything after a double slash (//) is considered as one. So, a documentation header for this workflow might look like:

// HELLO WORLD.bs
//   by Rick Sanchez
// 
// A BraneScript workflow for printing "Hello, world!" to the screen.

Next, we have to tell the Brane system which packages we want to use in our workflow. To do so, we will use the import-statement. This statement takes the identifier if the package, followed by a semicolon (;).

info Note that all Brane statements end with a semicolon ;. If you forget it, you may encounter weird syntax errors; so if you don't know what the error means but it's a syntax error, you should first check for proper usage of the semicolons.

We want to import the hello_world package, so we add:

// ...

import hello_world;

This will import all of the package's functions into the global namespace. Be aware of this, since this may lead to naming conflicts. See the advanced workflows-chapter for ways of dealing with this.

In our case, this imports only one function: the hello_world() function. As you can read in the package's README, this function takes no arguments, but returns a string who's value is: Hello, world!.

info If you do not want to go to the internet to find out what a package does, you can also use the inspect-subcommand of the brane-executable:

brane inspect hello_world

which will print something like: Details about the 'hello_world' package.

Even though hello_world() is an external function, BraneScript treats it like any old function, which is done similarly like in other languages. So to call the package function, simply add:

// ...

hello_world();

to your file.

However, running the file like this will probably not work. Remember that the package function returns the string, not print it; so to show it to us, the user, we have to use the builtin println() function:

// ...

// Use this instead, where we pass the result of the 'hello_world()'-call to 'println()'
// (as you would in other languages)
println(hello_world());

See the BraneScript documentation for a full overview of all supported builtins.

We now have a workflow that should print Hello, world! when we run it, which is what we set out to do!

The full workflow file, with some additional comments:

// HELLO WORLD.bs
//   by Rick Sanchez
// 
// A BraneScript workflow for printing "Hello, world!" to the screen.

// Define which packages we use, which makes its functions available ('hello_world()', in this case)
import hello_world;

// Prints the result of the 'hello_world()' call by using the 'println()' builtin
println(hello_world());

Be sure to save it somewhere where you can find it. Remember, we will refer to it as hello_world.bs.

Running a workflow

After you have written a workflow file, you can run it using the brane executable. Thus, make sure you have it installed and available in your PATH (see the installation chapter).

There are two modes of running a workflow: you can run it locally, in which all tasks are executed on your own machine and using the packages that are available locally. Alternatively, you can also run it on a remote instance, in which the tasks are executed on domains and nodes within that instance, using packages available only in that instance.

Local execution

Typically, you test your workflow locally first to make sure that it works and compiles fine without consuming instance resources.

To run it locally, you first have to make sure you have all the packages available locally. For us, this is the hello_world package. You can check whether you have it by running brane list, and then install it if you don't by downloading it from GitHub:

brane import epi-project/brane-std hello_world

For more details on this, see the previous chapter.

With the packages in place, you can then use the brane run-command to run the file we have just created:

brane run hello_world.bs

(Replace hello_world.bs with the path to the file you have created)

This will execute the workflow on your laptop. If it succeeded, you should something like:

Result of executing the workflow.

If your workflow failed, Brane will try to offer you as much help as it can. Make sure that your Docker instance runs (use sudo systemctl start docker if you see errors relating to "Failed to connect to Docker") and that you written the workflow correctly, and try again.

warning Note that the execution of such a simply workflow may take slightly longer than you expect; this will take a few seconds even on fast machines. This is due to the fact that packages are implemented as containers, which have to be spun up and, if this is the first time you run a workflow, also loaded into the daemon.

Remote execution

The procedure for executing a workflow on a remote instance is very comparable to running a workflow locally.

The first step is to make sure that the instance has all the packages you need. Use a combination of brane search and brane push to achieve this (see the previous chapter for more information).

Then, to execute your workflow, you can do the same, but now specify the --remote flag to use the instance currently selected:

brane run --remote ...

Thus, to run our workflow on the remote instance we are currently loggin-in to, we would use to the following command:

# We assume your already executed 'brane instance add'
brane run --remote hello_world.bs

If your packages are in order, this should produce the same result as when executing the workflow locally.

The REPL

As an alternative to writing an entire file and running that, you can also use the Brane Read-Eval-Print Loop (REPL). This is an interactive environment that you can use to provide workflows in a segmented way, typically providing one statement at a time and seeing the result immediately.

warning The REPL works in most cases, but it is known to be buggy for some design patterns (see subsequent chapters). If you run into an issue where something works in a file but not in a REPL, you can typically solve it by writing the separate statements in a single line. Please also let us know by raising an issue.

Because our workflow is so short, we will re-do it in the REPL.

First, open it by running:

brane repl

This should welcome you with the following:

Result of executing the workflow.

The REPL-environment works similar to a normal terminal, except that it takes BraneScript statements as input.

We can reproduce our workflow by writing its two statements separately:

// In the Brane REPL
import hello_world;
println(hello_world());

which should produce:

Result of executing the workflow.

Which is the same result as with the separate file, instead that we've now interleaved writing and executing the workflow.

You can also use the REPL in a remote scenario, by providing the --remote option when running it, similar to brane run:

brane repl --remote

Every command executed in this REPL is executed on the specified instance.

info In principle, executing the same workflow as a file or in the REPL as separate statements should give you the same result. Unfortunately, in the context of Brane, this might not hold true depending on the policies present in an instance. For example, some policies may want to have gaurantees about what happens in the next step of workflow, which is impossible for Brane to provide if it's executing the statements one-by-one. Thus, you can typically expect your workflow to be authorized more easily if it's running in one go as a file.

Next

In the next chapter, we will treat datasets and intermediate results, which are an essential component to writing workflows. If you are already familiar with those, you can also check the subsequent chapter, which introduces the finer concepts of workflow writing. Alternatively, you can also checkout the full BraneScript documentation.

Basic concepts

In the previous chapter, we discussed your first "Hello, world!"-workflow. In this chapter, we will extend upon this, and go over the basic language features of BraneScript. We will talk about things like variables, if-statements and loops, parallel statements and builtin-functions.

More complex features, such as arrays, function definitions, classes or Data and IntermediateResults, are left to the next few chapters.

Variables

First things first: how do variables work in BraneScript?

They work like in most languages, where you can think of a variable as a single memory location where we can store some information. Similarly to most languages, it can be used to store a single object only; e.g., we can only store a single number, string or other value in a single variable1.

Variables are also typed, i.e., a single variable can only store values of the same type. While in some low-level languages, such as C or Rust, this is necessary to be able to compute the size of the variable, BraneScript only implements this for the purpose of being able to do static analysis: it can tell you beforehand whether the correct types are passed to the correct variables, which will help to eliminate mistakes made before you run a potentially lengthy workflow.

Finally, unlike other languages such as Python, BraneScript has an explicit notion of declaration: there is a difference between creating a new variable and updating it. This is also done to make static analysis easier, since the compiler can explicitly know which variables exist and how to analyse them.

So, how can we use this? The first step is to declare a new variable, to make BraneScript aware that it exists. The general syntax for this is:

let <ID> := <EXPR>;

where <ID> is some identifier that you want to use for your variable (existing only of alphanumeric characters and an underscore, _), and <EXPR> is some code that evaluates to a certain value. We've already seen an example of this: a function call is an expression, since it has a return value that we can pass to other functions or statements. Other expressions include literal values (e.g., true, 42, 3.14 or "Hello, there!") or logical or mathmatical operations (e.g., addition, subtraction, logical conjunction, comparison, etc). For some more examples, see below, or check the BraneScript documentation for a full overview.

Yet another example of an expression is a variable reference, which effectively reads a particular variable. To use it, simply specify the identifier of the variable you declared (ID) any time you can use an expression. For example:

// Declare one variable with a value
let foo := 21 + 21;

// We can use it here to assign the same value to `bar`!
let bar := foo;

Finally, you can also update the value of a variable using similar syntax to a declaration:

<ID> := <EXPR>;

(note the omission of the let).

This is known as an assignment, and can only be done on variables already declared. For example:

// This will print '42'...
let foo := 42;
println(foo);

// ...and this will print '84'
foo := 84;
println(foo);

Technically, variables won't be updated until the expression is evaluated (i.e., computed). This guaranteed ordering means that the following also works:

// This works because foo is first read to compute `foo * 2`, and only then updated
let foo := 42;
foo := foo * 2;
// Foo is now 84
1

You may already have guessed that Arrays or Classes may contain multiple variables themselves. However, arrays or classes are objects too; and while they can contain any number of nested values, we still consider them a single object themselves.

Functions

Something that you've already seen used in the previous chapter and the previous section, is the use of function calls.

This concept is used in almost any language, and essentially represents a temporary jump to some other part of code that is executed, and then the program continues from the function call onwards. Crucially, we typically allow these snippets to take in some values - arguments - and hand us back a value when they are done - a return value.

BraneScript uses a syntax that is very widely used in languages like C, Python, Rust, Lua, C#, Java, ... It is defined as:

<ID>( <ARG1>, <ARG2>, ... )

The <ID> is the identifier of the function (i.e., its name), and in between the parenthesis (()) there are zero or more arguments to pass to the function, separated by commas.

The return value of the function is returned "invisibly", in the sense that it is returned as a value in an expression. To illustrate this, consider the following function zero that simply returns the integer 0:

let zero := zero();
println(zero);   // Should print '0'

(It should be obvious now that println was a regular function call all along!)

To use expression language, we can say that a function will always evaluate to its return value. To this end, there is a strict ordering implied: first, BraneScript will evaluate all of the function's arguments (in-order), then the function is called and executed, after which the remainder of the expression continues using the function's return value.

This makes it possible for us to write the following, which uses the zero function from the previous example and some add-function that takes two integers as its arguments and returns their sum:

let fourty_two := add(add(add(2, add(zero(), 20)), zero()), 20);
println(fourty_two);   // Should print '42'

Note that BraneScript uses the same syntax for calling imported functions (see the previous chapter with the hello_world()-function), builtin functions (think println(); see below) and defined functions (check the relevant chapter).

To be complete, you can import all of the functions within a package using the import-statement:

import <id>;

You've already seen examples of this in the previous chapter.

Control flow

Another very important and common feature of a programming language is that it typically has syntax for defining the control flow of a language. In BraneScript, this is even more important, since effectively that is what a workflow is: defining some control flow for a set of function calls.

To that end, BraneScript supports different kind of statements that can allow your workflow to branch or loop, or define things such as where functions are executed.

In the following subsections, we will go through each of the control-flow statements currently supported.

If-statements

Arguably one of the most important statements, an if-statement allows your code to take one of two branches based on some condition. Most languages feature an if-statement, and most feature them in comparable syntax.

For BraneScript, this syntax is:

if (<EXPR>) {
    <STATEMENTS>
}

This means that, if the <EXPR> evaluates to a true-boolean value, the code inside the block (i.e., the curly brackets {}) is executed; but if it evaluates to false, then it isn't.

An example of an if-statement is:

// Let's assume this has an arbitrary value
let some_value := 42;

if (some_value == 42) {
    println("some_value was 42!");
}

Because the expression value == 42 is computed at runtime, this allows the program to become flexible and respond differently to different values stored in variables.

The if-statement also comes in another form:

if (<EXPR>) {
    <STATEMENTS>
} else {
    <OTHER-STATEMENTS>
}

This is known as an if-else-statement, and essentially has the same definition except that, if the condition now evaluates to false, the second block of statements is run instead of nothing. To illustrate: these two blocks of code are equivalent:

let some_value := 42;
if (some_value == 42) {
    println("some_value was 42!");
} else {
    println("some_value was not 42 :(");
}
let some_value := 42;
if (some_value == 42) {
    println("some_value was 42!");
}
if (some_value != 42) {
    println("some_value was not 42 :(");
}

info From other languages, you may be familiar with a sequence of else-if's. For example, C allows you to do:

int some_value = 42;
if (some_value == 42) {
    printf("some_value was 42!");
} else if (some_value == 43) {
    printf("some_value was 43!");
} else if (some_value == 44) {
    printf("some_value was 44!");
} else {
    printf("some_value had some other value :(");
}

BraneScript, however, has no such syntax (yet). Instead, you should write the following to emulate the same behaviour:

let some_value := 42;
if (some_value == 42) {
    println("some_value was 42!");
} else {
    if (some_value == 43) {
        println("some_value was 43!");
    } else {
        if (some_value == 44) {
            println("some_value was 44!");
        } else {
            println("some_value had some other value :(");
        }
    }
}

Tedious, but produces equivalent results.

For-loop

Another type of control-flow statement is a so-called for-loop. These repeat a piece of code multiple times, based on some specific kind of condition being true.

Let's start with the syntax:

for (<STATEMENT>; <EXPR>; <STATEMENT>) {
    <STATEMENTS>
}

BraneScript for-loops are very similar to C for-loops, in that they have three parts (respectively):

  • An initializer, which is a statement that is run once before any iteration;
  • A condition, which is ran at the start of every iteration. The iteration continues if it evaluates to true, or else the loop quits;
  • and an increment, which is a statement that is run at the end of every loop.

Typically, you use the initializer to initialize some variable, the condition to check if the variable has exceeded some bounds and the increment to increment the variable at the end of every iteration. For example:

for (let i := 0; i < 10; i := i + 1) {
    println("Hello there!");
}

This will print the phrase Hello there! exactly 10 times.

info Note that the syntax for for-loops might become a lot more restrictive in the future. This is because they are quite similar to while-loops the way they are now (see below), but without the advantage that the compiler can easily deduce the number of iterations that a loop does if it is statically available.

While-loop

While loops are generalizations of for-loops, which repeat a piece of code multiple times as long as some condition holds true. Essentially, they only define the condition-part of a for-loop; the initializer and increment are left open to be implemented as normal statements.

The syntax for a while-loop is as follows:

while (<EXPR>) {
    <STATEMENTS>
}

The statements in the body of the while-loop are thus executed as long as the expression evaluates to true. Just as with the for-loop, this check happens at the start of every iteration.

For example, we can emulate the same for-loop as above by writing the following:

let i := 0;
while (i < 10) {
    println("Hello there!");
    i := i + 1;
}

More interestingly, we often represent a while-loop to do work that requires an unknown amount of iterations. A classic example would be to iterate while an error is larger than some factor:

let err := 100.0;
while (err > 1.0) {
    train_some_network();
    err := compute_error();
}

(A real example would probably require arguments in the functions, but they are left out here for simplicity).

Finally, another common pattern, which is an infinite loop, can also most easily be written with while-loops:

print("The");
while (true) {
    print(" end is never the");
}

Note, however, that BraneScript currently has no support for a break-statement (like you may find in other languages). Instead, use a simple boolean variable to iterate until you like to stop, or use a return-statement (see the next chapter).

Parallel statements

A feature that is a bit more unique to BraneScript is a parallel-statement. Like if-statements, they have multiple branches, but instead of taking only one of them, all of them are taken - in parallel.

The syntax for a parallel statement is:

parallel [{
    <STATEMENTS>
}, {
    <MORE-STATEMENTS>
}, ...]

(Think of it as a list ([]) of one or more code blocks ({}))

Unlike the if-statement, a parallel-statement can have any number of branches. For example:

parallel [{
    println("This is printed...");
}, {
    println("...while this is printed...");
}, {
    println("...at the same time this is printed!");
}]

There is more to say about parallel branches, but we keep this for the chapter on advanced workflows since it mixes with other BraneScript features. For now, assume that the branches run in parallel are run in arbitrary order, and (conceptually) at the same time. Once every branch has completed, the workflow continues (i.e., the "end" of the parallel statement acts as a joining point).

Builtin functions

Finally, it is very useful to know the builtin functions in BraneScript. These are them:

  • print(<string>): Prints the given string (or other value) to the terminal (stdout, to be precise). Does not add a newline at the end of the string.
  • println(<string>): Prints the given string (or other value) to the terminal (stdout, to be precise). Does add a newline at the end of the string.
  • len(<array>): Returns the length of the given array, as an integer.
  • commit_result(<string>, <result>): A function that promotes an intermediate result to a dataset. Don't worry if this doesn't make sense yet - for that, examine the chapter on data.

You've already seen println being used in this and the previous chapter, and that's also the builtin you will likely be using the most.

Examples

To help grasping the presented concepts, we present the following workflow that uses a little bit of all of them:

let hello := "Hello, world!";
println(hello);

hello := "Hello there!";
println(hello);

if (hello == "Hello, world!") {
    println("Goodbye, world!");
} else {
    println("Goodbye there!");
}

println("I love the world so much, I'm going to say hi...");
for (let i := 0; i < 5; i := i + 1) {
    println(i);
}
println("...times!");

println("In fact, I will say 'hi' until...");
let i := 0;
let say_hi := true;
while (say_hi) {
    i := i + 1;
    if (i == 3) { say_hi := false; }
    print("say_hi is ");
    print(say_hi);
    println("!");
}

parallel [{
    println("HELLO WORLD!");
}, {
    println("HELLO WORLD!");
}, {
    println("HELLO WORLD!");
}];

It may help to first try and guess what the workflow will print, and only then execute it to see if your guess was right.

Next

If you have the idea you understand these basic constructs a little, congratulations! This should allow you to write basic workflows.

In the next chapter, we examine how to define functions and classes and how to use the latter. Then, in the chapter after that, we examine BraneScript's builtin Data-class, which is integral to writing useful workflows. Finally, in the last chapter of the BraneScript-part, we discuss some of the finer details of BraneScript as a language.

Separate from these introductory chapters, there is also the complete and more formal overview of the language in the BraneScript documentation. Those chapters should cover all of its details, and function as useful reference material once you've grasped the basics.

Functions & Composite Types

In the previous chapter, we discussed the basic functionality and constructs of the BraneScript language: variables, control flow constructructs (if, for, while, parallel) and function calls.

This chapter will extend on that, and explains how to define functions (to match with the function calls). Moreover, we will also discuss classes and, while at it, arrays.

Function definitions

To start, we will examine function definitions.

As already discussed in the previous chapter, functions are essentially snippets of code that can be executed from somewhere "in between" other code. We've already discussed how to call them, i.e., run their code from somewhere else; in this section we discuss how to define them.

A definition uses the following syntax:

func <ID> ( <ARG1>, <ARG2>, ... ) {
    <STATEMENTS>
}

Just as with a call, the <ID> is the name of the new function, and in between the parenthesis (()) are zero or more arguments that this function can accept. They are given as identifiers, each of those specifying the name of that specific argument. The <STATEMENTS>, then, are the statements that are executed when this function is called.

The simplest function is one that neither takes any arguments, nor returns any value. An example of such a function is:

// Define the function first
func print_hello_world() {
    println("Hello, world!");
}

// Now run it
print_hello_world();

This should print the string Hello, world! to the terminal.

In practise, however, there will be very few functions that neither take nor produce any values. So let's consider a function that takes some arguments:

func print_text(text) {
    println(text);
}

The text is the argument that we want to pass to the function, and println(text) then uses that argument to pass as input to the println function. It may seem like arguments can be used very similar to variables, and that would be exactly write - because they are. They act and are local variables who are initialized with the values passed to the function.

Another example that is a bit more complex:

func print_greeting_place(greeting, place) {
    print(greeting);
    print(", ");
    print(place);
    println("!");
}

// To do the same as `print_hello_world()`, we can run:
print_greeting_place("Hello", "world");

// But we can also do other stuff now
print_greeting_place("Sup", "planet");

The only thing left, then, is to define how a function returns a value.

To do so, we use the return-statement. It has the following syntax:

return <EXPR>;

where <EXPR> is the expression that creates the value to return.

An example of how this works is by implementing the zero()- and add()-functions from the previous chapter:

func zero() {
    return 0;
}

func add(lhs, rhs) {
    return lhs + rhs;
}

When called, this functions will evaluate to 0 or the sum of its arguments, respectively.

In addition to just returning values, a return acts as a 'quit'-command for a function; whenever it is called, the function is exited immediately, and the program resumes execution from the function call onwards - even if there are subsequent statements in the function body.

For example, consider the following function:

func greet(person) {
    // Filter out rude names
    if (person == "stinky") {
        println("That is rude, I won't print that.");
        return;
    }

    // Otherwise, we can print
    print("Hello, ");
    print(person);
    println("!");
}

(Note that the expression can be omitted from the return-statement if the function does not return a value, as in this example. But it can also be used with expression.)

info Unlike other languages, BraneScript also allows the usage of a return-statement from the main workflow body (i.e., outside of a function). In this case, it can be used to early-quit the workflow (e.g., in an infinite while-loop) or to return a value from a workflow (relevant for packaged workflows (see here) or automatically downloading datasets (see here)).

Arrays

Next, we will talk about arrays before we will talk about classes.

Most languages that have a concept of variables, also have a concept of arrays. These are essentially (ordered) sequences of values, collected into a single object. You can thus think of them as a single variable that contains multiple values, instead of one.

Note, however, that arrays can only accept values of the same type. For example, they can contain multiple integers, or multiple strings - but not a mix of those. This essentially makes them homogeneous - every element has the same layout.

There are multiple syntaxes for working with arrays. The first is the array literal:

[ <EXPR1>, <EXPR2>, ... ]

Here, there are zero or more expressions, where every <EXPRX> is some expression who's evaluated value we will store in the array.

For example, this will generate an array with the values 1, -5 and 0:

let value := -5;
let array := [ 1, value, zero() ];

It is also possible to create an array of arrays:

let array := [ [ 0, 1, 2 ], [ 3, 4, 5 ], [ 6, 7, 8 ] ];

Then, to read a specific element in an array, or to write to the element, we can index it. This is done using the following syntax:

<ARRAY-EXPR>[ <INDEX-EXPR> ]

The <ARRAY-EXPR> is something that evaluates to an array (e.g., an array literal, a variable that contains an array, ...), and the <INDEX-EXPR> is something that evaluates to an integral number. Note that array indices in BraneScript are zero-indexed, so the first elements is addressed by 0, the second by 1 and so on.

The following examples show some array indexing:

let array1 := [ 1, 2, 3 ];
println(array1[0]);

let index1 := 2;
println(array1[index1])

println([ 4, 5, 6 ][1]);
println(generate_array_with_zeroes()[0]);
println(array1[zero()]);

This will print 1, 3, 5, 0 and 1, respectively.

We use the same syntax to write to an array, except that we then use the array in the variable position in an assignment:

let array2 := [ 7, 8, 9 ];
array2[0] := 42;
println(array2);

This will print [ 42, 8, 9 ].

Classes

It is probably easier to understand classes after you understand arrays, so be sure to check out their section first.

If arrays provide some homogeneous collection of values, then classes provide a heterogeneous collection. Specifically, we can think of classes as a collection of values but values who can be of different types. Usually, because of this inherent difference between the values, we don't index classes by positions (like arrays), but instead we assign a name to each value and index by that. Some languages allow this quite literally (e.g., JavaScript), whereas other choose a different kind of syntax called projection (e.g., C or Python). BraneScript uses the latter syntax as well.

Because of this heterogeneity, BraneScript requires you to specifically define classes, so that it knows beforehand which values are allowed in a specific class and how to name them.

info A specific class definition will act as its own type in BraneScript. This means that it's usually impossible to assign one class to another.

Technically, however, arrays do this as well, since it usually makes no sense to assign an array of strings to an array of integers. However, because of their uniform element type, array types are more lenient, whereas classes are almost always completely disjoint from each other.

Another key difference between arrays and classes (at least, in BraneScript) is that a class can associate functions with it, usually called methods. These methods, then, work on an explicit instance of that class (i.e., a particular set of values) in addition to their normal arguments. This allows for Object-Oriented Programming (OOP) design patterns. For more information on OOP in general, see here.

Definition & instantiation

We will first discuss the syntax and usage of classes as just data containers. To define a class, use the following syntax:

class <ID> {
    <FIELD-ID-1>: <FIELD-TYPE-1>;
    <FIELD-ID-2>: <FIELD-TYPE-2>;
    ...
}

Here, <ID> is the name of the class (conventially written in upper camel case). Then follow zero or more field definitions (an element within a class is referred to as a field), which consists of some identifier (<FIELD-ID>) as name and the type that determines what kind of values are allowed for that field (<FIELD-TYPE>).

To illustrate, consider the following class:

class Jedi {
    name: string;
    lightsaber_colour: string;
    is_master: bool;
}

This class will contain three fields, or string, string and bool-type respectively.

Note, however, that class definitions ask like "blueprints" rather than a usable value. To do so, we instantiate a class, which is the act of assigning values to the fields to create an object that we can use. In BraneScript, we use the following syntax for that:

new <ID> {
    <FIELD-ID-1> := <EXPR1>,
    <FIELD-ID-2> := <EXPR2>,
    ...
}

(Note the usage of comma's (,) instead of semicolons (;) at the end of each line)

This tells the backend to create a new object from the definition with the name <ID>, and then populate the fields with the given names (FIELD-ID) with the value that the given expression evaluates to (EXPR).

Note that this is an expression itself, which will thus evaluate to an instance of the referred class. Furthermore, because the fields are named, you don't have to use the same order in assigning the value as used in the definitions of the fields.

For example, we can instantiate our Jedi class as follows:

let anakin := new Jedi {
    name := "Anakin Skywalker",
    lightsaber_colour := "blue",
    is_master := false,
};

Similary, we can create another Jedi with different properties:

// Note the different order - still works!
let obi_wan := new Jedi {
    lightsaber_colour := "blue",
    name := "Obi-Wan Kenobi",
    is_master := true,
};

As long as they refer to the same class, they have the same type, and can thus be used interchangeably.

Projection

You can now create classes - great! So now it's time to learn how to use them.

The most basic operation on a struct is accessing one of its fields - and the operation for doing so is called projection. The sytanx for it is as follows:

<CLASS-EXPR>.<FIELD-ID>

Here, <CLASS-EXPR> is some expression that evaluates to a class, and <FIELD-ID> is the name of the field that should be accessed.

For our Jedi class, we could do something like this:

// A function that prints information about a given jedi
func print_jedi(jedi) {
    print(jedi.name);
    print(" swishes his ");
    print(jedi.lightsaber_colour);
    print(" lightsaber ");
    if (jedi.is_master) {
        println("masterfully!");
    } else {
        println("amateurishly!");
    }
}

// Call it
print_jedi(anakin);
print_jedi(obi_wan);

// Setting values works just like with array indices
anakin.lightsaber_colour = "green";
print_jedi(anakin);

Datasets in workflows

Advanced workflows

In this chapter, we will discuss some loosely connected but very useful concepts for when your are writing more extensive and advanced workflows.

Arrays

Another more complex form of an expression is an array. This is simply a(n ordered) collections of values, indexable by an integral number. They are very similar to arrays used in other languages.

To create an array, use the following syntax:

[ <VALUE>, <ANOTHER-VALUE>, ... ]

Note that arrays are homogeneous in the sense that all elements must have the same type. For example, this will throw errors:

let uh_oh := [ 42, "fourty two", 42.0 ];

Instead, assign it with values of the same type:

let ok := [ 83, 112, 97, 109 ];

To index an array, use the following syntax:

<ARRAY-EXPR> [ <INDEX-EXPR> ]

This may be a bit confusing, but the first expression is an expression that evaluates to an array to index (i.e., a literal array, a variable or a function call), and the second expression is an expression that evaluates to a number that is used as index. Some examples:

let array1 := [ 1, 2, 3 ];
// Arrays are zero-indexed, so this refers to the first element
println(array1[0]);

let index1 := 2;
// And this to the last element
println(array1[index1])

// Some other examples using weirder expressions
println([ 4, 5, 6 ][1]);
println(generate_array_with_zeroes()[0]);
println(array1[zero()]);

This will print 1, 3, 5, 0 and 1, respectively.

Array indexing can be used to assign a value as well as read it:

let array1 := [ "a", "b", "c" ];
array1[0] := "z";
println(array1);
// Will print '[ z, b, c ]'

Finally, when you have an array that you got from some function or other source that you don't know the size of, you can retrieve it using the builtin len-function:

println(len([ 0, 0, 0 ]));
// Will print 3

This is very useful when iterating over an array with a for-loop (see below).

Returning

A different kind of control flow statement is the return-statement. This is used to essentially halt the current control flow, and go to whatever was the calling context. In other languages, this is often used in functions, but in BraneScript its used a bit more general.

The syntax is:

return;

Writing this statement can be thought as a 'stop' or 'exit' command, and any statement following it (if not in a branch) can be ignored.

There are two possible ways to use a return statement:

  • When used in a function, the function is exited immediately and the program resumes execution from the function call onwards (see the next chapter).
  • When used in another context, the function exits the workflow entirely. This can be used to early-quit the workflow if desired.

For example, this workflow:

println("Hello, ");
return;
println("world!");

will only print Hello, , not world!, because of the early quit in between the statements.

A really useful alternative syntax of the return-statement allows it to carry a value to the calling scope:

return <EXPR>;

This is used to return a value from a function, or to return a value from a workflow.

For example, one can run this workflow in the Brane CLI:

return "A special value";
A workflow returning the string 'A special value'

While this doesn't seem a lot different than just printing, this actually matters in a few use-cases such as automatically downloading datasets or creating a workflow package.

Advanced parallelism

Note that there are a few peculiarities about parallel statements:

  • The code inside the blocks is run in parallel, which means that the statement itself will only return once all of the branches do. To illustrate:
    parallel [{
        println("The order of this print...");
    }, {
        println("...and this print may vary");
    }];
    println("But this print is only run after the other two finished");
    
  • Instead of being able to refer to variables like normal, every branch receives its own copy of those variables. In practise, this means that any changes they make to variables are only local to that branch. For example:
    let value := 42;
    parallel [{
        println(value);   // Will print 42
    }, {
        value := 84;
        println(value);   // Will print 84
    }];
    println(value);   // Will still print 42!
    
  • The order of execution of the branches is arbitrary (as hinted to above), as it depends on the scheduling of the runtime itself and of the OS' scheduling of the VM threads.
  • In addition, although they are said to run in parallel, in practise, the only guarantee is that each branch is run concurrently (but still may be run in parallel, depending on the setup). To understand the precise difference, check https://freecontent.manning.com/concurrency-vs-parallelism/.
  • Each parallel branch forms their own "workflow": or, to be more precise, when your return in a parallel branch, it actually returns the branch - not the workflow. For example:
    parallel [{
        println("1");
        return;
        println("2");
    }];
    println("3");
    
    will actually print 1 and 3, in that order.
  • The only way to return from a parallel branch is to use the declaration syntax of the parallel statement. It looks like the parallel statement is assigned to a variable declaration:
    let <ID> := parallel[{ <STATEMENTS> }, { <MORE-STATEMENTS> }, ...];
    
    If this syntax is used, then every branch must return a value of the same type (using a return-statement). For example:
    let jedis := parallel [{
        return "Obi-Wan Kenobi";
    }, {
        return "Anakin Skywalker";
    }, {
        return "Master Yoda";
    }];
    println(jedis);
    
    Will actually print an array with the returned strings.

    warning Note that the undefined order of execution, the order of the array is also undefined; it is first-come first-serve, so it typically only makes sense to process these array using some loop (e.g., a for-loop).

  • Finally, as a variation on returning an array, multiple merge strategies exist to do different things with the result. For example, one such strategy is the sum-strategy, that simply adds the results returned by the parallel-statement. The syntax to define it is:
    parallel [ <STRATEGY> ] [{
        <STATEMENTS>
    }, ...]
    
    To merge using sum:
    let res := parallel [all] [{
        return 42;
    }, {
        return 42;
    }];
    println(res);
    
    which will print 84.

    info For a complete overview of all merge strategies, check the BraneScript documentation.

Introduction

drawing Documentation for Bakery will be added soon.

Tutorials

In this chapter, you can find the resources used in tutorials hosted to promote the BRANE framework.

The resources are ordered by date of tutorial. The first tutorial will be hosted at ICT.OPEN, 20-04-2023.

In addition, the most recent resources to host a tutorial are:

Overview

An overview of all tutorials, ordered chronologically:

You can also select a tutorial in the sidebar to the left.

First BRANE tutorial at ICT.OPEN

The first tutorial that introduced users to the BRANE framework is given at ICT.OPEN 2023, a conference aiming to bring academia and industry together. Its purpose is to have users experience the role as a software engineer and scientist in the framework, mainly to develop an understanding of how working with the framework looks like in practise.

The tutorial is written for framework version 2.0.0.

The tutorial consists of the following parts:

  • 12:30-12:45: Introduction (presentation)
  • 12:45-13:30: Part1: Hello, world! (guided hands-on)
  • 13:30-13:45: Break
  • 13:45-14:15: Part 2: A workflow for Disaster Tweets (hands-on)
  • 14:15-14:30: Evaluation

The following resources are used, which are hosted on this website:

  • Generic handout (here)
  • Handout for Part 1: Hello, world! (here)
  • Handout for Part 2: A workflow for disaster tweets (here)
  • Introduction slides (here)

Part 1: Hello, world!

In this document, we will detail the steps to take to write a simple package within the EPI Framework (also called the BRANE Framework). Specifically, this tutorial will focus on how to install the CLI client (the brane executable), build a hello_world package and then call the hello_world() function within that package on a local machine.

Background

The framework revolves around workflows, which are high-level descriptions of an algorithm or data processing pipeline that the system will execute. Specifically, every workflow contains zero or more tasks, which are conceptual functions that take an in- and output, composed in a particular order or control flow. It helps to think of them as graphs, where the nodes are tasks that must be called, and the edges are some form of data flowing between them. An example of such a workflow is given in Figure 1.

A workflow with three functions and four steps.

Figure 1: A very simple workflow using three tasks, f, g and h. The nodes represent a function call, whereas the edges represent some data dependency. Specifically, this workflow depicts f has to be run first, then a second call of f and a new call of h can run in parallel, after which a third function g must be called.

While workflows can be expressed in any kind of language, the EPI Framework features its own Domain-Specific Language (DSL) to do so, called BraneScript1. This language is very script-like, which allows us to think of tasks as a special kind of function. Any control flow (i.e., dependencies between tasks) is then given using variables and commonly used structures, such as if-statements, for-loops, while-loops, and less commonly used structures, such as on-structs or parallel-statements.

Objective

In this tutorial, we will mainly focus on creating a single task to execute. Traditionally, this task will be called hello_world(), and will return a string "Hello, world!" once it is called. To illustrate, the following Python snippet implements the logic of the task:

def hello_world():
	return "Hello, world!"

By using print(hello_world()), we can print Hello, world! to the terminal. In the EPI Framework, we will implement the function as a task, and then write a very simple workflow that implements the print-part.

1

There is a second DSL, Bakery, which is more unique to the framework and features a natural language-like syntax. However, this language is still in development, and so we focus on BraneScript.

Installation

To start, first download the brane executable from the repository. This is a command line-based client for the framework, providinh a wide range of tools to use to develop packages and workflows for the EPI Framework. We will use it to build and then test a package, which can contain one or more tasks. Since we are only creating the hello_world() task, our package (called hello_world) will contain only one task.

The executable is pre-compiled for Windows, macOS (Intel and M1/M2) and Linux. The binaries in the repository follow the pattern of <name>-<os>-<arch>, where <name> is the name of the executable (brane for us), <os> is an identifier representing the target OS (windows for Windows, darwin for macOS and linux for Linux), and <arch> is the target processor architecture (x86_64, typically, or aarch64 for M1/M2 Macs).

So, for example, download brane-windows-x86_64 if you are on Windows, or brane-darwin-aarch64 if you have an M1/M2 Mac. You can see the commands below for the most likely executable per OS/architecture.

info When in doubt, choose x86_64 for your processor architecture. Or ask a tutorial host.

Once downloaded, it is recommended to rename the executable to brane to follow the calling convention we are using in the remainder of this document. Open a terminal in the folder where you downloaded the executable (probably Downloads), and run:

:: For Windows
move .\brane-windows-x86_64 .\brane
# For macOS (Intel)
mv ./brane-darwin-x86_64 ./brane
# For macOS (M1/M2)
mv ./brane-darwin-aarch64 ./brane
# For Linux
mv ./brane-linux-x86_64 ./brane

If you are on Unix, you probably want to execute a second step: by just renaming the executable, you would have to call it using ./brane instead of brane. To fix that, add the executable to somewhere in your PATH, e.g.:

sudo mv ./brane /usr/local/bin/brane

If you installed it successfully, you can then run:

brane --version

without getting brane not found errors.

info If you don't want to put brane in your PATH, you can also replace all occurrences of brane with ./brane in the subsequent commands (or any other path/name). Additionally, you can also run:

export PATH="$PATH:$(pwd)"

to add your current directory to the PATH variable. However, note that this lasts only for your current terminal window, and until you restart it.

Writing the code

The next step is to write the code that we will be running when we execute the task. In the EPI Framework, tasks are bundled in packages; and every package is implemented in a container. This means that every task has its own dependencies shipped within, and that multiple tasks can share the same dependencies. This also means that a task can be implemented in any language, as long as the program follows a particular convention in how it takes input and writes output. Specifically, the EPI Framework will call a specific executable file with a specific set of arguments and environment variables set, and then receive return values from it by reading the executable's stdout.

For the purpose of this tutorial, though, will choose Python to implement our hello_world-function. Because our function is so simple, we will only need a single file, which we will call hello.py. Create it, and then write the following in it (including comments):

#!/usr/bin/env python3

def hello_world():
	return "Hello, world!"

print(f'output: "{hello_world()}"')

Let's break this down:

  • The first line, #!/usr/bin/env python3 is a line that tells the operating system that this file is a Python script (it defines it must be called with the python3 executable). Any file that has this on the first line can be called by just calling the file instead of having to prefix python3, e.g.,
    ./hello.py
    
    instead of
    python3 hello.py
    
    This is important, because the framework will use the first calling convention.
  • The function, def hello_world(): ..., is the same function as presented before; it simply returns the "Hello, world!" string. This is actually the functionality we want to implement.
  • The final line, print(f'output: "{hello_world()}"') prints the generated string to stdout. Note, however, that we wrap the value in quotes (") and prefix it with output: we do this because of the convention that packages for the EPI Framework have to follow. The framework expects the output to be given in the YAML format, under a specific name. We choose output (see below).

And that's it! You can save and close the file, while we will move to the second part of a package: the container.yml file.

Writing the container.yml

A few text files do not make a package. In addition to the raw code, the EPI Framework also needs to know some metadata of a package. This includes things such as its name, its version and its owners, but, more importantly, also which tasks the package contributes.

This information is conventionally contributed using a file called container.yml. This is another YAML file where the toplevel keys contribute various pieces of metadata. Create a file with that name, and then write the following to it:

# A few generic properties of the file
name: hello_world
version: 1.0.0
kind: ecu

# Defines things we need to install
dependencies:
- python3

# Specifies the files we need to put in this package
files:
- hello.py

# Defines which of the files is the file that the framework will call
entrypoint:
  kind: task
  exec: hello.py

# Defines the tasks in this package
actions:
  'hello_world':
    command:
    input:
    output:
    - name: output
      type: string

This is quite a lot, so we will break it down in the following subsections. Every subsection will contain the highlighted part of the container.yml first, and then uses three dots (...) to indicate parts that have been left out for that snippet.

Minimal metadata

# A few generic properties of the file
name: hello_world
version: 1.0.0
kind: ecu

...

The top of the file starts by providing the bare minimum information that the EPI Framework has to know. First are the name of the package (name) and the version number (version). Together, they form the identifier of the package, which is how the system knows which package we are calling tasks from.

Then there is also the kind-field, which determines what kind of package this is. Currently, the only fully implemented package kind is an Executable Code Unit (ECU), which is a collection of arbitrary code files. However, other packages types that will be supported in the future are OpenAPI-packages and packages BraneScript or Bakery.

Specifying dependencies

...

# Defines things we need to install
dependencies:
- python3

...

Because packages are implemented as containers, we have the freedom to specify the set of dependencies to install in the container. By default, the framework uses Ubuntu 20.04 as its base image, and the dependencies specified are apt-packages. Note that the base container is fairly minimal, and so we have to specify we need Python installed (which is distributed as the python3-package).

Collecting files

...

# Specifies the files we need to put in this package
files:
- hello.py

...

Then the framework also has to know which files to put in the package. Because we have only one file, this is relatively simply: just the hello.py file. Note that any filepath is, by default, relative to the container.yml file itself; so by just writing hello.py we mean that the framework needs to include a file with that name in the same folder as container.yml.

info The files included will, by default, mimic the file structure that is defined. So if you include a file that is in some directory, then it will also be in that directory in the resulting package. For example, if you include:

files:
- foo/hello.py

then it will be put in a foo directory in the container as well.

Setting the entrypoint

...

# Defines which of the files is the file that the framework will call
entrypoint:
  kind: task
  exec: hello.py

...

Large projects typically have multiple files, and only one of them serves as the entrypoint for that project. Moreover, not every file included will be executable code; and thus it is relevant for the framework to know which file it must call. This is specified in this snippet: we define that the hello.py file in the container's root is the one to call first.

As already mentioned, the framework will call the executably "directly" (e.g., ./hello.py in this case). This means that, if the file is a script (like ours), we need a shebang (e.g., #!/usr/bin/env python3) string to tell the OS how to call it.

info Even if your package implements multiple tasks, it can only have a single entrypoint. To this end, most packages define a simple entrypoint script that takes the input arguments and uses that to call an appropriate second script or executable for the task at hand.

Defining tasks

...

# Defines the tasks in this package
actions:
  hello_world:
    command:
    input:
    output:
    - name: output
      type: string

The final part of the YAML-file specifies the most important part: which tasks can be found in your container, and how the framework can call them.

In our container, we only have a single task (hello_world), and so we only have one entry. Then, if required, we can define a command-line argument to pass to the entrypoint to distinguish between tasks (the command-field). In our case, this is not necessary because we only have a single one, and so it is empty.

Next, one can specify inputs to the specific task. These are like function arguments, and are defined by a name and a specific data type. At runtime, the framework will serialize the value to JSON and make these available to the entrypoint using environment variables. However, because our hello_world() function does not need any, we can leave the input-field empty too.

Finally, in the output section, we can define any return value our task has. Similar to the input, it is defined by a name and a type. The name given must match the name returned by the executable. Specifically, we returned output: ... in our Python script, meaning that we must name the output variable output here as well. Then, because the output itself is a string, we denote it as such by using the type: string.

In summary, the above actions field defines a single function that has the following pseudo-signature:

hello_world() -> string

Building a package

After you have a container.yml file and the matching code (hello.py), it is time to build the package. We will use the brane CLI-tool for this, and requires Docker and the Docker Buildx-plugin to be installed.

On Windows and macOS, you should install Docker Desktop, which already includes the Buildx-plugin. On Linux, install the Docker engine for your distro (Debian, Ubuntu, Arch Linux), and then install the Buildx plugin using:

# Install the executable
docker buildx bake "https://github.com/docker/buildx.git"
mkdir -p ~/.docker/cli-plugins
mv ./bin/build/buildx ~/.docker/cli-plugins/docker-buildx

# Create a build container to use
docker buildx create --use

If you have everything installed, you can then build the package container using:

brane build ./container.yml

The executable will work for a bit, and should eventually let you know its done with:

Successfully built version 1.0.0 of container (ECU) package hello_world.

If you then run

brane list

you should see your hello_world container there. Congratulations!

Running your package

All that remains is to see it in action! The brane executable has multiple ways of running packages locally: running tasks in isolation in a test-environment, or by running a local workflow. We will do both of these in this section.

The test environment

The brane test-subcommand implements a suite for testing single tasks in a package, in isolation. If you run it for a specific package, you can use a simple terminal interface to select the task to run, define its input and witness its output. In our case, we can call it with:

brane test hello_world

This should show you something like:

A list of tasks that can be executed when running `brane test hello_world`

If you hit Enter, the tool will query you for input parameters - but since there are none, instead it will proceed to execution immediately. If you wait a bit, you will eventually see:

A list of tasks that can be executed when running `brane test hello_world`

And that's indeed the string we want to see!

info The first time you run a newly built package, you will likely see some additional delay when executing it. This is because the Docker backend has to load the container first. However, if you re-run the same task, you should see a significant speedup compared to the first time because the container has been cached.

Running a local workflow

The above is, however, not very interesting. We can verify the function works, but we cannot do anything with its result.

Instead of using the test environment, we can also write a very simple workflow with only one task. To do so, create a new file called workflow.bs, and write the following in it:

import hello_world;

println(hello_world());

Let's examine what happens in this workflow:

  • In the first line, import hello_world;, we tell the framework which package to use. We reference our package by its name, and because we omit a specific version, we let the framework pick the latest version for us.
  • In the second line, println(hello_world());, we call our hello_world() task. The result of it will be passed to a builtin function, println(), which will print it to the stdout.

Save the file, close the editor, and then run the following in your terminal to run the workflow:

brane run ./workflow.bs

If everything is alright, you should see:

A terminal showing the command to run the workflow, and then 'Hello, world!'

info The brane-tool also features an interactive Read-Eval-Print Loop (REPL) that you can use to write workflows as well. Run brane repl, and then you can write the two lines of your workflow separately:

A terminal showing the command to run the workflow, and then 'Hello, world!'

Because it is interactive, you can be more flexible and call it repeatedly, for example:

A terminal showing the command to run the workflow, and then 'Hello, world!'

Simply type exit to quit the REPL.

Conclusion

And that's it! You've successfully written your first EPI Framework package, and then you ran it locally and verified it worked.

In the second half of the tutorial, we will focus more on workflows, and write one for an extensive package already developed by students. You can find the matching handout here.

Part 2: A workflow for Disaster Tweets

In this document, we detail the steps that can be taken during the second part of the tutorial. In this part, participants will write a larger workflow file for an already existing package and submit it to a running EPI Framework instance. The package implements a data pipeline for doing Natural Language Processing (NLP) on the Disaster Tweets dataset, created for the matching Kaggle challenge.

Background

In the first part of the tutorial, you've created your own Hello, world!-package. In this tutorial, we will assume a more complex package has already been created, and you will take on the role as a Domain-Specific Scientist who wants to use it in the framework.

The pipeline implements a classifier that aims to predict is a tweet is indicating a natural disaster is happening, or not. To do so, a naive bayes classifier has been implemented that takes preprocessed tweets as input, and outputs a 1 if it references a disaster, or a 0 if it does not. In addition, various visualisations have been implemented that can be used to analyse the model and the dataset.

The package has been implemented by Andrea Marino and Jingye Wang for the course Web Services and Cloud-Based Systems. Their original code can be found here, but we will be working with a version compatible with the most recent version of the framework which can be found here.

Objective

As already mentioned, this part focusses on implementing a workflow that can do classification on the disaster tweets dataset. To do so, the dataset has to be downloaded and the two packages have to be built. Then, a workflow should be written that does the following:

  1. Clean the training and test datasets (clean())
  2. Tokenize the training and test datasets (tokenize())
  3. Remove stopwords from the tweets in both datasets (remove_stopwords())
  4. Vectorize the datasets (create_vectors())
  5. Train the model (train_model())

All of these functions can be found in the compute package.

Then, optionally, any number of visualizations can be implemented as well to obtain results from the dataset and the model. Conveniently, you can generate all of them in a convenient HTML file by calling the visualization_action() function from the visualization package, but you can also generate the plots separately.

info Tip: If you use brane inspect <package>, you can see the tasks defined in a package together with which input and output the tasks define. For example:

Successfully built version 1.0.0 of container (ECU) package hello_world.

Installation

Before you can begin writing your workflow, you should first built the packages and download the required datasets. We will treat both of these separately in this section.

We assume that you have already completed part 1. If not, install the brane executable and install Docker as specified in the previous document before you continue.

Building packages

Because the package is in a GitHub repository, this step is actually fairly easy by using the brane import command.

Open a terminal that has access to the brane-command, and then run:

brane import epi-project/brane-disaster-tweets-example -c packages/compute/container.yml
brane import epi-project/brane-disaster-tweets-example -c packages/visualization/container.yml

This will allow you to build a package's source from a repository of the user epi-project and that goes by the name of brane-disaster-tweets-example. An eagle-eyed person may notice that this is exactly the URL of a repository, except that https://github.com/ is omitted. The second part of the command, -c ..., specifies which container.yml to use in that repository. We need to specify this because the repository defines two different packages, but this does allow us to build both of them.

After the command completes, you can verify that you have them installed by running brane list again.

Obtaining data

In the EPI Framework, datasets are considered assets, much like packages. That means that similarly, we will have to get the data file(s), defined some metadata, and then use the brane tool to build the assets and make them available for local execution.

To save some time, we have already pre-packaged the training dataset here, and the test dataset here. These are both ZIP-archives containing a directory with a metadata file (data.yml) and another directory with the data in it (data/dataset.csv). Once downloaded, you should unpack them, and then open a terminal.

Navigate to the folder of the training dataset first, and then run this command:

brane data build ./data.yml

Once it completes, navigate to the directory of the second dataset and repeat the command. You can then use brane data list to assert they have been added successfully.

The data.yml file itself is relatively straightforward, and so we encourage you to take a look at it yourself. Similarly, also take a look at the dataset itself to see what the pipeline will be working on.

info By default, the above command does not copy the dataset file referenced in data.yml, but instead just links it. This is usually fine, but if you intend to delete the downloaded files immediately afterwards, use the --no-links flag to copy the dataset instead.

Writing the workflow - Compute

Once you have prepared your local machine for the package and the data, it is time to write a proper workflow!

To do so, open a new file (called, for example, workflow.bs) in which we will write the workflow. Then, let's start by including the packages we need:

import compute;
import visualization;

The first package implements everything up to training the classifier, and the visualization package implements functions that generate graphs to inspect the dataset and the model, so we'll need both of them to see anything useful.

Next up, we'll do something new:

// ... imports

// We refer to the datasets we want to use
let train := new Data{ name := "nlp_train" };
let test  := new Data{ name := "nlp_test" };

The first step is to decide which data we want to use in this pipeline. This is done by creating an instance of the builtin Data class, which we can give a name to refer to a dataset in the instance. If you check brane data list, you'll see that nlp_train is the identifier of the training set, and nlp_test is the identifier of the test set.

Note, however, that this is merely a data reference. The variable does not represent the data itself, and cannot be inspected from within BraneScript (you may not that the Data class has no functions, for example). Instead, its only job is so that the framework knows which dataset to attach to which task at which moment. You can verify that the framework attaches it by inspecting the package code and observing that it will pass the task a path where it can find the dataset in question.

Next, we will do the first step: cleanup the dataset.

// ... datasets

// Preprocess the datasets
let train_clean := clean(train);
let test_clean  := clean(test);

You can see that this is the same function that takes different datasets as input, and then returns a new dataset that contains the same data, but cleaned. Note, however, that this dataset won't be externally reachable; instead, we call it an intermediate result, which is a dataset which will be deleted after the workflow completes.

Let's continue, and tokenize and then remove stopwords from the two datasets:

// ... cleaning

let train_final := tokenize(train_clean);
let test_final  := tokenize(test_clean);
train_final := remove_stopwords(train_final);
test_final  := remove_stopwords(test_final);

As you can see, we don't need a new variable for every new result; we can just override old ones if we don't need them anymore.

Now that we have preprocessed datasets, we will vectorize them so that it becomes quicker for a subsequent call to load them. However, by design of the package, these datasets are vectorized together; so we have to give them both as input, and only get a single result containing both output files:

// ... preprocessing

let vectors := create_vectors(train_final, test_final);

And with that, we have a fully preprocessed dataset. That means we can now train the classifier, which is done conveniently by calling a single function:

// ... vectorization

let model := train_model(train, vectors);
commit_result("nlp_model", model);

The second line is the most interesting here, because we are using the builtin commit_result-function to "promote" the result of the function to a publicly available dataset. Specifically, we tell the framework to make the intermediate result in the model-variable public under the identifier nlp_model. By doing this, we can later write a workflow that simply references that model in the first place, and pickup where we left off.

You might notice that the model is returned as a dataset as well. While the function could have returned a class or array in BraneScript to represent it, this has two disadvantages:

  • Most Python libraries write models to files anyway, so converting them to BraneScript values needs additional work; and
  • By making something a dataset, it becomes subject to policy. This means that participating domains will be able to say something about where the result may go. For this reason, in practice, a package will likely not be approved by a hospital if it does not return important values like these as a dataset so that they can stay in control of it.

This is useful to remember if you ever find yourself writing BraneScript packages again.

And with that, we have a workflow that can train a binary classifier on the Disaster Tweets dataset! However, we are not doing anything with the classifier yet; that will be done in the next section.

Writing a workflow - Visualization

The next step is to add inference to the network, and to generate some plots that can show it works. To do so, we will add a few extra function calls at the bottom of your workflow.bs file.

info You can also easily create a new workflow file to separate training and inference. If you want to, create a new workflow file and try to write the start yourself. You will probably have to commit the cleaned and final datasets in the previous workflow, and then use them and the model here. Also, don't forget to add the imports on top of your file.

Scroll past the training, and write the following:

// ... training

// Create a "submission", i.e., classify the test set
let submission := create_submission(test, vectors, model);

This line will use the existing test set, its vectors (the training-vectors are unused) and the trained model to create a so-called submission. This is just a small dataset that matches tweet identifiers to the prediction the model made (1 if it classified it as a disaster tweet, or 0 otherwise). The terminology stems from the package being written for a Kaggle challenge, where this classification has to be submitted to achieve a particular score.

We can then use this submission to generate the visualizations. The easiest way is to call the visualization_action() function from the visualization package:

// ... submission

// Create the plots, bundled in an HTML file
let plot := visualization_action(
    train,
    test,
    submission
);
return commit_result("nlp_plot", plot);

Here, we call the function (which takes both datasets and the classification), and commit its resulting plot. Note, however, that we return this dataset from the workflow. This means that, upon completion, the client will automatically attempt to download this dataset from the remote instance. Only one result can be returned at a time, and if you ever need to download more, simply submit a new workflow with only the return statement.

info As an alternative to using the generic function, the visualization package exposes its individual plot generation logic as separate functions. It might be a fun exercise to try and add these yourself, by using brane inspect and the package's code itself.

And that's it! You can now save and close your workflow file(s), and then move on to the next step: executing it.

Local execution

We can execute the workflow locally first to see if it all works properly. To do so, open up a terminal, and then run the following:

brane run <PATH_TO_WORKFLOW>

If your workflow works, you should see it blocking which indicates it is working. Eventually, the workflow should return and show you where it stored the final result of the workflow. If not, then it will likely show you an error of what went wrong, which may be anything from passing the wrong arguments to forgetting a semicolon (the latter tends to generate "end-of-file" errors, as do missing parenthesis errors).

info Tip: If you want to better monitor the progression, insert println() calls in your workflow! It takes a single argument, which will always be serialized to a string before printing it to stdout. By mixing print() (print without newline) and println(), you can even write formatted strings.

After having added some additional println() statements, you might see something like the following:

The output of running the workflow file.

(You can ignore the warning message)

You can then inspect the index file by navigating to the suggested folder, and then opening index.html. If you have a browser like Firefox installed, you can also run:

firefox "<PATH>/index.html"

to open it immediately, where you can replace <PATH> with the path returned by BRANE.

You will then see a nice web page containing the plots generated about the model and the dataset. It should look something like:

A view of the HTML with the predictions done by the model.

Remote execution

More interesting than running the code locally, though, would be to run it remotely on a running BRANE instance.

For the purpose of the tutorial, we have setup an instance with two worker nodes: one resides at the University of Amsterdam (at the OpenLab cluster), and the other resides at SURF's ResearchCloud environment.

Updating the workflow

We can run the workflow on either of these locations. To do so, you have to do a small edit to your workflow, because the planner of the framework is a little too simplistic in its current form. Open your workflow.bs file again, and wrap the code in the following:

on "uva" {
    // ... your code
}

(You can leave the import statements outside of it, if you like)

This on-struct tells the framework that any task executed within must be run on the given location. Because there are two locations, you can use two different identifiers: uva, for the University of Amsterdam server, or surf for the SURF server.

The reason that you have to manually specify this is because both sites have access to the required dataset. This means that the framework has to equally possible locations, and to avoid complications with policy, the framework just gives up and requires the programmer to manually make the decision where to run it. In the future, this should obviously be resolved by the framework itself.

You can now close your file again, after having saved it.

Adding the instance to your client

Then, go back to your terminal so that we can register this instance with your client.

You can register an instance to your client by running the following command:

brane instance add brane01.lab.uvalight.net --name tutorial --use

This will register an instance who's central node is running at brane01.lab.uvalight.net. The ports are all default, so there is no need to specify them. The tutorial-part is just a name so you can recognize it later, so you can replace it with something else if you want.

If the command runs successfully, you should see something like:

A view of the HTML with the predictions done by the model.

You can then query the status of the instance by running:

brane instance list --show-status
A view of the HTML with the predictions done by the model.

We now know the instance has been added successfully!

Adding certificates

Before we can run the workflow in the remote environment, we have to add a certificate to our client so that the remote domains know who they might be sharing data with. In typical situations, this requires contacting the domain's administrator and asking them for a certificate. However, because this is a tutorial, you will all be working as the same user.

Note that you will need new certificates for every domain, since you may not be involved with every domain. Thus, you can download the certificates for the University of Amsterdam here, and the certificates for the SURF server here.

warning Obviously, posting your private client key on a publicly available website is about the worst thing you can do, security-wise. Luckily for us, this tutorial is about the BRANE framework and not security best practises - but just be aware this isn't one.

Download both of these files, and extract the archives. Then, for each of the two directories with certificates, run the following command to add them to the client:

# For the University of Amsterdam
brane certs add ./ca.pem ./client-id.pem --domain uva
# For SURF
brane certs add ./ca.pem ./client-id.pem --domain surf

Unfortunately, there is a problem with certificate generation that does not properly annotate the domains in the certificates which causes the warnings to appear. However, this is not an issue, since the certificates are still signed with the private keys of the domains, and thus still provide reliable authentication.

After they are added, you can verify they exist with:

brane certs list
A view of the HTML with the predictions done by the model.

You are now ready to run your workflow online!

The final step

With an instance and certificates setup, and the proper instance selected, you can then run your workflow on the target instance. Normally, you would have to push your package to the instance (brane push <package name>) and make sure that the required datasets are available (this can only be done by the domain's administrators). However, for the tutorial, both of these steps have already been done a-priori.

Thus, all is left for you to execute your workflow remotely. Do so by running:

brane run <PATH_TO_WORKFLOW> --remote

You might note this is exactly the same command as to run it locally, save for the additional --remote flag. Subsequently, your output should also be roughly the same:

A view of the HTML with the predictions done by the model.

In the final step, the part with Workflow returned value..., the dataset is downloaded to your local machine. This means it is available in a similar manner as for local datasets, except it has now been executed remotely.

info If you use the --debug flag, you might see that the final result is actually downloaded from a different location than where you executed the workflow. This is because the resulting dataset is available on both sites (under the same identifier), and because the on-struct only affects tasks, not the builtin commit_result-function. Whether this has to be changed in the future remains to be seen, but just repeat the execution of your workflow a few times to also see the download from the other location.

Conclusion

Congratulations! You have now written a more complex workflow for a more complex package, and successfully ran it online. Hopefully, you find the framework (relatively) easy to work with, and enjoyed the experience of getting to know it!

If you still have time left, take the opportunity to play around and ask questions. You can select various topics in the sidebar to the left of this wiki page to find more explanations about the framework. Especially the topics on BraneScript might be interesting, to learn about more possibilities that can be done with the workflow language.

Note, however, that the wiki is still incomplete and unpolished, like the framework itself. If you want to know anything, however, feel free to ask it to the tutorial hosts!

Thanks for attending :)

Brane Demo at the UMC Utrecht

The second tutorial given about Brane was held at the UMC Utrecht to conclude a Proof-of-Concept performed with them, which focused on Brane serving as data sharing infrastructure for various analysis on pseudonamised patient data.

The tutorial is written for framework version 3.0.0.

The demo is split in two halves: the first half consists of a presentation introducing the framework at a generic SIG-meeting, whereas the second half features a hands-on session and a more technical presentation about the setup of Brane in the Proof-of-Concept.

The following resources are used, which are hosted on this website:

  • First half: SIG-meeting
  • Second half: Workshop
    • Slides (here)
    • Handout for hands-on part of the tutorial (here)

Part 1: Hello, world!

In this document, we detail the steps to take to write a simple package within the EPI Framework (also called the Brane Framework). Specifically, this tutorial will focus on how to install the CLI client (the brane executable), build a hello_world package and then call the hello_world() function within that package on a local machine. Finally, we will also practise submitting the code to a remote machine.

Background

The framework revolves around workflows, which are high-level descriptions of an algorithm or data processing pipeline that the system will execute. Specifically, every workflow contains zero or more tasks, which are conceptual functions that take an in- and output, composed in a particular order or control flow. It helps to think of them as graphs, where the nodes are tasks that must be called, and the edges are some form of data flowing between them.

We could formalise a particular data pipeline as a workflow. For example, suppose we have the following function calls:

g(f(f(input)), h(f(input)))

We can then represent this as a workflow graph of tasks that indicates which tasks to execute and how the data flows between them. This is visualised in Figure 1.

A workflow with three functions and four steps.

Figure 1: A very simple workflow using three tasks, f, g and h. The nodes represent a function call, whereas the edges represent some data dependency. Specifically, this workflow depicts f has to be run first, then a second call of f and a new call of h can run in parallel because they don't depend on each other, after which a third function g must be called.

While workflows can be expressed in any kind of language, the EPI Framework features its own Domain-Specific Language (DSL) to do so, called BraneScript1. This language is very script-like, which allows us to think of tasks as a special kind of function. Any control flow (i.e., dependencies between tasks) is then given using variables and commonly used structures, such as if-statements, for-loops, while-loops, and less commonly used structures, such as on-structs or parallel-statements.

1

There is a second DSL, Bakery, which is more unique to the framework and features a natural language-like syntax. However, this language is still in development, and so we focus on BraneScript.

Objective

In this tutorial, we will mainly focus on creating a single task to execute. Traditionally, this task will be called hello_world(), and will return a string "Hello, world!" once it is called. To illustrate, the following Python snippet implements the logic of the task:

def hello_world():
	return "Hello, world!"

By using print(hello_world()), we can print Hello, world! to the terminal. In the EPI Framework, we will implement the function as a task, and then write a very simple workflow that implements the print-part.

Installation

To start, first download the brane executable from the repository. This is a command line-based client for the framework, providing a wide range of tools to use to develop packages and workflows for the EPI Framework. We will use it to build and then test a package, which can contain one or more tasks. Since we are only creating the hello_world() task, our package (called hello_world) will contain only one task.

The executable is pre-compiled for Windows, macOS (Intel and M1/M2) and Linux. The binaries in the repository follow the pattern of <name>-<os>-<arch>, where <name> is the name of the executable (brane for us), <os> is an identifier representing the target OS (windows for Windows, darwin for macOS and linux for Linux), and <arch> is the target processor architecture (x86_64, typically, or aarch64 for M1/M2 Macs).

To make your life easy, however, you can directly download the binaries here:

info When in doubt, choose x86_64 for your processor architecture. Or ask a tutorial host.

Once downloaded, it is recommended to rename the executable to brane to follow the calling convention we are using in the remainder of this document. Open a terminal in the folder where you downloaded the executable (probably Downloads), and run:

:: For Windows
move .\brane-windows-x86_64 .\brane
# For macOS (Intel)
mv ./brane-darwin-x86_64 ./brane
# For macOS (M1/M2)
mv ./brane-darwin-aarch64 ./brane
# For Linux
mv ./brane-linux-x86_64 ./brane

If you are on Unix (macOS/Linux), you probably want to execute a second step: by just renaming the executable, you would have to call it using ./brane instead of brane. To fix that, add the executable to somewhere in your PATH, e.g.:

sudo mv ./brane /usr/local/bin/brane

If you installed it successfully, you can then run:

brane --version

without getting not found-errors.

info If you don't want to put brane in your PATH, you can also replace all occurrences of brane with ./brane in the subsequent commands (or any other path/name). Additionally, you can also run:

export PATH="$PATH:$(pwd)"

to add your current directory to the PATH variable. Note that this lasts only for your current terminal window; if you open a new one or restart the current one, you have to run the export-command again.

Writing the code

The next step is to write the code that we will be running when we execute the task. In the EPI Framework, tasks are bundled in packages; and every package is implemented in a container. This means that every task has its own dependencies shipped within, and that multiple tasks can share the same dependencies. This also means that a task can be implemented in any language, as long as the program follows a particular convention as to how it reads input and writes output. Specifically, the EPI Framework will call a specific executable file with environment variables as input, and then retrieve return values from it by reading the executable's stdout.

For the purpose of this tutorial, though, will choose Python to implement our hello_world-function. Because our function is so simple, we will only need a single file, which we will call hello.py. Create it, and then write the following in it (including comments):

#!/usr/bin/env python3

def hello_world():
	return "Hello, world!"

print(f'output: "{hello_world()}"')

Let's break this down:

  • The first line, #!/usr/bin/env python3 is a line that tells the operating system that this file is a Python script (it defines it must be called with the python3 executable). Any file that has this on the first line can be called by just calling the file instead of having to prefix python3, e.g.,
    ./hello.py
    
    instead of
    python3 hello.py
    
    This is important, because the framework will use the first calling convention.
  • The function, def hello_world(): ..., is the same function as presented before; it simply returns the "Hello, world!" string. This is actually the functionality we want to implement.
  • The final line, print(f'output: "{hello_world()}"') prints the generated string to stdout. Note, however, that we wrap the value in quotes (") and prefix it with output: we do this because of the convention that packages for the EPI Framework have to follow. The framework expects the output to be given in the YAML format, under a specific name. We choose output (see below).

And that's it! You can save and close the file, while we will move to the second part of a package: the container.yml file.

Writing the container.yml

A few text files do not make a package. In addition to the raw code, the EPI Framework also needs to know some metadata of a package. This includes things such as its name, its version and its owners, but, more importantly, also which tasks the package contributes.

This information is defined using a file conventionally called container.yml. This is another YAML file where the toplevel keys contribute various pieces of metadata. Create a file with that name, and then write the following to it:

# A few generic properties of the file
name: hello_world
version: 1.0.0
kind: ecu

# Defines things we need to install
dependencies:
- python3

# Specifies the files we need to put in this package
files:
- hello.py

# Defines which of the files is the file that the framework will call
entrypoint:
  kind: task
  exec: hello.py

# Defines the tasks in this package
actions:
  'hello_world':
    command:
    input:
    output:
    - name: output
      type: string

This is quite a lot, so we will break it down in the following subsections. Every subsection will contain the highlighted part of the container.yml first, and then uses three dots (...) to indicate parts that have been left out for that snippet.

Minimal metadata

# A few generic properties of the file
name: hello_world
version: 1.0.0
kind: ecu

...

The top of the file starts by providing the bare minimum information that the EPI Framework has to know. First are the name of the package (name) and the version number (version). Together, they form the identifier of the package, which is how the system knows which package we are calling tasks from.

Then there is also the kind-field, which determines what kind of package this is. Currently, the only fully implemented package kind is an Executable Code Unit (ECU), which is a collection of arbitrary code files. However, other packages types may be supported in the future; for example, support for external Workflow files (BraneScript/Bakery) or, if network support is added, OpenAPI containers.

Specifying dependencies

...

# Defines things we need to install
dependencies:
- python3

...

Because packages are implemented as containers, we have the freedom to specify the set of dependencies to install in the container. By default, the framework uses Ubuntu 20.04 as its base image, and the dependencies specified are apt-packages. Note that the base container is fairly minimal, and so we have to specify we need Python installed (which is distributed as the python3-package).

Collecting files

...

# Specifies the files we need to put in this package
files:
- hello.py

...

Then the framework also has to know which files to put in the package. Because we have only one file, this is relatively simply: just the hello.py file. Note that any filepath is, by default, relative to the container.yml file itself; so by just writing hello.py we mean that the framework needs to include a file with that name in the same folder as container.yml.

info The files included will, by default, mimic the file structure that is defined. So if you include a file that is in some directory, then it will also be in that directory in the resulting package. For example, if you include:

files:
- foo/hello.py

then it will be put in a foo directory in the container as well.

Setting the entrypoint

...

# Defines which of the files is the file that the framework will call
entrypoint:
  kind: task
  exec: hello.py

...

Large projects typically have multiple files, and only one of them serves as the entrypoint for that project. Moreover, not every file included will be executable code; and thus it is relevant for the framework to know which file it must call. This is specified in this snippet: we define that the hello.py file in the container's root is the one to call first.

As already mentioned, the framework will call the executably "directly" (e.g., ./hello.py in this case). This means that, if the file is a script (like ours), we need a shebang (e.g., #!/usr/bin/env python3) string to tell the OS how to call it.

info Even if your package implements multiple tasks, it can only have a single entrypoint. To this end, most packages define a simple entrypoint script that takes the input arguments and uses that to call an appropriate second script or executable for the task at hand.

Defining tasks

...

# Defines the tasks in this package
actions:
  hello_world:
    command:
    input:
    output:
    - name: output
      type: string

The final part of the YAML-file specifies the most important part: which tasks can be found in your container, and how the framework can call them.

In our container, we only have a single task (hello_world), and so we only have one entry. Then, if required, we can define a command-line argument to pass to the entrypoint to distinguish between tasks (the command-field). In our case, this is not necessary because we only have a single one, and so it is empty.

Next, one can specify inputs to the specific task. These are like function arguments, and are defined by a name and a specific data type. At runtime, the framework will serialize the value to JSON and make these available to the entrypoint using environment variables. However, because our hello_world() function does not need any, we can leave the input-field empty too.

Finally, in the output section, we can define any return value our task has. Similar to the input, it is defined by a name and a type. The name given must match the name returned by the executable. Specifically, we returned output: ... in our Python script, meaning that we must name the output variable output here as well. Then, because the output itself is a string, we denote it as such by using the type: string.

In summary, the above actions field defines a single function that has the following pseudo-signature:

hello_world() -> string

Building a package

After you have a container.yml file and the matching code (hello.py), it is time to build the package. We will use the brane CLI-tool for this, and requires Docker and the Docker Buildx-plugin to be installed.

On Windows and macOS, you should install Docker Desktop, which already includes the Buildx-plugin. On Linux, install the Docker engine for your distro (Debian, Ubuntu, Arch Linux), and then install the Buildx plugin using:

# Install the executable
docker buildx bake "https://github.com/docker/buildx.git"
mkdir -p ~/.docker/cli-plugins
mv ./bin/build/buildx ~/.docker/cli-plugins/docker-buildx

# Create a build container to use
docker buildx create --use

If you have everything installed, you can then build the package container using:

brane build ./container.yml

The executable will work for a bit, and should eventually let you know it's done with:

Successfully built version 1.0.0 of container (ECU) package hello_world.

If you then run

brane list

you should see your hello_world container there. Congratulations!

Running your package locally

All that remains is to see it in action! The brane executable has multiple ways of running packages locally: running tasks in isolation in a test-environment, or by running a local workflow. We will do both of these in this section.

The test environment

The brane test-subcommand implements a suite for testing single tasks in a package, in isolation. If you run it for a specific package, you can use a simple terminal interface to select the task to run, define its input and witness its output. In our case, we can call it with:

brane test hello_world

This should show you something like:

A list of tasks that can be executed when running `brane test hello_world`

If you hit Enter, the tool will query you for input parameters - but since there are none, instead it will proceed to execution immediately. If you wait a bit, you will eventually see:

A list of tasks that can be executed when running `brane test hello_world`

And that's indeed the string we want to see!

info The first time you run a newly built package, you will likely see some additional delay when executing it. This is because the Docker backend has to load the container first. However, if you re-run the same task, you should see a significant speedup compared to the first time because the container has been cached.

Running a workflow

The above is, however, not very interesting. We can verify the function works, but we cannot do anything with its result.

Instead of using the test environment, we can also write a very simple workflow with only one task. To do so, create a new file called workflow.bs, and write the following in it:

import hello_world;

println(hello_world());

Let's examine what happens in this workflow:

  • In the first line, import hello_world;, we tell the framework which package to use. We refer our package by its name, and because we omit a specific version, we let the framework pick the latest version for us (we could have used import hello_world[1.0.0]; instead).
  • In the second line, println(hello_world());, we call our hello_world() task. The result of it will be passed to a builtin function, println(), which will print it to the stdout.

Save the file, close the editor, and then run the following in your terminal to run the workflow:

brane run ./workflow.bs

If everything is alright, you should see:

A terminal showing the command to run the workflow, and then 'Hello, world!'

info The brane-tool also features an interactive Read-Eval-Print Loop (REPL) that you can use to write workflows as well. Run brane repl, and then you can write the two lines of your workflow separately:

A terminal showing the command to run the workflow, and then 'Hello, world!'

Because it is interactive, you can be more flexible and call it repeatedly, for example:

A terminal showing the command to run the workflow, and then 'Hello, world!'

Simply type exit to quit the REPL.

Running your package remotely

Of course, running your package locally is good for testing and for tutorials, but the real use-case of the framework is running your code remotely on a Brane instance (i.e., server).

Adding the instance

First, we have to make the brane-tool aware where the remote Brane instance can be found. We can use the brane instance-command for that, which offers keychain-like functionality for multiple instances to easily switch between.

Prior to this tutorial, we've setup an instance at brane01.lab.uva.light.net. To add it to your client, run the following command:

brane instance add brane01.lab.uvalight.net -a 50051 -d 50053 -n demo -u

To break down what this command does:

  1. brane instance add brane01.lab.uvalight.net tells the client that a new instance is being defined that is found at the given host;
  2. -a 50051 tells the client that the API service is found at port 50051 (the central registry service);
  3. -d 50053 tells the client that the driver service is found at port 50053 (the central workflow execution service);
  4. -n demo tells the client to call this new instance demo, which is an arbitrary name only useful to distinguish multiple instances (you can freely change it, as long as it's unique); and
  5. -u tells the client to use the instance as the new default instance.

Once the command completes, you can run the following command to verify it was a success:

brane instance list
A terminal showing `brane instance add ...` and `brane instance list`

Pushing your package

Now that you defined the instance to use, we can push your package code to the server so that it may use it.

This is done by running the following command:

brane push hello_world

This will push the specified package hello_world to the instance that is currently active. Wait for the command complete, and once it has, we can prepare the workflow itself for remote execution.

warning Note that this instance is shared by all participants in the system; so if you just upload your own package with the hello_world-name, you will probably overwrite someone else's. To avoid this problem, re-build your package with a unique name before pushing.

Adapting the workflow

In the ideal case, Brane can take a workflow that runs locally and deduce by itself where the steps in the workflow must be executed (i.e., plan it). However, unfortunately, the current implementation can't do this (ask me why :) ), and so we have to adapt our workflow a little bit to make it compatible with the instance that we're going to be running on.

Our instance has two nodes: worker1 and worker2. To tell Brane which of the instance we want to use, we can wrap the line that has the hello_world()-call in a so-called on-struct to force Brane to run it on that node.

Open the workflow.bs-file again, and write:

import hello_world;

on "worker1" {
    println(hello_world());
}

Save it and close it, and we're ready to run your workflow remotely!

Running remotely

In the first step of running the workflow remotely, we already defined the instance and marked it as default; so all that needs to be done is to run the same command again as before to execute the workflow, but now with the --remote-flag to indicate it must be executed on the currently active instance instead:

brane run ./workflow.bs --remote
A terminal showing the effects of `brane run ./workflow.bs --remote`

And that's it! While it looks like there isn't a lot of difference, your code just got executed on a remote server!

info You may see warnings relating to the 'On'-structures being deprecated (see the image above). This can safely be ignored; they will be replaced by a better method soon, but this method is not implemented yet.

Using the IDE

If the workflow is going to be remotely, one can also step away from the CLI-tool and instead use the Brane IDE-project, which is built on top of Jupyter Lab to provide a BraneScript notebook interface.

warning Note that currently, only writing and running workflows is supported (i.e., the brane run ... command). Managing packages still has to be done through the CLI.

To use it, download the source code from the repository and unpack it. Also download the Brane CLI library, which is used by the IDE to send commands to the Brane server. You can download it here. Unpack it as well, and place the libbrane_cli.so file in the root folder of the Brane IDE repository.

Once you have everything in place, you can launch an IDE connecting to this tutorial's Brane instance by running in the repository root:

./make.py start-ide -1 http://brane01.lab.uvalight.net:50051 -2 grpc://brane01.lab.uvalight.net:50053 

The command may take a second to complete, because it will first build the container that will run the server.

A terminal showing the initial effects of starting the IDE. A terminal showing the final effects of starting the IDE.

Once done, you can copy/paste the suggested link to your browser, and you should be greeted by something like:

A terminal showing the welcome page of the IDE.

If you click on the BraneScript-tile, you should see a notebook; and now you can run BraneScript workflows in the classic notebook fashion!

A terminal showing the execution of the workflow in Brane IDE.

Conclusion

And that's it! You've successfully written your first EPI Framework package, and then you ran it locally and verified it worked.

In the second half of the tutorial, we will focus more on workflows, and write one for an extensive package already developed by students. You can find the matching handout here.