Introduction

In this series of chapters, we will discuss the role of system administrators and how they may prepare their system for Brane. The chapters will talk about what the requirements are on their system and what kind of information they are expected to share with the Brane instance. Finally, we will also discuss defining datasets.

To know more about the inner workings of Brane, we recommend you checkout the Brane: A Specification book. That details the framework's inner workings.

Background & Terminology

The Brane instance defines a control node (or central node), which is where the orchestrator itself and associated services run. This node is run by the Brane administrators. Then, as a counterpart to this control node, there is the worker plane, which is composed of all the different compute sites that Brane orchestrates over. Each such compute site is referred to as a domain, a location or, since Brane treats them as a single entity, a worker node. Multiple worker nodes may exist per physical domain (e.g., a single hospital can have multiple domains for different tasks), but Brane will treat these as conceptually different places.

Within the framework, a system administrator is someone who acts as the 'technical owner' of a certain worker node. They are the ones who can make sure their system is prepared and meets the Brane requirements, and who defines the security requirements of any operation of the framework on their system. They are also the ones who make any data technically available that is published from their domain. And although policies are typically handled by policy writers, another role in the framework, in practise, this can be the same person as the system administrator.

The Central node

For every Brane instance, there is typically only one control node. Even if multiple VMs are used, the framework expects it to behave like a single node; this is due to the centralized nature of it.

The control node consists of the following few services:

  • The driver service is, as the name suggests, the driving service behing a control node. It takes incoming workflows submitted by scientists, and starts executing them, emitting jobs that need to be executed on the worker nodes.
  • The planner service takes incoming workflows submitted to the driver service and plans them. This is simply the act of defining which worker node will execute which task, and takes into account available resources on each of the domains, as well as policies that determine if a domain can actually transfer data or execute the job.
  • The registry service (sometimes called central registry service or API service for disambiguation) is the centralized version of the local registry services (see below). It acts as a centralized database for the framework, which provides information about which dataset is located where, which domains are participating and where to find them1, and in addition hosts a central package repository.
  • Finally, the proxy service acts as a gateway between the other services and the outside world to enable proxying (i.e., it does not accept proxied requests, but rather creates them). In addition, it is also the point that handles server certificates and parses client certificates for identifications.

For more details, check the specification.

Note that, if you need any compute to happen on the central node, this cannot be done through the central node itself; instead, setup a worker node alongside the central node to emulate the same behaviour.

The Worker node

As specified, a domain typically hosts a worker node. This worker node collectively describes both a local control part of the framework, referred to as the framework delegate, and some computing backend that actually executes the jobs. In this section, we provide a brief overview of both.

The delegate itself consists of a few services. Their exact working is detailled in the specification, but as a brief overview:

  • The delegate service is the main service on the delegate, and takes incoming job requests and will attempt to schedule them. This is also the service that directly connects to the compute backend (see below). You can think of it as a local driver service.
  • The registry service (sometimes called local registry service for disambiguation) keeps track of the locally available datasets and intermediate results (see the data tutorial for Software Engineers or the data tutorial for Scientists for more information) and acts as a point from where the rest of the framework downloads them.
  • The checker service acts as the Policy Enforcement Point (PEP) for the framework. It hosts a reasoner, typically eFLINT, and is queried by both the delegate and registry services to see if operations are allowed.
  • Finally, the local node also has proxy service, just like the central node.

As for the compute backend, Brane is designed to connect to different types. An overview:

  • A local backend schedules new jobs on the same Docker engine where the control plane of Brane runs. This is the simplest infrastructure of them all, and requires no other preparation than required when installing the control plane. This is typically the choice of backend when the worker node is running on a single server or VM.
  • A VM backend uses an SSH connection (via the Xenon middleware) to launch jobs on the Docker engine of another server or VM. This is typically useful for simple setups that still emphasise a split between a local control plane and a local compute plane, but don't have extensive clusters to connect to.
  • A Kubernetes backend connects to a Kubernetes cluster on which incoming jobs are hosted. This is the recommended option if you need larger compute power, since Kubernetes is designed to work with containers.
  • A Slurm backend connects to a Slurm cluster on which incoming jobs are hosted. This infrastructure type may be harder to setup, as Slurm does not have any builtin container support. However, when setup properly, it can be used to connect to existing large-scale compute clusters to execute Brane jobs on.

More information on each backend and how to set it up is discussed in the backends chapter(s).

Next

To start setting up your own worker node, we recommend checking out the installation chapters. These will walk you through everything you need to setup a node, both control nodes and worker nodes.

For information on setting up different backends, check the backend chapters.

Alternatively, if you are looking for extensive documentation on the Brane configuration files relevant to a worker node, checkout the documentation chapters.