Orchestrating Docker 1.12 Swarm With Cloudify

Posted By: DeWayne Filppi on September 21, 2016

Docker | Docker Swarm | Cloud Orchestration | Kubernetes | Cloud Automation | DevOps | Containers

The recent release of Docker 1.12 introduced a highly upgraded version of Swarm baked into it. This new release puts native Docker container orchestration in direct competition with Google's Kubernetes, although Swarm doesn't match all of Kubernetes capabilities yet. The addition of features such as load balanced services, scaling, overlay networking, placement affinity and anti-affinity, security and high availability make for a compelling platform.

Join our Kubernetes webinar - Moving Monoliths to Microservices.  Go

Orchestration Strategy for Docker Swarm

Having container orchestration native to the Docker platform makes the creation of cluster much easier than with Kubernetes. Swarm has a familiar master/worker architecture, with the capability of high availability for the master. Swarm uses a Raft implementation for leader election and consensus storage rather than an external provider (e.g. etcd). It also includes built-in overlay networking. The integrated nature of Swarm makes the orchestration straightforward.

swarm arch

As with the Cloudify Kubernetes Cluster Blueprint, the main value delivered by the orchestration of Swarm is:

  • creating a cluster in an infrastructure neutral way
  • auto-healing the cluster when cluster nodes fail
  • manual and/or auto-scaling the cluster when arbitrary metrics indicate the need.

Given these goals, the orchestration can be fairly simple because they fall into well worn patterns that Cloudify supports directly. This initial attempt starts a single manager and an arbitrary number of workers. The workers depend on the manager. The workers are outfitted with Diamond metric collectors, and scale and heal policies are defined in the blueprint. When workers are spun up, the get the security token from the manager and join swarm. When they are scaled down, the process is reversed. A load generating image is provided in the form of a Dockerfile to ease validating the scaling behavior, and part of the orchestration is the generation of the image on each worker node.

Orchestration Details

The blueprint defines two node types corresponding to the different Swarm node roles, manager and worker. The blueprint targets Openstack, and each of the roles is contained in a corresponding cloudify.openstack.nodes.Server type. The worker host nodes are outfitted with standard Cloudify metric collectors. The worker nodes are configured to depend on the manager node. The blueprint assumes that Docker is preinstalled on the image. While Cloudify could easily automate the install of Docker itself, the process is too time consuming to make sense in an auto-scaling use case.

docker swarm  blueprint
Blueprint Representation In Cloudify UI

The Manager

The manager stores its cluster token in runtime properties, which is used later by workers to join the swarm. The script that enables this is quite simple and is called as the result of the manager configuration in the blueprint.

    type: cloudify.nodes.SoftwareComponent
          implementation: scripts/start-manager.sh
                IP: {get_attribute: [manager_host, ip]}
      - target: manager_host
        type: cloudify.relationships.contained_in

Note that the IP of the containing host is passed through the environment to the very minimal configuration/startup required by Docker Swarm.

sudo docker swarm init --advertise-addr $IP

ctx instance runtime-properties master_token $(sudo docker swarm join-token -q manager)
ctx instance runtime-properties worker_token $(sudo docker swarm join-token -q worker)

To initialize a Swarm manager, only the init command is needed. The script then stores away the access tokens for future reference in the node runtime properties.

The Worker

The worker configuration is likewise simple and depends on the existence of the manager. A first step, to support easy demonstration of scaling, but which could also be used as a pattern for loading arbitrary images, is to put the "stress" image on each worker. The blueprint indicates this for the worker node as follows:

    type: cloudify.nodes.SoftwareComponent
        configure: scripts/configure-worker.sh

The configure-worker script uploads an archive from the blueprint that contains a Dockerfile and supporting artifacts, and builds it:

# create image for generating cpu load
ctx download-resource containers/stress.tgz /tmp/stress.tgz
cd /tmp
tar xzf /tmp/stress.tgz
cd /tmp/stress
sudo docker build -t stress:latest .

The next step is to actually start the worker and join the swarm. This is facilitated by Cloudify intrinsic functions that put required deployment info into the environment of the start-worker.sh script:

          implementation: scripts/start-worker.sh
                IP: {get_attribute: [worker_host, ip]}
                MASTERIP: {get_attribute: [manager_host, ip]}
                TOKEN: {get_attribute: [manager, worker_token]}

start-worker.sh then can run the very simple join command from Docker to join the cluster:

sudo docker swarm join --advertise-addr $IP --token $TOKEN $MASTERIP

Scaling and Healing

Worker hosts are installed with standard Cloudify Diamond Plugin facilitated metric collectors for metrics related to cpu, memory, and I/O. Autoscaling configuration in Cloudify consists of defining a group, which associates a number of blueprint nodes with a policy. The policy dictates under what circumstance the scaling (or other workflow) is triggered. The actual workflow (scale/heal or other) is associated in the scaling group definition. This means that the policy itself is just raising a flag (as it were): it doesn't command a certain workflow to execute, or even send parameters to a certain workflow. Since each group is statically defined in the blueprint, a separate group must be defined for each potential action. In the case of this Swarm blueprint, that means separate groups for scale up, scale down, and heal. Looking at the scale up group, you can see the policy association, and the policy configuration, which sets the threshold for scaling, the metric to use, and other parameters. Note that the actual policy implementation, in policies/scale.clj, is not baked into Cloudify itself, but is a general purpose autoscaling detection policy that you can reuse in your own blueprints.

   members: [worker_host]
       type: scale_policy_type
         policy_operates_on_group: true
         scale_limit: 4
         scale_direction: '<'
         scale_threshold: 50
         service_selector: .*worker_host.*cpu.total.user
         cooldown_time: 120
           type: cloudify.policies.triggers.execute_workflow
             workflow: scale
               delta: 1
               scalable_entity_name: worker_host
               scale_compute: true

Note that scale_policy_type is defined in the imported imports/scale.yaml file, which ultimately points at policies/scale.clj. The triggers section defines the workflow that will be executed (and its parameters) when the policy "raises its flag" (actually it calls the process-policy-triggers function). Nothing new or exotic here. The scale down group is almost identical, with slightly tweaked policy and workflow parameters. The heal group uses the built in host failure policy, which then triggers the built in heal workflow.

Test Drive

In order to test drive the Swarm integration with auto healing and scaling, you'll need access to a Openstack cloud and bootstrap a manager there. Recall that you'll need an Ubuntu 14+ image with Docker 1.12 installed. Then clone the blueprint from the git repo and edit the inputs/openstack.yaml file. Upload the blueprint to the manager and create a deployment using the inputs. Alternatively, you can create the deployment from UI and enter the inputs manually. Run the install workflow on the deployment. This will create a Swarm cluster with one manager and one worker.


From your Openstack Horizon dashboard, terminate the worker instance. Now return to the Cloudify Manager UI and note on the deployment view that the heal workflow has started.

swarm heal


From the Cloudify UI in the deployments view, get the public IP of the Swarm manager.

swarm ip

You'll need to access to the agent key for the Swarm manager to ssh there. ssh to the manager first from the CLI:

cfy ssh

Now ssh to the manager using the IP you got from the UI:

sudo ssh -i /root/.ssh/agent-key.pem ubuntu@<manager-ip>

Now that you're on the Swarm manager host, you can run the pre-installed service to generate load:

sudo docker service create --constraint 'node.role == worker' --restart-condition none stress /start.sh

This will run the stress tool on an arbitrary worker. In the current blueprint configuration, only the workers can auto-scale, and only metrics from the workers are used to decide whether to scale. This means the stress must be limited to worker nodes, which the Docker service placement constraint mechanism nicely supplies. Return to the Cloudify deployment view and see the scale workflow start:

swarm scale

Keep watching for a couple of minutes and another instance will join the cluster. Wait around for a few more minutes, and the deployment will scale down automatically to accommodate the decreased load.


Docker Swarm has stepped up big time to become a real competitor in the container management space. Cloudify can add value to a Swarm deployment by supplying portability, healing, and scaling capabilities beyond the container scaling and healing provided by Swarm itself. Cloudify can also orchestrate Swarm services in concert with systems external (possibly not containerized) to the Swarm infrastructure. The source code is available on github. As always, comments are most welcome.

Watch the video of this demo below:

blog comments powered by Disqus