Rolling Release with Ansible#

Problem#

How can we use Ansible to follow a Rolling Release / Rolling Deployment strategy?
How can we ensure that our service remains available at deployment time?

Solution#

Use serial in a Play:

Run the tasks on one host before going to the next host.

Note

As an additoinal benefit, if the one host fail, the Play would stop because 100% of the hosts have failed. For more control over this, there is a Play argument max_fail_percentage.

- hosts: nameservers
  serial: 1
  tasks:
  ...

For larger clusters, we can define serial as a list and define how much of hosts are allowed to fail at max using max_fail_percentage:

- hosts: apps
  max_fail_percentage: 30
  serial:
    - 1
    - 3
    - 10%
    - 30%
  tasks:
  ...

Explanation#

Especially when using a logical cluster, e.g. several authoritative DNS servers, CDN hosts or a worker queue, a rolling release strategy helps to improve the user experience to keep the some hosts untouched if anything goes wrong.

With serial we define the batch size, i.e. the maximum number of hosts (rounded up to 1) that Ansible is allowed to change on the next run. The Play now runs in a loop until all hosts have been taken into account.

If serial is defined as a list, the Batch size is recalculated for each run based on the value in the list, the last value is then applied as long as it still has hosts.

Reduce Pressure on Peripheral Systems

If our fleet grows, serial execution can reduce the pressure on peripheral systems. Running hundreds or thousands of package updates at the same time can have a noticeable impact on a centralised package repository. Our Playbook run can even slow down or lead to errors due to the load on the peripheral system. Keep this in mind.