Ansible and CloudFormation

🎵… sitting in a tree…🎵

For both my personal projects, and projects at Unbounce, I use Ansible to create CloudFormation stacks. This brings some amazing user experience improvements for people not used to working with CloudFormation and its asynchronous behaviour. Before I show some of the benefits, I will explain what each piece of the technology is doing.

What is Ansible?

Ansible is a configuration and orchestration tool for servers and services. People primarily use Ansible to configure packages, libraries, and services on Linux machines (though other operating systems are supported). It is focused on idempotency, which is to ensure that running the Ansible playbook once will change the system, but subsequent runs will not effect any change.

---
- hosts: all
  tasks:
    - name: ensure nginx is installed
      apt:
        pkg: 'nginx'
        state: 'present'
      become: yes

Ansible playbooks and roles are written in YAML, which is not a programming language and handcuffs you from doing some crazy looping or conditionals. Only basic looping and conditionals are supported, which should get you 99% to where you want to be. When an Ansible module is used in a playbook, the underlying code is written in Python. Anyone can write a module, but the ones shipped with Ansible are good enough for most people and I rarely see anyone creating custom modules (but they are easy to write, so I suggest you try).

The best part about Ansible modules is that they contain all of the validation checks and retry logic, but this is not exposed to the writer of the Ansible playbooks. Thus, when you state "the package nginx should be present", the underlying packaging module will perform all the conditional checks to ensure the call is idempotent.

This abstraction is useful for when CloudFormation is used, due to the way each Amazon AWS service is just an API endpoint.

What is CloudFormation?

CloudFormation is a service from Amazon AWS that allows you to provision other AWS resources. For instance, instead of writing a script or clicking through the AWS console to create four EC2 machines, an Elastic Load Balancer, and some S3 buckets, you can put those specifications into a JSON template and tell CloudFormation to do it for you. The JSON template (albeit annoying to write) are declarative and give you immediate access to every attribute for almost every AWS service. It is a bit like being thrown in the back of the kitchen where sausage is being made, but the power alone is worth the sight.

{
  "AWSTemplateFormatVersion": "2010-09-09",
  "Description": "My awesome 2-tier web stack",
  "Parameters": {
    ...
  },
  "Resources": {
    ...
  }
}

Each AWS service is called via an HTTP API endpoint, and each endpoint has its own API limits. When someone starts using AWS, they invariably ask the question, "What are the API limits?" There is no official answer, even from AWS, because the limits change based on the load and health of the overall AWS system or service. You may find you can call the S3 API 10/second in the morning, then only 1/second in the afternoon. The official response from AWS is that, if you hit an API limit, you will be told via the HTTP response, and you should implement exponential back-off into your code. Thus you get the following logic:

BackOffMultiplier = 2
InitialWaitTime = 200ms
Loop
    Call API
    Receive API limit error
    Wait InitialWaitTime
    InitialWaitTime = InitialWaitTime * BackOffMultiplier

One of the best things CloudFormation provides the end-user is that the exponential back-off code is built into the service, so you never concern yourself with that. Instead, you get to focus on building the template of resources to create. Should there be a service outage when you are creating a resource, CloudFormation will happily wait, backing off as needed, and create the resource once the service is back online.

Many people, even us at Unbounce, started using CloudFormation by calling it with the AWS CLI tool. This allows us to build orchestration around gathering parameters/variables and such that our CloudFormation template requires, or interacting with other services (for event notifications and such). Typically it is written as a Bash script, but this can become unwieldy and complex to read.

Using the AWS CLI, one of the first things that you will notice is that the aws cloudformation create-stack call returns immediately. This is an asynchronous operation that returns the unique ID of the stack being created. This is good, but stacks take awhile to create all of their underlying resources, and people want to know whether or not the stack creation completed successfully or failure so that further action can be taken. The end result is that you are back to polling the CloudFormation API and adding in exponential back-off. Ugh.

Ansible Orchestrates CloudFormation

Ansible can help with the asynchronous nature of CloudFormation because the underlying module code contains all the back-off and retry logic. Once the module is running, it is a blocking call and will wait until the stack completes (successfully or not). This is very useful because services like CloudFront CDNs take 15 minutes or more to complete as each edge node needs to be updated.

$ ansible-playbook -i localhost.inventory create-stack.yml

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [find the latest ami] *****************************************************
ok: [localhost]

TASK [set_fact] ****************************************************************
ok: [localhost]

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "AMI ID to be used is: ami-g430f9dd"
}

TASK [ensure master stack exists] **********************************************
changed: [localhost]

PLAY RECAP *********************************************************************
localhost                  : ok=4    changed=1    unreachable=0    failed=0

Ansible also has a Vault, a way to encrypt secrets before they are stored in version control. The secrets are decrypted when the playbook is run and then applied to playbooks when needed. With Bash scripts, you have to perform the encryption and decryption yourself. Not having to write any code for this makes me sleep easier and focus on the task at hand.

Another nice feature of using Ansible is its strong support for idempotence, which I mentioned earlier. When applied to Cloudformation, you can run the same playbook multiple times, and the Ansible cloudformation module will ensure that the stack is changed only if its components (template or template parameters) are changed. Otherwise, the stack remains unchanged and the user is notified as such. Updating a stack is also called in the same way as creating a stack, so you (as the end-user) do not have to duplicate code. Underneath, Ansible is determining whether to create a new stack, update it, or leave it alone.

In this example, we can see that running the playbook again results in nothing being changed, because none of the resources in the template or the parameters given to it have changed.

$ ... make some changes to parameters or the template ...
$ ansible-playbook -i localhost.inventory create-stack.yml

PLAY [all] *********************************************************************

TASK [setup] *******************************************************************
ok: [localhost]

TASK [find the latest ami] *****************************************************
ok: [localhost]

TASK [set_fact] ****************************************************************
ok: [localhost]

TASK [debug] *******************************************************************
ok: [localhost] => {
    "msg": "AMI ID to be used is: ami-g430f9dd"
}

TASK [ensure master stack exists] **********************************************
ok: [localhost]

PLAY RECAP *********************************************************************
localhost                  : ok=5    changed=0    unreachable=0    failed=0

Because Ansible playbooks can continue one or more tasks (or roles), our playbooks create stacks and also notify monitoring services like Slack or DataDog (both of which have modules supported in Ansible). Thus, we get to have the same support for orchestration that we had in our Bash scripts, but now people do not have to write in Bash and it is supported in more operating systems.

- hosts: all
  connection: local
  tasks:
    - name: ensure stack exists
      cloudformation:
        stack_name: 'typicalrunt-website-production'
        region: 'us-east-1'
        state: 'present'
        template: 'files/stack-templates/template.json'
        template_parameters:
          DomainName: 'typicalrunt.me'
      register: cfn_result
    - name: message slack on success
      slack:
        token: "{{ slack_webhook_token }}"
        msg: 'A new version of the infrastructure has been deployed'
        channel: "#typicalrunt"
      when: cfn_result|changed

This playbook creates and/or updates the stack and then, only if the stack changed, will it notify me on Slack that a new version of the website infrastructure has been released.

A Note About Localhost

Ansible likes to work with remote servers. However, when using Ansible to orchestrate CloudFormation stacks, you are effectively telling Ansible to skip any remote activity. You should use a localhost inventory file and use connection: local in your playbooks.

Here is a sample localhost.inventory file that I use. Yes, it contains just one line.

localhost ansible_connection=local

In your playbook, you inform it (again) that it will be a local connection:

- hosts: all
  connection: local
  tasks:
    ...

This works for most workflows. However, it is possible that some workflows may need to interact with the EC2 servers that have been created by CloudFormation. In this case, you can use the add_host Ansible module to dynamically add the host to your inventory and then you can run remote tasks on it.