Emitting UserData Events With Bosky

This article assumes intermediate knowledge of an EC2 instance's lifecycle and various AWS services.

When an Linux EC2 instance starts up, user data runs as part of the cloud-init system. This allows system administrators to configure an EC2 instance at runtime, exactly once, as user data does not automatically run ever again once the server has started.

At Unbounce, we built our user data to install our services onto the machine at runtime, then configure it for the specific environment (production, staging, etc.) that it required. One issue was what to do with failures within the user data script. When a failure happens and fails to start the service successfully, the machine (if behind an auto-scaling group) will be terminated and we lose why the user data failed. The first fix for this is to use off-box logging, like Cloudwatch Logs or, in our case, SumoLogic. That helped, but the turnaround time between the log service receiving the log entry and developers being notified can reach upwards of 5 minutes. By that time, the box is dead and gone. And this all assumes that enough user data ran to configure and start the off-site logging service successfully.

We then introduced another fix to this solution. We introduced a single SNS topic that user data can publish to in the event of user data failures. This works quite well, and through Lambda or OpsGenie, we get a near real-time response on a failing system. The issue with this is that the EC2 instance now depends on a resource, and requires permissions to publish to it. As noted by my previous article about decoupled event bus architectures, this is a tightly coupled leaky abstraction.

Using what I learned from the previous article, I wanted user data to publish an event, upon success or failure, that another system can read and process into an actionable response. I also wanted this solution to be easily deployed to systems without dependency management.

Introducing Bosky, a tool allowing system administrators to emit user data events to the CloudWatch Events service, to be eventually picked up by a consumer that care about specific types of user data events.

Depending on the use case, an administrator may want to:

  • emit an event when user data failed so that Slack notifies the correct team that owns the machine, by looking at its tags. This can be encapsulated in a CloudWatch Event Rule and a Lambda function.
  • emit an event when user data finishes successfully, so that further continuous delivery or smoke testing can start.
  • emit an informational event in case developers need a near real-time play-by-play of what an EC2 instance is doing in user data.

All of this is possible with Bosky. Truth be told, all of this is possible without Bosky, but you would need to write your own calls to the CloudWatch Events API, meaning you'd need some dependencies as well (e.g. AWS CLI, python, etc.).

Bosky is written in Go and released as a single statically-linked binary. Download the binary for your operating system and put it on the EC2 instance, then call it in your user data script. Bosky creates a predetermined, structured event that can be consumed by other processes, meaning you don't need to ensure the payload structure is the same everywhere.

Once installed, you can tell Bosky to emit an event like so:

bosky --info "I am an informational event"

Within CloudWatch Events, a consumer may choose to consume all user data events, with a pattern like this:

{
  "detail-type": [
    "User Data"
  ]
}

or maybe you only want to consume the user data informational events for a specific project/service. You first make sure the environment variable BOSKEY_PROJECT is set on your EC2 instance, then an Event Rule pattern looks like:

{
  "source": [
    "your-project-name-here"
  ],
  "detail-type": [
    "User Data"
  ],
  "detail": {
    "Status": [
      "info"
    ]
  }
}

And that's it. You can now consume, in near real-time, as many or as little of the user data events as you want. You can even test out the tool yourself without using EC2, simply download the binary specific to your operating system run it. Bosky is smart enough to detect whether it is run on EC2 or not, so you will need to provide it with a dummy instance ID (this is used in the event payload to show who is running the user data script – but for testing it does not matter).

bosky --instance-id i-123456 --fail 'testing failure event'

The benefits of Bosky are:

  • No need to write custom code
  • Decouple event publishing from consuming
  • Adheres to least privilege principles
  • No dependencies or libraries required
  • Runs fast
  • Deploys faster
  • Near real-time event notification
  • Better insight into your user data operations

Try it out and see if this helps you gain insight into how your user data is performing. Let me know what you think or, if you have issues or suggestions, please see the Github repository.