Forget SSH on AWS, Use SSM SessionManager
I don't often talk of employer, mainly to keep an arm's length distance between them and the writing on my blog. However, one of the great things about working at Unbounce is the concept of a Professional Development day (Pro-D). This is just like when you were in school and the teachers would take a day for themselves to improve. At Unbounce, every employee gets one day (8 hours) every 2 weeks to educate themselves and elevate their professional and career interests. Some of my best ideas have come out of things I learned on Pro-D day. Today, I decided to take a moment to learn about the AWS Systems Manager Session Manager (whoa, that's a mouthful).
In this article I'll detail my thoughts on the experience using this new service as well as some insight into the benefits for employees in security and infrastructure roles.
What is the SSM Session Manager?
A few years ago Amazon Web Services (AWS) released a service called Systems Manager. It is an agent installed and enabled on each EC2 machine that can report back to the central service in the Web console. It is used to provide inventory reports as well as autonomously run "documents", scripts that are transferred to the EC2 machine to run and do anything with root privileges (think Puppet). Dangerous? Absolutely, that's why there is a copious amount of access control and auditing, all backed by IAM. Then the ParameterStore came out, an awesome free secrets manager, which I've talked about previously (go read; I'll wait).
But then a month ago the SSM team at AWS released the Session Manager as a way to replace SSH CLI access to Linux and Windows machines. Even though EC2 machines are often intended to be ephemeral and short-lived ("cattle, not pets") there are situations where you just have to remotely administer the machine. The Session Manager, via the Web console, connects through the agent on the machine and opens a Websocket connection, providing you with a shell on the machine. It sounds like magic, and it helps provide some much needed security around remotely administering machines.
The Security Implications
Allowing someone to remotely connect to your EC2 machine from the AWS Web console is bound to raise some eyebrows at first. Let me help you understand why this is A Good Thing(tm).
In my position as Head of Security I talk with technical teams a lot who are building solutions for our customers, and it inevitably ends up being multiple sessions of design reviews where we build a threat model of the new service and try to identify and assess vulnerable points of entry. Whenever an EC2 machine is used the design review starts to get difficult because it needs to provide SSH access for emergency maintenance.
That one feature is the cause of so many issues because you have many delicate items that must be carefully maintained: firewall configurations, network ACLs, user accounts, SSH public keys, and two-factor authentication (primarily, Duo). I require it from our teams not just because of good hygiene, but also because our customers routinely ask about our operational security. Sure it's not sexy, but it's a necessity.
I have had a number of conversations with colleagues how the world would be easier if we could just shut off port 22 (the SSH port). But the need for emergency maintenance stymies that wish. With Session Manager this wish can be granted (sort of; keep reading). Not only that, your infrastructure and systems employees will love not having to manage SSH public keys or modify system user accounts. That work is not only toil, it means your systems are changing and that change could cause outages. We are now much closer to the goal of immutable systems and immutable infrastructure.
How SessionManager Works
Configuration and Setup
Getting setup was easy, but the Service is new so the latest AMIs don't have the correct version, even in the official package repositories. This will be fixed in week, I imagine. You will be notified of this when you try to start a session. Performing a manual install fixed the problem immediately. I use an Amazon Linux 2 AMI for my testing, so your experience may differ from mine. Your EC2 machine must have an IAM policy allowing access to the SSM service, and other services for delivering logs (CloudWatch, S3).
Once the SSM agent was running, the Web console noticed that this instance
was available to start a CLI session. I clicked on the button marked
Start Session and was brought to a page with a shell waiting for
input. A cursory
whoami revealed I was
ssm-user and I had passwordless
sudo privileges. I had full control of everything I needed, just like it
was a real SSH session. The responsiveness was good, but I did have a
session that didn't work properly – I could only use Enter, Ctrl-C and
Ctrl-D. Opening a new session fixed the problem, so I chalked it up to a
WebSocket connection problem.
In the AWS Web Console you can setup Session Manager to log the output of any session to an S3 bucket or to CloudWatch Logs. Do this immediately, your security staff will love you for it. Once I terminated a running session on my EC2 machine, I found it took 1-2 minutes for the console log to arrive in the S3 bucket. When I looked at the data, it showed everything printed to the screen. This is amazing… and scary. Be careful to restrict access to this bucket because any secrets shown on the EC2 machine during the session will also exist in the log. The same goes for CloudWatch Logs.
The ability to start sessions on EC2 machines can be controlled through IAM policies, which means fine-grained access to paticular machines (or groups of machines) can be provided to your employees. In addition to fine-grained access control, the ability to require MFA on IAM means you are effectively adding MFA capabilities to your EC2 machines without the use of a second service like Duo (sorry Duo, I still love your product though).
No SSH keys are used and everyone goes through a shared user account on the EC2 machine. This isn't an issue because the session is named after your IAM user (suffixed with random characters to make it unique), even through an IAM role switch between AWS accounts. This provides auditability without any complex infrastructure maintenance.
I should mention that, during all of my testing, there were no ingress rules on my EC2 machine's security group. None. I am so impressed with this technology.
I am pleased to see that CloudTrail captures all
for auditing. This also means that the event can be captured by
CloudWatch Events, processed, and sent to other services in AWS or
to third-parties (like Slack) to notify interested employees whenever
someone starts an "SSH" connection to an EC2 machine. Just think of the
monitoring and auditing capabilities that arises with this!
Unfortunately, there is no built-in CloudWatch Events pattern for
TerminateSession so you must fallback on matching
against CloudTrail API calls. It's not a big deal, but it means you may
have to wait 1-5 minutes for the event to be captured by CloudWatch
Events, instead of the usual 500ms with the built-in events. As is usual
with new features, it takes time to integrate into other AWS services so
I'm certain that a built-in CloudWatch Events pattern will appear soon.
Not everything was smooth sailing for me. All of the issues I encountered, except one, was minor and I know they will disappear over time. I'm listing them here to provide a balanced view of what to expect if you start using this service right now.
The Websocket connection code isn't the most stable. I had it fail twice in the 10 sessions I started on my machine. It could be my browser (Chrome) but I imagine it's just due to edge cases in the code that will be fixed later.
If the SSM Agent fails for some reason, there is no feedback on the AWS Web
console. You have to SSH into the machine and parse the agent logs. In
my case it was because Cloudwatch Logs encryption was not setup correctly.
The end result was that I could not initiate a new session on the machine until that was fixed. This is a good safe default, but the lack of feedback in the AWS console is unacceptable.
After terminating a session, the status of the session in the Web console remains as "Terminating" forever. I suspect this is a bug because I could not see any orphaned sessions on the EC2 machine when I started a new session.
Can We Stop Using SSH Now?
So is this the time when we can finally shutdown the SSH daemon (
and close port 22 on our security groups (firewall)? Not yet, but it
depends on your situation.
I don't place a lot of trust in computers, they fail often enough and services aren't always available 100% of the time. Thus, a disaster plan is needed in the event that you must get onto the machine.
I suggest keeping the SSH daemon enabled and running on an EC2 machine using the default keypair. Keep that keypair safely in the hands of a few people for use during emergency situations. Then remove the ingress SSH access on the security group. In the event SSM Session Manager fails, you can always enable the ingress rule again temporarily and get into the machine, but it is otherwise closed off to the world.
How Much Does it Cost?
Seriously. You only pay for usage of the underlying AWS resources (e.g. EC2, S3, etc.).
SSM Session Manager is a great new service and I can't believe it is free. Go try it out, the service is easy to setup, provides a great deal more automation features than you expect and it works amazingly well for the rare occasions that you need to remotely manage a machine. But make sure you keep your backup SSH access available until the kinks are ironed out of the system. Your security posture will improve immediately.
I honestly can't believe this stuff is free. Great work, Amazon!