Hybrid Risk Management with AWS Systems Manager

This post was written by Chris Coombs – Cloud Architect at Datacom, and Samual Brown, Senior Technical Account Manager at AWS. Datacom is an AWS Premier Partner providing migration, transformation and managed services across Australia and New Zealand.

At Datacom our Cloud Ops team now use AWS (Amazon Web Services) Systems Manager as the default task runner and desired state configuration tool for all new managed services customers. Our on-premise solution had served us well for many years, but required multiple platforms, each with its own licensing costs and scaling challenges. The previous solution also had a significant operational impact, requiring frequent updates to maintain vendor support and complex infrastructure for high availability. With AWS Systems Manager, we don’t need to worry about licensing or the underlying infrastructure, it just works.

Our transition to AWS Systems Manager was born out of a desire to focus more on the customer and less on the tooling. Once we migrated to AWS Systems Manager however, we have found that it provides even more value thanks to its extensibility and ease of use.

Whilst AWS Systems Manager has many uses, this blog post focuses on our hybrid implementation and the risk dashboard we built on top of AWS Systems Manager.

Activating AWS Systems Manager

When setting up the AWS Systems Manager agent on AWS EC2, you would usually create an instance profile to allow the agent to run. An often overlooked feature of AWS Systems Manager is that it will also run outside of AWS; however, as your on-premises hypervisor doesn’t understand IAM (Identity and Access Management), AWS provides another mechanism for configuring AWS Systems Manager – activation codes.

With activation codes, you can install the AWS Systems Manager agent prior to a cloud migration or as part of a multi-cloud strategy. What’s more is that you can also use the AWS System Manager activation codes in AWS itself, providing a standard setup for your entire fleet, whether it’s on AWS, on-premises or within other public cloud platforms.

Naming Instances

If we add an instance to AWS Systems Manager using activation codes, it appears in the management console with a funny looking ID, something like mi-1234. Don’t be fooled by the bit after the m (i-1234), that isn’t the AWS instance ID! So how do we map AWS Systems Manager IDs to AWS instance IDs (or some other on-premises ID)? Simple, we give it a name!

screenshot 1Managed Instances tab of the AWS Systems Manager console

We don’t give the instance a name during registration though, we actually have to specify the name during activation code creation. As such, we have to generate the codes in real time. We do this using an API backed by Lambda which we run as part of the instance UserData (or similar bootstrap script on non-AWS resources).

screenshot 2 code adjustedIt might seem odd not to use the native IAM integration with AWS Systems Manager in AWS, but this method doesn’t require development teams to mess around with IAM, and treating all instances in the same way ensures that we have a single workflow for all instances, regardless of their location.

Stating the Risk

AWS Systems Manager provides a lot of power to operations teams for running scheduled and ad hoc commands against entire hybrid workloads at once, which is a time saver for Ops. Where AWS Systems Manager really excels is in its flexibility, for example we also use it to report on compliance and security risk in near real time, which provides huge customer value.

screenshot 3Datacom risk dashboard

With AWS Systems Manager we can run State Manager (a scheduled command of sorts) in either of two modes. First we run in a report only mode. This allows us to gather patching, anti-virus and compliance information from the entire fleet without breaking anything. We can then discuss this data (using our risk dashboard) with the business, who may accept some risks (e.g. a legacy application, which the vendor won’t let you patch) but may mandate others (e.g. AV). With this information we can then move some or all workloads into enforcement mode, and it’s as simple as switching the AWS Systems Manager tag from report to enforce!

This is great for migrations. We can run the agent on-premises, analyse the results and remediate any gaps (e.g. missing AV) using the Run Command prior to relocation, reducing both the risk of rollback and the duration of the migration window. It also has the benefit of providing real time insight into born in the cloud workloads, which disappear at night or scale massively during the day. What’s really powerful is that the business can see what the risk profile looks like at any point in time, they can set alerts and take action with their development teams as things change.

What’s Next?

The extensibility of AWS Systems Manager is one of its greatest features. With AWS Systems Manager you can build a solution using cutting edge AWS technology and run it anywhere, from AWS to traditional tin. What Datacom build next is up to you. The idea for our risk dashboard came from customer feedback, and we’d love to hear what challenges you’re facing and how we can help.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s