Putting terminated instances in wait mode/error setting up put-lifecycle-hook on AWS
Sometimes when your AWS instances are terminated by AWS autoscaling, you want to know why. AWS has a feature where it will put those instances in a Terminating:Wait mode so that they no longer respond to requests but are still available, so you can ssh in and diagnose the problems.
If you follow the steps in the AWS tutorial Analyzing an Instance Before Termination you're bound to run into a few problems.
Problem #1 "Invalid choice"
If you try to run the given command:
aws autoscaling put-lifecycle-hook --lifecycle-hook-name WaitForDiagnostics --auto-scaling-group-name group-name --lifecycle-transition autoscaling:EC2_INSTANCE_TERMINATING --notification-target-arn arn:aws:sns:us-east-1:999999999999:topic-name --role-arn arn:aws:iam::999999999999:role/role-name
You may see the following error:
aws: error: argument operation: Invalid choice, valid choices are…
This happens when you have an older version of the AWS CLI (command line interface). Frustratingly, the latest package available via yum is out-of-date.
You have to install it, not with yum, but with pip (the Python package installer). I'm assuming you already have pip installed (if not, this step is left as an exercise to the reader).
sudo pip install --upgrade awscli
will install the latest version.
The above command should now run as expected.
Problem #2 "Unable to publish test message"
Ha ha, just kidding. Of course there are more errors. If you get this one:
A client error (ValidationError) occurred when calling the PutLifecycleHook operation: Unable to publish test message to notification target arn:aws:sns:us-east-1:999999999999:topic-name using IAM role arn:aws:iam::999999999999:role/role-name. Please check your target and role configuration and try to put lifecycle hook again.
there are (likely) two problems:
The IAM role doesn't have permissions for the needed SNS actions. Most people familiar with IAM and such probably already figured this and adjusted accordingly, for the record, here's the policy you need for that ( A simpler way to get this is just to add the managed police AutoScalingNotificationAccessRole):
{ "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Resource": "*", "Action": [ "sqs:SendMessage", "sqs:GetQueueUrl", "sns:Publish" ] } ] }
The IAM role doesn't have a trust relationship with autoscaling.amazonaws.com. Honestly, I wasn't even aware of trust relationships before this and I'm not even sure what they are or what they do. But without the right trust relationship, you cannot post messages based on actions taking place within autoscaling — or at least, not for the purposes of this fix. It wasn't until I looked at a sample IAM role created exclusively for Auto Scaling notifications that I figured this out. To solve this, go to the IAM Roles Page, choose your role, then click on the "Trust Relationships" tab and then click on "Edit Trust Relationship". In the section labeled "Statement" you want to add the following (make sure to put a comma between it and the previous statement):
{ "Effect": "Allow", "Principal": { "Service": "autoscaling.amazonaws.com" }, "Action": "sts:AssumeRole" }
Once I did this, the put-lifecycle-hook finally worked.
If at a later point you decide you don't want instances put into the Terminating:Wait state, you can first use the command
aws autoscaling describe-lifecycle-hooks --auto-scaling-group-name group-name
to see what you named your hook and then
aws autoscaling delete-lifecycle-hook --auto-scaling-group-name group-name --lifecycle-hook-name hook-name
to delete it.















