Devops occasionally gets alerts concerning untagged ec2 instances in the testeng AWS account. We try to make sure that all of our instances are tagged, but some are still managing to spin up without a tag. I think these are artifacts of the packer build process (in our build-packer-ami job on Jenkins). Packer grabs a vanilla Ubuntu AMI to provision with Ansible and then save it for later use. If the provisioning fails, packer will clean up the temporary instance. However, if the job is aborted or the packer process itself errors out, the clean up never occurs (and is never tagged, since Packer only tags successfully built AMIs). These workers can stay running indefinitely.
The Janitor job should clean up untagged workers in the testeng account. Perhaps set some sort of rule, like:
IMPORTANT: make sure that this rule is ONLY in effect for the testeng account. Do not kill anything in other accounts.
To determine the source of these untagged instances, I did the following
To find all untagged instances, run the following command:
Checked AWS console for launch time (in EST)
Checked https://build.testeng.edx.org/job/build-packer-ami/ for jobs that failed or were aborted around this time (but remember, Jenkins shows time in UTC). The beginning of the build log should have the instance-id for the temporary worker.
this seems like some good work, but I think we might still get alerts if there are untagged instances. Is it possible to tag these when they come up, or is there a pattern we can filter out, such as particular subnets that these end up in?
Rather than use a base ami directly from AWS in the packer job, could we take the ami, tag it, and save it in our account, so that every time an instance is created from our ami, it has a useful tag?