WAIT_TIMEOUT_SECONDS must be less than the boto http_socket_timeout


Jenkins job failures for the olive minos `terminate-instances` job were occurring when the SQS queue is empty (e.g. [job#449675](, but succeed when there are any messages in the queue (e.g. [job#449676](

From the errors reported in the failed job, we determined that, when the queue is empty, the [boto http_socket_timeout of 3 sec]( was being hit before the [configured queue wait timeout of 10 sec](

Since the `boto.cfg` file is part of the `aws` role and so is shared by lots of services, the best solution here was to decrease the queue wait to match the upstream [1 second timeout]( instead of increasing the boto timeout.

To verify this fix, I:

  • The upstream reduced queue wait timeout was made as part of upgrading minos to boto3, however issues with using this upgrade on olive caused that change to be reverted.
    We'll need to investigate these issues more thoroughly to maintain this configuration repo moving forward.

  • We are also working on changes to remove the need for OpenCraft to merge configuration changes to the `edx:configuration/olive` fork/branch, so we don't have to pester you about this stuff in future


