[Spike] Identify options for handling Enterprise overnight search traffic spikes
There is an Enterprise Catalog service that sends requests to search/all at a high rate every night from around 4-4:30am EST. This results in an extended period of low Apdex for the discovery service and usually triggers an OpsGenie alert. Consider options for how to deal with this.
the upcoming upgrade of ElasticSearch to version 7 may give us an across-the-board improvement of response time for search queries
add rate limiting to this endpoint / requesting user (or adjust if rate limiting is already configured
add an alert policy in OpsGenie for a small time window when this alert keeps firing. This would be a temporary measure only
profile the endpoint to understand whether there’s an egregious slowness that can be fixed in code.
Distributed trace link:
Steps to Reproduce
performance analysis write-up might be helpful: