Moving from RabbitMQ to Redis
What is RabbitMQ?
RabbitMQ is the message broker that edX previously used as a backend for Celery. Celery is a Python library that provides an API for running tasks asynchronously and abstracts away the particular backend used to run the tasks.
Where was it used?
RabbitMQ powered all async tasks done by the edx-platform as well as ecommerce, xqueue, notifier and PDF certificate generation.
Why did we move?
Over the course of 2017, RabbitMQ exhibited issues when scaling Celery workers out past a certain size. When the issues happened, RabbitMQ stopped accepting requests but did not provide any useful information about why it had stopped doing this. When RabbitMQ stopped accepting requests, various levels of outages occurred in the apps that use it. Given the severity of the impact and the lack of feedback from the system, we decided to switch to a different Celery backend - Redis.
Why Redis?
We chose Redis for a few different reasons.
In our testing a single Redis instance can support 10x more backend celery workers than our RabbitMQ setup.
Redis is available via AWS Elasticache, reducing the maintenance burden.
It also provides automated failover options managed by AWS - similar to what we have for our MySQL database in AWS RDS.
Using Redis for async tasks offers the potential to move to Redis as our cache backend as well, which would reduce the number of technologies that are needed to get Open edX up and running.