2015.09.09 Asynchronous Task Processing
Asynchronous Task Processing Architecture V2
In room: Clinton, Nimisha, Ed, Mark, Zach, Miki, Joel, Renzo, Ned.
On hangout: Cale, Dave, Felipe, Jim, Peter Pinch.
Action Items:
What does ecomm need right now?
make order fulfillment more robust
retries in the case of system failure (modulestore down)
devops wants a way to do asynch tasks that doesn't mean pushing the whole repo to another worker: wanted a lightweight way to do async
devops wanted versioning to be sure new code could be pushed without fear
if we make these tasks much smaller, do we have the same concerns about version changes?
version skew was a concern that motivated some of the new design, and the new design definitely helps with it.
performance: fulfillment is a bottleneck, so ecomm wants to make it async
Concerns with the proposal:
dependency conflicts: the proposal has all the tasks running in one virtualenv, so they have to agree on dependencies
use of celery:
celery doesn't have a pub-sub notion
but this doesn't feel like a pressing concern
replacing celery feels out of scope
miki: fulfillment seems like an area that will keep growing
do we need pubsub to deal with it?
jim: the next six months don't need it
dave: pubsub could be added afterward
mixed workload:
could big jobs starve little jobs?
ed can imagine having a queue per task
ballpark: we'll have a few dozen kinds of tasks?
Dependency conflicts
if the tasks are API-oriented, then the set of requirements will be tiny
cale: fundamental worry
IDAs are meant to be independent
Should allow teams to work independently
Now their tasks have to agree on dependencies
Clarification: this proposal was intended to cover all edX async tasks eventually
tasks within a team don't need isolation
the team can coordinate requirements
If devops is OK with different workers for different teams, then that lessens the conflicts to manageable levels
Worker pool per IDA is OK with everyone
Does ecomm want to be able to deploy individual tasks without deploying all of ecomm?
Deploying tasks independent of the front end makes sense
Deploying task A separate from task B isn't needed
Versioning
is it enough to make a new task when the version changes?
ed is worried that task names will become gross, and wants a convention
[ ] specify how to name tasks to deal with versions
What data is passed to the task?
Pass ids of objects, not the objects themselves.
What if the object changes before the task runs?
Should this be decided universally? Or case-by-case?
Passing values means that you can detect if the data has changed in the meantime.
Passing values also makes debugging easier
Passing values makes idempotency harder
Passing a version number with a reference can make things easier.
Tasks should be idempotent
Debugging
multi-machine configuration makes debugging hard
The proposal includes running tasks in-process for development.
not enough: should also consider debugging in "more real" environments.
Operational monitoring:
Pro-active alerts. Queue getting full needs to raise alarms
this is part of an overall monitoring scheme
currently, ecomm relies on splunk. celery-flower is the new thing?
rabbit is also monitored now, via splunk
everything must be monitored
Error recovery
what if a worker drains a queue, but fails everything? What will retry those tasks?
now we have a manual process to replay orders
that would stay the same
tasks would be responsible for retrying
Renzo will update the document, implementation can begin.