2015.09.09 Asynchronous Task Processing
In room: Clinton, Nimisha, Ed, Mark, Zach, Miki, Joel, Renzo, Ned.
On hangout: Cale, Dave, Felipe, Jim, Peter Pinch.
Action Items:
- Renzo Lucioni (Deactivated) specify task naming
- Renzo Lucioni (Deactivated) clarify that workers are per-IDA
- What does ecomm need right now?
- make order fulfillment more robust
- retries in the case of system failure (modulestore down)
- devops wants a way to do asynch tasks that doesn't mean pushing the whole repo to another worker: wanted a lightweight way to do async
- devops wanted versioning to be sure new code could be pushed without fear
- if we make these tasks much smaller, do we have the same concerns about version changes?
- version skew was a concern that motivated some of the new design, and the new design definitely helps with it.
- performance: fulfillment is a bottleneck, so ecomm wants to make it async
- Concerns with the proposal:
- dependency conflicts: the proposal has all the tasks running in one virtualenv, so they have to agree on dependencies
- use of celery:
- celery doesn't have a pub-sub notion
- but this doesn't feel like a pressing concern
- replacing celery feels out of scope
- miki: fulfillment seems like an area that will keep growing
- do we need pubsub to deal with it?
- jim: the next six months don't need it
- dave: pubsub could be added afterward
- mixed workload:
- could big jobs starve little jobs?
- ed can imagine having a queue per task
- ballpark: we'll have a few dozen kinds of tasks?
- Dependency conflicts
- if the tasks are API-oriented, then the set of requirements will be tiny
- cale: fundamental worry
- IDAs are meant to be independent
- Should allow teams to work independently
- Now their tasks have to agree on dependencies
- Clarification: this proposal was intended to cover all edX async tasks eventually
- tasks within a team don't need isolation
- the team can coordinate requirements
- If devops is OK with different workers for different teams, then that lessens the conflicts to manageable levels
- Worker pool per IDA is OK with everyone
- Does ecomm want to be able to deploy individual tasks without deploying all of ecomm?
- Deploying tasks independent of the front end makes sense
- Deploying task A separate from task B isn't needed
- Versioning
- is it enough to make a new task when the version changes?
- ed is worried that task names will become gross, and wants a convention
- [ ] specify how to name tasks to deal with versions
- What data is passed to the task?
- Pass ids of objects, not the objects themselves.
- What if the object changes before the task runs?
- Should this be decided universally? Or case-by-case?
- Passing values means that you can detect if the data has changed in the meantime.
- Passing values also makes debugging easier
- Passing values makes idempotency harder
- Passing a version number with a reference can make things easier.
- Tasks should be idempotent
- Debugging
- multi-machine configuration makes debugging hard
- The proposal includes running tasks in-process for development.
- not enough: should also consider debugging in "more real" environments.
- Operational monitoring:
- Pro-active alerts. Queue getting full needs to raise alarms
- this is part of an overall monitoring scheme
- currently, ecomm relies on splunk. celery-flower is the new thing?
- rabbit is also monitored now, via splunk
- everything must be monitored
- Error recovery
- what if a worker drains a queue, but fails everything? What will retry those tasks?
- now we have a manual process to replay orders
- that would stay the same
- tasks would be responsible for retrying
Renzo will update the document, implementation can begin.