2015.09.09 Asynchronous Task Processing

In room: Clinton, Nimisha, Ed, Mark, Zach, Miki, Joel, Renzo, Ned.
On hangout: Cale, Dave, Felipe, Jim, Peter Pinch.
Action Items:

 

  • What does ecomm need right now?
    • make order fulfillment more robust
      • retries in the case of system failure (modulestore down)
    • devops wants a way to do asynch tasks that doesn't mean pushing the whole repo to another worker: wanted a lightweight way to do async
    • devops wanted versioning to be sure new code could be pushed without fear
      • if we make these tasks much smaller, do we have the same concerns about version changes?
      • version skew was a concern that motivated some of the new design, and the new design definitely helps with it.
    • performance: fulfillment is a bottleneck, so ecomm wants to make it async
  • Concerns with the proposal:
    • dependency conflicts: the proposal has all the tasks running in one virtualenv, so they have to agree on dependencies
    • use of celery:
      • celery doesn't have a pub-sub notion
      • but this doesn't feel like a pressing concern
      • replacing celery feels out of scope
      • miki: fulfillment seems like an area that will keep growing
        • do we need pubsub to deal with it?
        • jim: the next six months don't need it
        • dave: pubsub could be added afterward
    • mixed workload:
      • could big jobs starve little jobs?
      • ed can imagine having a queue per task
        • ballpark: we'll have a few dozen kinds of tasks?
  • Dependency conflicts
    • if the tasks are API-oriented, then the set of requirements will be tiny
      • cale: fundamental worry
        • IDAs are meant to be independent
        • Should allow teams to work independently
        • Now their tasks have to agree on dependencies
    • Clarification: this proposal was intended to cover all edX async tasks eventually
    • tasks within a team don't need isolation
      • the team can coordinate requirements
    • If devops is OK with different workers for different teams, then that lessens the conflicts to manageable levels
    • Worker pool per IDA is OK with everyone
  • Does ecomm want to be able to deploy individual tasks without deploying all of ecomm?
    • Deploying tasks independent of the front end makes sense
    • Deploying task A separate from task B isn't needed
  • Versioning
    • is it enough to make a new task when the version changes?
      • ed is worried that task names will become gross, and wants a convention
      • [  ] specify how to name tasks to deal with versions
  • What data is passed to the task?
    • Pass ids of objects, not the objects themselves.
    • What if the object changes before the task runs?
      • Should this be decided universally? Or case-by-case?
    • Passing values means that you can detect if the data has changed in the meantime.
    • Passing values also makes debugging easier
    • Passing values makes idempotency harder
    • Passing a version number with a reference can make things easier.
  • Tasks should be idempotent
  • Debugging
    • multi-machine configuration makes debugging hard
    • The proposal includes running tasks in-process for development.
      • not enough: should also consider debugging in "more real" environments.
  • Operational monitoring:
    • Pro-active alerts. Queue getting full needs to raise alarms
    • this is part of an overall monitoring scheme
    • currently, ecomm relies on splunk. celery-flower is the new thing?
    • rabbit is also monitored now, via splunk
    • everything must be monitored
  • Error recovery
    • what if a worker drains a queue, but fails everything? What will retry those tasks?
      • now we have a manual process to replay orders
      • that would stay the same
    • tasks would be responsible for retrying

Renzo will update the document, implementation can begin.