/
2015.09.09 Asynchronous Task Processing

2015.09.09 Asynchronous Task Processing

In room: Clinton, Nimisha, Ed, Mark, Zach, Miki, Joel, Renzo, Ned.
On hangout: Cale, Dave, Felipe, Jim, Peter Pinch.
Action Items:

 

  • What does ecomm need right now?
    • make order fulfillment more robust
      • retries in the case of system failure (modulestore down)
    • devops wants a way to do asynch tasks that doesn't mean pushing the whole repo to another worker: wanted a lightweight way to do async
    • devops wanted versioning to be sure new code could be pushed without fear
      • if we make these tasks much smaller, do we have the same concerns about version changes?
      • version skew was a concern that motivated some of the new design, and the new design definitely helps with it.
    • performance: fulfillment is a bottleneck, so ecomm wants to make it async
  • Concerns with the proposal:
    • dependency conflicts: the proposal has all the tasks running in one virtualenv, so they have to agree on dependencies
    • use of celery:
      • celery doesn't have a pub-sub notion
      • but this doesn't feel like a pressing concern
      • replacing celery feels out of scope
      • miki: fulfillment seems like an area that will keep growing
        • do we need pubsub to deal with it?
        • jim: the next six months don't need it
        • dave: pubsub could be added afterward
    • mixed workload:
      • could big jobs starve little jobs?
      • ed can imagine having a queue per task
        • ballpark: we'll have a few dozen kinds of tasks?
  • Dependency conflicts
    • if the tasks are API-oriented, then the set of requirements will be tiny
      • cale: fundamental worry
        • IDAs are meant to be independent
        • Should allow teams to work independently
        • Now their tasks have to agree on dependencies
    • Clarification: this proposal was intended to cover all edX async tasks eventually
    • tasks within a team don't need isolation
      • the team can coordinate requirements
    • If devops is OK with different workers for different teams, then that lessens the conflicts to manageable levels
    • Worker pool per IDA is OK with everyone
  • Does ecomm want to be able to deploy individual tasks without deploying all of ecomm?
    • Deploying tasks independent of the front end makes sense
    • Deploying task A separate from task B isn't needed
  • Versioning
    • is it enough to make a new task when the version changes?
      • ed is worried that task names will become gross, and wants a convention
      • [  ] specify how to name tasks to deal with versions
  • What data is passed to the task?
    • Pass ids of objects, not the objects themselves.
    • What if the object changes before the task runs?
      • Should this be decided universally? Or case-by-case?
    • Passing values means that you can detect if the data has changed in the meantime.
    • Passing values also makes debugging easier
    • Passing values makes idempotency harder
    • Passing a version number with a reference can make things easier.
  • Tasks should be idempotent
  • Debugging
    • multi-machine configuration makes debugging hard
    • The proposal includes running tasks in-process for development.
      • not enough: should also consider debugging in "more real" environments.
  • Operational monitoring:
    • Pro-active alerts. Queue getting full needs to raise alarms
    • this is part of an overall monitoring scheme
    • currently, ecomm relies on splunk. celery-flower is the new thing?
    • rabbit is also monitored now, via splunk
    • everything must be monitored
  • Error recovery
    • what if a worker drains a queue, but fails everything? What will retry those tasks?
      • now we have a manual process to replay orders
      • that would stay the same
    • tasks would be responsible for retrying

Renzo will update the document, implementation can begin.

 

Related content

Asynchronous Task Processing Architecture V2
Asynchronous Task Processing Architecture V2
More like this
Arch Tea Time: 2020-03-26 (Data Redundancy followup)
Arch Tea Time: 2020-03-26 (Data Redundancy followup)
More like this
2024-02-07 Data WG Meeting notes
2024-02-07 Data WG Meeting notes
More like this
Large Instance Meeting Notes 18.04.2023
Large Instance Meeting Notes 18.04.2023
More like this
Arch Lunch: 2020-02-20
Arch Lunch: 2020-02-20
More like this
Architecture Challenges (2017-2018)
Architecture Challenges (2017-2018)
More like this