XQueue/XQWatcher Architecture

See also:

TL;DR

  1. XQueue is a simple micro-service that sits between LMS and external task processing services (generally, “pull graders”).

  2. LMS submits grading requests to XQueue. XQueue holds on to each submission a relevant pull grader asynchronously picks it, executes it, and pushes the result back to XQueue. XQueue then pushes the result back to LMS.

  3. XQWatcher is edX’s canonical implementation of a pull grader, although other pull grader implementations can and do exist.

Role

XQueue is an asynchronous task processing service that currently handles three types of LMS tasks, in order of most to least utilized:

  1. Grading requests for CodeResponse problems

  2. Input validation requests for MatlabInput problems (similar to grading requests, except these aren't actually graded)

  3. PDF certificate generation requests for the Certificates app (deprecated - in the process of removing)

Its primary use case by far, and the one for which it was designed, is code grading or "autograding" (#1). In general, each CodeResponse problem will have a different grading backend and subsequently a separate queue, so it's possible for XQueue to end up with many "queues under management."

Design

Dependencies

XQueue, of course, receives all of its tasks from the LMS, and sends all results back to the LMS.

XQueue also requires that, for each active task queue, some service exists that is generally available to process and remove tasks from the queue. In the event of a task consumer outage, XQueue will happily hold onto tasks indefinitely, allowing task consumers to resume processing once they're back online. That said, there is the obvious risk of XQueue filling up its queue capacity either during consumer downtime or if a consumer is simply too slow.

Technology stack

Under the hood, XQueue is a Python Django web application with whatever relational database (e.g. MySQL) and file storage backends (e.g. Amazon S3) you like to use with Django. (File storage is used for message payloads.)

XQueue uses its relational DB to store metadata about queued submissions - the actual submissions are stored in S3.

Summary diagram


Authentication

XQueue has a separate user model that has no connection to edx-platform users. Service accounts must be created directly with XQueue, and clients must use basic (session) auth to communicate

API Endpoints

There are three groups of HTTP web APIs exposed by XQueue, all of which communicate using JSON.

LMS-facing

  • /xqueue/submit

    • Receive a new task

    • Allowed methods: POST

    • Format: must have three keys defined in a JSON object in the POST body.

      • lms_callback_url: the exact location where XQueue should POST the final result of a task

      • lms_key: an opaque identifier that the caller wants returned in the final callback

      • queue_name

    • Any files included in the POST will be dumped into XQueue's configured file storage system.

External task processor -facing

  • /xqueue/get_queuelen?queue_name={queue}

    • Get the number of unprocessed tasks in the given queue.

    • Allowed methods: GET

  • /xqueue/get_submission?queue_name={queue}

    • Retrieves a single submission from the given queue. Contains a unique submission identifier. 

    • Allowed methods: GET

  • /xqueue/put_result

    • Pushes the result of a completed task back to XQueue.

    • Allowed methods: POST

Shared

  • /xqueue/login

  • /xqueue/logout

  • /xqueue/status

Producers

The edx-platform codebase contains an XQueue driver defined within CAPA (/common/lib/capa/capa/xqueue_interface.py) that interacts directly with the XQueue LMS endpoints. The CodeGrader and MatlabInput tasks are part of the CAPA codebase and directly use this driver.

The unique key generated by the LMS, in order to correlate each submission with a given activity, is a hash of course, user, usage key, and the current time.

Certificate generation and cert updates also use the CAPA driver directly (/lms/djangoapps/certificates/queue.py).

Consumers

XQueue was designed to handle two patterns of task consumers, pull and push. Push consumers have since been deprecated and removed due to the negative performance & scalability implications they had on XQueue.

Pull

Also called "active," these graders are constantly polling XQueue to see if there are new submissions to be processed. XQWatcher is a pull-grader implementation. We recommend using pull graders where possible.

The below diagram illustrates a pull request flow.


Push (Removed)

Also called "passive," these graders exposed a web interface that XQueue called when a new submission was available for processing. XServer was a push-grader implementation. We moved away from the use of push graders in favor of more general-purpose XBlock integrations (e.g. an LTI consumer).

The below diagram illustrates a push request flow.


XQueue-Watcher

XQueue-Watcher (a.k.a. xqueue-wather or just xqwatcher) is edX’s implementation of the XQueue pull-grader interface. It is not the only implementation of said pull-grader interface: there are different institutions that maintain their own forks of XQWatcher or have even written their own pull-grader implementations. However, XQWatcher is the only pull-grader that we maintain.

XQWatcher is a Flask-based Python application that polls XQueue for code submissions, and then executes those submissions against course-team authored graders, which run the student submission against a series of tests, and compares the result to the result obtained by running the same suite of tests against a course-team-authored exemplar submission. A single XQWatcher instance will run multiple threads that all listen to a single XQueue queue.  Each thread will wait for a single submission on the queue, respond to it, and then wait for another submission. Thus the number of threads limits how many submissions can be graded at once.  Another result of this design is that a separate instance of XQueueWatcher needs to be deployed for each course, with a configuration specific to that course.

Graders are defined in a course-team-owned git repository, named in the deploy-time configuration for the XQueueWatcher instance.  Each problem has its own grader which lives in a separate directory.  This directory also contains an answer.py file which contains the course-team authored exemplar submission.

Incoming requests will contain the learner's code submission as well as a grader payload, specified by the course team as part of the CAPA code_response XBlock.  The grader payload comprises a JSON object with the key "grader", whose value specifies the path to a python file within the course team's grader repo which contains the code that specifies a series of tests to run against the submission - some by static analysis of the student's submission, and some by executing the code in a sandboxed environment, managed by CodeJail.

Code execution is managed by the JailedGrader class, which marshalls together the student submission (saved as submission.py), the problem's grader, and harness code that facilitates the authoring of grader tests.  The student submission will get graded, and its results compared to the results of running the same suite of tests against the course-team authored answer.py.

Future state / wish list

LMS

  • Stop generating PDF certificates in favor of Web certificates. Easier said than done but this is absolutely the direction we need to go.

  • Decrease coupling: pull the LMS XQueue interface out of CAPA, or at least remove the code dependency between Certificates and CAPA.

  • Convert all hard-coded queue names in edx-platform to be configurable. You never know when this could become useful...

XQueue Core

  • Move away from the push grader model. XQueue should really only exist to handle asynchronous communication with pull graders. For push graders, synchronous communication makes more sense and could be done using a separate XBlock or an LTI tool. (This includes the MatlabInput use case as well, as that's effectively a push grader.)

    • UPDATE: Push grading has been deprecated.

  • Remove get_queuelen. I know it's nice to have a "peek" operation but it's not really that interesting. Even XQWatcher doesn't bother with it and instead just tries repeatedly to get_submission until successful.

  • Bulk get submissions for parallel grading (and bulk push back in). Jobs typically come in waves.

  • Switch from using a separate user store and session auth, to using the edx-platform authentication backend and signed JSON Web Tokens for authorization.

    • This would require that each queue is mapped to a whitelist of allowed producers and consumers (user authorization), and those clients would need to be appropriately scoped (app authorization).

    • Possible scopes (just brainstorming): "xqueue:submit" (basically just the LMS), "xqueue:score" (all grader clients). "read/write" and "pull/push" don't really make sense as all XQueue clients need to both add and remove from queues. If we want to keep it simple, we could just have an "xqueue" scope.

Task processors

  • Consider modifying XQWatcher to have a "slow" poll when the queue is empty, and a "fast" poll when it the queue is probably not empty (as in, if the last poll was successful).

  • XQWatcher is more of an instance of a code grader than a code grader framework. A framework/client library may speed adoption of external graders.

Resources

Source code

Diagram source

DSL for  :

Pull Flow

title Pull Flow participant LMS participant XQueue participant XQWatcher XQWatcher-->XQueue: Poll for submissions note left of LMS: Initiate a request\n(asynchronously) LMS->XQueue: Submit request to queue XQWatcher-->XQueue: Poll for submissions XQWatcher<->XQueue: Get next submission note right of XQWatcher: Do some work\n(e.g. grade code) XQWatcher->XQueue: Push result back XQueue->LMS: Send result via callback note left of LMS: Display result XQWatcher-->XQueue: Poll for submissions

Push Flow

title Push Flow participant LMS participant XQueue participant XServer note left of LMS: Initiate a request\n(asynchronously) LMS->XQueue: Submit request to queue XQueue->XServer: Submit to grader\n(blocking) note right of XServer: Do some work\n(e.g. grade code) XServer->XQueue: Respond with result XQueue->LMS: Send result via callback note left of LMS: Display result