Info |
---|
For more general architectural details: XQueue Architecture |
Table of Contents |
---|
Development
So you need to develop/troubleshoot XQueue and its external grading? This page contains helpful details.
xqueue
A devstack xqueue Docker container is available, which supports the same development as other Python/Django IDAs.
To run xqueue in devstack, use: make dev.up.xqueue
xqueue-watcher
Local Development
Here’s the steps required to get a functional xqueue-watcher running locally (without codejail):
Start up the xqueue container in devstack.
make dev.up.xqueue
Run xqueue migrations to create DB tables.
Code Block language text make dev.shell.xqueue Then: # source ../xqueue_env # python manage.py migrate
Create a Django user to use for xqueue-watcher authentication into XQueue.
Code Block language text make dev.shell.xqueue Then: # source ../xqueue_env # python manage.py shell Then: >>> from django.contrib.auth.models import User >>> user = User.objects.create_user('lms', 'test@example.com', 'lms') >>> user.save()
Clone the xqueue-watcher repo.
git clone https://github.com/edx/xqueue-watcher.git
Create a Python 3.8 virtualenv.
Install the xqueue-watcher requirements.
pip install -r requirements/production.txt
Create a directory structure that mirrors the one below:
Code Block language text . └── root ├── 600 ├── config │ ├── conf.d │ │ └── 600.json │ └── logging.json └── xqueue-watcher ├── AUTHORS ├── LICENSE.TXT ├── Makefile ├── README.md ├── conf.d ├── coverage.xml ├── grader_support ├── load_test ├── openedx.yaml ├── requirements ├── setup.py ├── tests └── xqueue_watcher
Now, run the xqueue-watcher:
python -m xqueue_watcher -d ../config
Troubleshooting
If the xqueue-watcher doesn’t connect successfully, look at its debug logs to see why. The logs are output to stdout when bringing up xqueue-watcher.
To view the xqueue logs and look for errors/activity, run this devstack make command:
make dev.logs.xqueue
Suggested Future Development Work
Developing graders or fixes to xqueue-watcher is not easy! Here’s some suggested items to make things easier:
Provide a Docker devstack xqueue-watcher container
This container would be built with:
the same Python virtualenvs used in production
the same codejail environments (using AppArmor)
the ability to use arbitrary grading repos/code in the container
Make xqueue-watcher a Python module
Currently, several xqueue-watcher files are duplicated across grader repos - which is the very problem that Python modules are made to solve.
Have each grader install the xqueue-watcher Python module and derive its graders from the common code.
Would include a script/method for testing all grader code written by a course team.
Accepts a configuration file which specifies all tests to run.
Code to grade along with the expected results.
Define a clear best practice for writing graders.
Create an edX-written grader repo which demonstrates all the best practices.
Operations
So how is xqueue/xqueue-watcher run in production?
XQueue
Production Environment
prod-edx
There’s an XQueue autoscaling group (ASG) used for prod-edx with a single MySQL instance used.
...
prod-edge
No separate XQueue resources are allocated for prod-edge. The prod-edx XQueue instance is used for all prod-edge courses which use external graders. The recent prod-edge submissions can be seen on the read-replica’s XQueue DB with this SQL:
select queue_name, lms_callback_url, count(*) from queue_submission where lms_callback_url not like 'https://courses.edx.org%' group by 1, 2 order by 1, 2;
stage-edx
There’s a stage ASG for XQueue used to test changes in the stage-edx environment. These tests only test a basic round-trip with minimal grading. No current way to test the course authors' grading code currently exists on stage-edx.
...
Deployment
XQueue is deployed using GoCD. Its pipelines are here:
https://gocd.tools.edx.org/go/pipelines#!/xqueue
Monitoring
NewRelic is used to capture some XQueue performance data. High error rates and low apdex scores trigger NewRelic alerts that currently go to DevOps:
XQueue high queue depths are apparently alerted on here (I don’t have access):
XQueue queue depths are in Cloudwatch and can be viewed here.
There are several different Cloudwatch alerts associated with different course queue depths. Here’s an example one for MITx-6.00X.
These depths are sent to Cloudwatch via a Jenkins job which runs a Django management command here: https://github.com/edx/xqueue/blob/780c758d5b080a4f3fde84da6e224fa98f4d21f7/submission_queue/management/commands/count_queued_submissions.py
The Jenkins job configuration section is here: https://github.com/edx/edx-internal/blob/master/ansible/vars/prod-edx.yml#L866-L884
Splunk can be used to search for XQueue requests/errors - with this query:
index=prod-edx source="/edx/var/log/supervisor/xqueue-stderr.log" <search_term>
Course Usage Statistics
A common question is “What courses currently have XQueue-graded problems?” There are a couple of ways to determine the answer.
DB Submissions Queues
Using the queries described below in the XQueue Database SQL, one can determine which submission queues are in existence and which ones have had recent submissions. However, XQueue submission queue names do not easily map back to course names, as the full course run key isn’t contained in each queue name. So this method doesn’t provide the full answer.
...
Coursegraph Queries
All of the prod-edx edx.org course data is scraped from the modulestore/MongoDB regularly and placed into a neo4j DB instance here: https://coursegraph.edx.org/browser/
...
https://docs.google.com/spreadsheets/d/1EEdk6FDGOi6MWZP5yeIEZHptEYMIeNVhsyuE-Lldugo/edit?usp=sharing
XQueue Database
The XQueue Django IDA has a single model - submission. The XQueue database can be viewed via the read replica using /edx/bin/prod-edx-xqueue-mysql-iam-auth.sh
:
...
https://tools-edx-jenkins.edx.org/job/xqueue/job/prod-edx-delete_old_submissions/
Useful SQL
The single interesting non-Django table is queue_submission. Here’s various queries to view the queued submissions:
Code Block | ||
---|---|---|
| ||
mysql> describe queue_submission; +------------------+---------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------------+---------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | requester_id | varchar(128) | NO | | NULL | | | queue_name | varchar(128) | NO | MUL | NULL | | | xqueue_header | varchar(1024) | NO | | NULL | | | xqueue_body | longtext | NO | | NULL | | | s3_keys | varchar(1024) | NO | | NULL | | | s3_urls | varchar(1024) | NO | | NULL | | | arrival_time | datetime | NO | | NULL | | | pull_time | datetime | YES | | NULL | | | push_time | datetime | YES | | NULL | | | return_time | datetime | YES | | NULL | | | grader_id | varchar(128) | NO | | NULL | | | pullkey | varchar(128) | NO | | NULL | | | grader_reply | longtext | NO | | NULL | | | num_failures | int(11) | NO | | NULL | | | lms_ack | tinyint(1) | NO | | NULL | | | lms_callback_url | varchar(128) | NO | MUL | NULL | | | retired | tinyint(1) | NO | MUL | NULL | | +------------------+---------------+------+-----+---------+----------------+ -- The DB submission queue length goes up and down over time. -- Completed submissions are marked as "retired" and deleted after a few weeks. mysql> select count(*) from queue_submission; +----------+ | count(*) | +----------+ | 102885 | +----------+ -- Shows all the submission queues and how many submissions are in each queue. mysql> select queue_name, count(*) from queue_submission group by 1 order by 2 desc; -- Shows examples of S3 bucket locations where submissions have been stored. mysql> select queue_name, s3_keys, s3_urls from queue_submission where char_length(s3_keys) > 3 limit 10; |
xqueue-watcher
Production Environment
prod-edx
The EC2 instances running xqueue-watcher graders in production can be found here:
...
stage-edx
The EC2 instances running xqueue-watcher in stage-edx for simple smoke tests can be found here:
...
Deployment
xqueue-watcher is built and deployed using GoCD. Its pipelines are here:
...
Note |
---|
WARNING: Currently, whenever the stage xqueue-watcher is deployed via GoCD, the production xqueue-watcher is also built and deployed - without waiting for a verification step on stage-edx. |
MIT Grader Integration
There are two MIT repositories which are built into the xqueue-watcher AMIs which are deployed by edX - they are:
...
NOTE: The release
branch from each grader repository above is deployed. So any changes merged to the master
branch should be merged into the release
branch for deployment to production.
Stage XQWatcher Testing
A simple smoke-test of xqueue-watcher can be performed on stage-edx. To test xqueue-watcher, visit this course problem:
...
A submission to the second problem on the page (named pull grader debug: hello world
) has the queue Watcher-MITx-6.00x
setup for running the task. A successful submission (whether correct or incorrect) demonstrates that xqueue-watcher is operational, though it doesn’t check any problem-specific logic.
Prod XQWatcher Testing
A simple smoke-test of xqueue-watcher can also be performed on prod-edx. To test xqueue-watcher, visit this course problem:
...
As before, a submission to the second problem on the page (named pull grader debug: hello world
) has the queue Watcher-MITx-6.00x
setup for running the task. A successful submission (whether correct or incorrect) demonstrates that xqueue-watcher is operational, though it doesn’t check any problem-specific logic.
Monitoring
xqueue-watcher does not send data to New Relic.
xqueue-watcher logs are sent to Splunk. The easiest way to find them is to search for a specific queue name, such as:
MITx-6.00x
Database
xqueue-watcher does not use a database. Its state is maintained in-memory only.