Grades backfill execution plan


Pre-execution

  1. Land Cliff's PR: https://github.com/edx/edx-platform/pull/14925
  2. Ops – Upgrade rabbit servers.  OPS-1837 - Getting issue details... STATUS   OPS-1838 - Getting issue details... STATUS   OPS-1835 - Getting issue details... STATUS
  3. Create new celery queue / routing key in edx-internal TNL-6875 - Getting issue details... STATUS
    1. https://github.com/edx/edx-internal/pull/50
    2. Update jenkins job to reference --routing_key
  4. Enable persistent grades on edge. ( TNL-6626 - Getting issue details... STATUS

Backfill Grades

  1. Enable write_only_if_engaged waffle switch on all environments (prod, stage, edge, load test). - ticket
  2. Slow delete persistent grades tables (34 million rows in prod). ~5-8 hours expected - Devops ticket
  3. Export list of courses (per environment) from course overviews table on read replica or call Courses API.  (Note: Do not start this until tables have been deleted. - ticket

Stage TNL-6874 - Getting issue details... STATUS

  1. Delete persistent grades tables.
  2. Run against demo course, with 100 users per batch.
  3. Delete persistent grades tables again.
  4. Run against demo course, with 1000 users per batch.
  5. Run against all other courses on stage, using 1000 users per batch.
  6. Decide if we're happy with batching and worker count.

Edge  TNL-6876 - Getting issue details... STATUS

  1. Redo batch size testing, if stage didn't have enough data for useful metrics.
  2. Using batch size from batch size test here or or stage, run against 10 courses
  3. Run against 100 courses
  4. Run against all courses.

Production TNL-6877 - Getting issue details... STATUS

  1. Use batch size from edge
  2. Run against demo course
  3. Run against 10 courses
  4. Add extra celery worker
  5. Run against 10 courses
  6. Run against 100 courses
  7. Run against 200 more courses (optional)
  8. Run against all courses

Load test

  1. Can we set up a jenkins job against this environment, or should we just run from command line? Kevin Falcone (Deactivated)?
  2. Run against all courses when environment is idle.

Notes:

  • Use Jenkins management command for all test runs, setting course ids and batch size in config model.
  • Watch celery logs to ensure tasks are being picked up and run smoothly.
  • Watch new relic to make sure we aren't negatively affecting site performance.