Grades backfill execution plan
Pre-execution
- Land Cliff's PR: https://github.com/edx/edx-platform/pull/14925
- Ops – Upgrade rabbit servers. - OPS-1837Getting issue details... STATUS - OPS-1838Getting issue details... STATUS - OPS-1835Getting issue details... STATUS
- Create new celery queue / routing key in edx-internal
-
TNL-6875Getting issue details...
STATUS
- https://github.com/edx/edx-internal/pull/50
- Update jenkins job to reference --routing_key
- Enable persistent grades on edge. ( - TNL-6626Getting issue details... STATUS
Backfill Grades
- Enable write_only_if_engaged waffle switch on all environments (prod, stage, edge, load test). - ticket
- Slow delete persistent grades tables (34 million rows in prod). ~5-8 hours expected - Devops ticket
- Export list of courses (per environment) from course overviews table on read replica or call Courses API. (Note: Do not start this until tables have been deleted. - ticket
Stage - TNL-6874Getting issue details... STATUS
- Delete persistent grades tables.
- Run against demo course, with 100 users per batch.
- Delete persistent grades tables again.
- Run against demo course, with 1000 users per batch.
- Run against all other courses on stage, using 1000 users per batch.
- Decide if we're happy with batching and worker count.
Edge - TNL-6876Getting issue details... STATUS
- Redo batch size testing, if stage didn't have enough data for useful metrics.
- Using batch size from batch size test here or or stage, run against 10 courses
- Run against 100 courses
- Run against all courses.
Production - TNL-6877Getting issue details... STATUS
- Use batch size from edge
- Run against demo course
- Run against 10 courses
- Add extra celery worker
- Run against 10 courses
- Run against 100 courses
- Run against 200 more courses (optional)
- Run against all courses
Load test
- Can we set up a jenkins job against this environment, or should we just run from command line? Kevin Falcone (Deactivated)?
- Run against all courses when environment is idle.
Notes:
- Use Jenkins management command for all test runs, setting course ids and batch size in config model.
- Watch celery logs to ensure tasks are being picked up and run smoothly.
- Watch new relic to make sure we aren't negatively affecting site performance.