...
- All prod mongos are currently 3.0
- Devstack is still running 2.6
- Odd that there is still a 2.6 package but no 3.0
- Kevin outlined an upgrade process as such:
- Testing in devstack
- Testing in Jenkins
- Testing the following steps on a load test cluster during a test, then doing them on the prod cluster:
- Replace the hidden secondary mongo 3.2
- Replace each secondary mongo with 3.2
- Fail over the primary to a secondary and replace with a 3.2 instance
- Repeat with 3.4 if desired
- Upgrade process is somewhat complicated by the size of the database (~1tb, ~600 megs ~1.3tb, ~650 gigs on disk)
- May need to do a pymongo upgrade to support upgrading to 3.2 or 3.4
- BMez tests on devstack showed no failures on a naive upgrade to 3.4 without a pymongo upgrade
- Doesn't mean it wouldn't break in other, more complicated environments
- Discussion of the potential value of trimming the data before moving
- Deleting orphaned nodes
- Fun problem, but complicated and potentially dangerous from a data loss and database performance perspective
- Some work has been done on this by Ed?
- Unknown priority
- Pruning course history to only keep X versions
- Some work may has been done?
- Unknown priority
- No SLA for number of versions to keep
- No user-facing tools even exist to roll back versions (management command occasionally used)
- Tools may get built soon, though, with new teams?
- Potentially irritating to users, potential to cause prod database problems
- Splitting forums to a separate cluster
- Only about 10% of the database size
- Expensive to run a 2nd cluster
- Maybe worth it if the alternative is changes to forums to support 3.2 or 3.4?
- Deleting orphaned nodes
- Depending on the savings we can get in time of the upgrades it might be worth doing some of this work sooner
- 8 different ~1TB upgrades is a loooooong maintenance window (should be a 0 downtime, but degraded performance window. 11 hours per sync right now)
- Being in an asymmetric state during this window increases risk if something goes wrong over that time (week or more?)
- 8 different ~1TB upgrades is a loooooong maintenance window (should be a 0 downtime, but degraded performance window. 11 hours per sync right now)