Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The SRE Team is tracking each stage, prod and edge database upgrade under this Jira epic: https://openedx.atlassian.net/browse/PSRE-301

  • As each of our services have different requirements (owners, data pipelines), each database upgrade may have specific steps required.

  • Each of the services in the diagram below (except Forums, Prospectus & Marketing), has a MySQL database that will be upgraded to Aurora 5.7, in addition to the service, our data pipelines also connect to these databases so special care should be taken to ensure those pipelines and the downstream reports are not impacted by the upgrades.

  • At least the LMS, Ecommerce, Discovery, License Manager, Enterprise Catalog, Demographics and Credentials services also have DBT generated Snowflake views described here, please be sure to sync with DE about requirements before doing each cutover.

  • At least LMS and Ecommerce have Jenkins EMR jobs that pull data into Verticaand/or Swoop jobs, these jobs should be checked carefully before and after each respective cutover.

...

  1. Notes (SRE) - Done on 10/23 (not on Aurora)

  2. Discovery & Ecommerce (Engagement) (not on Aurora)

    1. Discovery - Done on 10/29

    2. Ecommerce - Done on 11/17 @ 10pm

  3. Credentials and Demographics (Aperture)

    1. Credentials - Done on 11/17 @ 3pm

    2. Demographics - Done on 12/7 @ 10am

  4. Registrar and Portal Designer (Masters)

    1. Registrar Target - Done on 12/2 @ 10am

    2. Designer Target - Done 12/9 @ 10am

  5. Analytics API and Insights (DE)

    1. Prod Analytics API and Analytics Data - Done 12/17 @ 10am-12pm ET

    2. Edge Analytics - Done for 12/18 @ 10am-12:11pm

  6. XQueue (T&L)

    1. Move Stage xqueue schema from edxapp db to shared db Done

    2. Prod Xqueue Target -Done on 1/5 @ 10am

  7. License Manager, Enterprise Catalog, Blockstore, Video Encode Manager (Enterprise, T&L, Incident Management)

    1. Shared Cluster Target - Tentatively scheduled for between 1/14 and 1/26- Done

  8. Platform (Arch + TNL):

    1. Stage edxapp Target - Tentatively scheduled for Done on 1/25

    2. Edge edxapp Target - Tentatively scheduled for Done on 1/27

    3. Prod csmhe & edxapp Target - TBDDone on 2/2

Communications Plan

Note: These steps may be followed twice if we do a segmented upgrade to support TLS 1.2 first on 5.6.48 before going to 5.7 (e.g for notes)

...

Fred Smith (Deactivated) I just realized this row hasn’t been added, do you know if any work needs to be done for License Manager

Service

Business and Technical Owners (see list here)

Current MySQL Versions (best effort)

In Open edX?

5.7 DevStack Upgrade

Due November 9th if in Open edX, December 15th if not.

5.7 Travis Upgrade

Due November 9th, December 15th if not.

5.7 Sandbox Upgrade

Planned 5.7 Stage Upgrade

Due December 15

Planned 5.7 Prod Upgrade (+ Any Read Replicas)

Due Jan 1

Planned 5.7 Edge Upgrade

Maintenance Mode / Smoke Test Docs

Platform (LMS + CMS) + csmhe

Arch -
B&T:Nimisha Asthagiri (Deactivated)

cc Jeremy Bowman (Deactivated)

Stage: 5.7
Prod: 5.7

Yes

https://openedx.atlassian.net/browse/BOM-2059 - Muhammad Arif (Deactivated)

Upgraded on 10/29

Tests run on Jenkins instead of Travis

https://openedx.atlassian.net/browse/BOM-2059

Upgraded on 10/29

SRE will create a test sandbox once https://openedx.atlassian.net/browse/BOM-2059 is done.

Muhammad Nadeem Shahzad do you think we are ready to create a test sandbox? or did we already fo this after the edx-platform changes?

https://openedx.atlassian.net/browse/PSRE-309

(dedicated aurora)
https://openedx.atlassian.net/browse/PSRE-329 is done.

Upgraded on 01/26 @ 1 AM

CNAME: stage-edx-edxapp.rds.edx.org

Coordinate with Jeremy/Nimisha/Sarina/Stu/Feanil

Note: There are two databases.

(dedicated aurora)
https://openedx.atlassian.net/browse/PSRE-332
dev by SRE, coordinate with Jeremy

Completed on 2/2

Coordinate with Jeremy/Nimisha/Sarina/Stu/Feanil

Completed on 1/27

Maintenance Mode: See Runbook.

Smoke Test:

edxapp + csmhe Smoke Tests Run Book cc Sarina Canelake (Do Not Use) (Deactivated)

eCommerce

Engagement -
T:Emma Green (Deactivated)
B:Seth McCann

cc #revenue

Stage: 5.7
Prod: 5.7.31

Yes

https://github.com/edx/devstack/pull/639
https://github.com/edx/configuration/pull/6066 - need review by Emma Green (Deactivated) / Ben Holt (Deactivated)

https://openedx.atlassian.net/browse/PSRE-314

Travis File

https://github.com/edx/ecommerce/pull/3210

https://openedx.atlassian.net/browse/REV-1568, now ecom team helping via https://openedx.atlassian.net/browse/REV-1568

https://openedx.atlassian.net/browse/PSRE-309

(dedicated aurora)
https://openedx.atlassian.net/browse/PSRE-305
Upgraded on 11/17 @ 11am

(dedicated aurora)
https://openedx.atlassian.net/browse/PSRE-368
Upgraded on 11/17 @ 10pm

Not applicable (ecommerce not in Edge)

Maintenance Mode: 1: Disable ASG in Asgard
2: Alter the payment MFE page rule to point at something else (Emma Green (Deactivated) would Purchase be able to put up a PR that we can merge and revert to put up a friendly maintenance page? possibly with this: https://github.com/adinhodovic/terraform-cloudflare-maintenance )

Smoke Test: For stage SRE can run e2e tests, reach out to #revenue to have their team smoke test prod. cc Emma Green (Deactivated)

Credentials

Aperture -
T:Former user (Deleted)
B: Ryan O'Connell
eSRE:Matt Tuchfarber (Deactivated)
Oncall: TJ Tracy (Deactivated)

Stage: 5.7
Prod: 5.7

Yes


Matt Tuchfarber (Deactivated)
https://openedx.atlassian.net/browse/MICROBA-546

https://openedx.atlassian.net/browse/MICROBA-546

N/A

(shared aurora)
https://openedx.atlassian.net/browse/PSRE-304
Completed on 10/20

(dedicated aurora)
https://openedx.atlassian.net/browse/PSRE-344?atlOrigin=eyJpIjoiOWM4Nzc0MmZiY2RiNDNjNGEzYWYyYWIzMGU3ZWMzMzMiLCJwIjoiamlyYS1zbGFjay1pbnQifQ
Upgraded on 11/17 at 5pm

N/A

Maintenance Mode: Disable ASG in Asgard

Smoke Test: Adam Blackwell (Deactivated) will reach out to Matt Tuchfarber (Deactivated)

Demographics

Aperture -
T:Former user (Deleted)
B: Ryan O'Connell
eSRE:Matt Tuchfarber (Deactivated)

Stage: 5.7
Prod: 5.7

No

Done as part of https://github.com/edx/demographics/pull/80/files

N/A

Adam Blackwell (Deactivated) ask Matt Tuchfarber (Deactivated) if demographics runs in sandbox.

(dedicated aurora)
https://openedx.atlassian.net/browse/PSRE-410
Completed on 11/24

(dedicated aurora)
https://openedx.atlassian.net/browse/PSRE-412Upgraded on 12/7 at 10am

N/A

Maintenance Mode: Disable in ArgoCD

Smoke Test: Adam Blackwell (Deactivated) will reach out to Matt Tuchfarber (Deactivated)

Notes

SRE -
B&T:Bill DeRusha (Deactivated)

Stage: 5.7
Prod: 5.7
Edge: 5.7

Yes

Adam Blackwell (Deactivated)

https://github.com/edx/devstack/pull/629
https://openedx.atlassian.net/browse/PSRE-311

Travis File
https://github.com/edx/edx-notes-api/pull/213
https://openedx.atlassian.net/browse/PSRE-311

https://openedx.atlassian.net/browse/PSRE-309

(shared aurora)
https://openedx.atlassian.net/browse/PSRE-304

(dedicated mysql)
https://openedx.atlassian.net/browse/PSRE-333
(we moved to 5.6.48 on 10/20 to unblock Ubuntu 20.04 work, this involved ~5m of reduced functionality)
5.7 upgrade completed on 10/23

(dedicated mysql)
https://openedx.atlassian.net/browse/PSRE-333
(we want to move to 5.6.48 to unblock Ubuntu 20.04 work)
5.7 upgrade completed on 10/23

Maintenance Mode: We can’t currently put just notes in maintenance, but the LMS fails gracefully for users and we can use Cloudflare.

Smoke Test: Log into LMS, go to a course with notes enabled, add a note with a tag, check that it’s in the notes tab, delete a note.

Xqueue

T&L -
T:Nimisha Asthagiri (Deactivated)
B:Marco Morales (Deactivated)

Jeremy Bowman (Deactivated)

Stage: 5.7
Prod: 5.7

Yes

Upgraded as of https://openedx.atlassian.net/browse/PSRE-322)

UPGRADED 11/19

Travis File

https://openedx.atlassian.net/browse/PSRE-322 https://github.com/edx/xqueue/pull/783

UPGRADED 11/19

(edxapp aurora)
https://openedx.atlassian.net/browse/PSRE-329

(dedicated mysql)
TODO: Find or create ticket
TBD

Will be done by SRE team, need coordination with Feanil Patel

N/A

Check with Feanil Patel

Maintenance Mode: Disable ASG in Asgard

Smoke Test:
Try to do something that gets graded and look for it in flower

Registrar & Workers

Programs -
T:Simon Chen
B: Deen Abdul-Hathi (Deactivated)

Stage: 5.7
Prod: 5.7

No

Upgraded as of

https://openedx.atlassian.net/browse/MST-389

No Travis Docker Tests

https://openedx.atlassian.net/browse/PSRE-309

(shared aurora)
https://openedx.atlassian.net/browse/PSRE-304

(dedicated mysql)
https://openedx.atlassian.net/browse/PSRE-345
Upgraded on 12/2 @ 10am

Endpoint for Stitch: prod-edx-registrar-003.cluster-ciqreuddjk02.us-east-1.rds.amazonaws.com

CNAME: prod-edx-registrar.rds.edx.org

N/A

Maintenance Mode: Disable in Asgard

Smoke Test: https://openedx.atlassian.net/wiki/x/bYCCeQ

Enterprise Catalog

Enterprise -
T:Brittney Exline (Deactivated)
B: Joe Cassaro
eSRE:Brandon Baker (Deactivated)

Stage: 5.7
Prod: 5.7

Yes

Upgraded as of

https://openedx.atlassian.net/browse/ENT-3316

No Travis Docker Tests

https://openedx.atlassian.net/browse/PSRE-309

(shared aurora)
https://openedx.atlassian.net/browse/PSRE-304
Completed

(shared aurora)
https://openedx.atlassian.net/browse/PSRE-308
Completed

N/A

Maintenance Mode: Disable in Asgard

Smoke Test: Ask Brandon Baker (Deactivated)

License Manager

Enterprise -
T:Brittney Exline (Deactivated)
B: Joe Cassaro
eSRE:Brandon Baker (Deactivated)

Stage: 5.7
Prod: 5.7

Yes, for now

Upgraded as of

https://openedx.atlassian.net/browse/ENT-3317

No Travis or CircleCI

(shared aurora)
https://openedx.atlassian.net/browse/PSRE-304
Completed

(shared aurora)
https://openedx.atlassian.net/browse/PSRE-308
Completed

N/A

Maintenance Mode: Disable in Asgard

Smoke Test:
Ask Brandon Baker (Deactivated)

Discovery

Engage -
T: Jason Myatt (Deactivated)
B: Kaitlin Ahern (Deactivated)

cc #discovery

Stage: 5.7
Prod: 5.7

Yes

Upgraded as of

https://openedx.atlassian.net/browse/DISCO-1675

Upgraded Travis as of

https://openedx.atlassian.net/browse/PSRE-309

(shared aurora)
https://openedx.atlassian.net/browse/PSRE-304

(dedicated mysql)
https://openedx.atlassian.net/browse/PSRE-336
Completed 10/29 @ 3pm

N/A

Portal Designer

Masters -
T:Simon Chen
B:
eSRE: Matt Hughes (Deactivated)?

Stage: 5.7
Prod: 5.7

No

Upgraded local compose file as of
Alison Langston
https://openedx.atlassian.net/browse/MST-390

No Travis or CircleCI

https://openedx.atlassian.net/browse/PSRE-309

(shared aurora)
https://openedx.atlassian.net/browse/PSRE-304

(dedicated aurora)
https://openedx.atlassian.net/browse/PSRE-413
Completed on 12/9 @ 10am

N/A

Maintenance Mode: Disable in Asgard

Smoke Tests: Designer Smoke Tests Run Book

Blockstore

T&L -
B:Marco Morales (Deactivated)

Stage: 5.7
Prod: 5.7

Yes

https://openedx.atlassian.net/browse/TNL-7642

https://openedx.atlassian.net/browse/OSPR-5101

Done 11/6

CircleCI file

https://openedx.atlassian.net/browse/TNL-7642 Done 11/6

https://openedx.atlassian.net/browse/OSPR-5101

N/A

(shared cluster)

(shared cluster)
https://openedx.atlassian.net/browse/PSRE-307
Completed

N/A

Analytics Pipeline
(may also include https://github.com/edx/edx-analytics-configuration )

Data Engineering - T:Brian Beggs

N/A

Yes

https://openedx.atlassian.net/browse/DEPR-119

https://openedx.atlassian.net/browse/DEPR-119

N/A

prod-edx-analyticsapi-data.rds.edx.org(pipeline001) & prod-edx-analyticsapi-data-readonly.rds.edx.org (not in use)

Analytics API

Data Engineering - T:Brian Beggs

Stage: 5.7
Prod: 5.7
Edge: N/A

Yes

N/A (According to Brian Beggs , insights and analytics-api do not have a devstack.)

Travis doesn’t use MySQL

Doesn’t have sandboxes.

(shared aurora) https://openedx.atlassian.net/browse/PSRE-304

No action items

(analytics aurora)
https://openedx.atlassian.net/browse/PSRE-335Upgraded on 12/17 @ 10am ET

CNAMES:
prod-edx-analyticsapi-django.rds.edx.org

(analytics mysql)
https://openedx.atlassian.net/browse/PSRE-408
Upgraded on 12/18 @ 10am ET

prod-edge-analyticsapi.rds.edx.org

Maintenance Mode: Brian Beggs will check with his team + Asgard

Smoke Test: Download button in Insights

Insights and/or Analytics Dashboard

DB Name:
analytics-report + analytics-report-replica-001

Data Engineering - T:Brian Beggs

Stage: 5.7
Prod: 5.7
Edge: 5.7

Yes

N/A (According to Brian Beggs , insights and analytics-api do not have a devstack.)

Travis doesn’t use MySQL

Doesn’t have sandboxes.

(shared aurora) https://openedx.atlassian.net/browse/PSRE-304

No action items

(analytics aurora)
https://openedx.atlassian.net/browse/PSRE-335 Upgraded on 12/17 @ 10am ET

CNAMES:
prod-edx-analyticsapi-django.rds.edx.org, prod-edx-analyticsapi-data.rds.edx.org(reports001) & prod-edx-analyticsapi-data-readonly.rds.edx.org (not in use)

(analytics mysql)
https://openedx.atlassian.net/browse/PSRE-408
(we moved to 5.6.48 on 10/21 which involved 7 hours of unexpected downtime)
Upgraded on 12/18 @ 10am ET

prod-edge-analyticsapi.rds.edx.org

Maintenance Mode: Brian Beggs will check with his team + Asgard

Smoke Test: Download button in Insights

Notifier

ensure retirement


Simon Chen , Julie Davis (Deactivated) - repo ownership

Jeremy Bowman (Deactivated) - for deprecation

N/A

No

Done 11/13

https://openedx.atlassian.net/browse/EDUCATOR-5378 - closed in favor of https://openedx.atlassian.net/browse/DEPR-106?searchSessionId=4511027c-eb42-4c57-b290-aaec5aaf03fa&searchObjectId=186639&searchContainerId=17023&searchContentType=issue

No Travis Docker or CircleCI Tests

N/A

https://openedx.atlassian.net/browse/DEPR-106?searchSessionId=4511027c-eb42-4c57-b290-aaec5aaf03fa&searchObjectId=186639&searchContainerId=17023&searchContentType=issue

https://openedx.atlassian.net/browse/DEPR-106?searchSessionId=4511027c-eb42-4c57-b290-aaec5aaf03fa&searchObjectId=186639&searchContainerId=17023&searchContentType=issue

N/A

Video Encode Manager

Kashif Chaudhry (Unlicensed) , Dawoud Sheraz

Stage: 5.7
Prod: 5.7

Yes

Not in DevStack: https://openedx.atlassian.net/browse/PSRE-371

N/A

N/A

(shared cluster)

(shared cluster)
https://openedx.atlassian.net/browse/PSRE-307

Completed

N/A

Kashif Chaudhry (Unlicensed) Are you aware of how SRE can smoketest this service after we complete this database upgrade?

VEM Smoke Tests

Licence Manager

Stage: 5.7
Prod: 5.7

Yes

N/A

(shared cluster)

(shared cluster)
https://openedx.atlassian.net/browse/PSRE-307

Completed

N/A

Taxonomy

ensure retirement

Website:
T:Albemarle (Deactivated)

N/A

No

Due December 15

Will be retired before by then - https://openedx.atlassian.net/browse/WS-1413
https://github.com/edx/taxonomy/pull/11Due December 15

Will be retired before then -https://openedx.atlassian.net/browse/WS-1413

N/A

Needs to be removed from shared

Needs to be removed from shared

N/A

N/A

Enterprise Reporting

Markhors:
T:Mike O'Connell (Deactivated)

Stage: 5.7
Prod: 5.7

No

N/A

N/A

N/A

stage-edx-enterprise-reporting

prod-edx-enterprise-reporting

N/A

N/A

...