Upgrade RDS Databases from MySQL 5.6 to 5.7

This article describes the upgrade path and expectations for IDAs moving from MySQL 5.6 to 5.7

1: This work has been completed as of 2/2, however this page has been left for contextual purposes.

2: This page is public, but some of the details here pertain only to edX deployments, and many of the links are to protected pages.

Engineering Team Instructions

  1. Coordinate with product to determine an upgrade timeframe for devstack.

    1. Update the “Planned 5.7 DevStack Upgrade” column below.

    2. Please link any created tickets to https://openedx.atlassian.net/browse/OPS-4652

  2. Identify dependency changes

    1. If your app uses any python database libraries, please update these.

  3. Upgrading Devstack

    1. Find your app’s docker-compose block(s) in

    2. Update your apps container to point at mysql57 instead of mysql and update any related make commands.
      (e.g. https://github.com/edx/devstack/blob/a76042b19aeb8fc8b02d93d3db4be783a54a0be1/docker-compose-marketing-site.yml#L15 )

    3. If possible run e2e tests to see if the change causes any issues.

    4. Merge your PR and notify engineering if any manual steps are needed to recreate data locally.
      (e.g. make dev.dbcopy57.<service> , see 10/6 “Discovery switching to mysql 5.7” email thread from Mike for examples)

    5. If manual steps are needed and your service is an Open edX service, please also post to Discourse (e.g. https://discuss.openedx.org/t/discovery-switching-to-mysql-5-7/3360) and add make sure the Koa release planning notes reflect your changes.

    6. If you learned any lessons that would be valuable to other teams, please describe or link to them here.

  4. Upgrading Travis or CircleCI

    1. Make sure that whatever Travis or CircleCI tests you have which use MySQL are using a 5.7 container. (e.g. https://github.com/edx/xqueue/pull/783/files)

  5. Upgrading Sandboxes

    1. SRE will be upgrading Sandboxes before we upgrade stage with this PR: https://github.com/edx/configuration/pull/5695/files

    2. However you may need to coordinate the timing of persistent sandbox deployments.
      (e.g. partner registrar sandboxes should get rebuilt after prod gets upgraded, int sandboxes get upgraded ahead of time)

  6. Be on the lookout for any performance impacts once SRE notifies you that stage and prod are being upgraded.

    1. Note: RDS Aurora MySQL Databases will likely require downtime, but RDS MySQL databases can be upgraded without downtime.

    2. If there are downtime constraints for prod please let the SRE team know, we will be estimating the downtime and choosing which upgrade path makes the most sense and will be attempting to

 Significant Changes since 5.6

  • As of right now we only know of improvements, however, some libraries like django-mysql <1.0.6 do not support MySQL 5.7 and may need to be updated (edx-platform is on 3.0.0). We also may use some features of MySQL 5.6 that are no longer available in django-mysql or mysqlclient.

Site Reliability Team Instructions

  • The SRE Team is tracking each stage, prod and edge database upgrade under this Jira epic:

  • As each of our services have different requirements (owners, data pipelines), each database upgrade may have specific steps required.

  • Each of the services in the diagram below (except Forums, Prospectus & Marketing), has a MySQL database that will be upgraded to Aurora 5.7, in addition to the service, our data pipelines also connect to these databases so special care should be taken to ensure those pipelines and the downstream reports are not impacted by the upgrades.

  • At least the LMS, Ecommerce, Discovery, License Manager, Enterprise Catalog, Demographics and Credentials services also have DBT generated Snowflake views described here, please be sure to sync with DE about requirements before doing each cutover.

  • At least LMS and Ecommerce have Jenkins EMR and/or Swoop jobs, these jobs should be checked carefully before and after each respective cutover.

(lucid chart link)

The current proposed order for updating our databases is here (please keep this in sync with ):

  1. Notes (SRE) - Done on 10/23 (not on Aurora)

  2. Discovery & Ecommerce (Engagement) (not on Aurora)

    1. Discovery - Done on 10/29

    2. Ecommerce - Done on 11/17 @ 10pm

  3. Credentials and Demographics (Aperture)

    1. Credentials - Done on 11/17 @ 3pm

    2. Demographics - Done on 12/7 @ 10am

  4. Registrar and Portal Designer (Masters)

    1. Registrar Target - Done on 12/2 @ 10am

    2. Designer Target - Done 12/9 @ 10am

  5. Analytics API and Insights (DE)

    1. Prod Analytics API and Analytics Data - Done 12/17 @ 10am-12pm ET

    2. Edge Analytics - Done for 12/18 @ 10am-12:11pm

  6. XQueue (T&L)

    1. Move Stage xqueue schema from edxapp db to shared db Done

    2. Prod Xqueue Target -Done on 1/5 @ 10am

  7. License Manager, Enterprise Catalog, Blockstore, Video Encode Manager (Enterprise, T&L, Incident Management)

    1. Shared Cluster - Done

  8. Platform (Arch + TNL):

    1. Stage edxapp - Done on 1/25

    2. Edge edxapp - Done on 1/27

    3. Prod csmhe & edxapp - Done on 2/2

Communications Plan

Note: These steps may be followed twice if we do a segmented upgrade to support TLS 1.2 first on 5.6.48 before going to 5.7 (e.g for notes)

Stage:

  1. We will be sending out a message to #dev and @'ing @sre-team + the technical owners listed below for each stage database upgrade, if you would like to be on the cc list for a specific service please add your name to the column.

    1. At the service owner or SRE team members request we will cross post announcements to #warroom or other channels. (note, we may also pause pipelines)

  2. Once maintenance is complete & any pipelines are unpaused we will post in the #dev announcement thread with “also send to channel” checked to let engineers know that their services should be back up should notify us if they see anything unusual.

  3. At the service owners request we will communicate to a specified engineer or channel to ask them to manually run smoke tests or trigger e2e tests.

Prod & Edge:

  1. We will be coordinating any required maintenance windows with the business and technical owner for each service which will be documented in the table below and the corresponding tickets.

    1. At the service owners request we can also include external contractors who are impacted in this coordination (e.g. OpenCraft/LabXchange).

  2. At the service owners request we will post a status.edx.org maintenance window, which will usually be either 1pm-2pm or 1am-2am depending on which engineers need to be available, on status page, (e.g. https://status.edx.org/incidents/v1n8lpl6sz13 - we will not be posting to status page for edge notes)

  3. We will post an announcement in #warroom and @'ing @sre-team + @status + the service owners at the beginning of the maintenance window.

    1. At the service owners request we will cross post #warroom announcements to other channels (e.g. #partner-support)

  4. Once maintenance is complete we will perform an appropriate smoke test, update the status page.

  5. We will update the announcement thread with “also send to channel” checked to let engineers know that their services should be back up should notify us if they see anything unusual.

Read Replicas:

  1. Read Replicas (e.g. for Platform, Ecommerce & Discovery) will be handled on a case by case basis.

Timeline

Service

Business and Technical Owners (see list here)

Current MySQL Versions (best effort)

In Open edX?

5.7 DevStack Upgrade

Due November 9th if in Open edX, December 15th if not.

5.7 Travis Upgrade

Due November 9th, December 15th if not.

5.7 Sandbox Upgrade

Planned 5.7 Stage Upgrade

Due December 15

Planned 5.7 Prod Upgrade (+ Any Read Replicas)

Due Jan 1

Planned 5.7 Edge Upgrade

Maintenance Mode / Smoke Test Docs

Service

Business and Technical Owners (see list here)

Current MySQL Versions (best effort)

In Open edX?

5.7 DevStack Upgrade

Due November 9th if in Open edX, December 15th if not.

5.7 Travis Upgrade

Due November 9th, December 15th if not.

5.7 Sandbox Upgrade

Planned 5.7 Stage Upgrade

Due December 15

Planned 5.7 Prod Upgrade (+ Any Read Replicas)

Due Jan 1

Planned 5.7 Edge Upgrade

Maintenance Mode / Smoke Test Docs

Platform (LMS + CMS) + csmhe

 

Arch -
B&T:@Nimisha Asthagiri (Deactivated)

cc @Jeremy Bowman (Deactivated)

Stage: 5.7
Prod: 5.7

Yes

- @Muhammad Arif (Deactivated)

Upgraded on 10/29

Tests run on Jenkins instead of Travis

Upgraded on 10/29

SRE will create a test sandbox once is done.

@Muhammad Nadeem Shahzad do you think we are ready to create a test sandbox? or did we already fo this after the edx-platform changes?

(dedicated aurora)
is done.

Upgraded on 01/26 @ 1 AM

 

CNAME: stage-edx-edxapp.rds.edx.org

Coordinate with Jeremy/Nimisha/Sarina/Stu/Feanil

Note: There are two databases.

(dedicated aurora)

dev by SRE, coordinate with Jeremy

Completed on 2/2

Coordinate with Jeremy/Nimisha/Sarina/Stu/Feanil

Completed on 1/27

Maintenance Mode: See Runbook.

Smoke Test:

cc @Sarina Canelake (Do Not Use) (Deactivated)

eCommerce

Engagement -
T:@Emma Green (Deactivated)
B:@Seth McCann

cc #revenue

Stage: 5.7
Prod: 5.7.31

Yes


- need review by @Emma Green (Deactivated) / @Ben Holt (Deactivated)

Travis File

, now ecom team helping via

(dedicated aurora)

Upgraded on 11/17 @ 11am

(dedicated aurora)

Upgraded on 11/17 @ 10pm

Not applicable (ecommerce not in Edge)

Maintenance Mode: 1: Disable ASG in Asgard
2: Alter the payment MFE page rule to point at something else (@Emma Green (Deactivated) would Purchase be able to put up a PR that we can merge and revert to put up a friendly maintenance page? possibly with this: )

Smoke Test: For stage SRE can run e2e tests, reach out to #revenue to have their team smoke test prod. cc @Emma Green (Deactivated)

Credentials

Aperture -
T:@Former user (Deleted)
B: @Ryan O'Connell
eSRE:@Matt Tuchfarber (Deactivated)
Oncall: @TJ Tracy (Deactivated)

Stage: 5.7
Prod: 5.7

Yes

Oct 6, 2020
@Matt Tuchfarber (Deactivated)

Oct 8, 2020

N/A

(shared aurora)

Completed on 10/20

(dedicated aurora)

Upgraded on 11/17 at 5pm

N/A

Maintenance Mode: Disable ASG in Asgard

Smoke Test: @Adam Blackwell (Deactivated) will reach out to @Matt Tuchfarber (Deactivated)

Demographics

Aperture -
T:@Former user (Deleted)
B: @Ryan O'Connell
eSRE:@Matt Tuchfarber (Deactivated)

Stage: 5.7
Prod: 5.7

No

Done as part of https://github.com/edx/demographics/pull/80/files


N/A

@Adam Blackwell (Deactivated) ask @Matt Tuchfarber (Deactivated) if demographics runs in sandbox.

(dedicated aurora)

Completed on 11/24

(dedicated aurora)
Upgraded on 12/7 at 10am

N/A

Maintenance Mode: Disable in ArgoCD

Smoke Test: @Adam Blackwell (Deactivated) will reach out to @Matt Tuchfarber (Deactivated)

Notes

SRE -
B&T:@Bill DeRusha (Deactivated)

Stage: 5.7
Prod: 5.7
Edge: 5.7

Yes

@Adam Blackwell (Deactivated)


Travis File

(shared aurora)

(dedicated mysql)

(we moved to 5.6.48 on 10/20 to unblock Ubuntu 20.04 work, this involved ~5m of reduced functionality)
5.7 upgrade completed on 10/23

(dedicated mysql)

(we want to move to 5.6.48 to unblock Ubuntu 20.04 work)
5.7 upgrade completed on 10/23

Maintenance Mode: We can’t currently put just notes in maintenance, but the LMS fails gracefully for users and we can use Cloudflare.

Smoke Test: Log into LMS, go to a course with notes enabled, add a note with a tag, check that it’s in the notes tab, delete a note.

Xqueue

T&L -
T:@Nimisha Asthagiri (Deactivated)
B:@Marco Morales (Deactivated)

@Jeremy Bowman (Deactivated)

Stage: 5.7
Prod: 5.7

Yes

Upgraded as of Oct 6, 2020 )

UPGRADED 11/19

Travis File

UPGRADED 11/19

 

(edxapp aurora)

(dedicated mysql)
TODO: Find or create ticket
TBD

 

Will be done by SRE team, need coordination with @Feanil Patel

N/A

Check with @Feanil Patel

Maintenance Mode: Disable ASG in Asgard

Smoke Test:
Try to do something that gets graded and look for it in flower

Registrar & Workers

Programs -
T:@Simon Chen
B: @Deen Abdul-Hathi (Deactivated)

Stage: 5.7
Prod: 5.7

No

Upgraded as of Sep 24, 2020

No Travis Docker Tests

(shared aurora)

(dedicated mysql)

Upgraded on 12/2 @ 10am

Endpoint for Stitch: prod-edx-registrar-003.cluster-ciqreuddjk02.us-east-1.rds.amazonaws.com

CNAME: prod-edx-registrar.rds.edx.org

N/A

Maintenance Mode: Disable in Asgard

Smoke Test: https://openedx.atlassian.net/wiki/x/bYCCeQ

Enterprise Catalog

Enterprise -
T:@Brittney Exline (Deactivated)
B: @Joe Cassaro
eSRE:@Brandon Baker (Deactivated)

Stage: 5.7
Prod: 5.7

Yes

Upgraded as of Sep 28, 2020

No Travis Docker Tests

(shared aurora)

Completed

(shared aurora)

Completed

N/A

Maintenance Mode: Disable in Asgard

Smoke Test: Ask @Brandon Baker (Deactivated)

License Manager

Enterprise -
T:@Brittney Exline (Deactivated)
B: @Joe Cassaro
eSRE:@Brandon Baker (Deactivated)

Stage: 5.7
Prod: 5.7

Yes, for now

Upgraded as of Sep 28, 2020

No Travis or CircleCI

 

(shared aurora)

Completed

(shared aurora)

Completed

N/A

Maintenance Mode: Disable in Asgard

Smoke Test:
Ask @Brandon Baker (Deactivated)

Discovery

Engage -
T: @Jason Myatt (Deactivated)
B: @Kaitlin Ahern (Deactivated)

cc #discovery

Stage: 5.7
Prod: 5.7

Yes

Upgraded as of Oct 6, 2020

Upgraded Travis as of Oct 6, 2020

(shared aurora)

(dedicated mysql)

Completed 10/29 @ 3pm

N/A

 

Portal Designer

Masters -
T:@Simon Chen
B:
eSRE: @Matt Hughes (Deactivated)?

Stage: 5.7
Prod: 5.7

No

Upgraded local compose file as of Sep 18, 2020
@Alison Langston

No Travis or CircleCI

(shared aurora)

(dedicated aurora)

Completed on 12/9 @ 10am

N/A

Maintenance Mode: Disable in Asgard

Smoke Tests:

Blockstore

T&L -
B:@Marco Morales (Deactivated)

Stage: 5.7
Prod: 5.7

Yes

Done 11/6

CircleCI file

Done 11/6

N/A

(shared cluster)

(shared cluster)

Completed

N/A

 

Analytics Pipeline
(may also include )

Data Engineering - T:@Brian Beggs

N/A

Yes

 

N/A

prod-edx-analyticsapi-data.rds.edx.org(pipeline001) & prod-edx-analyticsapi-data-readonly.rds.edx.org (not in use)

 

 

Analytics API

Data Engineering - T:@Brian Beggs

Stage: 5.7
Prod: 5.7
Edge: N/A

Yes

N/A (According to @Brian Beggs , insights and analytics-api do not have a devstack.)

Travis doesn’t use MySQL

Doesn’t have sandboxes.

(shared aurora)

No action items

(analytics aurora)
Upgraded on 12/17 @ 10am ET

CNAMES:
prod-edx-analyticsapi-django.rds.edx.org

(analytics mysql)

Upgraded on 12/18 @ 10am ET

prod-edge-analyticsapi.rds.edx.org

Maintenance Mode: @Brian Beggs will check with his team + Asgard

Smoke Test: Download button in Insights

Insights and/or Analytics Dashboard

DB Name:
analytics-report + analytics-report-replica-001

Data Engineering - T:@Brian Beggs

Stage: 5.7
Prod: 5.7
Edge: 5.7

Yes

N/A (According to @Brian Beggs , insights and analytics-api do not have a devstack.)

Travis doesn’t use MySQL

Doesn’t have sandboxes.

(shared aurora)

No action items

(analytics aurora)
Upgraded on 12/17 @ 10am ET

CNAMES:
prod-edx-analyticsapi-django.rds.edx.org, prod-edx-analyticsapi-data.rds.edx.org(reports001) & prod-edx-analyticsapi-data-readonly.rds.edx.org (not in use)

(analytics mysql)

(we moved to 5.6.48 on 10/21 which involved 7 hours of unexpected downtime)
Upgraded on 12/18 @ 10am ET

prod-edge-analyticsapi.rds.edx.org

Maintenance Mode: @Brian Beggs will check with his team + Asgard

Smoke Test: Download button in Insights

Notifier

ensure retirement



@Simon Chen , @Julie Davis (Deactivated) - repo ownership

@Jeremy Bowman (Deactivated) - for deprecation

N/A

No

Done 11/13

- closed in favor of

No Travis Docker or CircleCI Tests

N/A

N/A

 

Video Encode Manager

@Kashif Chaudhry (Unlicensed) , @Dawoud Sheraz

Stage: 5.7
Prod: 5.7

Yes

Not in DevStack:

N/A

N/A

(shared cluster)

(shared cluster)

Completed

N/A

@Kashif Chaudhry (Unlicensed) Are you aware of how SRE can smoketest this service after we complete this database upgrade?

 

Licence Manager

 

Stage: 5.7
Prod: 5.7

Yes

 

 

N/A

(shared cluster)

(shared cluster)

Completed

N/A

 

Taxonomy

ensure retirement

Website:
T:@Albemarle (Deactivated)

N/A

No

 

Will be retired before by then -

 

Will be retired before then -

N/A

Needs to be removed from shared

Needs to be removed from shared

N/A

N/A

Enterprise Reporting

Markhors:
T:@Mike O'Connell (Deactivated)

Stage: 5.7
Prod: 5.7

No

N/A

N/A

N/A

stage-edx-enterprise-reporting

prod-edx-enterprise-reporting

N/A

N/A

 

 

 

 

 

 

 

 

 

 

 

 

 

- needs to be done by 11/9

Important Dates

  • MySQL Database 5.6 becomes end of life in Feb, 2021

  • Open edX Juniper will ship with support for 5.6, but Open edX Koa will ship with support for 5.7

  • Application Owners should plan to upgrade to 5.7 as part of the Koa release (December 9. Upgrades should be in place by Nov 9.)

  • Some apps (like edxapp, ecommerce, discovery may define MySQL URLs outside of their compose files)

Out of Scope

The following services do not need to be upgraded within this project, or upgrade is not applicable to them.

Filter by label

There are no items with the selected labels at this time.