Runbooks from the ES1 migration
Generalized Strategy
These will have to be done twice in most cases, first on stage and then on prod so that we can uncover any major infrastructural issues.
0. Create/Find a Test Bed on Stage and Prod
Test before upgrade
After upgrade
After re-index
1. Create ES7 clusters in terraform
An example PR: https://github.com/edx/terraform/pull/2830
2. Create a new setting for the ES7 configuration
Set this setting to point to the new clusters
3. Spin up new instances of app with ES7-compatible code
This is a manual process
eSREs will rely on SRE for assistance with how to do this
4. Index ES data from ES7-compatible instance
Each app has its own method for indexing data, will need to be discussed with eSRE during setup and implementation
Understand complications based on deploy time (i.e. missing notes that were added between indexing + deploy)
5. Merge ES7 code and configuration changes
6. Deploy using GoCD using the newly built changes
Communicate schedule to #support ahead of time.
7. Re-index ES7 to include any writes that were missed
8. Clean up settings and clusters
Remove old settings
Remove old clusters from terraform
Specific App Strategies
edx-notes
Plan deploy for week of Sept 7.
Does not have remote config, due to Kubernetes
Can’t send out config synchronously with image
Set different variables for different clusters
Different code uses different variables
Can we hide search from users temporarily?
Action Items:
Diana Huang - try to get this working in devstack
Try adding new
ELASTICSEARCH_DSL
in the yaml, clean upHAYSTACK_CONNECTIONS
andELASTICSEARCH_URL
after we confirm that the deploy goes smoothly
Diana Huang - schedule time next week to try to deploy to stage
course-discovery
Instead of using the existing
ELASTICSEARCH_URL
var for the url for the new cluster. We’ll add a new oneELASTICSEARCH_CLUSTER_URL
to make the cutover easier and less error prone.Runbook
Merge remote-config PR
Merge course-discovery PR
Deploy to stage.
After Deploy
manage.py update_index --disable-change-limit
Test on stage.
edx-platform
Pause prod pipeline
Merge changes to master
Have revert PR available
Do testing in stage
If we feel confident with stage, unpause prod pipeline
Prevent prod/edge deploy from deploying - not added to ELB
Index data - prod/edge
This needs to be discovered and investigated - is it a checkbox? - Fred Smith (Deactivated)
Checkbox - switch deploy_ami step from ‘On Success’ to 'Manual'
Manually set up ASGs for
Diana Huang - schedule a 4 hour window for attempting to deploy