[BD 19-20] Technical Plan (Haystack Replacement, Elasticsearch Upgrade)
Plans for upgrading from the Elasticsearch Upgrade Task Force.
Goals
Upgrade the version of Elasticsearch used by Open edX to ES7 (the latest version).
Currently, the version of ES we use is ES1.5, which is currently only supported by AWS.
The focus of this project will be the repos that do not have dependencies on Haystack.
Remove our dependencies on Haystack as a library, which is not compatible with ES7.
Also remove drf-haystack, our library for integrating Haystack into django-rest-framework.
We will consider this successful when Haystack is removed as a dependency from these repositories.
[BD-20] Haystack Replacement
Two of our repos, edx-notes-api
and course-discovery
use GitHub - django-haystack/django-haystack: Modular search for Django andGitHub - rhblind/drf-haystack: Haystack for Django REST Framework to integrate these repos with ES.
Haystack and DRF-Haystack are lagging behind on their ES support, and so we cannot move off of ES1/2 while still depending on them.
This project would involve replacing our Haystack dependencies with GitHub - barseghyanartur/django-elasticsearch-dsl-drf: Integrate Elasticsearch DSL with Django REST framework. and GitHub - elastic/elasticsearch-dsl-py: High level Python client for Elasticsearch in these two repos.
A recommended approach to tackling this:
Start with edx-notes-api: GitHub - openedx/edx-notes-api: edx-notes-api
a smaller and simpler repo
Then move onto course-discovery: GitHub - openedx/course-discovery: Service providing access to consolidated course and program metadata
Will benefit from the work done on notes
[BD-19] Elasticsearch Upgrade
While the effort to remove Haystack is ongoing, we would like to also upgrade the other repos dependent on ES to the latest version of ES (as of right now, ES7).
These repos include:
edx-platform: GitHub - edx/edx-platform
edx-enterprise: GitHub - openedx/edx-enterprise
edx-search: GitHub - openedx/edx-search
cs_comments_service: GitHub - edx/cs_comments_service: Redirect information for cs_comments_service
edx-analytics-data-api: GitHub - edx/edx-analytics-data-api
From our initial research, none of these repos are being held back by dependency issues the way the Haystack repos are, but there may need to be significant updates to the code to ensure that they are still functional, since we are moving from ES1.5 to ES7.
A recommended ordering for tackling this work would be:
edx-analytics-data-api
this already uses elasticsearch-dsl and may be simpler to upgrade
@Stuart Young (Deactivated) has already attempted to move it onto ES5
edx-search
This is a low-level library that primarily indexes course content
It is used by edx-platform and edx-enterprise
edx-enterprise
This may “just work” once edx-search is upgraded
A library installed by edx-platform
edx-platform
This also may “just work” once edx-search is upgraded
cs_comments_service
Uses ruby and ruby libraries
Can be done in parallel to the work above.
[Completion] Elasticsearch & Haystack
To bring the above 2 projects to full completion, let’s ensure we do the following:
Upgrade all Analytics repos, including newly found ones: Analytics Pipeline to ES7 (1 → 2 weeks)
Fix any issues that arise when edX deploys the upgrades for each service to edX stage and production. Elasticsearch Ownership and Rollout ( ? )
Deprecate and remove all unneeded code that existed for the prior versions. (2 days → 2 weeks)