[BD 19-20] Technical Plan (Haystack Replacement, Elasticsearch Upgrade)

Plans for upgrading from the Elasticsearch Upgrade Task Force.

Goals

Upgrade the version of Elasticsearch used by Open edX to ES7 (the latest version).

  • Currently, the version of ES we use is ES1.5, which is currently only supported by AWS.

  • The focus of this project will be the repos that do not have dependencies on Haystack.

Remove our dependencies on Haystack as a library, which is not compatible with ES7.

  • Also remove drf-haystack, our library for integrating Haystack into django-rest-framework.

  • We will consider this successful when Haystack is removed as a dependency from these repositories.

[BD-20] Haystack Replacement

Two of our repos, edx-notes-apiand course-discovery use GitHub - django-haystack/django-haystack: Modular search for Django andGitHub - rhblind/drf-haystack: Haystack for Django REST Framework to integrate these repos with ES.

Haystack and DRF-Haystack are lagging behind on their ES support, and so we cannot move off of ES1/2 while still depending on them.

This project would involve replacing our Haystack dependencies with GitHub - barseghyanartur/django-elasticsearch-dsl-drf: Integrate Elasticsearch DSL with Django REST framework. and GitHub - elastic/elasticsearch-dsl-py: High level Python client for Elasticsearch in these two repos.

A recommended approach to tackling this:

  1. Start with edx-notes-api: GitHub - openedx/edx-notes-api: edx-notes-api

    1. a smaller and simpler repo

  2. Then move onto course-discovery: GitHub - openedx/course-discovery: Service providing access to consolidated course and program metadata

    1. Will benefit from the work done on notes

[BD-19] Elasticsearch Upgrade

While the effort to remove Haystack is ongoing, we would like to also upgrade the other repos dependent on ES to the latest version of ES (as of right now, ES7).

These repos include:

From our initial research, none of these repos are being held back by dependency issues the way the Haystack repos are, but there may need to be significant updates to the code to ensure that they are still functional, since we are moving from ES1.5 to ES7.

A recommended ordering for tackling this work would be:

  1. edx-analytics-data-api

    1. this already uses elasticsearch-dsl and may be simpler to upgrade

    2. @Stuart Young (Deactivated) has already attempted to move it onto ES5

  2. edx-search

    1. This is a low-level library that primarily indexes course content

    2. It is used by edx-platform and edx-enterprise

  3. edx-enterprise

    1. This may “just work” once edx-search is upgraded

    2. A library installed by edx-platform

  4. edx-platform

    1. This also may “just work” once edx-search is upgraded

  5. cs_comments_service

    1. Uses ruby and ruby libraries

    2. Can be done in parallel to the work above.

[Completion] Elasticsearch & Haystack

To bring the above 2 projects to full completion, let’s ensure we do the following:

  1. Upgrade all Analytics repos, including newly found ones: Analytics Pipeline to ES7 (1 → 2 weeks)

  2. Fix any issues that arise when edX deploys the upgrades for each service to edX stage and production. Elasticsearch Ownership and Rollout ( ? )

  3. Deprecate and remove all unneeded code that existed for the prior versions. (2 days → 2 weeks)