[BD-19, BD-20] Meeting Notes

2021-01-12

  • course-discovery deployed to prod

    • mostly going well. some small issues surfacing from prospectus builds

  • edX working to clean up the old usages of 1.5

2021-01-05

  • talking to DE, we want to halt work on edx-analytics-data-api for now, because DE cannot spare the resources

  • course-discovery

    • try to work on this earlier in the day for edX

    • Hopefully one last issue will be resolved this week

    • If we find others, we will focus on getting RG a good dev setup to run the same tests we are

2020-12-22

  • course-discovery

    • handled rebasing onto master

    • did some more testing on stage

      • some issues arose during testing

      • understand how to address these issues

      • should be handled by the next meeting

  • edx-analytics-pipeline

    • no updates at the moment

    • questions:

      • who can handle questions from DE while Stuart is out?

        • A: not sure, @Diana Huang will check in with DE but it’s possible that they are too understaffed at the moment.

      • test model that handles connection with ES

2020-12-15

  • course-discovery

    • nothing has changed since last week

    • have not had any of Feanil’s time

    • one last question that needs to be addressed, having trouble replicating it

    • can we check the latest commit?

  • edx-analytics-api

    • did a big debugging session between RG and edX

    • made progress on acceptance tests

      • tests are failing and logs will be sent to RG

    • waiting for edX to return feedback/logs

2020-12-08

  • course-discovery had fixes added

    • found some new smaller issues

      • filtering for org field in exclude mode

      • fixes for this have been added

    • waiting for the next phase of testing

    • fix managed to handle other potential problems

    • one unresolved issue left behind

      • don’t have the same number of results between searches

      • have the database dump - trying to recreate on devstack

      • trying the old process of doing a debugging

        • blocked on Feanil’s time

  • edx-analytics-pipeline

    • asked DE about how the ES is run

      • run on AWS Elasticsearch

    • helpful if we run service tests using the OpenDistro image

    • cannot reproduce the issue locally, but will continue to try it

    • unfortunately, will need to reconfigure ES to work this way

    • check in with DE about what we might be able to do in terms of accelerating testing

2020-12-01

  • Nikolay is back from vacation

    • working on resolving the outstanding issues with course-discovery

    • plans to do another testing session today

    • currently on stage - testing to happen later today

    • needed to handle requirements for the enterprise team(s)

      • add more filters to handle these cases

    • waiting for results

  • edx-analytics-pipline

    • fixed Travis issues with the PR

    • need edX to run acceptance tests to verify that these changes are working

2020-11-24

  • edx-analytics-pipeline is being worked on

    • Travis CI integration fixed

    • tests being added for integration with ES

    • should be ready to go

    • update acceptance tests to use ES7 cluster settings

    • talk with DE - create a thread in Slack where we can discuss the changes that may need to happen

  • waiting for Nikolay for course-discovery

    • resolving the last few issues

    • aiming for the first couple of weeks of December to deploy

  • edx-search

    • deployed! currently no issues

  • just a note: cs_comments_service ES nodes are crashing periodically.

    • TNL team on edX is investigating why this might be happening - slow queries or similar

    • should we just allocate more memory to the nodes?

2020-11-17

  • New PRs for course-discovery/Haystack replacement

    • RG hasn’t reviewed, but edX can/will do that review

    • Waiting on a few last issues to be resolved

    • Waiting for Nikolay to get back from vacation to address those issues

  • prepared PR to edx-analytics-pipeline

    • the code that was broken should now be fixed

    • once unit tests are green, we can ask DE/Stu to run the acceptance tests again

  • edx-search

    • edX is working on testing this on stage

    • we’re seeing some failures indexing courses on stage. edX will be investigating this more closely and will inform RG if code changes are needed.

2020-11-10

  • no blockers on course-discovery, edx-notes-api

    • waiting for feedback from edX

    • course-discovery

      • waiting for time to do the testing

      • when filtering ES, does not work case insensitively

        • differences between ES1.5 and ES7

        • came up with a solution to leave everything lower case

      • planning on doing some more testing

    • edx-notes-api

      • one issue found

      • getting a bad request into ES

      • tried to do a direct query

      • edX to try to do more testing and investigation into the issue

  • edx-analytics-data-api

    • found strange error - ES library failing request to ES

      • need to investigate this issue

    • writing more tests

  • edx-search

    • results need to be paginated to handle results

      • who should update platform side to support pagination?

        • RG for now

        • will reach out to edX if anything changes

2020-11-03

  • conversation about accelerating syncing

  • trying to do a daily sync update. this seems to be going well.

  • finding new and unexpected bugs

  • course-discovery

    • last time, one issue with a POST request

    • added a fix - still a small problem

    • waiting to find some time to test the latest fix

  • edx-notes-api

    • found an issue on prod

    • couldn’t figure out why this was happening

    • have some tests for running against the ES instance directly

  • edx-search

    • trying to test with the settings updated

    • found an issue in local yesterday - probably just an earlier version of edx-search being used incorrectly

  • edx-analytics-pipeline

    • RG to resolve the issue that found by the acceptance tests

    • all code to ES in this repo are mocked, so tests didn’t catch it

2020-10-27

  • work at RG focused on bugfixing/testing/etc

  • course-discovery - actively working on testing and fixing bugs

    • results showing

    • Feanil prepared tests that needed to pass

    • 500 error on some endpoints for course discovery

    • found one last issue on Monday - POST request issue with search all

    • fixes submitted this morning. we need to test to ensure these work

  • Talking with Robert about forums issues - trying to resolve issues that were found on prod

    • We can’t reproduce these issues locally - sort by vote issues that were found

    • This is the main issue found - can we find a way to reproduce it

    • Found a course, provided json to prove - can RG use this json to reproduce locally? if we need to provide staging comments data, we can probably pass it along

    •  

  • Seeing some issues with edx-search and edx-platform - can see if we can fix those issues with ES settings

  •  

2020-10-13

  • Focused on two repos currently in deployment: cs_comments_service, course-discovery

  • Two other repos in progress: analytics work, mostly resolving conflicts, addressing requests

    • Adding tests to analytics-pipeline

  • RG to be on vacation on Wednesday

  • course-discovery

    • some progress being made

    • made some more updates and resolved conflicts

    • did some more testing against Stage data

    • lots of communication via Slack - managed to resolve issues

    • only 4 issues left - 3 of them have the same root cause

    • still didn’t have Stage data on RG - using raw ES queries to edX for testing

    • tests have been written for the new issues/cases we have so that we can try to triage

    • manual testing still showing issues

  • edx-analytics-api

    • resolved conflicts

  • cs_comments_service

    • resolved all issues and comments

  • analytics-pipeline

    • adding tests

2020-10-06

  • Actively working on course-discovery

    • Did second round of testing on Stage - produced a list

    • Went through every item on the list - did a thorough evaluation of every issue

    • Submitted fixes and tests

    • Explanation on the issues posted in Slack

    • PR of fixes has some conflicts to resolve against edx master branch - just needs to be rebased - hoping Feanil can review new code today

    • Don’t need access to data for now

  • Resolved issues with edx-search

  • Added alias functionality back to cs_comments_service - responding to questions and review comments

  • Slack threads can be hard to respond to without context

  • Settings - @Diana Huang to submit PRs to update the settings against RG side

    • at least give an example for RG to look at

2020-09-29

  • RG mainly focused on bugs handled on deployed services

    • course-discovery continuing work - addressed issues found on Stage

    • forums feedback from the TNL squad, addressing issues

    • edx-search and edx-platform updates and fixes based on feedback as well

  • course-discovery

    • progress happening here

    • initial testing on stage last week, found issues

    • working between edX and RG to resolve issues in course-discovery

      • more testing on stage with the fixes, some issues found. old ones not entirely fixed and new ones discovered

      • despite local testing with similar data, still seeing issues on stage with stage data

      • unsure if the data edX provided can replicate all the issues we are seeing

        • can we get a dump of data off of stage? - probably some PII concerns

        • edX to talk to DE about the security/PII concerns

      • planning on getting in fixes back for tomorrow

  • edx-notes-api is in production

    • with issues - had problems indexing all the notes

    • having some devstack problems, but ES seems be working

  • reopened PR against edx-platform

    • resolved test failures on branch

  • opened PR against edx-analytics-pipeline

    • code coverage issues - not enough code coverage of ES sections

    • hopefully the pipeline acceptance tests cover some of this

  • cs_comments_service PR being reviewed by edX

    • most of the issues resolved

    • planning on returning alias functionality to allow for swap between indexes, allowing for quick swapover

  • devstack issues

    • edx-notes-api uses the new ES7 code but still pointing at the old ES container

    • edX working to resolve devstack issues right now

      • focus on skipping notes in the devstack - high priority - @Diana Huang

    • last week - community asked about the devstack issues

2020-09-22

  • issues with devstack and provisioning that need to get fixed

    • ES7 code deployed but not finished

  • course-discovery test was positive but some issues were raised

    • most of them are easily fixable, but some of them need to be discussed when Mike gets back

  • testing edx-platform and edx-search, PR raised some issues

    • can’t reproduce it locally

    • can see which tests are failing - exploring what the reasons for what they are

      • fixes for edx-search

2020-09-15

  • responded to issues with edx-notes-api and edx-search questions

  • edx-analytics-pipeline PR is being prepared and will be ready be opened against upstream

  • will aim to get to edx-search and course-discovery onto stage to do some manual testing this week

  • more people being added to the project from the edX side, may see new names and new questions.

2020-09-08

  • Updates first, then retrospective

  • edx-search and edx-platform changes

    • last week, we talked about removing the old interface

    • submitted updates and need to test this

  • forums is in testing - resolved some issues

    • unit test passed, but did some small manual testing

    • found that previously created indexes were not removed

    • waiting for review from edX/ @Diana Huang

  • working on edx-analytics-pipeline right now

Retrospective

  • What went well:

    • Generally - good and interesting job, an adventure

    • Did the best to get it done quickly and well

    • Good project for RG to pick up, due to familiarity with ES

    • Managed to handle the volume and complexity of the work well and adapted quickly

    • Team worked well together

    • Good communication channels, found solutions together

      • Good to look at PRs early

      • Good feedback loops between teams

      • Especially across timezones

      • Excellent transparency

      • Weekly checkins were a good cadence for discussion

    • Lots of tests already existed in the Open edX codebase - made it easier to ensure that functionality didn’t break in the upgrade/library change

    • Even if the project was larger, was able to react quickly

    • RG pro-actively suggesting retrospective!

  • What could be improved:

    • Underestimated the size of the project

      • Found a repo very far on into the project that needed to be converted as well

      • Always have to account for “padding” in cases where there are exceptions

    • When switching from Haystack to Elasticsearch-DSL, it was necessary to understand the difference between these tools and how different they are

    • Better documentation around the ES architecture +4

      • Want to take this on to improve the experience for Open edX overall

    • Maybe could have used more tests - better tests? +3

      • Didn’t have time to add tests that we saw were missing

    • RG wanted to do more manual testing +1 - parking lot discussion

      • Wanted to help with manual testing

      • Should be part of the project

      • How does this get delivered?

      • We did manual testing for course-discovery and will forums

    • Would like to test and deploy work sooner +3

    • Could we have done code in smaller PRs because of the changes? +1

      • PRs were are broken into smaller chunks - each commit

Documentation around ES architecture

  • ADRs(?)

    • Good for the decisions in the project

  • More comprehensive documentation

    • difficult to maintain

  • Is there a tool that could generate documentation of the ES schema?

  • HOWTO guides

    • how does this impact Koa?

    • these are easier to keep up to date

  • Put together a proposal of work for documentation for RG

    • timebox - maybe a few days, maybe more

    • can decide on what combinations of work

Manual Testing - or Testing

  • What concrete actions can we take?

  • Local testing for ES - take a look at how it works in action

  • Could we add more unit tests(?)

    • Found some unit tests were missing

    • Three API views without any tests

  • Manual testing will help us to find things immediately

 

2020-09-01

  • Introductions for the team, welcoming Robert to the group

  • Close to the end

  • Want to have a retrospective on the project

    • Discuss what went well and what didn’t

    • Extend Sept 8 meeting for an hour

  • analytics-pipeline

    • Estimate: only 1-2 weeks to discuss

  • course-discovery

    • Who will be doing manual testing?

      • edX will start doing deployment preparation

      • RG will do some limited manual testing

  • Questions:

    • When is schema created?

      • All READMEs in the PRs should have instructions on creating the new schemas using management commands

2020-08-25

  • BD-19

    • cs_comments_service PR

      • one review comment from edX

    • edx-search

      • what are we doing about DEPR?

  • BD-20

    • course-discovery PR

      • lots of feedback from edX

        • still waiting for more review

      • may open PR against edX soon

      • Travis CI build is passing

      • Still worried about testing

        • Want to get this done sooner rather than later, due to recent context

  • Services

    • Lower risk on upgrade (since not active writes in real-time)

      • analytics

      • course-discovery

    • notes

    • comments-service

    • edx-search

  • Action Item:

    • Schedule a retrospective/wrap-up for second week of September - @Diana Huang

2020-08-18

  • BD-19

    • cs_comments_service PR

      • still under review

      • addressing review comments

      • still waiting on edX review

    • edx-search

      • once we switched back to the old

      • LMS tests are passing with some minor changes

      • switching two document types to two indexes requires work in Studio

  • BD-20

    • course-discovery

      • tests are passing after removal of Haystack and switching over to elasticsearch-dsl with global

      • multi-processor mode issues with xdist

        • old prefixing strategy wasn’t working with ES7/elasticsearch-dsl

        • found solution for handling this

      • do we need aliases?

        • yes, we want to keep this as a sanity check for our work

        • this feature is now in the codebase

        • all tests related to this are passing

      • connected course-discovery PR to travis

        • travis is now passing, lots of issues with style-checkers like pylint errors that needed to be fixed

        • still some style issues can be fixed

2020-08-11

  • BD-19

    • working with edx-platform, edx-search changes caused breakages in tests, will try to update edx-platform

      • the current plan:

        • switch interface back to old one, run edx-platform tests to ensure that there are no issues

        • once tests are passing, update edx-platform code to use new, better interface from edx-search

    • cs_comments_service changes are up for review internally, will be looked at by Diana

      • be on the look out for the PR against edx org

  • BD-20

    • course-discovery - making progress

      • got comments/review from Mike Terry on the PR

      • ES7 doesn’t support some of the features that ES1 and Haystack relied on

      • close to getting final code review - some blockers are in the way

        • lots of tests in course-discovery, only 6 tests are still failing - related to index creation

          • do we need these tests? can we delete them now that we are changing them.

          • take a look at them in the pull requests

        • still some places where Haystack is still being used, like filters.py

          • will be rewritten into elasticsearch-dsl-drf

        • running tests for course-discovery - take 40 minutes to complete

          • cannot make use of xdist to run ES tests - connection fails to ES container

          • will take a look at the prefixing used to make this work, see if we can get it working with elasticsearch-dsl

2020-08-04

  • BD-19

    • Optimistic that we will have a course-discovery PR next week

      • Lots of changes needed, going to be a big, in-depth code review

      • About 1000 tests pass, a bunch of tests still fail

      • Only have edx-haystack-extensions in course-discovery left

      • Need to configure Travis CI

      • Update documentation

      • Still the question of testing manually

      • Test coverage

        • Three apis in search.py that are not tested via unit tests

        • Look at percentage of coverage

    • Both edx-notes-api and course-discovery will be on our end

  • BD-20

    • PRs on edx-search - all tests are passing

      • needs review from TNL

      • need to be tested against edx-platform tests

    • cs_comments_service

      • Making lots of progress

      • Cleaning up code

      • PR against internal repo for review

  • edX internally is ramping up the process of working on upgrades and deployment

    • going to focus on edx-analytics-data-api

 

2020-07-28

  • BD-19

    • Got edx-analytics-data-api review comments from DE

      • Updated based on the

    • Working cs_comments_service

      • remove aliases from indexes

      • two schemas.

      • Updating tests

      • Lots of cleanup

  • BD-20

    • Open WIP PR for course-discovery

      • tests passing in test_search.py, test_models.py

      • Moving up on ES7 as well

      • Good pace on course-discovery

      • Can take a look at the intermediate PR

      • Lots of legacy code that needs to be dealt with

    • edx-notes-api - need SRE approval and review - blocked on edX side

2020-07-21

  • Only Mike was able to attend from edX side this time

  • BD-19 Update

    • edx-search

      • Made PR, possibly done with this

    • comments service

      • Starting work here

  • BD-20 Update

    • course-discovery

      • still understanding full scope while getting some work done

      • getting some tests to pass

      • working on course run index and aggregators

      • working on translation layer from haystack query language to es5 (more haystack language use than we thought)

      • floated possibility of bumping the API version of some endpoints, to speak pure elasticsearch query language, and be able to eventually one day drop the above translation layer. May need more discussion here. Let’s wait for Diana’s feedback.

    • edx-notes

      • Open edX PR needs review

Action Items

@Michael Terry (Deactivated) to review the edx-notes PR

2020-07-14

  • Two PRs - edx-notes-api, edx-analytics-data-api need review

    •  

  • BD-19 Update

    • edx-analytics-data-api

      • Review needs to happen

      • Request DE to also review

    • edx-search

      • Rewrite search method which builds a complex query that needs to be converted

  • BD-20 Update

    • edx-notes-api - opened against edx repository

      • need SRE/Devops Support for anything merged

    • focus on course-discovery

      • much more complicated

      • lots more search

      • haystack uses features for ES1 that might not exist for ES7

    • questions:

      • there will be many changes to course-discovery

      • worried about breaking things unintentionally

      • what acceptance criteria can we do this testing?

        • manual testing

        • check in with website and enterprise teams about usages of course discovery

      • How strict around we going to be about the API interfaces for course discovery?

        • elasticsearch-dsl-drf doesn’t match the Haystack API

        • URL structure is based on Haystack assumptions, ES-dsl-drf has different URL

        • A: most of the cases where we have query params, this is passed through to ES backend

        • Can write an adapter for the differences

      • Planning on breaking the current index into four different indexes

        • Need to write ADR around the decision

          • Diana and Mike to review this ADR when it’s ready

 

2020-07-07

  • Action items - Diana - get information on ES cluster sizes, review PRs in edx-notes-api

  • First PRs ready

  • Devstack changes for smoketesting

    • Will require some updates from someone after changes are done

  • BD-19 (Upgrade)

    • edx-analytics-data-api PR available

      • Resolving some remaining issues

    • Starting on edx-search next

  • BD-20 (Haystack)

    • Almost done with edx-notes-api

      • PR available

      • Test passes

    • Looking into course-discovery

      • Starting to do discovery on this service

      • Will be bigger and more involved

2020-06-30

2020-06-23

  • Set up the teams and the project infrastructure

  • BD-19 (Elastic upgrade): Igor and Sergiy

    • Starting to create the testing infrastructure

    • Plan: (start with phase 3) Upgrade and fix tests as they come up

  • BD-20 (Haystack): Nikolay

    • Starting with edx-notes

    • Currently using Ironwood

    • 4-part phase

      1. Evaluation (DONE)

        1. very different libraries

        2. but it’s all Python code at the end of the day

      2. Tests (DONE)

        1. all tests are passing (pre-changes)

        2. currently, there is 40% test coverage in one model; but 90% coverage in general

      3. Upgrade

        1. anticipating issues even within Haystack-supported versions

      4. Migration, within the context of ES-7

  • PR workflow

    • RG will work off of their own fork

    • edX prefers smaller PRs

    • edX (Diana) is open to being tagged on RG’s fork while work is in progress so early communication happens.

      • edX review should not block RG PRs

Action Items

@Diana Huang to get information on the current size(s) of our ES databases/instances