Data WG 2022-03-24 Meeting Notes

 Date

Mar 24, 2022

 Participants

  • @Edward Zarecor

  • @Diego Millan

  • @Andy Shultz (Deactivated)

  • @Simon Chen

  • @Dave Ormsbee (Axim)

  • @Maria Fernanda Magallanes Z

  • @Tobias Macey

 Goals

  •  

 Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

5M

Followup on Elasticsearch → OpenSearch plans

@Edward Zarecor

  • There’s a new Slack channel, #search-migration, for tracking plans and progress

  • The current proposal is to migrate a number of our services to use Opensearch

  • Discovery was done into abandoning *search in favor of MySQL full text search, but was deemed not currently feasible because of performance problems

10M

edx insights acceptance tests

@Simon Chen

  • Do any of the Open edX members run acceptance tests on edx-analytics-dashboard? If so, how do you do it?

  • How valuable do you find these acceptance tests?

  • Notes

    • The tests are not trivial to run locally

    • The use bokchoy which has/is being deprecated

    • Do they run in CI or via the deployment pipeline?

    • There do seem to be some acceptance tests running in GitHub

    • Cost of doing nothing is maintenance, but isn’t high

    • @Jill Vogel any opinion on this matter?

10M

learner view deprecation

@Andy Shultz (Deactivated)

  • We’re deprecating the learner view because it is expensive and not analytics

  • Where is the right place for this student info to live? Is this “data” even a concern for this WG? The only actual users we have found for it use it as a student directory!

  • Notes

    • View was of detailed user data and modules they interacted with.

    • Was sparsely used

    • PII risk

    • Other alternatives exist for getting similar data, but maybe not to the module level

    • We can see how they are doing on questions, but not at as low a level

    • Will Insights by design avoid pulling in PII in the future? Yes. Will focus on aggregates over learner level data which can leak out in numerous ways.

    • Can we add an ADR specifying this decision

    • Already turned off at edX

    • Deprecation has been announced, end of March is the comment deadline

    • It would be removed in Nutmeg

    • If this data comes back, individual learner engagement, is that a matter for this group?

      • Yes and no, an inform of plans to build a student directory would be helpful however.

 

Exporting courses via an API

@Tobias Macey

  • ODL have a beta API for course export

  • Package is on pipy, ol-openedx-course-export

  • https://pypi.org/project/ol-openedx-course-export/

  • https://github.com/mitodl/open-edx-plugins/tree/main/src/ol_openedx_course_export

  • In order to inform analytics they pull down course content to figure out block ids to map progress to content

  • Can post a list of course ids to export

  • Has a get endpoint for status of export

  • Interested in contributors?

    • Yes

    • In a plugin mono repo

    • Need to be familiar with the pants build tool

  • How does it work?

    • Uses mostly the same machinery as the Studio export, but exposes it as a RESTful API

    • The format is OLx

    • Is there a RESTful endpoint for posting a tarball? Unsure. There’s a way to connect to a git repository.

    • The sysadmin functionality has also been moved into a plugin

    • API is asynch and uses celery for executing the work

    • Currently the API is opinionated about S3, but could be easily adapted to use Django storages

    • There is this version of an import API.

  • Should we be deprecating other versions of export/import in favor of this or an extended version of this?

  • The main need here is to re-hydrate block ids and there’s hope that there will be an easier way to do that in the future

    • Would the course block API enable this today? Maybe, would require some discovery.

    • With a block id you can get usage ids, titles, etc.

    • Not sure what happens for items that are not connected to the course tree like textbooks

 

Standard for ETL for Insights

@Maria Fernanda Magallanes Z

  • Starting a conversation among Insights users on standards and how to keep it going as 2U deprecates pipeline

  • Insights is the only place that access control is done today in our instructor analytics system

  • Insights currently does data transformation beyond what is done in the API, can we eliminate this?

  • How can we move into a phase of converging on a design recommendation?

  • Starting with ETL and defining our migration path in phases may be a way of doing this.

 Action items

Review the notes for accuracy, Everyone.
Add creating an ADR for the PII stance of insights as part of the Learner View deprecation work. @Andy Shultz (Deactivated)
MIT had issues authenticating against Studio because of delegated authentication to the LMS. They were being redirected to the login path an unable to exercise the APIs for import/export that exist in Studio.
@Maria Fernanda Magallanes Z will start a conversation in Discourse to try to get more engagement form Insights users.

 Decisions