Data WG 2022-05-18 Meeting notes

 Date

May 18, 2022

 Participants

  • @Edward Zarecor

  • @Dean Jay Mathew

  • @Sofiane Bebert

  • @Dave Ormsbee (Axim)

  • @Julien Maupetit

  • @Maria Fernanda Magallanes Z (eduNEXT)

  • @Former user (Deleted)

  •  

  •  

 Goals

  •  

 Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

10 min

 

@Andrés González

  • Announcing the first beta releases of panorama-elt and tutor-contrib-panorama, the basic tools to integrate Open edX and other systems into a datalake. Contributions are welcome!

  • Notes

    • Python based ELT toolkit that attempts to be modular and support diverse data sources and data lakes

    • Currently focused on AWS and only supports Athena today

    • Tutor plugin allows running the ELT tools alongside tutor, but expects an AWS destination for the data

    • Full support is currently only available in the Kubernetes version, locally only the ELT part for RDBMS tables

    • Athena TLDR;

      • put files in an s3 bucktet

      • Athena allows SQL over CSV, JSON and other formats

      • Athena is based on Hive, so there is something available that is open source, but there are no plans to work on this

    • The plugin is usable for local installations and dev installations

      • Docs are currently empty

      • Tracking logs are not supported locally

5 min

Insights 2.0?

@Edward Zarecor

  • At the conference the 2U product team spoke about “Insights 2.0.”

  • Can someone describe the scope of that effort?

  • Will it be valuable outside of 2U?

  • Notes

    • Insights 2.0 is super aspirational

    • 2U have designs

    • Not planned for work for two quarters

    • Insights data is useful, but has significant gaps

    • Intends to allow in-context analytics

    • What they have been working on was focused on replacing the pipeline

    • If they have to replace the Insights frontend, they would open a number of architectural questions

      • should it be combined with the data api

      • should insights be an MFE

      • Should django be side-lined to avoid ORM performance tax

    • As this isn’t actively in development, this is all speculation

    • @Dave Ormsbee (Axim) asks has there been clarification about the intended audience of Insights in 2.0.

      • Yes, will be staying focused on the aggregates, not individual learners.

      • Learner view is being deprecated

    • @Dave Ormsbee (Axim) will the data be primarily instructional or will it included things like program enrollments?

      • The most popular piece currently is the enrollment dashboard

      • Imagine that any investment would support the admin persona and probably add additional features for admins

    • @Edward Zarecor were the designs based on user research or blue sky?

      • 2U did user research, mostly with administrators

      • They would like to do a big round of customer interviews with instructors

    • @Edward Zarecor are the designs sharable?

      • They are very preliminary, Ed would need to work with 2U product to discuss further

    • @Andrés González what are the specific challenges that 2U are facing with PII?

      • They are really planning to focus on aggregated data as a general rule to avoid any risks associated with data that could be associated with any particular learner.

    • Is there a business specific data need that requires individual data access regimes?

    • @Andy Shultz (Deactivated) proposes an early fork in an analytics design the separates aggregated data and individual data.

 

10 min

Bite-sized work

@Edward Zarecor

Let’s think about what bite-sized work would be valuable to start on now and commit to doing some of it. I have a few idea.

2 min

Starting a conversation about Embargoing the Open edX event system

@Edward Zarecor

5 min

Completion Aggregator

@Sofiane Bebert

  • openedx-completion-aggregator

  • @Sofiane Bebert thought this plugin is very useful and asks whether it should be part of the default Open edX release?

  • @Dave Ormsbee (Axim) not sure if http://edx.org runs plugin.

  • This plugin wasn’t upgraded to Django 3.2, so unlikely that this is run there – edx.org

  • Walking the course tree is expensive, we do that elsewhere, so that in itself is not a disqualification

  • Need more data from OpenCraft

    • Have they considered making it part of the default

  • @Julien Maupetit is there an event for completion at the block level currently, with the block name and the block id

  • @Dave Ormsbee (Axim) if it is not the case that this event is already created, it should be easy to do so because the completion API is already persisting this data to a table.

 

Figures

@John Baldwin

 Action items

@Dave Ormsbee (Axim) to report back on block level completion events in the tracking logs.

Update: It doesn’t look like it currently makes it to the tracking logs. It is possible to add this support, but it will likely require refactoring work. See Slack thread for details.
@Jill Vogel Would it be possible to get more context on GitHub - open-craft/openedx-completion-aggregator: An Open edX Django app that aggregates block-level completion data to report on different block types. ? There was a question of whether there’s an intention to support it? What are its known limitation, if any? Should we consider making it part of the default installation of the Open edX platform?

Update: Yes, OpenCraft have a few clients which use this plugin, and so we do support it, though we’re lagging behind somewhat. @Sofiane Bebert has submitted an update for Maple (thank you!) and @Gábor Boros is reviewing.

Should we consider making it part of the Open edX by default? Sure, but note:

* synchronous vs asynchronous: sync aggregation can hurt performance, and async requires cron jobs (would be better served by celery-beat). Synchronous is the default, so it should be configured to async if installed on large deployments.
* Provides a Course/Chapter Progress Bar view which could be integrated into the Learning MFE (current clients use custom theme on the old courseware view).
* APIs might be useful for data analytics.
* There’s also a few old open issues.



 Decisions