Data WG 2022-05-18 Meeting notes

 Date

May 18, 2022

 Participants

  • @Edward Zarecor

  • @Dean Jay Mathew

  • @Sofiane Bebert

  • @Dave Ormsbee (Axim)

  • @Julien Maupetit

  • @Maria Fernanda Magallanes Z (eduNEXT)

  • @Quitterie Lucas

  •  

  •  

 Goals

  •  

 Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

10 min

 

@Andrés González

  • Announcing the first beta releases of panorama-elt and tutor-contrib-panorama, the basic tools to integrate Open edX and other systems into a datalake. Contributions are welcome!

  • Notes

    • Python based ELT toolkit that attempts to be modular and support diverse data sources and data lakes

    • Currently focused on AWS and only supports Athena today

    • Tutor plugin allows running the ELT tools alongside tutor, but expects an AWS destination for the data

    • Full support is currently only available in the Kubernetes version, locally only the ELT part for RDBMS tables

    • Athena TLDR;

      • put files in an s3 bucktet

      • Athena allows SQL over CSV, JSON and other formats

      • Athena is based on Hive, so there is something available that is open source, but there are no plans to work on this

    • The plugin is usable for local installations and dev installations

      • Docs are currently empty

      • Tracking logs are not supported locally

5 min

Insights 2.0?

@Edward Zarecor

  • At the conference the 2U product team spoke about “Insights 2.0.”

  • Can someone describe the scope of that effort?

  • Will it be valuable outside of 2U?

  • Notes

    • Insights 2.0 is super aspirational

    • 2U have designs

    • Not planned for work for two quarters

    • Insights data is useful, but has significant gaps

    • Intends to allow in-context analytics

    • What they have been working on was focused on replacing the pipeline

    • If they have to replace the Insights frontend, they would open a number of architectural questions

      • should it be combined with the data api

      • should insights be an MFE

      • Should django be side-lined to avoid ORM performance tax

    • As this isn’t actively in development, this is all speculation

    • @Dave Ormsbee (Axim) asks has there been clarification about the intended audience of Insights in 2.0.

      • Yes, will be staying focused on the aggregates, not individual learners.

      • Learner view is being deprecated

    • @Dave Ormsbee (Axim) will the data be primarily instructional or will it included things like program enrollments?

      • The most popular piece currently is the enrollment dashboard

      • Imagine that any investment would support the admin persona and probably add additional features for admins

    • @Edward Zarecor were the designs based on user research or blue sky?

      • 2U did user research, mostly with administrators

      • They would like to do a big round of customer interviews with instructors

    • @Edward Zarecor are the designs sharable?

      • They are very preliminary, Ed would need to work with 2U product to discuss further

    • @Andrés González what are the specific challenges that 2U are facing with PII?

      • They are really planning to focus on aggregated data as a general rule to avoid any risks associated with data that could be associated with any particular learner.

    • Is there a business specific data need that requires individual data access regimes?

    • @Andy Shultz (Deactivated) proposes an early fork in an analytics design the separates aggregated data and individual data.

 

10 min

Bite-sized work

@Edward Zarecor

Let’s think about what bite-sized work would be valuable to start on now and commit to doing some of it. I have a few idea.

2 min

Starting a conversation about Embargoing the Open edX event system

@Edward Zarecor

  • I’ll start with a thread in discourse to discuss

  • Hope to have that published later today

5 min

Completion Aggregator

@Sofiane Bebert

  • openedx-completion-aggregator

  • @Sofiane Bebert thought this plugin is very useful and asks whether it should be part of the default Open edX release?

  • @Dave Ormsbee (Axim) not sure if runs plugin.

  • This plugin wasn’t upgraded to Django 3.2, so unlikely that this is run there – edx.org

  • Walking the course tree is expensive, we do that elsewhere, so that in itself is not a disqualification

  • Need more data from OpenCraft

    • Have they considered making it part of the default

  • @Julien Maupetit is there an event for completion at the block level currently, with the block name and the block id

  • @Dave Ormsbee (Axim) if it is not the case that this event is already created, it should be easy to do so because the completion API is already persisting this data to a table.

 

Figures

@John Baldwin

  • Was an appsembler contribution to the community

  • Replaced an internal product they had that generated CSV files

  • Figures is a dashboard the provides different context views

  • Course-centric with drill down to view learners

  • Not great at analytics, really more of an exploratory dashboard

  • During development they struggled getting robust requirements from users or future users

  • John dug into what was available in the platform and accepted the definitions that he found there

  • How can one find block ids – seems like a perennial problem for multiple users.

  • Figures

    • A Django app

    • It’s an Open edX plugin that plugs into the LMS and use it’s resources

    • It is not opinionated about the architecture.

    • It does not currently process tracking logs

    • It does creates its own tables

      • The platform models don’t track rich history

      • Performance of analytics queries, for example, courseware_studentmodule doesn’t have the indexes for efficient querying

    • Figures does use celery jobs to marshall the data into it’s datamart

  • @Sofiane Bebert is working on an update of Figures for Maple. He doesn’t think that an update for Nutmeg will be difficult.

  • @John Baldwin is also working on a series of fixes that improve performance and fix known bugs. Not really adding new features, but adding some instrumentation. If one does not have persistent grades installed, queries can be very expensive

  • Figures is an active project, but current level of investment is not high because of other business priorities.

  • Currently runs in the Appsembler Tahoe platform

  • Tahoe has a feature flag that allows it to use a different celery queue than the LMS to prevent resource contention, this is really a server var and is set at start up.

  • Interested in a conversation with folks about the Instructor dashboard

    • Question of user friendliness versus robustness of the data available.

    • (Note: There is some work spec’d out for doing an MFE conversion of the instructor dashboard, but development hasn’t started: , )

  • Is there a Tutor plugin for Figures?

    • There is currently, but it is not maintained

    • Maintenance was paused until Figures is updated

    • After Sofiane completes the upgrades for Maple and Nutmeg, it would be possible for someone to take on maintenance of the Tutor plugin.

 Action items

@Dave Ormsbee (Axim) to report back on block level completion events in the tracking logs.

Update: It doesn’t look like it currently makes it to the tracking logs. It is possible to add this support, but it will likely require refactoring work. See Slack thread for details.
@Jill Vogel Would it be possible to get more context on ? There was a question of whether there’s an intention to support it? What are its known limitation, if any? Should we consider making it part of the default installation of the Open edX platform?

Update: Yes, OpenCraft have a few clients which use this plugin, and so we do support it, though we’re lagging behind somewhat. @Sofiane Bebert has submitted an update for Maple (thank you!) and @Gábor Boros is reviewing.

Should we consider making it part of the Open edX by default? Sure, but note:

* synchronous vs asynchronous: sync aggregation can hurt performance, and async requires cron jobs (would be better served by celery-beat). Synchronous is the default, so it should be configured to async if installed on large deployments.
* Provides a Course/Chapter Progress Bar view which could be integrated into the Learning MFE (current clients use custom theme on the old courseware view).
* APIs might be useful for data analytics.
* There’s also a few old open issues.



 Decisions