Announcing the first beta releases of panorama-elt and tutor-contrib-panorama, the basic tools to integrate Open edX and other systems into a datalake. Contributions are welcome!
Notes
Python based ELT toolkit that attempts to be modular and support diverse data sources and data lakes
Currently focused on AWS and only supports Athena today
Tutor plugin allows running the ELT tools alongside tutor, but expects an AWS destination for the data
Full support is currently only available in the Kubernetes version, locally only the ELT part for RDBMS tables
Athena TLDR;
put files in an s3 bucktet
Athena allows SQL over CSV, JSON and other formats
Athena is based on Hive, so there is something available that is open source, but there are no plans to work on this
The plugin is usable for local installations and dev installations
They are very preliminary, Ed would need to work with 2U product to discuss further
Andrés González what are the specific challenges that 2U are facing with PII?
They are really planning to focus on aggregated data as a general rule to avoid any risks associated with data that could be associated with any particular learner.
Is there a business specific data need that requires individual data access regimes?
Andy Shultz (Deactivated) proposes an early fork in an analytics design the separates aggregated data and individual data.
This plugin wasn’t upgraded to Django 3.2, so unlikely that this is run there – edx.org
Walking the course tree is expensive, we do that elsewhere, so that in itself is not a disqualification
Need more data from OpenCraft
Have they considered making it part of the default
Julien Maupetit is there an event for completion at the block level currently, with the block name and the block id
Dave Ormsbee (Axim) if it is not the case that this event is already created, it should be easy to do so because the completion API is already persisting this data to a table.
Replaced an internal product they had that generated CSV files
Figures is a dashboard the provides different context views
Course-centric with drill down to view learners
Not great at analytics, really more of an exploratory dashboard
During development they struggled getting robust requirements from users or future users
John dug into what was available in the platform and accepted the definitions that he found there
How can one find block ids – seems like a perennial problem for multiple users.
Figures
A Django app
It’s an Open edX plugin that plugs into the LMS and use it’s resources
It is not opinionated about the architecture.
It does not currently process tracking logs
It does creates its own tables
The platform models don’t track rich history
Performance of analytics queries, for example, courseware_studentmodule doesn’t have the indexes for efficient querying
Figures does use celery jobs to marshall the data into it’s datamart
Sofiane Bebert is working on an update of Figures for Maple. He doesn’t think that an update for Nutmeg will be difficult.
John Baldwin is also working on a series of fixes that improve performance and fix known bugs. Not really adding new features, but adding some instrumentation. If one does not have persistent grades installed, queries can be very expensive
Figures is an active project, but current level of investment is not high because of other business priorities.
Currently runs in the Appsembler Tahoe platform
Tahoe has a feature flag that allows it to use a different celery queue than the LMS to prevent resource contention, this is really a server var and is set at start up.
Interested in a conversation with folks about the Instructor dashboard
Question of user friendliness versus robustness of the data available.
After Sofiane completes the upgrades for Maple and Nutmeg, it would be possible for someone to take on maintenance of the Tutor plugin.
✅ Action items
Dave Ormsbee (Axim) to report back on block level completion events in the tracking logs.
Jill Vogel Would it be possible to get more context on https://github.com/open-craft/openedx-completion-aggregator ? There was a question of whether there’s an intention to support it? What are its known limitation, if any? Should we consider making it part of the default installation of the Open edX platform?