Announcing the first beta releases of panorama-elt and tutor-contrib-panorama, the basic tools to integrate Open edX and other systems into a datalake. Contributions are welcome!
Notes
Python based ELT toolkit that attempts to be modular and support diverse data sources and data lakes
Currently focused on AWS and only supports Athena today
Tutor plugin allows running the ELT tools alongside tutor, but expects an AWS destination for the data
Full support is currently only available in the Kubernetes version, locally only the ELT part for RDBMS tables
Athena TLDR;
put files in an s3 bucktet
Athena allows SQL over CSV, JSON and other formats
Athena is based on Hive, so there is something available that is open source, but there are no plans to work on this
The plugin is usable for local installations and dev installations