Analytics Breakout Session

Insights & Analytics Pipeline sub-session

How does the Analytics Pipeline work?

How can we make Analytics Pipeline near-realtime?

  • Smaller hourly partitions for more frequent runs
  • Streaming data using Apache Sparc (instead of hadoop, hive, sqoop..).  edX Analytics team are in the process of doing this right now!
  • Use a lighter-weight solution if direct MySQL queries are sufficient

What reporting do we want?

  • Dashboard for blended learning use case (small classes, many copies of "same" course")
    • Data per learner
    • Divide by class, subject, geographic, organisation
  • Timeline to show learning rates/engagement/enrollment as course progresses
    • Tag significant events (course start/end, advertising events, assignment deadlines...)
  • Survey data integration, e.g. to measure learner satisfaction

How has reporting improved?

Lightweight Analytics sub-session

In this sub-section, we mostly discussed Figures, did Q/A on how it works, current development status, goals moving forward. Also briefly discussed Open edX moving forward toward decoupling the LMS front end from back end, using React for the front end.

This breakout session had a small group. 

Add yourself if you want to identify you were in the sub-session:

Figures, lightweight analytics, what is it?

  • Analytics for small sites, hosting on a single server, where Insights is out of the question. You can start with it and grow. Insights is great for course specific and can handle MOOC-size data. Figures is here to fill a gap.
  • Currently only working on devstack, will soon be production ready.
  • It includes a React based Javascript single page application and a reusable Django app (minimize modifications in the platform) plug-in, in line with open/close principle (open for extension but close for modification - cf Nimisha).
  • Target audience for figures are those looking for high level metrics on a site-wide, cross-course and per-course level, such as learning program managers and administrators. who need to make management decisions on how their courseware is doing.
  • Figures gets data from the Django models and provides REST API endpoints for direct model access and aggregation
  • Daily snapshots of the daily aggregate data, which then can get rendered via charts. Future enhancements can be customizing the time granularity (such as capturing data every N minutes).
  • Plan for doing a code walkthrough via hangout.

What other metrics would we want or not want?

  • List of courses and page course which provides metrics and charts.
  • We have existing metrics, what other metrics would we want or not want?
  • What demographics would we pull from registration or another external source?


  • Appsembler uses Ansible. EduNext too.
  • Kyoto U: Developing a management system with dashboard around 16 courses on edx. Notifications, course invite on new course and newsletter (msnses) for marketing. Concern is around data processing.

Real-time analytics while keeping a light-weight infrastructure?

  • Near-real time jobs using Celeri
  • Need to be real time?
    • For the course authors, it'd be great to see progress on live assignments.
    • Marketing would also love it, in reality near-real time probably sufficient.
    • It might be useful for customer retention.

Source of persistent data

  • Course enrollment
  • User profile
  • Nb learners’ per course
  • Courses per learner
  • Students module of how many active learners
  • Grade percentage
  • When did they do a section
  • Average learner progress aggregated
  • Generated certificate, how many course completion
  • etc.


  • John, Jhony, Qasim are interested in working on this.

Ideas for new features?

  • Which learners how they are responding, and aggregates
  • Timeline to see the engagement rates, deadline for assignments
  • What the learner goals are survey data in there
  • Make the prob response in the instructor dashboard, improvements on data and usability.