Skip to end of metadata
Go to start of metadata
Insights & Analytics Pipeline sub-session
How does the Analytics Pipeline work?
How can we make Analytics Pipeline near-realtime?
- Smaller hourly partitions for more frequent runs
- Streaming data using Apache Sparc (instead of hadoop, hive, sqoop..). edX Analytics team are in the process of doing this right now!
- Use a lighter-weight solution if direct MySQL queries are sufficient
What reporting do we want?
- Dashboard for blended learning use case (small classes, many copies of "same" course")
- Data per learner
- Divide by class, subject, geographic, organisation
- Timeline to show learning rates/engagement/enrollment as course progresses
- Tag significant events (course start/end, advertising events, assignment deadlines...)
- Survey data integration, e.g. to measure learner satisfaction
How has reporting improved?
- Problem Response reports in Instructor dashboard have been made more readable, easier to use. See:
Lightweight Analytics sub-session
- Analytics for small sites, hosting on a single server, where Insights is out of the question. You can start with it and grow. Insights is great for course specific and can handle MOOC-size data. Figures is here to fill a gap.
- Currently only working on devstack, will soon be production ready.
- It includes a Javascript single page application and a reusable Django app (minimize modifications in the platform). Plug-in, in line with open/close principle (open for extension but close for modification - cf Nimisha).
- Analytics tool useful for data scientist, or you need to make management decisions on how your courseware is doing.
- Figures gets data from the Django models. In the future, building more end-points.
- Daily snapshots of the daily aggregate data, which then can get rendered via charts.
- Plan for doing a code walkthrough via hangout.
What other metrics would we want or not want?
- List of courses and page course which provides metrics and charts.
- We have existing metrics, what other metrics would we want or not want?
- What demographics would we pull from registration or another external source?
Deployment?
- Appsembler uses Ansible. EduNext too.
- Kyoto U: Developing a management system with dashboard around 16 courses on edx. Notifications, course invite on new course and newsletter (msnses) for marketing. Concern is around data processing.
Real-time analytics while keeping a light-weight infrastructure?
- Near-real time jobs using Celeri
- Need to be real time?
- For the course authors, it'd be great to see progress on live assignments.
- Marketing would also love it, in reality near-real time probably sufficient.
- It might be useful for customer retention.
Source of persistent data
- Course enrollment
- User profile
- Nb learners’ per course
- Courses per learner
- Students module of how many active learners
- Grade percentage
- When did they do a section
- Average learner progress aggregated
- Generated certificate, how many course completion
- etc.
Multi-tenancy
- John, Jhony, Qasim are interested in working on this.
Ideas for new features?
- Which learners how they are responding, and aggregates
- Timeline to see the engagement rates, deadline for assignments
- What the learner goals are survey data in there
- Make the prob response in the instructor dashboard, improvements on data and usability.