edx-video-pipeline (VEDA)


Feature Overview

CategoryVideo Service
Documentation
User ImpactAs a course instructor I am able to encode my course videos for delivery across all edX viewing platforms.

The edX Video pipeline is a video encoding and delivery pipeline IDA that is intended to enable a rapid and scalable video pipeline for course video encoding across all edX viewing platforms. edx-video-pipeline handles all video uploaded through the edX studio course video upload page for edx.org courses, all content generated by the in-house post-production department, and all course marketing videos.

edx-video-pipeline delivers to youtube, AWS, and several transcription services. There is also a 'review/revision' limited workflow for in-house productions that can be triggered to complete via the video production planning software. Each individual workflow is completely modular, and each course team can deliver to specific endpoints (or accounts within endpoints) independently. In addition, edx-video-pipeline handles marketing video automation (as much as is possible given the current 'about page' workflow) via a native upload tool.

edx-video-pipeline and edx-video-worker encodes video via a forked build of ffmpeg, using a cluster of celery workers to do the actual encoding jobs, with a central node acting as the interface to the various intake and delivery providers, and is, effectively, a django app sending tasks to be consumed by a cluster of celery workers.

Pipeline:



Milestone: HLS

HLS (HTTP Live Streaming) is a method of delivering video to users wherein the video player uses connection speed to determine the best-quality stream that is deliverable with no, or minimal, buffering. The video quality is constantly adjusted in set time intervals (usually 10 seconds) as the video is played back, constantly improving or degrading video playback quality as the connection speed is either improved or degraded.

Simply put, it matches video quality to connection speed, and adjusts as the connection changes.


HTTP Live Streaming milestones are broken down into 'phases' to match concurrent work being done in edx-platform on both the video player and within Studio. 

Current State:

Upon creating a new course instance in Studio, individual course teams must request (via email or support ticket) a support team member to enable the video upload page. Each course has a custom workflow, an individual associated youtube channel, and an optional transcription service provider. All of these options (and associated authentication credentials) are handled by edx-video-pipeline. In addition, each partner institution has a youtube-partner account that requires manual intervention from Google to implement. Prior to October 2016, edX enjoyed a concierge level of support from Google, and was able (with the assistance of a dedicated Media Support Specialist) to enable these institutions and courses in relatively short order.

As of October 2016, Google has terminated this relationship with edX, and the dedicated Media Support Specialist position has been vacated.

New partners have been added with additional support provided in an ad-hoc fashion for new partners, though that has proven to be exceedingly difficult (see: OxfordU Jan 2017). Existing partners can continue using the existing workflow for new courses, though manual intervention from edX support is still needed for new courses, due to the requirement of a dedicated associated youtube channel. This pain has been alleviated somewhat due to a machine-learning enabled tool to replace a large portion of the work previously done by the Media Support Specialist, however it is still somewhat manual for support teams, and requires a specific request from course teams to implement.

The Youtube enablement work cannot be further automated without violating the youtube TOS.

The actual workflow to enable the video upload tool is as follows:

  1. Enable Studio course instance (edX staff)
  2. Activate dedicated youtube channel (Course team)
  3. If new partner, request Youtube partner CMS from Google (edX staff, Google Staff)
  4. Associate course Youtube channel with Youtube partner CMS (edX staff)
  5. Generate edx-video-pipeline token (edX Staff)
    1. Add associated workflow information (Review process, Transcription)
  6. Input edx-video-pipeline token into Studio Advanced Settings, enabling video upload tool (edX Staff)

The current goal is to eliminate steps 2-6 through a series of compartmentalized upgrades to edx-video-pipeline and edx-platform in parallel with work being done to enable HLS. The plan is currently comprehensive for Phase I, with more ambiguity around phases II and III allowing for improvisation and an updated course of action as more data is gathered.



Phase I:

  • Enable HLS

  • Automate Video Upload Enablement in Studio

  • Begin phase-out for edx-video-pipeline Transcription workflow
CategoryFeature
DocumentationTBD
User Impact

As a learner I am able to watch content, whether or not I am in a youtube-embargoed country optimized for my level of bandwidth access.

As a course team member I am able to deliver content across viewing platforms in an automated way with minimal intervention and without having to contact a support team for assistance. If I have a course currently in production, I will not experience a change in my workflow, and if I am an existing partner, I can request a customized workflow.

Proposal: 'Default Mode'

Enable a 'default' workflow in Studio, which is an HLS/Mobile only workflow with no customizations. Currently running courses would remain unchanged, and this workflow could be overridden by a custom edx-video-pipeline token. Every new course instance would have this default token, and new partners would not have the option of overriding this default workflow. edx-video-pipeline would then not differentiate between courses/institutions for this default workflow, instead tracking videos as individual assets rather than attempting to assign custom workflows based on course instances. 

This new 'default' workflow would be an HLS and mobile only workflow, not delivering to Youtube or transcription services. This has a very short time-to-value, and, in addition, requires very little (if any) advance socialization with our partners. We could slip this feature in fairly quietly with no disruption to current 'in-production' courses, and monitor support ticket volume.

In addition, HarvardX is our only partner currently availing itself of our transcription workflow, and we should begin discovery around whether or not we wish to automate this workflow further. My recommendation, if we wish to automate, is a studio-based solution from a subset of known vendors and a simplified file-handling workflow. Most vendors will push eagerly to an endpoint (such as AWS S3) when transcripts are ready, and there are opportunities for real improvement in this area.

Work needed: Studio

  • Changes to the Studio advanced settings template
    • Add a 'default' shared token, automatically enabling the video upload page for all new courses.
  • Thumbnail handling for individual videos
    • Optional video thumbnail upload
    • Selection of one-of-three automatically generated (by edx-video-pipeline) thumbnails served from the video streaming S3 bucket.
  • edx-platform video player HLS playback
  • Transcription workflow discovery.
    • Infrastructure plan
    • Vendor acquisition, planning
    • UX changes

Work Needed: edx-video-pipeline

  • Activate HLS encoding
    • Test and upgrade validation for failed or suboptimal encodes
    • Determine optimal object invalidation for cloudfront cached transport stream manifests, in the case of failed encode streams.
      • Option A: Cache rename (check header)
      • Option B: Max-Life
  • Encode and revalidate legacy video objects
    • Encode for HLS
    • (optional) Re-encode legacy streams, optimize object storage
    • Delete unused objects from AWS
  • Upgrade operational support
    • Terraform plan upgrade
    • NewRelic alerting upgrade
  • Enable shared-token default course workflow
    • Track new course objects via VAL ID
  • Optimize database tables
  • Video thumbnail workflow/discovery

Work Needed (as prep for Phase II):

  • Improved tracking for encode product/resultant parameters 
  • Improved stats for video overhead/bandwidth switching.
  • Socialization of youtube deprecation, socialization of edx-video-pipeline transcription workflow deprecation.

Unanswered Questions:

  • What will be the reduction in support ticket volume from this one step?
  • Do our partners get value from having their videos on youtube?
  • What is the optimal number of optimal transport streams (bandwidth switching options)



Phase II:

  • Deprecate Youtube

  • Deprecated transcription services 

  • MAYBE: Staged rollout of Automated Transcription Services.

CategoryFeature
DocumentationTBD
User Impact

As a learner I am able to watch content, whether or not I am in a youtube-embargoed country optimized for my level of bandwidth access.

As a course team member I am able to deliver content across viewing platforms in an automated way with NO manual intervention and without having to contact a support team for assistance.

With the work in place from phase I, very little new building needs to occur in edx-video-pipeline. As we stop uploading and validating videos to/from Youtube, that encode stream and module is simply deprecated, and eventually, deleted. There are opportunities to do some further database cleaning, and some deleting of code from the edx-video-pipeline repo. Some discovery around the marketing video roadmap should be done as well.

One note of special importance. As the manually entered 'unique-key' is the method by which individual videos are routed to their respective transcription providers, some consideration should be paid to what the intention for the future around transcription services are. If we decide to automate transcription services completely, much of that infrastructure will need to be in place by the end of work for this phase, allowing for a reasonable and staged rollout, as well as minimal service disruptions for existing partners. 

Work Needed: Studio

  • Deactivate course token metadata
  • Deactivate Studio Advanced Setting for video upload token
  • IF WE DECIDE TO AUTOMATE TRANSCRIPTION:
    • Simplified transcription upload workflow and file handling 
      • AWS S3 storage/serving of static transcript assets
      • Eager push of completed assets from vendors
    • Credential storage and handling.
    • UX changes

Work Needed: edx-video-pipeline

  • Deactivate token validation from edx-studio
  • Deactivate youtube encoding and youtube upload
  • Drop course table
  • (Optional) Enable ML Encode optimization
  • Deactivate Transcription workflow

Work Needed (as prep for Phase III):

  • Generate and test ML algorithm for determining optimal encoding schema.

Unanswered Questions:

  • What is the operational need for transcription automation?
  • Should we handle non-english transcriptions? How many languages and service providers should we interface with?



Phase III:

Proposal 1: Decentralizing edx-video-pipeline

If we were to completely eliminate custom workflows, with no service to youtube or transcription providers, edx-video-pipeline could become a decentralized cluster of workers, consuming a queue with tasks generated by studio uploads. A separate django instance dedicated to edx-video-pipeline would be redundant, and we could eliminate all of the central edx-video-pipeline control node, which would greatly simplify operational complexity and code complexity. This would, in addition, create opportunities to easily bundle edx-video-pipeline with devstack and open instances, and allow for an easily scaled edx-video-pipeline 'swarm' to consume the video tasks generated by studio.

Potential Drawbacks:

We would not be able to create customized workflows for course instances, every course would get the same video pipeline treatment, and we would not be able to provide any level of custom content handling for partner institutions. In addition, the current marketing video workflow would be unable to be supported without some work done to create a dedicated marketing video node.