Video Pipeline 1.0

UX Design

 https://edx-wiki.atlassian.net/wiki/display/UX/Video+Uploads+in+Studio

Assumptions

  1. Video Production process happens before course content creation.  The set of actors (video producers and course authors) may or may not be different individuals.
  2. Videos will need to be used across course runs and possibly across courses.
  3. An out-of-band manual handshake needs to occur between the VEDA team and the Course Team to properly setup YouTube channels, 3play (for transcript translation), and possibly accounting.
  4. Video Producers will upload multiple video files at once.

Video Module

One of the primary building blocks used to deliver course content on the edx platform is the Video Module (an xModule).  Course Authors add a video module to a unit in a course and configure it with various metadata such as video URLs, transcript files, and video download settings.  Currently, course authors manually enter URLs of all encodings of their video into the Video Module's Advanced editor.  The various encodings of a video could be a YouTube link, mp4 and webm versions, and various speeds (.75x, 1x, 1.25x, 1.5x).  While they can manually upload transcripts for the videos, the platform also automatically checks for the existence of transcripts at the given YouTube location. 

See the docs for more information.

VEDA (Video Encoding Processor)

VEDA is a Video Encoding Processing framework that, given a video file, spawns workers to create various video encodings and transcript translation jobs.  Theoretically, VEDA is course-agnostic since videos can be shared across multiple courses and multiple runs of a course.  It is currently not organization-agnostic since "institution"-specific out-of-band accounting and configuration steps must take place before VEDA can be used by an organization.

ARCH NOTE: The VEDA process does not (and need not) run on the same server cluster as the other edX servers, since all communications happen through RESTful APIs.

BEFORE Mobile

Before Mobile came into the picture, VEDA was an optional (paid?) service used by several partners.

VEDA Jobs

VEDA was responsible for the following encodings and jobs:

      • A. Uploading the high resolution file to YouTube (1080p)
      • B. Creating and uploading mp4 and webm versions to S3 (720p)
      • C. Sending a request for transcript translation to 3play (720p)

Configuring VEDA

Before using VEDA, edx partners must first personally communicate and exchange (at least) the following information (in)directly with the VEDA team:

      1. Creating a VEDA-unique Institute Name.
      2. Setting up and properly configuring (for sftp) their YouTube channel.
      3. Setting up and enabling a 3play account.
      4. Creating an FTP account with correct credentials for uploading their videos.

Using VEDA

The process flow for using VEDA has been:

      1. Video producers upload their video file to the institute's folder on the FTP server.
      2. VEDA would create its various encodings and maintain the various video URLs.
      3. The VEDA team would email each institute a spreadsheet with the status and URLs of their various videos and encodings.
      4. The institute could also access a (publicly accessible) webpage with the status of their videos.
      5. The Course team would need to manually enter all the VEDA-supplied URLs into the appropriate video modules.

WITH Mobile via Studio

Now with Mobile in play, it is essential to create video encodings for the various supported screensizes of our edx clients.  As we continue to add support for additional devices in the future (e.g., tablets), our list of required encodings may continue to increase.  As a result, having course teams manually enter and manage all the video URLs of the various encodings is no longer acceptable.  Hence these changes in the video pipeline.

VEDA Jobs

VEDA has been updated to create and upload the following additional encodings to S3 (in addition to A, B, C listed above):

      • D. Mobile High to S3 (640p)
      • E. Mobile Low to S3 (320p)

Configuring VEDA

Before using VEDA, edx partners still need to personally communicate and setup their accounts with the VEDA team (including YouTube and 3play).  So all of i. to iii under "Configuring Veda" above still need to be done.

However, step iv. will no longer be needed since we are choosing to replace the FTP part of the process with a new Studio upload UI.

Using VEDA

The proposed new process flow for using VEDA would entail:

      1. Course PM creates a (shell) course in Studio.
        1. Enables a new "Video Upload Pipeline" Advanced Setting in the course and enters the Institute Name and corresponding Access Token (mutually agreed upon in Configuration step i above).
        2. Adds the Video Producers to the Course Staff role in the new course in Studio.  (LATER: we can create a new "Video Producer" role with limited access to the course.)
        3. Communicates the Studio URL of the newly created Course to the Course' Video Producers.

      2. Video Producers login and upload their file(s) through Studio into the specific Course.
        PROD NOTE: Having video producers know which video corresponds to which course is a new requirement.  Up until now, they only needed to know the institute for which a video belongs, not the course.  In the future, we should consider using the Content Library feature, which is external to a course.

        1. Studio automatically uploads the Raw Video file(s) to an institute-specific folder in an S3 bucket accessible by Studio and VEDA.
          1. (API:0@B_STUD)The Browser calls the Studio server with the list of filenames that the user wants to upload.

          2. The Studio server
            1. Creates the following metadata information for each of the files:
              1. edx video ID. An ID that uniquely identifies this logical video.
              2. User Supplied File Name.  The user-provided name of the file.
                PROD NOTE: We store the user-provided filename in the metadata instead of using it as the actual name of the uploaded file so (1) we can treat duplicate filenames as completely different videoes and (2) we don't have to worry about unintended name collisions.
              3. Institute Name and Token.  The previously agreed-upon values as stored in the Studio Advanced Setting.
              4. Course Name.  The 'course' field of the (org, course, run) tuple information for the course that this file was uploaded in.
                ARCH NOTE: Although the Course Name is not really needed by VEDA (it being course-agnostic), it currently uses it for generating its own internal IDs.
            2. The files would be placed in a folder named after the Institute Name and the file's name being the edx video ID.
               
            3. (DBW:1@STUD) Stores the edx-video-ids in the Asset Meta Data Store (newly minted by the Platform team) with category "video".
            4. (API:1@STUD_S3) Authenticates with S3 and requests upload URLs for each of the Video file(s).

          3. (API:2@STUD_VAL) Calls VAL with the Institute Name and list of edx-video-IDs generated for each uploaded file.
            1. (DBW:2@VAL) VAL creates new entries for the edx-video-IDs, with VAL status "Uploading".
            2. (API:3@VAL_VEDA) Optionally, VAL notifies VEDA to check the S3 bucket for newly incoming files.

          4. (API:4@B_S3)The Browser uploads the files directly to S3 with the server-supplied one-time-use URLs and corresponding metadata.  Here are examples of how this is done in ORA2:
            1. https://github.com/edx/edx-ora2/blob/master/openassessment/xblock/static/js/src/lms/oa_response.js#L483-483\
            2. https://github.com/edx/edx-ora2/blob/master/openassessment/xblock/submission_mixin.py

        2. VEDA does the following for each new uploaded file found in the S3 bucket:
          1. NEW since Beta:(API:5@VEDA_VAL) Notifies VAL with the validity status of the uploaded file, including the file's edx-video-id (embedded in the meta data of the uploaded file).
            1. (DBW:3@VAL) VAL updates the entry for the video with the status.  VAL status: "File Valid" or "File Invalid"
          2. As before, spawns new processes for creating the various encodings and kicks off transcript translation jobs.
          3. Already in Beta:
            1. (API:6-10@VEDA_VAL) Notifies and updates VAL whenever one of the spawned processes completes.
              1. (DBW:4-8@VAL) VAL creates a new mapping between the video's corresponding edx-video-ID and the S3 URL for each new encoding.  VAL status: "In Progress"
            2. (API:11@VEDA_VAL) Notifies and updates VAL when all processes have completed.
              1. (DBW:9@VAL)  VAL updates the status accordingly.  VAL status: "Complete"

      3. Video Producers can check the status of their previously uploaded files within Studio.
        1. (DBR:1@VAL) Studio retrieves the list of edx-video-ids associated with the course from the Asset Meta Data Store.
        2. (API:12@STUD_VAL), (DBR:2@VAL) It queries VAL for the status of each edx-video-id in the course.
        3. It displays a table with metadata for each uploaded video, including its status, edx-video-ID, user-provided upload file name, date uploaded, and video time duration.

      4. Course Teams can also use the above table to see the list of uploaded videos.  They copy and paste the edX-video-ID from that table into the corresponding Video Module.


edX Video ID

An edX-Video-ID uniquely identifies a logical video and provides an abstraction to the locations of the individual encodings of the video. 

For example, an edx-Video-ID of "edx-vid-v1-j3ui8jsdf0" could refer to the following files for its various encodings:

VEDA's Video ID

Currently, VEDA generates its own ID for keeping track of videos.  For now, VEDA requires a hard-coded structure of its ID.


VAL (Video Abstraction Layer)

VAL is a course-agnostic abstraction layer that is used as a storage and querying service that manages the mapping between edx-Video-ID and Video encoding files.  It now lives in the edx-platform and has RESTful APIs that are called by VEDA and the platform. 

Although theoretically, a single VAL instance can be used by multiple edx deployments (e.g., Stage, Prod, Edge, Sandboxes), it is currently tied to a deployment.  This means, for example, the VAL instance on Stage would not have edx-Video-ID data for videos that were uploaded on Prod.  PROD NOTE: Because of this deployment configuration, we must change course XML import/export to include VAL data along with the course content.  This will be a backward incompatible change since old servers will not know that the VAL data is included and needs to be imported.

Mobile API Endpoint

The Mobile API endpoint needs to return all the videos that are accessible to a given user in a given course.  In order to make this scalable, the API will call the get_items method on the modulestore, which may need to be optimized to return the precise video information needed by the API.  This approach is different than what is currently implemented in Beta.  Currently, in Beta, VAL maintains a table of course-id to edx-video-id mappings in addition to maintaining a mapping between edx-video-id and encodings.  The proposal here is to have the API instead query the modulestore to retrieve all video descriptors with an edx-video-id configured within a course.  The benefits of this change are:

  • Allows VAL to once again be course-agnostic as it was before.
  • Aligns better with the new shift in our modulestore design (introduced by Split) where content is stored independently of course structure.
  • Supports rerunning of a course.

Performance Considerations

The following areas will need to be carefully optimized for scalability and performance:

  • Uploading multiple video files to S3, including API:1@STUD_S3 and API:4@B_S3.
  • Database queries by VAL to retrieve data for a given set of edx-video-ids: DBR:2@VALAPI:12@STUD_VAL should support pagination.
  • Querying the modulestore for a list of video modules with edx-video-ids in a given course.

Advanced Setting

 

There is a new advanced setting that needs to be configured in order to enable the Video Uploads page in Studio.

{

"course_video_upload_token": "xxx"

}