Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Epic

 Phase 1.A 

Jira Legacy
serverJIRA (openedx.atlassian.net)
serverId13fd1930-5608-3aac-a5dd-21b934d3a4b4
keyEDUCATOR-610

...

  • video_id:  It could be edx_video_id or id from from external sources. e.g. external youtube id
  • language_code: It will be language code
  • provider: 3playmedia / cielo24 / custom uploaded
  • format: It will be transcript's format srt/sjson
  • transcript(CustomizableFileField): It will be transcript name from S3 (url will be generated at run time) 

WARNING: Changing transcript for a video in one course will change it for entire edX courses having the same video.

TranscriptPreference

This is where course specific preferences would be saved.

  • course_id
  • provider
  • cielo24_fidelity
  • cielo24_turnaround
  • three_play_turnaround
  • preferred_languages

TranscriptCredentials

This is where org-wide settings for 3rdParty Credentials will live.

...

In Phase 1.A, we decided to use video pipeline's Django admin to take 3rd party credentials as input which is not ideal since course team won't have access to the pipeline's admin. This phase solely focus on addressing that concern which is not addressed in phase 1.A.

...

  • In this phase we'll incorporate new UX on Video Uploads to take 3rd party credentials as input from course teams
  • These credentials will be organization specific and all courses under particular org will use these for trnascription process
  • Third party credentials will be stored in same data model TranscriptCredentials that we defined in 1.A
  • There should be a communication between platform and video-pipeline to store these credentials in pipeline
    • There is JWT Authentication for these server to server requests/communication.
  • We won't be storing credentials in platform/VAL but as cache, we'll store credentials state in VAL to avoid the same credentials at multiple place(s)
  • We won't show the saved credentials to reduce the reads but coure teams can update them anytime without seeing the existing ones
  • The credentials should be encrypted as per requirement of edX legal department.
  • We will be using django-fernet-fields (that is built on Fernet) as recommeded in Storing (3rd party) Secrets.
  • For fernet keys rotation, here is our Fernet Keys Rotation Policy.

...

OrganizationTranscriptCredentialsState – VAL

  • org – This will be short organization name. 
  • provider – Cielo or 3playMedia
  • exists – True or False

In order to release this phase, we are trying to setup video pipeline via ansible just like other IDAs. This will also add other things like Splunk, New Relic which will make things easier to debug and help us in syncing up the testing + production environtment. This document is not covering up those details.

...

 After this phase, transcripts will be preferably checked in S3 but if they are not found there, content-store will serve as a fallback until all the data is migrated to S3. After this phase, transcripts will only be utilized from the VAL configured storage(e.g. S3) and contentstore will not be used, not even as a fallback. We will need to migrate the contentstore transcripts into the S3 before we roll this phase out. We will be writing more about transcripts migration on Migration from contentstore to s3.


Course Import/Export (Transcript S3 URLs)

Info
titleTranscript Export/Import

Transcripts Import/Export that includes transcript S3 URLs. This section is deprecated as per decision to include transcript content with course OLX –

Jira Legacy
serverSystem JIRA
serverId13fd1930-5608-3aac-a5dd-21b934d3a4b4
keyEDUCATOR-2233
.

Please have a look at the comments section:

Regarding export/import, please see Transcript import/export scenarios.  At this time, we do not need to export the Transcript content along with the course's OLX.  The behavior should be the same as it is today for export/import of VAL-produced videos.  That is, only the VAL metadata needs to be exported with the course.  The assets (transcript content and video content) do not need to be exported along with the course.

The plan is to keep transcript metadata with the Course's OLX on exporting the course. On importing the course, we will not allow overwriting(or creation) any of the transcripts if the related video is already present in Video data model.


Export

On a course export, video transcripts metadata including the URLs to the transcript files, will be exported with the Course. The original transcript files will not be exported with Course OLX and they will continue to live in the configured storage (for example, S3, local or other).

Import
On a course import,

  1. For existing videos, transcripts will not be updated in VAL, this is to avoid overwriting the transcripts for an existing Video (this is happening for video encodings) – suggested here.
  2. For the other videos, transcripts will created through the exported metadata, and transcript content will be fetched (via exported transcript URL) and stored in instance's configured storage.

Course Import/Export (Transcripts in OLX)

...

Exporting a course from edx.org and import it into an older open edX instance(<= ginkgo) does do not imports import the New Transcripts. This is because "open-release/ginkgo.master" does not have the feature (i.e. "Transcripts Phase 2 Deprecate Contentstore") available. So, we need to be backward compatible unless Ginkgo is deprecated and the new open-release having the feature is official.

...