External Video Transcripts


External Video Transcripts are the transcripts for a video component that does not have edx_video_id set. This document specifically proposes transcripts flow for such video components. A little context about VideoTranscript data model, we have a video_id  which is unique together with language_code

Assuming a video component that does not have an edx_video_id:

On Fresh Video Component
A transcript is uploaded for a language from video component. There will be a call to VAL and a new UUID will be generated which will serve as video_id in creating the corresponding record in VideoTranscript data model (the content will be uploaded to S3/whatever storage is configured). This UUID will be returned back to Video Component. Video Component will have a new Video xField(non-editable) to contain the returned UUID

Video Component with existing Transcripts
For all the subsequent transcript uploads, video component will request VAL to upload further transcripts via its UUID. On retrieval, VAL will be able to give all the transcripts attached to a UUID received from the video component. On removing a transcript, VAL will also be able to delete transcripts related to the received UUID from a video component.

Import / Export across the same platform
On the course export, all the transcripts (metadata + content) will be exported with course. On the course import, All the transcript metadata will be back into VideoTranscript data model and we can regenerate new UUIDs (which will serve as video_ids) for the imported transcripts – This will benefit us when we import/export across the different platforms but there will be transcripts content duplication if we are exporting/importing in a same platform.

Import / Export across the different platforms (e.g. edX to Edge etc)
This is the case when we export a course (for example, from edX platform to edX Edge). On the course export, all the transcripts (metadata + content) will be exported with course. On the course import, there will be no chance that transcripts could conflict with existing records in VideoTranscript data model as we are regenerating new UUIDs for the imported transcripts.

Is this such an frequent UseCase that we have to allow data-duplication? (not sure)