External Video Transcripts are the transcripts for a video component that does not have edx_video_id
set. This document specifically proposes transcripts flow for such video components. A little context about VideoTranscript data model, we have a video_id
which is unique together with language_code
.
Assuming a video component that does not have an edx_video_id:
On Fresh Video Component
A transcript is uploaded for a language from video component. There will be a call to VAL and a new UUID
will be generated which will serve as video_id
in creating the corresponding record in VideoTranscript data model (the content will be uploaded to S3/whatever storage is configured). This UUID
will be returned back to Video Component. Video Component will have a new Video xField(non-editable) to contain the returned UUID
.
Video Component with existing Transcripts
For all the subsequent transcript uploads, Video Component will request VAL for transcripts via UUID
, and VAL will be able to give all the transcripts attached to that UUID
. On deleting transcript from the video component, VAL will require the UUID
from the Video Component.
Import / Export across the same platform
On the course export, all the transcripts (metadata + content) will be exported with course. On the course import, All the transcript metadata will be back into VideoTranscript data model and we can regenerate new UUIDs (which will serve as video_id
s) for the imported transcripts – This will benefit us when we import/export across the different platforms but there will be transcripts content duplication if we are exporting/importing in a same platform.
Import / Export across the different platforms (e.g. edX to Edge etc)
This is the case when we export a course (for example, from edX platform to edX Edge). On the course export, all the transcripts (metadata + content) will be exported with course. On the course import, there will be no chance that transcripts could conflict with existing records in VideoTranscript data model as we are regenerating new UUIDs for the imported transcripts.
Is this such an frequent UseCase that we have to allow data-duplication? (not sure)