Finalized design decisions have been taken at Video Transcript Design document while these approaches were the initial drafted proposals.


A Video Component having multiple video sources should not have different transcript content for each source (for the same language). For instance, it does not make sense for a video component to have multiple english transcripts with different content each.


Users will be required to add a video source before adding any transcript from Video Component's Advanced Settings, this is because S3 Transcripts are video sources specific. So, a video source need to be there in order to link any transcript to it.


Video Component Advanced Settings Input FieldsInternal/Technical Field Name

"Default Timed Transcript"

sub
"Transcript Languages"

transcripts

Background

We have sub and transcripts field on video component. The sub is used to store english transcripts metadata – i.e. en transcript's filename. While transcripts is a dict field in a video component and it contains non-english transcripts metadata – i.e. transcript's filename against the corresponding language, e.g. {'es': 'transcript_file_name.srt'}  and the transcript content for 'transcript_file_name.srt' is in Content-store.

Moving on S3 from Content-Store

Now that we are replacing transcripts storage from  Mongo Content-store to S3 and the transcript metadata (i.e. transcript language, the video id to which this transcript is linked, etc.) is now in edx-val's VideoTranscript data model unlike previously, where the transcript metadata was on Video Component in the above mentioned transcripts and sub fields.

Approaches:

Approach # 1

Request transcripts from VAL's VideoTranscript data model every time from the Component (i.e. on loading video component) and do not set it as sub and transcripts on Video Component itself.

Pros:

Cons:

Implementation:

We will retrieve transcripts from edx-val and include them in transcripts field's context (that is going to be rendered on Video Advanced Settings TAB) on loading a video component and whenever a video component is saved, we will look for VAL's included transcripts and discard them from transcripts field so that they don't get persisted on Video Component. 

Approach # 2

Set transcript metadata from VideoTranscript to Video Component (e.g. on adding a new source) and use it afterwards – (i.e. do not ask VideoTranscript data-model for transcript metadata everytime). When we delete a transcript from Video Component (i.e. from Advanced Settings: "Default Timed Transcript" OR "Transcript Languages" fields), it will not be deleted from VAL(S3) instead it will just be unlinked from Video Component.

Why soft-delete?

Assuming that deleting a transcript from video component also removes the corresponding transcript from VideoTranscript data model and S3, If a transcript is deleted from a video component's advanced settings then, the other video components who are using the same video source(s) will run stale because those are also using the removed transcript.

Pros:

Cons:

If we store S3 transcript metadata on video component as well as in VideoTranscript data model then:

Implementation:

Every added source will bring its transcript metadata from VideoTranscript data model to the Video Component's sub and transcripts.