S3 Transcripts Metadata
Note
Multiple Video Sources
Advanced Settings
Users will be required to add a video source before adding any transcript from Video Component's Advanced Settings, this is because S3 Transcripts are video sources specific. So, a video source need to be there in order to link any transcript to it.
Video Component Advanced Settings Input Fields | Internal/Technical Field Name |
---|---|
"Default Timed Transcript" | sub |
"Transcript Languages" |
|
Background
We have sub
and transcripts
field on video component. The sub
is used to store english transcripts metadata – i.e. en
transcript's filename. While transcripts
is a dict field in a video component and it contains non-english transcripts metadata – i.e. transcript's filename against the corresponding language, e.g. {'es': 'transcript_file_name.srt'
} and the transcript content for 'transcript_file_name.srt
' is in Content-store.
Moving on S3 from Content-Store
Now that we are replacing transcripts storage from Mongo Content-store to S3 and the transcript metadata (i.e. transcript language, the video id to which this transcript is linked, etc.) is now in edx-val's VideoTranscript
data model unlike previously, where the transcript metadata was on Video Component in the above mentioned transcripts
and sub
fields.
Approaches:
Approach # 1
Request transcripts from VAL's VideoTranscript
data model every time from the Component (i.e. on loading video component) and do not set it as sub
and transcripts
on Video Component itself.
Pros:
- Since, we are also planning to upload transcripts for VAL videos directly from Video Uploads Page, when we'll upload a transcript from Video Uploads Page for a VAL Video, it will start showing up on all those Video Components who are/were using that
edx_video_id
. - For auto-generated transcripts, the generated transcript will automatically show up on all those Video Components who are/were using that
edx_video_id
. - We will use
VideoTranscript
data model as single source of truth, we will not be managing the transcript metadata on multiple places (i.e. VALVideoTranscript
and Video Component)
Cons:
- Scenario 1 from "Video Component Scenarios with Contentstore (transcript-related)" will not work
- Scenario 6 from "Video Component Scenarios with Contentstore (transcript-related)" will not work
- Scenario 7 from "Video Component Scenarios with Contentstore (transcript-related)" will not work
- We won't be able to delete any transcript from Video Component because all deletion-related UI is strictly bound to Video Component metadata (i.e.
sub
andtranscripts
). For this reason, we should be going for approach # 2
Implementation:
We will retrieve transcripts from edx-val and include them in transcripts
field's context (that is going to be rendered on Video Advanced Settings TAB) on loading a video component and whenever a video component is saved, we will look for VAL's included transcripts and discard them from transcripts
field so that they don't get persisted on Video Component.
Approach # 2
Set transcript metadata from VideoTranscript
to Video Component (e.g. on adding a new source) and use it afterwards – (i.e. do not ask VideoTranscript
data-model for transcript metadata everytime). When we delete a transcript from Video Component (i.e. from Advanced Settings: "Default Timed Transcript" OR "Transcript Languages" fields), it will not be deleted from VAL(S3) instead it will just be unlinked from Video Component.
Why soft-delete?
Assuming that deleting a transcript from video component also removes the corresponding transcript from VideoTranscript
data model and S3, If a transcript is deleted from a video component's advanced settings then, the other video components who are using the same video source(s) will run stale because those are also using the removed transcript.
Pros:
- Once a source is added to a video component, all of its transcripts are now Video Component specific in terms of deletion
- Deleting a transcript from video component does not remove it from
VideoTranscript
data model Or from S3, it will be unlinked from this video component (i.e. soft-deleted for this video component) - Covers all the documented scenarios: Video Component Scenarios with Contentstore (transcript-related)
Cons:
If we store S3 transcript metadata on video component as well as in VideoTranscript
data model then:
- Video component may have different transcripts metadata than the
VideoTranscript
data model for a same video source. - For our plan to upload transcripts for VAL videos directly from Video Uploads Page, when we'll upload a transcript from Video Uploads Page for a VAL Video, it will not autometically show up on all those Video Components which were already using that
edx_video_id
(from the past). - For auto-generated transcripts, generated transcript will not autometically show up on all those Video Components who were using that
edx_video_id
(from the past).
Implementation:
Every added source will bring its transcript metadata from VideoTranscript
data model to the Video Component's sub
and transcripts
.