Content-store Deprecation Summary

Purpose of this document is to summarize as simply as possible the updates in video transcripts module on deprecating content-store:

TL;DR

For video transcripts, edx.org is using S3 as a primary storage and content-store has been deprecated. In addition to that, transcripts metadata, which was previously referenced by `sub` and `transcripts` fields on the video component, is now managed by edxval (val stands for video abstraction layer). `transcripts` and `sub` fields have been deprecated and backward compatibility for the old video components, which are already utilizing these fields, have been ensured.

Export

In the course OLX, a new tag <transcripts/> has been introduced under <video_asset/> that contains S3 transcripts and the transcript format is always going to be SRT. A sample OLX can be seen as follow:

Sample Video Transcripts in OLX
<video  sub="" display_name="Test Video" edx_video_id="9c563e7d-c86c-4b97-8154-815b421bc80f" youtube_id_1_0="FEWSxCV">
    <video_asset client_video_id="test_video.mp4" duration="319.94" image="">
        <encoded_video bitrate="174" file_size="6988066" profile="mobile_low" url="https://d2f1egay8yehza.cloudfront.net/ABC_MB2.mp4"/>
        <encoded_video bitrate="192" file_size="7681678" profile="audio_mp3" url="https://d2f1egay8yehza.cloudfront.net/ABC_MB1.mp3"/>
        <encoded_video bitrate="279" file_size="11201200" profile="desktop_mp4" url="https://d2f1egay8yehza.cloudfront.net/ABC_MB4.mp4"/>
        <encoded_video bitrate="0" file_size="0" profile="youtube" url="FEWSxCV"/>
        <transcripts>
             <transcript file_format="srt" language_code="en" provider="Custom"/>
             <transcript file_format="srt" language_code="de" provider="Custom"/>
        </transcripts>
    </video_asset>
    <transcript language="en" src="9c563e7d-c86c-4b97-8154-815b421bc80f-en.srt"/>
</video>

Transcript content files are present in course OLX `/static` directory and their filename format is "<edx_video_id>-<language_code>.srt". So, for the above OLX, a transcript with the filename "9c563e7d-c86c-4b97-8154-815b421bc80f-en.srt" should be discoverable in course /static directory. 

One might also see <transcript/> tag directly under <video/> as seen in the above example. This tag previously contained contents of transcripts field, but now this is kept there for backwards compatibility on importing new OLX format in older open edX instance(<= ginkgo) since this references the old as well as new styled transcripts.

Import

On importing a course, all the video transcripts are imported into S3 and their metadata is managed by edxval in VideoTranscript data model.

The above also holds true when we import an old OLX format into the edx.org or a newer open edX instance (>= hawthorne) Or even if a course OLX is a mix of old and new styled transcripts.

Note:
As of August 13, existing transcripts in the contentstore have also been migrated into the S3 for edx.org. Import/Export of these new-style transcripts only make use of <transcripts/> tag under <video_asset/> in course OLX.