Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Once the transcripts for a video have been successfully migrated, all of the changes made to S3 are permanent. As the transcripts have been made available to the end users, it must be ensured that a subsequent run of the script does not overwrite the S3 data. Each course migration event will be logged via the PersistOnFailureTask mixin. The script will be idempotent so it can be re-run even if it ran before and failed midway(after) a failed previous run.

Implementation

A Django management script will be written that will traverse the contentstore and for each course will find all the video objects. Any transcripts related to the found video objects will be pushed to S3 as an atomic task. In the transaction, a video's transcripts metadata will be migrated from video component to edxval and the corresponding transcript content will be migrated from contentstore to S3.

Pseudo-Code

Happy Path:

Search for all courses in modulestore()
Put the course ids in the migration status table with status as ‘Not-Migrated’, donot overwrite course ids if already present

For each 'Not-Migrated' or 'Failed' course id in the migration status table , create a celery task with PersistOnFailureTask mixin which will:
    Update the migration status for that course from ‘Not-Migrated’ to ‘In-Progress’

    Search for videos

        For each video create an atomic transaction

...

        If all the videos of a course have been processed, update course migration status in the migration status table from ‘In-Progress to Migrated’

        Update the Feature Flag to switch the user to S3 transcripts

...

    Retry after 2 sec with maximum retries of 3    After 3 retries, update the migration status for that course from ‘In-Progress’ to ‘Failed’ in the migration status table

      

Rollout

Mock Runs

There should be atleast two Mock runs of the script on a refreshed staging instance. Issue fix cycle will follow each Mock run.

...

Migration validation will be at the video level. Before updating the feature flag to enable Phase II for a course, the transcript count for the video will be compared in both contentstore modulestore and S3edx-val

End User Communication

Migration time window will be chosen as a time of least activity

...

  1. Where should the script search for transcripts … in draft, published or both (Resolved: both will be migrated)
    Should we delete any partially migrated data for a “failed migration” course from S3. Alternately, we could overwrite the data in a subsequent run of the script (Resolved. Transcripts will not be overwritten. See section 4b in comments by Nimisha)