Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

There were 7767 courses for which transcripts were migrations from content-store to S3.

...

Run #Videos submitted for Transcripts MigrationVideos with no transcriptsVideos completed Transcripts Migration Number of External Videos
10000
20000
32782276216
425062441
511,15572310,4323,072
650,5555,61145,02910,346
750,6515,39445,3349,865
897,303
8,68488,80718,547
9107,406
11,07396,33319,764
10101,71813,63688,07118,946
11103,43710,05393,63020,023
1298,299
11,58586,89517,756
13101,329
10,03791,51219,799
1499,473
9,90289,7839,100
15102,728
9,38291,1503,674
1664,3157,96456,27810,835


Below are the queries for the above mentioned artifacts, "run" can be adjusted to the desired job run:

...

Run #Number of successfully migrated transcriptsNumber of transcripts whose migration failedTranscript with no content to migrate
1000
2000
327600
424202
511,79181
0
654,688454
750,213365
898,717855
9109,2112158
1099,990375
11103,298573
1296,8789119
13101,4845115
1499,2060100
15101,2172,49597
1663,16719152


Below are the queries for the above mentioned artifacts, "run" can be adjusted to the desired job run:

...

Chart
dataDisplayafter
width1000
rangeAxisLowerBound0
dataOrientationhorizontal
titleTranscript Migrations
typexyLine
yLabelcourses/videos/transcripts migrated
colorsblue, yellow, green, red
xLabelmigrations job run
height600



12345678910111213141516
Courses002210050050010001000100010001000100010001000663
Videos00276244104324502945334888079633388071936308689591512897839115056278
Transcripts002762421179154688502139871710921199990103298968781014849920610121763167
Failures0000814382359502495191



Let's talk about the 0.283% failures(i.e. 2,806 transcripts), this is taken as the max number of the transcripts that might not have been migrated due the exceptions occurred. The following is what I have observed in manual verification of errors:

Migration failures falling into the below categories are ignore-able:

  1. Integrity error raised due to race conditions
  2. For a few legacy courses, transcripts are stored in a different format than their actual content-type (e.g. sjson content in SRT file)


Also, we can see, there are considerably significant amount of failures in 15th run as compared to other runs, and most of these are for non-english transcript languages that failed on decoding with utf-8-sig.

Image Added