Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Initial Tests:

Note: Locust user wait times were not correctly set here. Ignore user number and wait time and assume requests were being sent as soon as one of the users are ready.

...

Expand
titleTest comparing course sizes

On the left is the initial portion of the BerkeleyX/CS.CS169.1x/3T2013 (783 threads, 3385 comments at the time) loadtest. On the right is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) loadtest.

On the left is the HarvardX/1368.2x/2T2015 (92 threads, 1697 comments) load test. On the right is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) load test.

Namereqs#failsAvgMinMaxMedianreq/s95%
GETGET_comment_list3240(0.00%)22113910361900.1460
GETGET_thread204850(0.00%)168124207917012.7220
GETGET_thread_list109190(0.00%)46113216774505.6840
PATCHPATCH_comment3250(0.00%)34817613543000.1720
PATCHPATCH_thread3563(0.84%)2721348952300.2500

On the left is the TsinghuaX/00690242_2x/3T2014 (29 threads, 155 comments) load test. On the right is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) load test.

Namereqs#failsAvgMinMaxMedianreq/s95%
GETGET_comment_list4740(0.00%)1881394331800.3250
GETGET_thread294920(0.00%)169129140816010.3220
GETGET_thread_list157070(0.00%)19812713221604.6340
PATCHPATCH_comment4740(0.00%)31417710423000.3510
PATCHPATCH_thread5907(1.17%)24614813152300.2380

Tests while Mixing Courses:

    In this series of tests, we will be testing against different sized courses with different ratios. There are spikes in response times in production and one possibility may be that the few very large courses may be the reason. In this initial test, we ran it for 3 hours, with a small course vs large course on a 20:1 ratio. The large course used is BerkeleyX/ColWri2.2x/1T2014 (~25,000 threads, 30,000 comments) , while the small course is SMES/ASLCx/1T2015 (1700 Threads, 3047 comments). We can see that there are spikes in the response times. There have been many 500 responses from the large course which needs to be addressed. There were many unexpected 500 errors during this test. Upon further investigation, it was found that there was a memory issue due to missing indexes.

Expand
titleInitial mixed course test

Course Sizereqs#failsAvgMinMaxMedianreq/s95%
LargeGET_comment_list1060(0.00%)864263468068002700
 GET_thread60530(0.00%)21315612712000.3280
 GET_thread_list2928240(7.58%)1012175546510000.21800
 PATCH_comment1060(0.00%)1017305473879002700
 PATCH_thread813(3.57%)681167408342002000
SmallGET_comment_list30540(0.00%)24714314842100.2550
 GET_thread1834991(0.00%)1870624718028.1250
 GET_thread_list9748821(0.02%)501141668542012.71100
 PATCH_comment30541(0.03%)36817343453300.1750
 PATCH_thread330543(1.28%)28714025842500.2570

The chart below is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) loadtest by itself for reference. These numbers are not comparable since the test in the chart below was at 10 users vs 20 users in the above test.

Namereqs#failsAvgMinMaxMedianreq/s95%
GETGET_comment_list3240(0.00%)22113910361900.1460
GETGET_thread204850(0.00%)168124207917012.7220
GETGET_thread_list109190(0.00%)46113216774505.6840
PATCHPATCH_comment3250(0.00%)34817613543000.1720
PATCHPATCH_thread3563(0.84%)2721348952300.2500

Addressing the 500s:

    Although only present on the larger courses, a significant amount of 500s were being returned. A solution has been proposed to include an index to account for this but for now in order to continue the load tests, the offending parameters "last_activity_at" and "asc" will be removed from out random selection. This also means that our thread retrieval for PATCHing (and DELETEing which has been disabled) will be less random, but sufficiently random. By removing these parameters, we hope to get rid of these 500s and see a performance improvement for large and small courses.

Test reruns:

    These tests have the non-indexed fields removed just to verify that those were causing the 500s. All tests were run with 10 locust users for 30 minutes. Response times were all better, no spikes, and response times were better the smaller the course.

...

Expand
titleTsinghuaX/00690242_2x/3T2014 (29 threads, 155 comments)

 

Namereqs#failsAvgMinMaxMedianreq/s95%
GETGET_comment_list5140(0.00%)1761335141700.2240
GETGET_thread297340(0.00%)164126216216019.2220
GETGET_thread_list157380(0.00%)195123142715010.1340
PATCHPATCH_comment5120(0.00%)31016723772900.4520
PATCHPATCH_thread5068(1.56%)23913632932200.3380

After adding indices:

Expand
titleRerun of mixed courses test after first index was added

This is before adding the "asc" index and after adding the "last_activity_at" index. The major difference is the lack of spikes, consistent RPM, better response times especially for the large course and no 500s.

This test was run for 3 minutes with a small course vs large course on a 20:1 ratio. The large course used is BerkeleyX/ColWri2.2x/1T2014 (~25,000 threads, 30,000 comments) , while the small course is SMES/ASLCx/1T2015 (1700 Threads, 3047 comments). 

Namereqs#failsAvgMinMaxMedianreq/s95%
LargeGET_comment_list460(0.00%)44320711553700840
 GET_thread24040(0.00%)22516812752100.7290
 GET_thread_list12960(0.00%)48918111044800.6830
 PATCH_comment450(0.00%)49022120604500780
 PATCH_thread530(0.00%)48917419124100.11100
SmallGET_comment_list8770(0.00%)21115410722000.3300
 GET_thread557102(0.00%)198141156219028.2260
 GET_thread_list296940(0.00%)449147391943013.2810
 PATCH_comment8754(0.46%)34018415843200.2560
 PATCH_thread10389(0.86%)25915010552400.5420

Page size vs. Response time: 
Anchor
psvrt
psvrt

    In a new test, it was found that SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) was slowing down over time. To identify what was happening, a new course was seeded which we will call DAPI (1000 Threads, 500 comments). In the analysis of the forums, we saw that the median body size was 250 characters. Our PATCH operations can be 4, 250, 1000, 5000, or 10000 characters. This test was run over 10 hours. The decrease in response time needed to be addressed.

...