Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

    In the initial testing phase of the discussion api, the wait time between each top-level task was 500ms. If a task has multiple calls, such as PATCH comment, the wait time between each of those calls were 1000ms. These waits are very short and represent an accelerated usage of the discussions api. The first test that was run was against BerkeleyXagainst BerkeleyX/ColWri2.2x/1T2014 (~25~38,000 threads, 30~40,000 comments) with 10 locust users. A high percentage of the responses were failures and the response times were in the 1000-3000ms range. Under the assumption that a course with more posts would have worse performance, it would be expected that this course which is the 5th biggest course in our system, would not perform. To get a better baseline, we tried a smaller course.

Expand
titleInitial Flowtest against very large course

This was tested against the course BerkeleyX/ColWri2.2x/1T2014 (~25~38,000 threads, 30~40,000 comments)   against courses-loadtest.edx.org.

 

Namereqs#failsAvgMinMaxMedian95%
GETGET_comment_list213190(47.15%)163518955749604500
GETGET_thread1009353035(2.92%)172291000314004400
GETGET_thread_list778851851(86.94%)2186163931320004900
PATCHPATCH_comment2077(3.27%)1622227953712004000
PATCHPATCH_thread19187(31.29%)12101476729600400
POSTPOST_comment_comment1572(1.26%)1973363883915005300
POSTPOST_comment_response444216(32.73%)24163161008016006800
POSTPOST_thread444926(0.58%)120019766639003200

 

The errors that came up were all 500 server errors. When looking at NewRelic, these errors were all timeout errors.

#occurencesError
3035GET_thread:HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
7PATCH_comment:HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
51851GET_thread_list:HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
83PATCH_thread:HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
216POST_comment_response:HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
2POST_comment_comment:HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
4PATCH_thread:HTTPError('404 Client Error: NOT FOUND',)
26POST_thread:HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
190GET_comment_list:HTTPError('500 Server Error: INTERNAL SERVER ERROR',)

...

    In this series of tests, we will be testing against different sized courses with different ratios. There are spikes in response times in production and one possibility may be that the few very large courses may be the reason. In this initial test, we ran it for 3 hours, with a small course vs large course on a 20:1 ratio. The large course used is BerkeleyX/ColWri2.2x/1T2014 (~25~38,000 threads, 30~40,000 comments) , while the small course is SMES/ASLCx/1T2015 (1700 Threads, 3047 comments). We can see that there are spikes in the response times. There have been many 500 responses from the large course which needs to be addressed. There were many unexpected 500 errors during this test. Upon further investigation, it was found that there was a memory issue due to missing indexes.

...

Expand
titleRerun of mixed courses test after first index was added

This is before adding the "asc" index and after adding the "last_activity_at" index. The major difference is the lack of spikes, consistent RPM, better response times especially for the large course and no 500s.

This test was run for 3 minutes with a small course vs large course on a 20:1 ratio. The large course used is BerkeleyX/ColWri2.2x/1T2014 (~25~38,000 threads, 30~40,000 comments) , while the small course is SMES/ASLCx/1T2015 (1700 Threads, 3047 comments). 

Namereqs#failsAvgMinMaxMedianreq/s95%
LargeGET_comment_list460(0.00%)44320711553700840
 GET_thread24040(0.00%)22516812752100.7290
 GET_thread_list12960(0.00%)48918111044800.6830
 PATCH_comment450(0.00%)49022120604500780
 PATCH_thread530(0.00%)48917419124100.11100
SmallGET_comment_list8770(0.00%)21115410722000.3300
 GET_thread557102(0.00%)198141156219028.2260
 GET_thread_list296940(0.00%)449147391943013.2810
 PATCH_comment8754(0.46%)34018415843200.2560
 PATCH_thread10389(0.86%)25915010552400.5420

...