Page Comparison

...

In the initial testing phase of the discussion api, the wait time between each top-level task was 500ms. If a task has multiple calls, such as PATCH comment, the wait time between each of those calls were 1000ms. These waits are very short and represent an accelerated usage of the discussions api. The first test that was run was against BerkeleyXagainst BerkeleyX/ColWri2.2x/1T2014 (~25~38,000 threads, 30~40,000 comments) with 10 locust users. A high percentage of the responses were failures and the response times were in the 1000-3000ms range. Under the assumption that a course with more posts would have worse performance, it would be expected that this course which is the 5th biggest course in our system, would not perform. To get a better baseline, we tried a smaller course.

Expand

title	Initial Flowtest against very large course

This was tested against the course BerkeleyX/ColWri2.2x/1T2014 (~25~38,000 threads, 30~40,000 comments) against courses-loadtest.edx.org.

Name	reqs	#	fails	Avg	Min	Max	Median	95%
GET	GET_comment_list	213	190(47.15%)	1635	189	5574	960	4500
GET	GET_thread	100935	3035(2.92%)	1722	9	10003	1400	4400
GET	GET_thread_list	7788	51851(86.94%)	2186	163	9313	2000	4900
PATCH	PATCH_comment	207	7(3.27%)	1622	227	9537	1200	4000
PATCH	PATCH_thread	191	87(31.29%)	1210	147	6729	600	400
POST	POST_comment_comment	157	2(1.26%)	1973	363	8839	1500	5300
POST	POST_comment_response	444	216(32.73%)	2416	316	10080	1600	6800
POST	POST_thread	4449	26(0.58%)	1200	197	6663	900	3200

The errors that came up were all 500 server errors. When looking at NewRelic, these errors were all timeout errors.

#	occurences	Error
3035	GET_thread:	HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
7	PATCH_comment:	HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
51851	GET_thread_list:	HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
83	PATCH_thread:	HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
216	POST_comment_response:	HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
2	POST_comment_comment:	HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
4	PATCH_thread:	HTTPError('404 Client Error: NOT FOUND',)
26	POST_thread:	HTTPError('500 Server Error: INTERNAL SERVER ERROR',)
190	GET_comment_list:	HTTPError('500 Server Error: INTERNAL SERVER ERROR',)

...

In this series of tests, we will be testing against different sized courses with different ratios. There are spikes in response times in production and one possibility may be that the few very large courses may be the reason. In this initial test, we ran it for 3 hours, with a small course vs large course on a 20:1 ratio. The large course used is BerkeleyX/ColWri2.2x/1T2014 (~25~38,000 threads, 30~40,000 comments) , while the small course is SMES/ASLCx/1T2015 (1700 Threads, 3047 comments). We can see that there are spikes in the response times. There have been many 500 responses from the large course which needs to be addressed. There were many unexpected 500 errors during this test. Upon further investigation, it was found that there was a memory issue due to missing indexes.

...

Expand

title	Rerun of mixed courses test after first index was added

This is before adding the "asc" index and after adding the "last_activity_at" index. The major difference is the lack of spikes, consistent RPM, better response times especially for the large course and no 500s.

This test was run for 3 minutes with a small course vs large course on a 20:1 ratio. The large course used is BerkeleyX/ColWri2.2x/1T2014 (~25~38,000 threads, 30~40,000 comments) , while the small course is SMES/ASLCx/1T2015 (1700 Threads, 3047 comments).

Name	reqs	#	fails	Avg	Min	Max	Median	req/s	95%
Large	GET_comment_list	46	0(0.00%)	443	207	1155	370	0	840
	GET_thread	2404	0(0.00%)	225	168	1275	210	0.7	290
	GET_thread_list	1296	0(0.00%)	489	181	1104	480	0.6	830
	PATCH_comment	45	0(0.00%)	490	221	2060	450	0	780
	PATCH_thread	53	0(0.00%)	489	174	1912	410	0.1	1100
Small	GET_comment_list	877	0(0.00%)	211	154	1072	200	0.3	300
	GET_thread	55710	2(0.00%)	198	141	1562	190	28.2	260
	GET_thread_list	29694	0(0.00%)	449	147	3919	430	13.2	810
	PATCH_comment	875	4(0.46%)	340	184	1584	320	0.2	560
	PATCH_thread	1038	9(0.86%)	259	150	1055	240	0.5	420

...

Versions Compared

Old Version 43

New Version 44

Key