Initial Tests:

Note: Locust user wait times were not correctly set here. Ignore user number and wait time and assume requests were being sent as soon as one of the users are ready.

...

Expand

title	Test comparing course sizes

On the left is the initial portion of the BerkeleyX/CS.CS169.1x/3T2013 (783 threads, 3385 comments at the time) loadtest. On the right is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) loadtest.

On the left is the HarvardX/1368.2x/2T2015 (92 threads, 1697 comments) load test. On the right is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) load test.

Name	reqs	#	fails	Avg	Min	Max	Median	req/s	95%
GET	GET_comment_list	324	0(0.00%)	221	139	1036	190	0.1	460
GET	GET_thread	20485	0(0.00%)	168	124	2079	170	12.7	220
GET	GET_thread_list	10919	0(0.00%)	461	132	1677	450	5.6	840
PATCH	PATCH_comment	325	0(0.00%)	348	176	1354	300	0.1	720
PATCH	PATCH_thread	356	3(0.84%)	272	134	895	230	0.2	500

On the left is the TsinghuaX/00690242_2x/3T2014 (29 threads, 155 comments) load test. On the right is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) load test.

Name	reqs	#	fails	Avg	Min	Max	Median	req/s	95%
GET	GET_comment_list	474	0(0.00%)	188	139	433	180	0.3	250
GET	GET_thread	29492	0(0.00%)	169	129	1408	160	10.3	220
GET	GET_thread_list	15707	0(0.00%)	198	127	1322	160	4.6	340
PATCH	PATCH_comment	474	0(0.00%)	314	177	1042	300	0.3	510
PATCH	PATCH_thread	590	7(1.17%)	246	148	1315	230	0.2	380

Tests while Mixing Courses:

In this series of tests, we will be testing against different sized courses with different ratios. There are spikes in response times in production and one possibility may be that the few very large courses may be the reason. In this initial test, we ran it for 3 hours, with a small course vs large course on a 20:1 ratio. The large course used is BerkeleyX/ColWri2.2x/1T2014 (~25,000 threads, 30,000 comments) , while the small course is SMES/ASLCx/1T2015 (1700 Threads, 3047 comments). We can see that there are spikes in the response times. There have been many 500 responses from the large course which needs to be addressed. There were many unexpected 500 errors during this test. Upon further investigation, it was found that there was a memory issue due to missing indexes.

Expand

title	Initial mixed course test

Course Size	reqs	#	fails	Avg	Min	Max	Median	req/s	95%
Large	GET_comment_list	106	0(0.00%)	864	263	4680	680	0	2700
	GET_thread	6053	0(0.00%)	213	156	1271	200	0.3	280
	GET_thread_list	2928	240(7.58%)	1012	175	5465	1000	0.2	1800
	PATCH_comment	106	0(0.00%)	1017	305	4738	790	0	2700
	PATCH_thread	81	3(3.57%)	681	167	4083	420	0	2000
Small	GET_comment_list	3054	0(0.00%)	247	143	1484	210	0.2	550
	GET_thread	183499	1(0.00%)	187	0	6247	180	28.1	250
	GET_thread_list	97488	21(0.02%)	501	141	6685	420	12.7	1100
	PATCH_comment	3054	1(0.03%)	368	173	4345	330	0.1	750
	PATCH_thread	3305	43(1.28%)	287	140	2584	250	0.2	570

The chart below is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) loadtest by itself for reference. These numbers are not comparable since the test in the chart below was at 10 users vs 20 users in the above test.

Name	reqs	#	fails	Avg	Min	Max	Median	req/s	95%
GET	GET_comment_list	324	0(0.00%)	221	139	1036	190	0.1	460
GET	GET_thread	20485	0(0.00%)	168	124	2079	170	12.7	220
GET	GET_thread_list	10919	0(0.00%)	461	132	1677	450	5.6	840
PATCH	PATCH_comment	325	0(0.00%)	348	176	1354	300	0.1	720
PATCH	PATCH_thread	356	3(0.84%)	272	134	895	230	0.2	500

Addressing the 500s:

Although only present on the larger courses, a significant amount of 500s were being returned. A solution has been proposed to include an index to account for this but for now in order to continue the load tests, the offending parameters "last_activity_at" and "asc" will be removed from out random selection. This also means that our thread retrieval for PATCHing (and DELETEing which has been disabled) will be less random, but sufficiently random. By removing these parameters, we hope to get rid of these 500s and see a performance improvement for large and small courses.

Test reruns:

These tests have the non-indexed fields removed just to verify that those were causing the 500s. All tests were run with 10 locust users for 30 minutes. Response times were all better, no spikes, and response times were better the smaller the course.

...

Expand

title	TsinghuaX/00690242_2x/3T2014 (29 threads, 155 comments)

Name	reqs	#	fails	Avg	Min	Max	Median	req/s	95%
GET	GET_comment_list	514	0(0.00%)	176	133	514	170	0.2	240
GET	GET_thread	29734	0(0.00%)	164	126	2162	160	19.2	220
GET	GET_thread_list	15738	0(0.00%)	195	123	1427	150	10.1	340
PATCH	PATCH_comment	512	0(0.00%)	310	167	2377	290	0.4	520
PATCH	PATCH_thread	506	8(1.56%)	239	136	3293	220	0.3	380

After adding indices:

Expand

title	Rerun of mixed courses test after first index was added

This is before adding the "asc" index and after adding the "last_activity_at" index. The major difference is the lack of spikes, consistent RPM, better response times especially for the large course and no 500s.

This test was run for 3 minutes with a small course vs large course on a 20:1 ratio. The large course used is BerkeleyX/ColWri2.2x/1T2014 (~25,000 threads, 30,000 comments) , while the small course is SMES/ASLCx/1T2015 (1700 Threads, 3047 comments).

Name	reqs	#	fails	Avg	Min	Max	Median	req/s	95%
Large	GET_comment_list	46	0(0.00%)	443	207	1155	370	0	840
	GET_thread	2404	0(0.00%)	225	168	1275	210	0.7	290
	GET_thread_list	1296	0(0.00%)	489	181	1104	480	0.6	830
	PATCH_comment	45	0(0.00%)	490	221	2060	450	0	780
	PATCH_thread	53	0(0.00%)	489	174	1912	410	0.1	1100
Small	GET_comment_list	877	0(0.00%)	211	154	1072	200	0.3	300
	GET_thread	55710	2(0.00%)	198	141	1562	190	28.2	260
	GET_thread_list	29694	0(0.00%)	449	147	3919	430	13.2	810
	PATCH_comment	875	4(0.46%)	340	184	1584	320	0.2	560
	PATCH_thread	1038	9(0.86%)	259	150	1055	240	0.5	420

Page size vs. Response time:
Anchor
psvrt
psvrt

In a new test, it was found that SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) was slowing down over time. To identify what was happening, a new course was seeded which we will call DAPI (1000 Threads, 500 comments). In the analysis of the forums, we saw that the median body size was 250 characters. Our PATCH operations can be 4, 250, 1000, 5000, or 10000 characters. This test was run over 10 hours. The decrease in response time needed to be addressed.

...

Version	Old Version 41	New Version 42
Changes made by	Christopher Lee	Christopher Lee
Saved on	Oct 16, 2015	Oct 16, 2015

Versions Compared

Key

Initial Tests:

Tests while Mixing Courses:

Addressing the 500s:

Test reruns:

After adding indices:

Page size vs. Response time:
Anchor
psvrt
psvrt

Content Comparison

Versions Compared

Key

Initial Tests:

Tests while Mixing Courses:

Addressing the 500s:

Test reruns:

After adding indices:

Page size vs. Response time: Anchorpsvrtpsvrt

Page size vs. Response time:
Anchor
psvrt
psvrt