Initial Tests:
In the initial testing phase of the discussion api, the wait time between each top-level task was 500ms. If a task has multiple calls, such as PATCH comment, the wait time between each of those calls were 1000ms. These waits are very short and represent an accelerated usage of the discussions api. The first test that was run was against BerkeleyX/ColWri2.2x/1T2014 (~25,000 threads, 30,000 comments) with 10 locust users. A high percentage of the responses were failures and the response times were in the 1000-3000ms range. Under the assumption that a course with more posts would have worse performance, it would be expected that this course which is the 5th biggest course in our system, would not perform. To get a better baseline, we tried a smaller course.
...
Expand | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
On the left is the initial portion of the BerkeleyX/CS.CS169.1x/3T2013 (783 threads, 3385 comments at the time) loadtest. On the right is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) loadtest. On the left is the HarvardX/1368.2x/2T2015 (92 threads, 1697 comments) load test. On the right is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) load test.
On the left is the TsinghuaX/00690242_2x/3T2014 (29 threads, 155 comments) load test. On the right is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) load test.
|
Tests while Mixing Courses:
In this series of tests, we will be testing against different sized courses with different ratios. There are spikes in response times in production and one possibility may be that the few very large courses may be the reason. In this initial test, we ran it for 3 hours, with a small course vs large course on a 20:1 ratio. The large course used is BerkeleyX/ColWri2.2x/1T2014 (~25,000 threads, 30,000 comments) , while the small course is SMES/ASLCx/1T2015 (1700 Threads, 3047 comments). We can see that there are spikes in the response times. There have been many 500 responses from the large course which needs to be addressed. There were many unexpected 500 errors during this test. Upon further investigation, it was found that there was a memory issue due to missing indexes.
Expand | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The chart below is the SMES/ASLCx/1T2015 (1700 Threads, 3047 comments) loadtest by itself for reference. These numbers are not comparable since the test in the chart below was at 10 users vs 20 users in the above test.
|
Addressing the 500s:
Although only present on the larger courses, a significant amount of 500s were being returned. A solution has been proposed to include an index to account for this but for now in order to continue the load tests, the offending parameters "last_activity_at" and "asc" will be removed from out random selection. This also means that our thread retrieval for PATCHing (and DELETEing which has been disabled) will be less random, but sufficiently random. By removing these parameters, we hope to get rid of these 500s and see a performance improvement for large and small courses.
Test reruns:
These tests have the non-indexed fields removed just to verify that those were causing the 500s. All tests were run with 10 locust users for 30 minutes. Response times were all better, no spikes, and response times were better the smaller the course.
...
Expand | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
NEXT STEPS:
Run with new index.