Page Comparison

Jira Legacy

server	JIRA (openedx.atlassian.net)
serverId	13fd1930-5608-3aac-a5dd-21b934d3a4b4
key	MA-1099

Goals:

Understand the load we are able to handle with the discussion API for when the mobile app is released.
1. What can the server handle?
Understand the the overhead between the discussion API and the ruby forums code.
1. Does the Discussion API perform better, worse, or on par with the browser's forums?
2. What does the forums performance look like in general?

Usage patterns to look out for:

Default page size for the browser is 25 while the mobile device will be using 10. It is possible that more requests could be sent for the same amount of information.
Push notifications
- There is a possibility for a different usage pattern to look out for. If there is a popular thread, bursts of requests can be expected.
- Increased forum usage as there is currently no notifications for the browser.
The browser can display the Threads, Response, and Comments all at once. The mobile app treats all three of these as separate views. It is possible that more requests could be sent for the same amount of information.
General Usage. Discussions on mobile could naturally increase discussion forums usage.

...

/courses/(course_id)/
- GET
/course_topics/(course_id)
- GET
/threads/
- GET
- POST
/threads/(thread_id)
/comments
- GET
- POST
/comments/(comment_id)
- GET not implemented
- PATCH
- DELETE

...

Testing Strategy:

Originally the plan was to isolate each endpoint and determine what kind of load it can handle, but after analysis of the data, some of these endpoints seem unnecessary to isolate for a load test. These endpoints include DELETE and PATCH which are a significantly small part of the overall load in production. For the isolated test for these endpoints, it will be paired with it's appropriate GET Thread/Comment. For example, every DELETE Thread request requires a thread_id. We obtain this thread_id by calling GET Thread List with randomize parameters, which returns a list of threads where one is then randomly selected. This selected thread is then DELETEd. Below is the chart of the additional request we make. As long the ratio of how many of these requests happen in each task is understood, we can get the desired endpoint distribution.

...

*GET Thread List can always return a response (so we delete a random response), but will not always return a comment so the comment created will be the one deleted.

Thread and Comment pool:

Various methods of select post data were considered.

Selecting threads from a smaller pool or selecting the same thread. Rather than getting the entire list of thread_ids to send requests against, we would just store a random portion of the threads. A test was run to see if matters whether the retrieved thread was random or not, but the sandbox it was run against did not have the correct mongo indexes set up. Regardless, this strategy would not work when trying to DELETE threads as the pool of potential threads would be smaller. Additionally this relies on storing data that must be shared amongst the locust users which could lead to race conditions as a locust user could be trying to GET a thread that another locust user was in the middle of DELETEing. When dealing with much larger file IO operations, it could cause some limitations on the machine that spawns the locusts.
Retrieving the list of thread ids when starting locust. This method was effective up until the number of threads in the data set started to increase. As the median number of posts in a course is ~2000, when trying to retrieve 20*(page size max of 100), it would take 20 queries. Additionally, as mentioned in the above strategy, storing data amongst the locust users is not a trivial task. Each locust user would try to generate it's own list of threads which is unacceptable. If a thread was POSTed or DELETEd, only that locust user would have that updated information. Attempts at using the lazy module did not work either as each list of threads was instantiated separately by each locust user. Again, even if the locust users were able to use the same global variables, there would be race conditions.

Calling GET thread_list per DELETE/PATCH/GET_comment. Since the ratio of GET thread_list is significantly higher than any of the other calls except for GET Thread, we can achieve the desired distribution of requests for the discussion API without having to store any of the thread_ids. The table below is a 7 day snapshot on NewRelic for the discussion forums. The only drawback is that in order to GET a single thread, we need to have a thread_id. This issue will be discussed in the next bullet.

Action	Count		Discussion API Call
.forum.views:single_thread	675980	4760	GET Thread
.forum.views:forum_form_discussion	234783	1653	GET Thread List
.forum.views:inline_discussion	155176	1093	GET Thread List
create_thread	31176	220	POST Thread
create_comment	27438	193	POST comment
create_sub_comment	14345	101	POST comment
users	13820	97	-
.forum.views:user_profile	12336	87	-
.forum.views:followed_threads	7698	54	GET Thread List
vote_for_comment	6731	47	PATCH Comment
vote_for_thread	6242	44	PATCH Thread
upload	4208	30	-
update_comment	3403	24	PATCH Comment
follow_thread	3870	27	PATCH Thread
update_thread	2827	20	PATCH Thread
delete_thread	2091	15	DELETE Thread
endorse_comment	1232	9	PATCH Comment
delete_comment	770	5	DELETE Comment
flag_abuse_for_comment	373	3	PATCH Comment
flag_abuse_for_thread	142	1	PATCH Thread

Using pre-stored thread_id data. Since GET Thread is called more than GET Thread List, we cannot use GET Thread List to get a thread_id. Instead, we can use a pre-defined set of thread_ids as mentioned in the first two bullets. This will allow us to be able to test GET Threads in isolation. Unfortunately the issue of trying to GET a DELETEd thread may still arise. Another option could be to have the locust user only call GET Thread List once and then run multiple GET Thread's. Again, the same issue still arises if one of those Threads happened to get DELETEd.

Production Spikes in response time:

When running some early tests, it was found that some of the requests that were believed to be slow on production, were not appearing that way on the loadtests.

Looking at the errors that show up on GET Thread, the response time is 20s, which is the time out limit. When looking for this thread, a 404 is returned. Other factors that are involved are courses that have many posts, may take longer to GET information from. These courses, although the exception, mixed in with normal courses, could explain the spikes in the data.

Staff vs. Normal User:

Using users with staff access was thrown into consideration as it would be make some of the permissions a bit more difficult for some discussion forums actions such as editing the body of a thread. Some tests were ran to see if there was a difference. No difference was found the tests that were designed to check for a difference.

There were some concerns that the pagination in the forums code are not working properly. A series of tests will be run against courses of different sizes and compared. The idea is that if the pagination is working correctly, all the courses should be returning threads with similar response times. If it is not working correctly, course with more posts will take longer to return threads than a smaller course.

Things that were left out:

Moderator actions

Pin Thread - Not implemented
Open/Close Thread -Not implemented
Endorsed - Not Implemented

Course topics - This will be addressed at another time.

Pagination:

(FILL THIS UTTTT)

...

/threads/

GET:
Anchor
/threads/get
/threads/get

Will also be testing against different course sizes.

Expand

title	Invalid: Old test that was invalid due to the way mongo is set up

Using 10 locust users with min/max time of 1000/2000ms, GET requests for sent for either a single thread or randomly from a selection of 10% of the threads in a course. This was tested against a sandbox.

It seems like getting a random thread vs. the same thread does not make a reliably noticeable difference.
As number of posts in the database increase, the response time also increases.
Number of posts in a course does not seem to be matter for GETting a post.

Update: This conclusion is invalid as the sandbox does not have the proper indexes.

Approx Total Posts	Name	# requests	# fails	Median	Average	Min	Max	Content Size	# reqs/sec
1000	GET a random Thread out of 1000 Posts	2671	0	140	150	125.5698204	410.5699062	1178	5.8
1000	GET a single Thread out of 1000 Posts	2262	0	140	151	126.8491745	337.1069431	1178	5.9
11000	GET a random Thread out of 10000 Posts	1522	0	180	196	164.470911	447.6130009	1178	6
11000	GET a single Thread out of 10000 Posts	1593	0	180	198	164.4868851	480.9308052	1178	5.8
11100	GET a random Thread out of 100 Posts	542	0	180	193	164.2448902	419.1830158	1178	5.9
11100	GET a single Thread out of 100 Posts	738	0	180	199	165.0490761	434.2639446	1178	6
12100	GET a random Thread out of 1000 Posts	683	0	180	204	166.1930084	489.5379543	1178	5.6
12100	GET a single Thread out of 1000 Posts	1049	0	180	202	169.2481041	443.4149265	1178	6
13100	GET a random Thread out of 1000 Posts	1473	0	180	209	171.4019775	1157.299042	1178	6.1
13100	GET a single Thread out of 1000 Posts	1317	0	180	204	170.7370281	510.4939938	1178	5.8
14100	GET a random Thread out of 1000 Posts	855	0	190	209	175.9641171	468.7230587	1178	6
14100	GET a single Thread out of 1000 Posts	7557	0	190	213	173.609972	1970.304012	1178	5.1

POST:
Anchor
/threads/post
/threads/post

Expand

title	Missing some data: After 1,000,000 posts in 24 hours, the response time remain constant.

Unfortunately locust ran into a calculation error when running a post test so there is no table data. After 1,000,000 posts in 24 hours, the response time remain constant. This was tested against https://courses-loadtest.edx.org/

/threads/{thread_id}

GET:
Anchor
/threads/thread_id/get
/threads/thread_id/get

Waiting on Loadtest env to get meaningful results. Refer to /threads/get

PATCH:
Anchor
/threads/thread_id/patch
/threads/thread_id/patch

Expand

title	Patch events for the boolean values took more or less the same amount of time.

This was tested against a t2.large sandbox.

Type	Name	# requests	# fails	Median	Average	Min	Max	Content Size	# reqs/sec
PATCH	abuse_flagged	1814	0	300	315	96.83394432	1307.781935	2009	2.5
PATCH	following	1847	0	300	314	97.83601761	1730.396986	1939	1.9
PATCH	voted	1875	0	310	319	97.41687775	1427.90103	2104	1.3

...

Expand

title	Different body sized edits did not seem to make a difference in response times.

Name	reqs	#	fails	Avg	Min	Max	Median	req/s
PATCH	abuse_flag_thread	16	0(0.00%)	162	117	212	150	0
PATCH	edit_thread_with_10000char	50	0(0.00%)	199	151	424	190	0.1
PATCH	edit_thread_with_1000char	61	0(0.00%)	234	138	3707	170	0.1
PATCH	edit_thread_with_250char	54	0(0.00%)	178	136	422	160	0.1
PATCH	edit_thread_with_4char	57	0(0.00%)	183	138	331	170	0.1
PATCH	edit_thread_with_5000char	57	0(0.00%)	188	141	341	180	0.1
PATCH	following_thread	42	0(0.00%)	168	130	337	160	0.2
PATCH	vote_on_thread	698	0(0.00%)	160	114	652	150	1.1

DELETE:
Anchor
/threads/delete
/threads/delete

Expand

title	DELETE will be best tested with the other endpoints

For every DELETE thread, we POST a Thread and then GET a thread from the thread pool. This was tested against a t2.large.sandbox.

Name	reqs	#	fails	Avg	Min	Max	Median	req/s
DELETE	DELETE_thread	305	1(0.33%)	203	137	352	190	3
GET	GET_thread_list	306	0(0.00%)	190	154	420	170	2.9
POST	POST_thread	306	0(0.00%)	127	102	277	110	3

/comments/

GET:
Anchor
/comments/get
/comments/get

Expand

title	mark_as_read and page did not seem to affect the response time at all.

Using 10 locust users with min/max time of 1000/2000ms, GET requests for sent for 1 of 10 threads with 100 responses in increasing increments of 100 responses, each with single comment. The page_size seemed to be the parameter that affected the response time. This was tested against a t2.large sandbox.

Type	Name	# requests	# fails	Median	Average	Min	Max	Content Size	# reqs/sec
GET	page_size=100	552	0	1200	1212	689.6290779	2079.323053	167747	0.6
GET	page_size=75	588	0	1000	1053	570.6920624	1957.83782	125858	0.7
GET	page_size=50	552	0	900	927	459.9750042	1813.49206	83958	1.1
GET	page_size=25	525	0	790	810	345.9990025	1692.13295	42058	1
GET	page_size=1	557	0	680	710	237.9820347	1673.289061	1833	0.8

Expand

title	None of the parameters affect the response time. As comments on a response increases, so does the response time.

Using 10 locust users with min/max time of 1000/2000ms, GET requests for sent for 1 of 10 threads where the chosen thread will contain a response of 50, 100, 150, 200... comments increasing in increments of 50. This was tested against a t2.large sandbox.

Type	Name	# requests	# fails	Median	Average	Min	Max	Content Size	# reqs/sec
GET	comments=500	352	0	1700	1822	1361.582041	3062.257051	406832	0.1
GET	comments=450	395	0	1600	1710	1244.348049	3462.92305	366232	0.3
GET	comments=400	387	0	1400	1552	1093.059063	4270.447969	325632	0.1
GET	comments=350	394	0	1300	1394	972.3510742	2612.555027	285032	0.4
GET	comments=300	379	0	1100	1233	827.7709484	2370.13793	244432	0.2
GET	comments=250	353	0	970	1056	708.3749771	2795.8529	203832	0
GET	comments=200	352	0	830	943	583.6598873	2230.697155	163232	0.1
GET	comments=150	368	0	680	785	441.0719872	2042.181969	122632	0.2
GET	comments=100	342	0	540	658	323.1511116	1866.328001	82032	0.2
GET	comments=50	390	0	390	512	194.7860718	2818.300009	41432	0.3

POST:
Anchor
/comments/post
/comments/post

POST should be similar to POSTing threads.

/comments/comment_id

PATCH:
Anchor
/comments/comment_id/patch
/comments/comment_id/patch

Expand

title	As with PATCHing a thread, body edit size did not seem to matter. Boolean patches also seemed to be the same.

Name	reqs	#	fails	Avg	Min	Max	Median	req/s
GET	GET_comment_list	916	0(0.00%)	260	119	643	240	3
GET	GET_thread_list	916	0(0.00%)	275	187	571	260	2.9
PATCH	abuse_flag_comment	47	0(0.00%)	271	159	513	260	0.4
PATCH	edit_comment_with_10000char	60	0(0.00%)	307	205	498	300	0.2
PATCH	edit_comment_with_1000char	57	0(0.00%)	264	165	448	250	0.2
PATCH	edit_comment_with_250char	71	0(0.00%)	264	160	467	250	0.3
PATCH	edit_comment_with_4char	76	0(0.00%)	265	166	469	250	0.4
PATCH	edit_comment_with_5000char	60	0(0.00%)	293	195	597	280	0.1
PATCH	vote_on_comment	547	0(0.00%)	283	130	641	270	1.5

DELETE:
Anchor
/comments/comment_id/delete
/comments/comment_id/delete

DELETE is best tested with the other endpoints. For every comment delete, we POST a thread, GET a random thread, and then DELETE that random thread.

Versions Compared

Old Version 54

New Version 55

Key

Goals:

Understand the load we are able to handle with the discussion API for when the mobile app is released.

Usage patterns to look out for:

Testing Strategy:

Thread and Comment pool:

Production Spikes in response time:

Staff vs. Normal User:

Things that were left out:

/threads/

GET:
Anchor
/threads/get
/threads/get

POST:
Anchor
/threads/post
/threads/post

/threads/{thread_id}

GET:
Anchor
/threads/thread_id/get
/threads/thread_id/get

PATCH:
Anchor
/threads/thread_id/patch
/threads/thread_id/patch

DELETE:
Anchor
/threads/delete
/threads/delete

/comments/

GET:
Anchor
/comments/get
/comments/get

POST:
Anchor
/comments/post
/comments/post

/comments/comment_id

PATCH:
Anchor
/comments/comment_id/patch
/comments/comment_id/patch

DELETE:
Anchor
/comments/comment_id/delete
/comments/comment_id/delete

Page Comparison

Versions Compared

Old Version 54

New Version 55

Key

Goals:

Understand the load we are able to handle with the discussion API for when the mobile app is released.

Usage patterns to look out for:

Testing Strategy:

Thread and Comment pool:

Production Spikes in response time:

Staff vs. Normal User:

Pagination:

Things that were left out:

/threads/

GET: Anchor/threads/get/threads/get

POST: Anchor/threads/post/threads/post

/threads/{thread_id}

GET: Anchor/threads/thread_id/get/threads/thread_id/get

PATCH: Anchor/threads/thread_id/patch/threads/thread_id/patch

DELETE: Anchor/threads/delete/threads/delete

/comments/

GET: Anchor/comments/get/comments/get

POST: Anchor/comments/post/comments/post

/comments/comment_id

PATCH: Anchor/comments/comment_id/patch/comments/comment_id/patch

DELETE: Anchor/comments/comment_id/delete/comments/comment_id/delete

GET:
Anchor
/threads/get
/threads/get

POST:
Anchor
/threads/post
/threads/post

GET:
Anchor
/threads/thread_id/get
/threads/thread_id/get

PATCH:
Anchor
/threads/thread_id/patch
/threads/thread_id/patch

DELETE:
Anchor
/threads/delete
/threads/delete

GET:
Anchor
/comments/get
/comments/get

POST:
Anchor
/comments/post
/comments/post

PATCH:
Anchor
/comments/comment_id/patch
/comments/comment_id/patch

DELETE:
Anchor
/comments/comment_id/delete
/comments/comment_id/delete