Jira Legacy |
---|
server | JIRA (openedx.atlassian.net) |
---|
serverId | 13fd1930-5608-3aac-a5dd-21b934d3a4b4 |
---|
key | MA-1099 |
---|
|
Table of Contents:
- Goals
- Testing Strategy
- Thread and comment pool
- Spikes in production data
- Staff vs. Normal user
- Pagination issue
- Other notes
- Endpoints
- Seeding data
- Test Details
Goals:Understand the load we are able to handle with the discussion API for when the mobile app is released.
What can the server handle?
Understand the the overhead between the discussion API and the ruby forums code.
- Does the Discussion API perform better, worse, or on par with the browser's forums?
- What does the forums performance look like in general?
...
Anchor |
---|
| testingstrategy |
---|
| testingstrategy |
---|
|
Testing Strategy:Originally the plan was to isolate each endpoint and determine what kind of load it can handle, but after analysis of the data, some of these endpoints seem unnecessary to isolate for a load test. These endpoints include DELETE and PATCH which are a significantly small part of the overall load in production. For the isolated test for these endpoints, it will be paired with it's appropriate GET Thread/Comment. For example, every DELETE Thread request requires a thread_id. We obtain this thread_id by calling GET Thread List with randomize parameters, which returns a list of threads where one is then randomly selected. This selected thread is then DELETEd. Below is the chart of the additional request we make. As long the ratio of how many of these requests happen in each task is understood, we can get the desired endpoint distribution.
...
*GET Thread List can always return a response (so we delete a random response), but will not always return a comment so the comment created will be the one deleted.
Various methods of select post data were considered.
- Selecting threads from a smaller pool or selecting the same thread. Rather than getting the entire list of thread_ids to send requests against, we would just store a random portion of the threads. A test was run to see if matters whether the retrieved thread was random or not, but the sandbox it was run against did not have the correct mongo indexes set up. Regardless, this strategy would not work when trying to DELETE threads as the pool of potential threads would be smaller. Additionally this relies on storing data that must be shared amongst the locust users which could lead to race conditions as a locust user could be trying to GET a thread that another locust user was in the middle of DELETEing. When dealing with much larger file IO operations, it could cause some limitations on the machine that spawns the locusts.
- Retrieving the list of thread ids when starting locust. This method was effective up until the number of threads in the data set started to increase. As the median number of posts in a course is ~2000, when trying to retrieve 20*(page size max of 100), it would take 20 queries. Additionally, as mentioned in the above strategy, storing data amongst the locust users is not a trivial task. Each locust user would try to generate it's own list of threads which is unacceptable. If a thread was POSTed or DELETEd, only that locust user would have that updated information. Attempts at using the lazy module did not work either as each list of threads was instantiated separately by each locust user. Again, even if the locust users were able to use the same global variables, there would be race conditions.
Calling GET thread_list per DELETE/PATCH/GET_comment. Since the ratio of GET thread_list is significantly higher than any of the other calls except for GET Thread, we can achieve the desired distribution of requests for the discussion API without having to store any of the thread_ids. The table below is a 7 day snapshot on NewRelic for the discussion forums. The only drawback is that in order to GET a single thread, we need to have a thread_id. This issue will be discussed in the next bullet.
Action | Count | | Discussion API Call |
---|
.forum.views:single_thread | 675980 | 4760 | GET Thread |
.forum.views:forum_form_discussion | 234783 | 1653 | GET Thread List |
.forum.views:inline_discussion | 155176 | 1093 | GET Thread List |
create_thread | 31176 | 220 | POST Thread |
create_comment | 27438 | 193 | POST comment |
create_sub_comment | 14345 | 101 | POST comment |
users | 13820 | 97 | - |
.forum.views:user_profile | 12336 | 87 | - |
.forum.views:followed_threads | 7698 | 54 | GET Thread List |
vote_for_comment | 6731 | 47 | PATCH Comment |
vote_for_thread | 6242 | 44 | PATCH Thread |
upload | 4208 | 30 | - |
update_comment | 3403 | 24 | PATCH Comment |
follow_thread | 3870 | 27 | PATCH Thread |
update_thread | 2827 | 20 | PATCH Thread |
delete_thread | 2091 | 15 | DELETE Thread |
endorse_comment | 1232 | 9 | PATCH Comment |
delete_comment | 770 | 5 | DELETE Comment |
flag_abuse_for_comment | 373 | 3 | PATCH Comment |
flag_abuse_for_thread | 142 | 1 | PATCH Thread |
- Using pre-stored thread_id data. Since GET Thread is called more than GET Thread List, we cannot use GET Thread List to get a thread_id. Instead, we can use a pre-defined set of thread_ids as mentioned in the first two bullets. This will allow us to be able to test GET Threads in isolation. Unfortunately the issue of trying to GET a DELETEd thread may still arise. Another option could be to have the locust user only call GET Thread List once and then run multiple GET Thread's. Again, the same issue still arises if one of those Threads happened to get DELETEd.
Production Spikes in response time:
When running some early tests, it was found that some of the requests that were believed to be slow on production, were not appearing that way on the loadtests.
Looking at the errors that show up on GET Thread, the response time is 20s, which is the time out limit. Other factors that are involved are courses that have many posts, may take longer to GET information from. These courses, although the exception, mixed in with normal courses, could explain the spikes in the data.
Staff vs. Normal User:
Using users with staff access was thrown into consideration as it would be make some of the permissions a bit more difficult for some discussion forums actions such as editing the body of a thread. Some tests were ran to see if there was a difference. No difference was found the tests that were designed to check for a difference.
Pagination:
There were some concerns that the pagination in the forums code are not working properly. A series of tests will be run against courses of different sizes and compared. The idea is that if the pagination is working correctly, all the courses should be returning threads with similar response times. If it is not working correctly, course with more posts will take longer to return threads than a smaller course.
Things that were left out:
Moderator actions
- Pin Thread - Not implemented
- Open/Close Thread -Not implemented
- Endorsed - Not Implemented
Course topics - This will be addressed at another time.
...
Endpoints:
- /courses/(course_id)/
- /course_topics/(course_id)
- /threads/
- /threads/(thread_id)
- /comments
- /comments/(comment_id)
Usage patterns to look out for:
- Default page size for the browser is 25 while the mobile device will be using 10. It is possible that more requests could be sent for the same amount of information.
- Push notifications
- There is a possibility for a different usage pattern to look out for. If there is a popular thread, bursts of requests can be expected.
- Increased forum usage as there is currently no notifications for the browser.
- The browser can display the Threads, Response, and Comments all at once. The mobile app treats all three of these as separate views. It is possible that more requests could be sent for the same amount of information.
- General Usage. Discussions on mobile could naturally increase discussion forums usage.
/threads/
GET:
Will also be testing against different course sizes.
Expand |
---|
title | Invalid: Old test that was invalid due to the way mongo is set up |
---|
|
Using 10 locust users with min/max time of 1000/2000ms, GET requests for sent for either a single thread or randomly from a selection of 10% of the threads in a course. This was tested against a sandbox. - It seems like getting a random thread vs. the same thread does not make a reliably noticeable difference.
- As number of posts in the database increase, the response time also increases.
- Number of posts in a course does not seem to be matter for GETting a post.
Update: This conclusion is invalid as the sandbox does not have the proper indexes. Approx Total Posts | Name | # requests | # fails | Median | Average | Min | Max | Content Size | # reqs/sec | 1000 | GET a random Thread out of 1000 Posts | 2671 | 0 | 140 | 150 | 125.5698204 | 410.5699062 | 1178 | 5.8 | 1000 | GET a single Thread out of 1000 Posts | 2262 | 0 | 140 | 151 | 126.8491745 | 337.1069431 | 1178 | 5.9 | 11000 | GET a random Thread out of 10000 Posts | 1522 | 0 | 180 | 196 | 164.470911 | 447.6130009 | 1178 | 6 | 11000 | GET a single Thread out of 10000 Posts | 1593 | 0 | 180 | 198 | 164.4868851 | 480.9308052 | 1178 | 5.8 | 11100 | GET a random Thread out of 100 Posts | 542 | 0 | 180 | 193 | 164.2448902 | 419.1830158 | 1178 | 5.9 | 11100 | GET a single Thread out of 100 Posts | 738 | 0 | 180 | 199 | 165.0490761 | 434.2639446 | 1178 | 6 | 12100 | GET a random Thread out of 1000 Posts | 683 | 0 | 180 | 204 | 166.1930084 | 489.5379543 | 1178 | 5.6 | 12100 | GET a single Thread out of 1000 Posts | 1049 | 0 | 180 | 202 | 169.2481041 | 443.4149265 | 1178 | 6 | 13100 | GET a random Thread out of 1000 Posts | 1473 | 0 | 180 | 209 | 171.4019775 | 1157.299042 | 1178 | 6.1 | 13100 | GET a single Thread out of 1000 Posts | 1317 | 0 | 180 | 204 | 170.7370281 | 510.4939938 | 1178 | 5.8 | 14100 | GET a random Thread out of 1000 Posts | 855 | 0 | 190 | 209 | 175.9641171 | 468.7230587 | 1178 | 6 | 14100 | GET a single Thread out of 1000 Posts | 7557 | 0 | 190 | 213 | 173.609972 | 1970.304012 | 1178 | 5.1 |
|
POST:
Anchor |
---|
| /threads/post |
---|
| /threads/post |
---|
|
Expand |
---|
title | Missing some data: After 1,000,000 posts in 24 hours, the response time remain constant. |
---|
|
Unfortunately locust ran into a calculation error when running a post test so there is no table data. After 1,000,000 posts in 24 hours, the response time remain constant. This was tested against https://courses-loadtest.edx.org/
|
/threads/{thread_id}
GET:
Anchor |
---|
| /threads/thread_id/get |
---|
| /threads/thread_id/get |
---|
|
Waiting on Loadtest env to get meaningful results. Refer to /threads/get
PATCH: Anchor |
---|
| /threads/thread_id/patch |
---|
| /threads/thread_id/patch |
---|
|
Expand |
---|
title | Patch events for the boolean values took more or less the same amount of time. |
---|
|
This was tested against a t2.large sandbox. Type | Name | # requests | # fails | Median | Average | Min | Max | Content Size | # reqs/sec | PATCH | abuse_flagged | 1814 | 0 | 300 | 315 | 96.83394432 | 1307.781935 | 2009 | 2.5 | PATCH | following | 1847 | 0 | 300 | 314 | 97.83601761 | 1730.396986 | 1939 | 1.9 | PATCH | voted | 1875 | 0 | 310 | 319 | 97.41687775 | 1427.90103 | 2104 | 1.3 |
|
...
Expand |
---|
title | Different body sized edits did not seem to make a difference in response times. |
---|
|
Name | reqs | # | fails | Avg | Min | Max | Median | req/s |
---|
PATCH | abuse_flag_thread | 16 | 0(0.00%) | 162 | 117 | 212 | 150 | 0 | PATCH | edit_thread_with_10000char | 50 | 0(0.00%) | 199 | 151 | 424 | 190 | 0.1 | PATCH | edit_thread_with_1000char | 61 | 0(0.00%) | 234 | 138 | 3707 | 170 | 0.1 | PATCH | edit_thread_with_250char | 54 | 0(0.00%) | 178 | 136 | 422 | 160 | 0.1 | PATCH | edit_thread_with_4char | 57 | 0(0.00%) | 183 | 138 | 331 | 170 | 0.1 | PATCH | edit_thread_with_5000char | 57 | 0(0.00%) | 188 | 141 | 341 | 180 | 0.1 | PATCH | following_thread | 42 | 0(0.00%) | 168 | 130 | 337 | 160 | 0.2 | PATCH | vote_on_thread | 698 | 0(0.00%) | 160 | 114 | 652 | 150 | 1.1 |
|
DELETE:
Anchor |
---|
| /threads/delete |
---|
| /threads/delete |
---|
|
Expand |
---|
title | DELETE will be best tested with the other endpoints |
---|
|
For every DELETE thread, we POST a Thread and then GET a thread from the thread pool. This was tested against a t2.large.sandbox. Name | reqs | # | fails | Avg | Min | Max | Median | req/s |
---|
DELETE | DELETE_thread | 305 | 1(0.33%) | 203 | 137 | 352 | 190 | 3 | GET | GET_thread_list | 306 | 0(0.00%) | 190 | 154 | 420 | 170 | 2.9 | POST | POST_thread | 306 | 0(0.00%) | 127 | 102 | 277 | 110 | 3 |
|
/comments/
GET:
Anchor |
---|
| /comments/get |
---|
| /comments/get |
---|
|
Expand |
---|
title | mark_as_read and page did not seem to affect the response time at all. |
---|
|
Using 10 locust users with min/max time of 1000/2000ms, GET requests for sent for 1 of 10 threads with 100 responses in increasing increments of 100 responses, each with single comment. The page_size seemed to be the parameter that affected the response time. This was tested against a t2.large sandbox. Type | Name | # requests | # fails | Median | Average | Min | Max | Content Size | # reqs/sec | GET | page_size=100 | 552 | 0 | 1200 | 1212 | 689.6290779 | 2079.323053 | 167747 | 0.6 | GET | page_size=75 | 588 | 0 | 1000 | 1053 | 570.6920624 | 1957.83782 | 125858 | 0.7 | GET | page_size=50 | 552 | 0 | 900 | 927 | 459.9750042 | 1813.49206 | 83958 | 1.1 | GET | page_size=25 | 525 | 0 | 790 | 810 | 345.9990025 | 1692.13295 | 42058 | 1 | GET | page_size=1 | 557 | 0 | 680 | 710 | 237.9820347 | 1673.289061 | 1833 | 0.8 |
|
Expand |
---|
title | None of the parameters affect the response time. As comments on a response increases, so does the response time. |
---|
|
Using 10 locust users with min/max time of 1000/2000ms, GET requests for sent for 1 of 10 threads where the chosen thread will contain a response of 50, 100, 150, 200... comments increasing in increments of 50. This was tested against a t2.large sandbox. Type | Name | # requests | # fails | Median | Average | Min | Max | Content Size | # reqs/sec | GET | comments=500 | 352 | 0 | 1700 | 1822 | 1361.582041 | 3062.257051 | 406832 | 0.1 | GET | comments=450 | 395 | 0 | 1600 | 1710 | 1244.348049 | 3462.92305 | 366232 | 0.3 | GET | comments=400 | 387 | 0 | 1400 | 1552 | 1093.059063 | 4270.447969 | 325632 | 0.1 | GET | comments=350 | 394 | 0 | 1300 | 1394 | 972.3510742 | 2612.555027 | 285032 | 0.4 | GET | comments=300 | 379 | 0 | 1100 | 1233 | 827.7709484 | 2370.13793 | 244432 | 0.2 | GET | comments=250 | 353 | 0 | 970 | 1056 | 708.3749771 | 2795.8529 | 203832 | 0 | GET | comments=200 | 352 | 0 | 830 | 943 | 583.6598873 | 2230.697155 | 163232 | 0.1 | GET | comments=150 | 368 | 0 | 680 | 785 | 441.0719872 | 2042.181969 | 122632 | 0.2 | GET | comments=100 | 342 | 0 | 540 | 658 | 323.1511116 | 1866.328001 | 82032 | 0.2 | GET | comments=50 | 390 | 0 | 390 | 512 | 194.7860718 | 2818.300009 | 41432 | 0.3 |
|
POST:
Anchor |
---|
| /comments/post |
---|
| /comments/post |
---|
|
POST should be similar to POSTing threads.
/comments/comment_id
PATCH:
Anchor |
---|
| /comments/comment_id/patch |
---|
| /comments/comment_id/patch |
---|
|
Expand |
---|
title | As with PATCHing a thread, body edit size did not seem to matter. Boolean patches also seemed to be the same. |
---|
|
Name | reqs | # | fails | Avg | Min | Max | Median | req/s | GET | GET_comment_list | 916 | 0(0.00%) | 260 | 119 | 643 | 240 | 3 | GET | GET_thread_list | 916 | 0(0.00%) | 275 | 187 | 571 | 260 | 2.9 | PATCH | abuse_flag_comment | 47 | 0(0.00%) | 271 | 159 | 513 | 260 | 0.4 | PATCH | edit_comment_with_10000char | 60 | 0(0.00%) | 307 | 205 | 498 | 300 | 0.2 | PATCH | edit_comment_with_1000char | 57 | 0(0.00%) | 264 | 165 | 448 | 250 | 0.2 | PATCH | edit_comment_with_250char | 71 | 0(0.00%) | 264 | 160 | 467 | 250 | 0.3 | PATCH | edit_comment_with_4char | 76 | 0(0.00%) | 265 | 166 | 469 | 250 | 0.4 | PATCH | edit_comment_with_5000char | 60 | 0(0.00%) | 293 | 195 | 597 | 280 | 0.1 | PATCH | vote_on_comment | 547 | 0(0.00%) | 283 | 130 | 641 | 270 | 1.5 |
|
Anchor |
---|
| /comments/comment_id/delete |
---|
| /comments/comment_id/delete |
---|
|
DELETE is best tested with the other endpoints. For every comment delete, we POST a thread, GET a random thread, and then DELETE that random thread.
...
Jira Legacy |
---|
server | JIRA (openedx.atlassian.net) |
---|
serverId | 13fd1930-5608-3aac-a5dd-21b934d3a4b4 |
---|
key | MA-1102 |
---|
|
Seeding Data: Course Structure Setup:
A tarfile with a very simple setup will be used for each load test. This course was created in studio and then exported. During the course creating when seeding data, this tarfile will be used as the skeleton.
Forums Data Analysis:
After some analysis of the forums database, there were some pieces of information that were found to be useful. The link to the google doc is here.
...
Using this data, we were able to get an idea of what a course might look like. Most notably, the largest comment_count (comments and responses) for a thread is 5907 and the median seems to be 1. Although that value is an outlier, each course has a "Introduce yourself" topic which would consistently put a thread with a high comment_count in each course. Also, when thinking about mobile usage, push notification could possibly have a different usage pattern where these high comment_count threads could see high spikes in traffic.
...
Test details and their importance Since the request distribution is very disproportional, the individual endpoint tests are categorized base on how often these requests are hit.
Important individual endpoint tests:
GET Thread - This test will be for the common case of a thread. ~2000 posts (median) in total will be created as the base course to GET from.
- Each thread has a ~250character body
- Of the 1000 threads created
- 200 have no comments
- 300 have some sort of flag (abused/voted/following)
- 100 has a response and a comment
- 500 have a response
- 200 will be of the type "question"
- Of the response heavy threads
- n threads will be created with a response that has n*20 comments (This could change)
In addition to this test, different course sizes will be created as well and tested against as we expect the course size to affect the performance.
GET Comment (Response is depth=1, comment is depth=2) - This test will be for the expected edge cases of a thread. It is important to note that the although the largest comment_count is ~5000, the ratio of responses to comments is unknown.
- Each response/comment has a ~250character body
- Each response will have 20*n comments (could change)
Less important:
POST Thread/Comments - Expected to be constant, this test will just be POSTing threads.
PATCH Comments/Threads - Will use the same setup as GET thread. This test will modify fields such as "abuse_flagged", "following", "voted", "body"
Insignificant:
DELETE Comment/Thread - These endpoints are hit significantly less than the other endpoints. If running these individually, Threads/Comments will be created to delete. Refer to "Testing Strategy" for more information.