...
...
...
We have gathered and compared two sets of load test results. The former results (i.e. with profile_image) are run on the latest changes and the later are run on older implementation (i.e. without profile_image).
There appears to be anomaly in the results below (highlighted 99%) where the new implementation is faster than the old. Where as, for other results the percentile of new implementation increases significantly with the increase in load. For example, the first pair of results where no. of clients = 48, the difference is in tens or hundrend and for the last pair of results where no. of clients = 610, the difference is in thousands.
Logically; there should only be increase in response time for GET endpoints, but we see in results that all other endpoints show differences too. This is because all PATCH, POST and DELETE endpoints first call GET endpoint to retrieve an 'id' of thread and/or comment and then make further processing on it.
New Relic:
Here is the new relic permaLink_without_profile and permaLink_with_profile where the average rpm for former is 1.55k and for later is 1.48k and 84.5% of requests are made for AccountViewSet.list (i.e. user accounts API for multiple usernames) for later.
...
With Profile Image
...
Without Profile Image
...
No. of clients =
req/s =
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment |
| ||
DELETE_thread |
| ||
GET_comment_list | |||
GET_thread | |||
GET_thread_list | |||
PATCH_comment | |||
PATCH_thread | |||
POST_comment_comment | |||
POST_comment_response | |||
POST_thread | |||
auto_auth |
No. of clients =
req/s =
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | |||
DELETE_thread | |||
GET_comment_list | |||
GET_thread | |||
GET_thread_list | |||
PATCH_comment | |||
PATCH_thread | |||
POST_comment_comment | |||
POST_comment_response | |||
POST_thread | |||
auto_auth |
No. of clients =
req/s =
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment |
| ||
DELETE_thread | |||
GET_comment_list | |||
GET_thread | |||
GET_thread_list | |||
PATCH_comment | |||
PATCH_thread | |||
POST_comment_comment | |||
POST_comment_response | |||
POST_thread | |||
auto_auth |
No. of clients =
req/s =
...
No. of clients =
req/s =
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | |||
DELETE_thread | |||
GET_comment_list | |||
GET_thread | |||
GET_thread_list | |||
PATCH_comment | |||
PATCH_thread | |||
POST_comment_comment | |||
POST_comment_response | |||
POST_thread | |||
auto_auth |
No. of clients =
req/s =
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | |||
DELETE_thread | |||
GET_comment_list | |||
GET_thread | |||
GET_thread_list | |||
PATCH_comment | |||
PATCH_thread | |||
POST_comment_comment | |||
POST_comment_response | |||
POST_thread | |||
auto_auth |
No. of clients =
req/s =
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 270 | 350 | 1300 |
DELETE_thread | 180 | 210 | 280 |
GET_comment_list | 170 | 250 | 300 |
GET_thread | 170 | 230 | 260 |
GET_thread_list | 240 | 590 | 740 |
PATCH_comment | 270 | 350 | 450 |
PATCH_thread | 190 | 260 | 370 |
POST_comment_comment | 330 | 440 | 650 |
POST_comment_response | 280 | 350 | 400 |
POST_thread | 170 | 230 | 250 |
auto_auth | 220 | 230 | 230 |
No. of clients =
req/s =
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 270 | 340 | 380 |
DELETE_thread | 180 | 240 | 360 |
GET_comment_list | 150 | 210 | 330 |
GET_thread | 160 | 220 | 300 |
GET_thread_list | 170 | 340 | 480 |
PATCH_comment | 260 | 340 | 370 |
PATCH_thread | 150 | 260 | 270 |
POST_comment_comment | 330 | 410 | 550 |
POST_comment_response | 280 | 360 | 440 |
POST_thread | 170 | 230 | 270 |
auto_auth | 210 | 210 | 210 |
No. of clients = 192
req/s = 16.10
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 280 | 370 | 550 |
DELETE_thread | 180 | 230 | 290 |
GET_comment_list | 180 | 260 | 370 |
GET_thread | 170 | 240 | 340 |
GET_thread_list | 240 | 610 | 780 |
PATCH_comment | 260 | 380 | 1100 |
PATCH_thread | 170 | 260 | 300 |
POST_comment_comment | 340 | 430 | 540 |
POST_comment_response | 290 | 370 | 400 |
POST_thread | 180 | 230 | 350 |
auto_auth | 210 | 220 | 220 |
No. of clients = 192
req/s = 16
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 280 | 360 | 430 |
DELETE_thread | 180 | 290 | 290 |
GET_comment_list | 150 | 210 | 290 |
GET_thread | 160 | 220 | 310 |
GET_thread_list | 180 | 350 | 490 |
PATCH_comment | 250 | 340 | 450 |
PATCH_thread | 190 | 260 | 430 |
POST_comment_comment | 330 | 430 | 550 |
POST_comment_response | 280 | 370 | 470 |
POST_thread | 170 | 230 | 290 |
auto_auth | 220 | 230 | 230 |
No. of clients = 240
req/s = 18.70
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 290 | 390 | 470 |
DELETE_thread | 180 | 250 | 350 |
GET_comment_list | 180 | 270 | 400 |
GET_thread | 180 | 240 | 300 |
GET_thread_list | 240 | 620 | 800 |
PATCH_comment | 270 | 400 | 860 |
PATCH_thread | 200 | 270 | 2300 |
POST_comment_comment | 340 | 470 | 760 |
POST_comment_response | 290 | 390 | 760 |
POST_thread | 180 | 240 | 390 |
auto_auth | 220 | 230 | 230 |
No. of clients = 240
req/s = 19.7
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 280 | 370 | 460 |
DELETE_thread | 180 | 250 | 300 |
GET_comment_list | 150 | 220 | 280 |
GET_thread | 160 | 230 | 280 |
GET_thread_list | 180 | 350 | 500 |
PATCH_comment | 260 | 380 | 500 |
PATCH_thread | 190 | 250 | 380 |
POST_comment_comment | 330 | 420 | 500 |
POST_comment_response | 280 | 370 | 460 |
POST_thread | 180 | 240 | 340 |
auto_auth | 210 | 220 | 220 |
No. of clients = 288
req/s = 23.20
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 290 | 380 | 460 |
DELETE_thread | 190 | 290 | 380 |
GET_comment_list | 180 | 270 | 340 |
GET_thread | 180 | 240 | 300 |
GET_thread_list | 250 | 630 | 790 |
PATCH_comment | 260 | 350 | 440 |
PATCH_thread | 170 | 240 | 280 |
POST_comment_comment | 340 | 430 | 560 |
POST_comment_response | 290 | 390 | 470 |
POST_thread | 180 | 240 | 320 |
auto_auth | 210 | 210 | 210 |
No. of clients = 288
req/s = 22.2
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 280 | 420 | 1400 |
DELETE_thread | 180 | 240 | 270 |
GET_comment_list | 160 | 240 | 660 |
GET_thread | 160 | 240 | 490 |
GET_thread_list | 180 | 360 | 570 |
PATCH_comment | 270 | 360 | 740 |
PATCH_thread | 190 | 260 | 630 |
POST_comment_comment | 340 | 450 | 620 |
POST_comment_response | 290 | 440 | 1300 |
POST_thread | 180 | 240 | 1500 |
auto_auth | 220 | 220 | 220 |
No. of clients = 336
req/s = 26.70
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 290 | 410 | 570 |
DELETE_thread | 190 | 320 | 1200 |
GET_comment_list | 190 | 270 | 370 |
GET_thread | 180 | 250 | 350 |
GET_thread_list | 250 | 630 | 800 |
PATCH_comment | 240 | 370 | 410 |
PATCH_thread | 180 | 270 | 330 |
POST_comment_comment | 350 | 460 | 560 |
OST_comment_response | 300 | 390 | 470 |
POST_thread | 180 | 250 | 320 |
auto_auth | 240 | 250 | 250 |
No. of clients = 336
req/s = 28.2
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 290 | 440 | 2400 |
DELETE_thread | 190 | 260 | 440 |
GET_comment_list | 160 | 240 | 550 |
GET_thread | 170 | 240 | 690 |
GET_thread_list | 180 | 370 | 540 |
PATCH_comment | 270 | 380 | 1700 |
PATCH_thread | 190 | 280 | 1500 |
POST_comment_comment | 360 | 480 | 2400 |
OST_comment_response | 300 | 430 | 1300 |
POST_thread | 180 | 240 | 290 |
auto_auth | 240 | 260 | 260 |
No. of clients = 384
req/s = 31
Methods | median response time | 95% | 99% |
---|---|---|---|
DELETE_comment | 300 | 400 | 890 |
DELETE_thread | 190 | 300 | 320 |
GET_comment_list | 190 | 280 | 410 |
GET_thread | 180 | 260 | 360 |
GET_thread_list | 260 | 660 | 830 |
PATCH_comment | 230 | 380 | 430 |
PATCH_thread | 200 | 260 | 400 |
POST_comment_comment | 360 | 500 | 590 |
POST_comment_response | 300 | 410 | 470 |
POST_thread | 180 | 250 | 290 |
auto_auth | 290 | 360 | 360 |
No. of clients = 384
req/s = 31.4
...
Background:
In reference to
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
- Changed post/response/comment behaviour to update post's 'last_activity_at' only at time of creation of post and creation of response/comment on a post. Previously post's 'last_activity_at' was being updated for both creation and update.
- To calculate 'read' status of a post, used 'last_acitvity_at' instead of 'updated_at'.
- To calculate 'unread comment count' for a post, used 'created_at' instead of 'updated_at'.
'unread comment count' was being calculate as:
unread_comment_count = Comment.collection.find(:comment_thread_id => t._id, :author_id => {"$ne" => user.id}, :updated_at => {"$gte" => read_dates[thread_key]}).count
and had a compound index against it
index({_type: 1, comment_thread_id: 1, author_id: 1, updated_at: 1})
With new implementation:
unread_comment_count = Comment.collection.find(:comment_thread_id => t._id, :author_id => {"$ne" => user.id}, :created_at => {"$gte" => read_dates[thread_key]}).count
So, we removed index
index({_type: 1, comment_thread_id: 1, author_id: 1, updated_at: 1})and added a new one
index({comment_thread_id: 1, author_id: 1, created_at: 1})
Results:
The load tests were run on 4x c4.2xlarge instances for lms and 3x m4.large instances for forums.
The results of load tests below show differences between the old and new implementation. The two set of results looks quite similar except when it reaches "No. of clients = 336"; where there is huge difference between old and new percentiles as well as sudden rise in percentile for both old and new index with respect to "No. of clients = 224". For all the next tests (i.e. No. of clients = 460, No. of clients = 510, No. of clients = 578), the difference between old and new percentile is minimised and the new index results have lower percentile for most of the endpoints.
I have captured new relic charts too; permaLink_old_index with average rpm = 1.86k and permaLink_new_index with average rpm = 1.81k
Old Index | New Index | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No. of clients = 48048 req/s = 369.804
| No. of clients = 48048 req/s = 388.89
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No. of clients = 51096 req/s = 4118.405
| No. of clients = 51096 req/s = 4018.46
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No. of clients = | 544162 req/s = | 28.1031
| 340
| 1500
| 4700
| 210
| 1900
| 2900
| 220
| 1400
| 3200
| 200
| 2100
| 5400
| 290
| 1600
| 3300
| 290
| 1800
| 3200
| 220
| 1200
| 2900
| 410
| 1900
| 3800
| 350
| 1800
| 3800
| 200
| 1400
| 2900
| 260
| 270
| 270
| No. of clients = | 544162 req/s = | 4331. | 506
| 490
| 710
| 200 | 340 | 500
| 170 | 280 | 550
| 300 | 870
| list210 | 420 | 700
| 290 | 500 | 670
| 200 | 340 | 440
| 400 | 600 | 840
| 340 | 530 | 1000590
| 290 | 250
| 210 | 250 |
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No. of clients = | 578240 req/s = | 4544. | 906
| 370
| 2100
| 5500
| 210
| 950
| 1600
| 250
| 1700
| 4100
| 210
| 2800
| 5700
| 310
| 1800
| 4200
| 1100
| 2600
| 230
| 1700
| 4200
| 440
| 1900
| 5700
| 380
| 2000
| 5400
| 210
| 1700
| 5300
| 530
| 5300
| 5300
| No. of clients = | 578240 req/s = | 4745. | 740
| 330 | 540 | 2300470
| 200 | 320 |
| 180 | 310 | 1100
| 180 | 330 | 1900
| _list210 | 450 | 1200
| 610 | 3600
| 210 | 360 | 940
| 400 | 600 | 1300
| 340 | 550 | 1400
| 190
| 1200
| 240 | 280 | 280
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No. of clients = 610270 req/s = 3651.1
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No. of clients = 610300 req/s = 48.9
|
UPDATE:
To narrow down the behaviour of feature endpoints on high traffic and huge data, I have conducted some more tests and here are the results.
Case 1:
I ran a few tests with a fresh new course for each run and initial data of 100 threads, 10 responses each thread and 7 comments to each response in each course. The percentile shows acceptable numbers as opposed to above results where data was increasing in a single course with each run (see above "with profile image" column). Hence we know the increasing data in any course has directly proportional effect to response percentile.
...
| |||||||||||||||||||||||||||||||||||||||||||||||||
No. of clients = 336 req/s = 53
| No. of clients |
...
= 336 req/s = 54.3
| 350 | 520 | 990
| 220 | 330 | 1100
| 220 | 360 | 490
| 200 | 310 | 450
| 490 |
| 390 | ||||||||||||||||||||||||||||||||||||||||||||||||||
POST_comment_comment | 420 | 600 | 780 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
POST_comment_response | 350 | 530 | 720 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
POST_thread | 200 | 310 | 500 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
auto_auth | 230 | 240 | 240 |
No. of clients = 544: (error rate = 364 (1.57%) )
...
Case 2:
I had created a new fresh course populated in it a huge number of threads, responses and comments (in thousands) and then run GET thread and comment endpoints for both with profile image and without (as these are the only endpoints this changes is reflected in).
Comparing the two set of results; we can see the difference in 99% for with profile image but I believe the numbers are acceptable, only the last two cases that I have highlighted shows anomaly.
- For profile image: when no. of clients = 510 shows greater response time than no. of clients = 544. Its reason that I could assume is the users involved in thread and comments for later test are less in number possibly.
- For without profile image:
- 99% for no. of clients = 544 is greater than that of with profile image which is weird.
- there is an instant rise for clients=544 than for clients=510; to see if its a valid increase I used no. of clients somewhere between the two numbers i.e. 522, but the 99% was even higher than for 544, again weird.
With Profile Image | Without Profile Image
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
No. of clients = 460 req/s = 44.5
| No. of clients = 48460 req/s = 449.10error rate = 1 (0.03%)
No. of clients = 48 req/s = 4 error rate = 0
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No. of clients = 144510 req/s = 12 error rate = 0 32.4
No. of clients = 144 req/s = 11.4 error rate = 0
| No. of clients = 288510 req/s = 2333.9error rate = 0
No. of clients = 288 req/s = 23.3 error rate = 0
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
No. of clients = 510578 req/s = 42.3 error rate = 1(0.00%)26.70
No. of clients = 510 req/s = 40.8 error rate = 0
| No. of clients = 544578 req/s = 4440.402 error rate = 0
No. of clients = 544 req/s = 45.4 error rate = 0
|
...
|