Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

These are notes on the entire load testing process for the LMS commerce order creation endpoint. The LMS endpoint calls an ecommerce order creation and fulfillment endpoint, which calls back to the LMS enrollment API during fulfillment.

TL;DR

When running at 78 RPM (on two m3.medium instances), 95% of requests to the LMS commerce endpoint complete in 460 ms.

When running at 1000 RPM (on two m3.xlarge instances), 95% of requests to the LMS commerce endpoint complete in 460 ms.

Order fulfillment is bottlenecked by the enrollment API.

Performance

...

Precedents

For reference, the statistics below represent throughput and response times, averaged over the seven day period between March 5, 2015 and March 12, 2015, for the change_enrollment (iframe) and EnrollmentListView (enrollment API) processes on the LMS. These serve as targets for our new endpoint.

...

However, it quickly became clear that the ecommerce service was not persisting anything to the database. The Oscar management dashboard appeared as follows:

Image Modified

Note the last order at 5:58 PM UTC, or 1:58 PM EDT. New Relic appeared as follows:

Image Modified

 

We disabled autocommit at about 1:58 PM. However, Splunk logs showed orders being created and fulfilled successfully after 1:58, and enrollments for the demo course were steadily increasing at some point between 2 and 3 PM.

 

Despite New Relic showing order_line select and update operations, it seems like order model objects were being created and operated on in memory without being persisted to the database. I confirmed that each of the utility functions we're using from Oscar core (e.g., place_order, used to create an order) explicitly save model objects before returning them, yet disabling autocommit appears to prevent the SQL generated by those saves from being committed to the database, behavior which aligns with what's described in the docs.
One point of confusion was that order numbers visible in Splunk continued to be incremented. Order numbers are created using basket IDs, so it would seem that incrementing them requires the baskets being created for each new user to be written to the database. However, it's more likely that Django was instantiating models in memory and assigning them IDs, but not persisting these in the database.


The "MySQL other" issue was finally resolved by leaving autocommit enabled (it's set to TRUE by default) and enabling atomic requests. When enabled, autocommit wraps every SQL query in a transaction, committing it to the database on success. Django is smart enough to prevent SQL queries from being wrapped in transactions when those queries are executed in the context of an existing transaction. Enabling atomic requests wraps every HTTP request to a view in a transaction. As a result, instead of committing queries to the database many times per request, Django commits queries to the database once per request after the successful conclusion of the transaction wrapping the target view. The result is lower database response times (atomic requests and autocommit were enabled together at 12:46 PM):

...

These statistics serve as a baseline for a single m3.medium instance running three synchronous workers. The number of concurrent users matches the number of available workers, yielding optimal performance.

UsersRPSRPM95%Error rate
30.3923.45700.00%

While running this test, we observed that we were only utilizing 1% of CPU on the server. Since we're bottlenecked by the number of workers on the server, Ed suggested increasing the number of workers on the server until we hit 60-80% CPU utilization so that we can use our allotted resources more effectively.

...