Mocking modulestore - optimal seam investigation

There are a handful of places where we might be able to mock out modulestore to save unit test time. Here are the results of my investigation into those places and the estimated work / time savings for each. Due to time limitations and the limitations of the hotshot profile that Nose uses I restricted my benchmarking to CMS tests.

Mongomock at the split mongo layer

I was able to test this by hacking a mongomock client to replace self.database in mongo_connection.py (MongoConnection.__init__) fairly easily and making some changes in mongomock. All but a dozen CMS tests succeeded with these changes in place. Unfortunately benefits weren't great due to it only impacting split mongo tests, not mocking out GridFS, etc. 

Work

  • Write a mock to to monkey patch the hack I put in - 1d
  • Submit PRs to mongomock or fork and make changes - 1-2d
  • Identify tests that should be mocked and add mocking to them - 2d (there are hundreds) 
  • Investigate and fix up the dozen or so tests that were still failing (optional, could just not mock those tests) - 1d

Savings

  • 10-20 seconds on a local run of CMS-only tests
  • Estimated total across all paver tests ~80 seconds.

Advantages

  • Still uses a lot of modulestore code, tests more authentic
  • Conceptually simple
  • Mongomock handles almost everything we need for this

Disadvantages

  • Need to submit some PRs to, or fork, mongomock to make some necessary bug fixes (one example)
  • New tests might require incur more mongomock support costs as it is still fairly young
  • Leaves a lot of time on the table (modulestore, but not mongo, not in split, gridfs, etc)
  • Potentially hard to segregate tests w and w/o mock


Mongomock at the mongo connection layer

I was able to mostly test this by hacking a mongomock client into the return of mongo_utils.py (connect_to_mongodb) fairly easily. With some additional hacking of GridFS and mongomock I was able to get 90% of CMS tests to run successfully, and I'm fairly confident another day's work would pretty much close that gap as the vast majority of failing tests were DDT tests that should be fixed with the same change, or could simply not utilize the mocking.

Work

  • Write a mock to patch connect_to_mongodb - 1d
  • Submit PRs to mongomock or fork and make changes - 2-3d
  • Submit PRs to pymongo or fork and make changes - 1-2d
  • Identify tests that should be mocked and add mocking to them - 3d (there are hundreds) 
  • Investigate and fix up the tests that were failing or in CMS etc. (optional, could just not mock those tests) - 3d

Savings

  • ~70 seconds on a local run of CMS-only tests
  • Estimated total savings across all paver tests ~280 secs

Advantages

  • Saves quite a bit more time than mocking at the split layer
  • Still uses a lot of modulestore code, tests more authentic
  • Mongomock handles a lot of what we need for this
  • Clear boundaries

Disadvantages

  • Need to submit some PRs to, or fork, mongomock to make some necessary bug fixes (one example), new tests might require incur more mongomock support costs
  • Need some assistance from Mongo for GridFS fixup, or to fork pymongo and fix it ourselves
  • Still leaves a lot of time on the table (modulestore code outside of mongo)
  • Potentially hard to segregate tests w and w/o mock


New modulestore backend

Work

  • Create a new (in local memory?) modulestore backend that can behave like old mongo and split mongo as necessary (~15 days?)
    • ~40 public methods in old mongo backend, ~60 in split
    • May take additional time / iterations to realize the full savings
  • Identify tests that should be mocked and add mocking to them - 3d (there are hundreds) 

Savings

  • Potential savings of 10 min plus on a full paver test run
    • Depending on how much savings we can squeeze from skipping various to/from Mongo transformations, and whether we can fit everything we need to in local memory

Advantages

  • Saves the most test time

Disadvantages

  • Huge undertaking due to wide API and potential for some direct calls in to internals that may need to be mocked out or changed
  • Likely to grow in scope as new access patterns / edge cases are discovered
  • Needs to be maintained along with other storage engines
  • Less authentic tests, may need to beef up integration tests for explicit backend old / split mongo testing


K/V level mock

Mocking out the k/v store was suggested as a possible place to look at. Presuming I'm looking at the right places (InheritanceKeyValueStore / MongoKeyValueStore / SplitMongoKVS) it doesn't seem to be a place where we are spending much time. Further it seems like there are no clear optimizations to be made in here as the data seems to already be stored fundamentally in local memory as a dict. The time spent in all of the KVSs is somewhere in the ~10-15 second range for CMS tests. It's unclear how we could improve on that in a mock or what the gains might be.