Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In terms of management commands to rebuild things, I'd like to create a generic publish-signal command, and make it the responsibility of the individual listening tasks to determine whether or not they need to do work, and how much (e.g. collecting missing pieces). I've gone back and forth on this, but I'm afraid of having too many code paths, or forcing people upgrading from Cypress to Elm to run five different bootstrapping scripts (course overviews, course structures, block transforms, etc.).

 

...

Next Steps: More Efficient Caching

Goals

  1. Store data as compactly as possible.
  2. Don't query data for nodes we don't need (e.g. all the course's structure data to render a vertical).
  3. Don't query data for transforms we don't need.
  4. Data should be stored such that we can rebuild individual Transformer collects without rebuilding everything.

Proposal

  1. A Block Cache Unit has each of the following, stored as a separate key:
    1. structure
    2. one key for each Transformer's collect, for all the nodes in the structure (inside would both transform-level data as well as per-block data).
    3. one key for each requested XBlock field
  2. Two tiers:
    1. Course wide BCU in permanent storage.
    2. Memcached sub-tree BCUs for any particular requested part of the course, quickly derived from the course level BCU while serving requests.
  3. Packing data:
    1. Because the collected data always has an associated BCU, we don't actually need to store the location keys in anything but the structure. Everything else can assume an order based on sorting the keys found in the structure, and use an array to represent values. This has a huge impact for our ability to separate out collect data, since for many transformers, the location key is actually much larger than the data they want to collect.
      1. Another possibility is using something like xxHash to represent the locations for really sparse data, but I don't think that's necessary at this time.
    2. We can exploit the repetitive or sparse nature of many fields. For instance, if we store 4K entries of "graded":false in the middle of a lot of other attribute data, it can take up a fair amount of space. However, when flattened out into a list with just this attribute's values, the large course (which has 900+ graded items) compresses down to 72 bytes. Doing this on the XBlock Field data for the large course brought the compressed size of that section down from 96K to 23K. Extrapolating out from that (and assuming that we eliminate some redundant structure data), we could have a 4X improvement storage on the part of our system that's most likely to grow.