We can measure how latency correlates with different courses, XBlock types, and size of request (both data moved and number of records).
: Are there any other metrics you think should be captured while doing this work?
I assume we can get method level detail about hot-spots from TX traces if we need to get to that level And can also get geo data already in insights.
This seems like something that could be a common part of our "common client" library. Steps toward generalization would be great, say segregating NR specific details.
Will this be enabled/disabled via a config model?
We can get method level detail for hot-spots, but we'd have to add some specific instrumentation to get the real low level detail stuff. That being said, most of the concern here is just database I/O, and I wasn't really planning to do CPU level profiling in this PR. There are a couple of caveats around geo-data (we'd need to query against PageView for that, not Transaction, so we'd lose some % of our overall CSM requests), but I don't think geo really matters for the purposes of measuring this particular issue.
I generally support the idea of putting an abstraction around this, but I'm honestly not sure what that would be at this point. Datadog and New Relic don't really match up in terms of their semantics, and AppNeta doesn't support custom metrics like Insights does. Do you have a sense of what else we might plug in underneath?
I was not planning to gate it with a config model, but just have it turned on if New Relic was running, like our other custom metrics. Do you think it's important to flag separately?
Takeaway from offline discussion is that there is room for improvement re: batching/caching writes to xblock user state. This will be a new story.