Potential Organizational Velocity Improvements

Proposals

PriorityProject NameDependenciesDescriptionProjects Which BenefitNotesPain (0-5)Frequency (0-5)Cost (0-5)
1Speed up tests locallyNone

Currently, running a full test suite takes 'a long time', so people push to CI instead, which also takes a long time. If we shortened the full tests suite, or gave a good selection of fast smoke tests, devs could have a tighter turnaround time. Perhaps using per-test coverage data to do a reverse lookup based on edited lines?

 

All projects in edx-platformBJP (4/2015) - TestEng undertook an effort in summer 2014 to evaluate a tool to do this for us (i.e. auto-determine which tests to run using  nose-knows ). However, we think it is worth revisiting. See  TE-539 - Getting issue details... STATUS 152
1Speed up tests in CINone

For an average PR, just running the test jobs to completion takes 40 - 50 minutes, not including time in queue. That is enough time that developers context switch, which costs time in addition

 

All projects in edx-platform

TestEng is actively working on this, which lowers the usefulness of Platform putting time in.

 

BJP (4/2015) - A truer solution for this would be a more mature build pipeline (likely coupled with the above item). That would involve testing a subset on every PR commit, and having a way to run more tests at/near merge-time. This could also imply a different branching strategy.

152
 Move grade storage to submissions appNoneDave Ormsbee can fill in more, but essentially, we can change the storage model for grades such that they are append-only, and use that to make score computation for things like the progress page more predictable and less expensive.
  • SaudiX: Gradebook ETL
  • Analytics: Overall course grade distribution
  • Analytics: Export all student submissions to a problem or an assignment
  • Analytics: Export programming assignment submissions
 414
 Make Runtimes per-app, rather than per-modulestoreNoneConvert from Storage-centric runtimes to Application-centric runtimes  325
 Provide access to field storage via FieldData interface, rather than via Modulestore api methodsNoneWe can move towards the XBlock-standard model of field storage, which should make our runtimes have to jump through fewer hoops to work correctly, and will let us re-use some building blocks across different data stores.  315
2Kill XModuleNoneLots of technical debt here, some of which will go away with conversion to XBlock.
  • Devops: Sandboxing javascript isolation
  • SaudiX: Course Search Pt 2
  • T&L: Editing support for XBlocks in Studio
  • Load XModule/XBlock javascript with require.js
 435
4Kill XModule-specific APIs/workaroundsKill XModuleLots more technical debt related to making two classes look like one class. Contributes many of the most brittle code pieces.
  • T&L: Editing support for XBlocks in Studio
  • Analytics: XBlock displays of insights visualization
 515
 Switch InstructorTask to generate micro-tasks and use a distributed counter (memcache) for results on particular jobsNoneAsynchronous tasks pattern gives a strategy option. This code is really hairy right now, and jumps through a lot of hoops in order to not spam the relational database.
  • T&L: Email students in a cohort
 413
1Set up infrastructure for building performance tests (working with Corey, locust.io)NoneSetting up performance tests from scratch introduces risk to any project that needs to measure serverside performance.
  • Mobile: Infrastructure (VAL/VEDA) - Perf / scale - app/client
  • Mobile: Infrastructure (VAL/VEDA) - Perf / Scale - video pipeline
  • All projects that we want to measure performance impact
 332
3Move LMS to require.jsNoneIntegration of Require JS into the system
  • T&L: LMS Performance
  • Mobile: Responsive LMS chrome
  • Devops: Sandboxing javascript isolation
 423
3Load XModule/XBlock javascript with require.jsMove LMS to require.jsMoving XModule to require.js would allow us to get rid of the large blob of all of the XModule js that's loaded on every courseware page. Moving XBlock to requirejs would allow us to stop including js snippets on the main page directly (and hopefully cache the offloaded js).
  • T&L: LMS Performance
  • Mobile: Responsive LMS chrome
  • Devops: Sandboxing javascript isolation
 424
 Clean up certificates codeNoneThe certificates generation code is a mess
  • Dest Edx: Improve Self-Paced Courses and Course availability approach (includes Certificate presentation and Download)
 514
1Pre-process and check-in Javascript and CSSNoneCurrently, django collectstatic, and the requirejs optimizer, eat up lots of time any time they're invoked. If we separated the build of all of our javascript and CSS from the execution of the platform, then we'd save time on tests and on deployment, with the only cost being a test that verifies that no one forgot the asset compilation/checkin step.
  • All projects in edx-platform
 15

1

Definitions

Pain: How bad is this right now?

Frequency: How often do we have to experience the pain?

Cost: How much will this cost to fix?

All estimates are rough and relative only to the rest of the table, in Calen Pennington (Deactivated)'s estimation.

Input Data

Summary of Talking to Teams

I talked to most of the development teams about what they felt had made them slow historically. The two most common themes were that Jenkins and the test suite are slow to run, and that devstack setup is somewhat unstable, especially when switching between environments, or for occasional devstack users. Improving the LMS javascript architecture also came up from two different teams (T&L and destEdX).

Per-Team Notes

Analytics (Gabe)

  • Static assets compilation is slow
  • Paver tasks aren't well documented
  • Devstack requires rebuild after an absence from platform
    • issues with:
      • vmware shutdown
      • mongodb lock file
  • Lack of Python IDE
    • Common setup tips?
  • Long test cycle
    • Leads to task switching
  • no time to optimize workflow
    • cluster provisioning
    • 2hr acceptance tests
    • poor tooling for debugging
    • no automated validation
  • Dockerize vs long-lived cluster
  • stack uncertainty
  • code review tools
    • outdated comments
  • Support of legacy infrastructure - no tests, no non-prod environments, very brittle
    • analytics-server
    • analytics-exporter
  • bugs with low customer impact but cause periodic manual intervention

Destination edX (Steve S)

  • paver run_pylint is slow
  • jenkins build times
    • rerun particular shards?
  • devstack reinstalls on new machines
  • lack of theming/isolation
    • risk of breaking whitelabel/opensource
  • distractions
  • interaction with delicate code
    • third party auth
    • certificates
  • lms javascript architecture
  • a/b testing adds a tax

Solutions (Steve M)

  • devstack issues (~5 person-days)
    • switching environments
  • Code reviews are slow
  • Enforcement of standards may be slowing us down without providing value to our users or velocity
    • example: line length restrictions, whitespace nits
  • jenkins is slow
  • ramping up on additional testing requirements
    • bok_choy
  • building the right tests?
  • pep8 cleanliness combined w/ slow jenkins
    • pep8 enforcement a waste of time?
  • Do we have default/shared editor configurations?
  • Do we have a blessed technology list to help people make decisions about what technologies to use?
  • [ cdodge ] "flakey" acceptance tests cause many manual re-triggerings of Jenkins builds which take about 30-45 minutes to complete.

Teaching and Learning (Andy)

  • require.js/backbone lacking in LMS
  • Lack of XBlock editing in studio
  • lack of architecture docs (Daniel)
  • old libraries -> unfixed bugs to work around (Daniel)
  • interteam PR review process
    • no code ownership
  • devstack reinstalls are hard
  • jenkins turnaround is too long
  • quality included as part of tests results -> builds are always red until the end of a PR

Mobile (Nimisha + Chris)

  • Video module fires
    • delicate
    • legacy code
  • lack of mobile test environment
  • mobile dev environments
  • IDE issues
    • eclipse for android is slow
  • lack of standards for APIs
  • hard to communicate with other teams
    • hard to know who to talk to at first
    • interteam projects
  • no mobile escalation team
  • jenkins turnaround is slow
  • devstack setup
  • S3 for video pipeline
  • content library missing