Django 1.8 Retro

What went well

  • Coordination between us and Lahore
    • Timezone shift = follow the sun development
  • Only one bug on release day
  • Released on time (after updated schedule)

    • Touched over 500 files with no regressions and no new defects

  • Bug bash was successful and offers a model for the future
  • Diverse team - TNL, platform, devops - enabled us to understand the side effects of what we were doing
  • Estimation overall was pretty good 
    • Initially TNL team estimated 4.5 months
    • After adding resources, able to accomplish in 2.5 months
      • Work was well suited for parallelizing, particularly addressing test failures
  • Ned's test output categorization tool was very helpful!
  • Pairing on transactional changes worked well in terms of peer review back and forth
  • Improved a lot of things along the way
    • Requirements management - everything off of tags or hashes
  • Rollback plan was written before we pushed to production

What went poorly / what can we improve

  • We didn't finish before Django 1.4 was EOL'd (JE, FP)
    • We didn't start early enough
  • Upgrading from 1.4 to 1.8 was a large jump (UK)
    • Jump included switching migration frameworks, the transaction framework had changed, etc - everything touching data changed
      • We didn't expect migrations to be such a huge hassle
  • We have a lot of use cases for the platform and considering all of them created a lot of work
    • Some of this is not finished - we need to document the Cypress to Dogwood upgrade (potentially engineering work also)
  • The Python toolchain sucks (FP, NB, JE, UK, MA, BB)
    • Dependency management sucks
      • Lots of pip quirks, including when there are multiple references to a dependency, last one wins
    • Similar dependency issue with satellite repos referencing each other caused issues and could continue to cause issues
      • Particularly an issue with devstack which does incremental updates
  • Bug bash found lots of existing bugs (JE)
  • We tried to write down what we were going to do, but we ended up winging it a lot anyway (NB, FP, BB)
    • Merge back to master plan
    • Release plan
    • Should have picked a consistent branch name
      • Didn't understand the variety of git usages across repos
  • Some confusion resulted in duplicated work between developers (NB)
    • Lots of people in the same codebase
    • Wasn't a huge drag
  • Rollback plan came together late
  • Load testing should have been reviewed more thoroughly

Python toolchain sucks

  • We mean pip, virtualenv, setuptools, etc
  • We need to understand it better
  • We need to put discipline around and and set policies about how its used
    • Standardization about how we write requirements.txt files
    • Need to clean up what we have - too much copypasta
  • We should push for changes to pip that we think make sense
  • Investigate other tools
  • Upgrade pip after upgrading to Python 2.7.10, which gives us pip-tools
  • We need an owner for pushing work forward on dependency management
    • Ned has volunteered along with Feanil and Brian
    • Think about what we need from a tool in order to do the next one better
  • What group does this fall to? Release/deployment, coding standards, new?

Wrote things down but ended up winging it

  • There's a difference between writing things down and figuring out what needs to be written down
  • The satellite merge plan was not fleshed out well enough
  • Huge number of repositories made documenting all the different scenarios difficult
  • The fact that satellites have references to other satellites caused issues
  • Making it up as you go isn't necessarily a problem, so long as that decision is made consciously
    • Need to write it down as you go if we start winging it
  • Muzaffar's wiki page on satellite repos was very useful
    • We should add relationships between repos
  • Need to designate a scribe when documenting plans as a group, and ask the question "who is in charge?"
    • Who is the keeper of history?
    • Write notes each day on what was done each day
    • This improved over the course of the project - nightly notes were sent
      • This came together when the work started to converge
    • Documentation can be overdone - can't document every little issue, otherwise we'd be overwhelmed


Action items

  •  Brian Beggs Finish converting bug bash tickets
  • Feanil Patel (Deactivated) Create or join working group regarding deployment & dependency management toolchain
    • Ticket to adopt better dependency management tools - pipdeptree or find or build our own
    • Document the best practices learned while doing this work, particularly wrt requirements and dependency management
    • Standardize release branching process across repos (though there are some that are forks that complicate this)
  • Can we align devstack more closely with production by fixing pip or some other strategy?
  • Joel Barciauskas (Deactivated) What is our overall Django upgrade strategy? Document the shell of a plan for how to tackle large upgrades like this based on our experience
    • As part of documentation of best practices, consider a "scribe" role
  • Feanil Patel (Deactivated) Rename the table outside devops "the crisis table" and announce it at the eng all hands