Clean Code Notes - Run 2

August 29 - Chapter 1

September 5 - Chapters 2, 3

Chapter 2 - Naming

Overall:

  • Names should be unambiguous and clear and should reveal their intent. Your reader should be able to assess the purpose of the named entity from the name alone.

Dos:

  • Names should not require mental mapping to extract meaning.
  • Class and object names should have noun or noun phrase names
  • Method names should have verb or verb phrase names.
  • Similar concepts should have names that are spelled similarly to aid code completion/discovery.
  • Attempt to make names that are spelled similarly easily distinguishable.
  • Make names distinguishable in their purpose.
  • Make names pronounceable.
  • Use names that are easily searchable, so no single-letter variables or numeric constants.
  • Pick one word per concept.
    • For example, don’t use fetch, retrieve, and get as method names unless they are semantically different.

Donts:

  • Don’t use abbreviations. Don’t use lower-case l or upper-case o as variable names.
  • Don’t use single letter variables unless they’re extremely limited in scope and purpose.”The length of a name should correspond to the size of its scope.”
  • For example, an index in a small loop.
  • Don’t be “disinformative”. Words have meaning, and readers will expect certain qualities or behaviors from your named entities based on their names.
    • For example, don’t refer to a collection of accounts as accountList unless it is actually a list. Author alludes to not encoding container type in names anyway.
  • Don’t use inconsistent spelling.
  • Do not use encodings.
  • When dealing with an interface and the implementing class(es), favor encoding the implementation over the interface, if you must.
  • Do not use member prefixes.


Specific Names to Avoid:

  • list series naming (a1, a2, … aN)
  • noise words (ProductInfo or ProductData; what purpose do “Info” or “Data” serve?)


"People are also afraid of renaming things for fear that some other developers will object. We do not share that fear and find that we are actually grateful when names change (for the better). Most of the time we don’t really memorize the names of classes and methods"

What is noise?

  • "Manager" in a name is useless in practice
    • Might sometimes be necessary to use?

Refactoring

  • Pycharm refactoring utilities - difficult to do well in python because there isn't static typing

If you are using an integer vs. a variable, use a variable if you're going to use it more than once

  • Consistency across codebase
  • No "magic numbers/strings"

Use something from the solution domain, not the problem domain

  • Opposite of domain-driven design book?
  • Clearly separate two types of code
    • Factory vs. factory implementation

-I prefix for interfaces vs. -Implementation prefix for implementation of interfaces

  • Interface prefix helps programmers look for subclasses
  • Implementation prefix conveys that you are by default interacting with the interface
  • Should just follow given conventions

Summary: names need to be clear and understandable

Chapter 3 - Functions

Small functions and stepdown rule make it easier to test code

Separating on level of abstraction different but helpful way to approach writing functions

Low-level implementation should be separated from higher-level functions

Functions vs. methods:

  • Python files encapsulate functionality, even if the functions in a python file are not all necessarily methods
  • Should function know about the outside world? (Requests, django settings, etc)
    • Nimisha: context is needed
  • Functions should have everything passed into them, and not mutate other objects
  • Keep number of arguments short vs. functions should have everything passed into them
    • If you keep a function very short, then this isn't a problem
    • Python: can use dictionaries
    • "Compositional functions" (that call many other methods/functions) might need many things passed into them, but lower level functions should be shorter and simpler

Does Python's looseness contribute to messier code?

  • Pep8 as a solution to this problem
  • Sometimes Java's rigidity prevents clean code practices

Higher-level functions that require a lot of information:

  • Functions that dictate what the user sees
    • to render a page, header, request, and template, with context, are needed
  • Courseware page - "monolith" page
  • Could refactor and make many components
  • Elm
    • Takes immutable model variable, one solution to passing in too many variables
  • React
    • "Smart components" vs "dumb components"
      • Dumb components only given what they need, unaware of anything else
      • Smart are state-aware, aware of their context
    • Attempt to use, as often as possible, dumb components
  • Javascript
    • pushes you into passing many parameters
    • Similar to Python with keyword arguments and kwargs?
    • A lot of edx-platform uses old, outdated javascript (backbone, coffeescript)
    • Hopefully frameworks like React help us enforce better design patterns

Adding another parameter into a function

  • Does that extra parameter introduce another "thing" that the function does?
    • If yes, should add another function
  • Boy scout rule ... 11 parameters is no good, should probably refactor

Active code reviews/PRs contribute to better code quality

Liked point about having a logical/natural ordering to arguments

  • assertEqual(expected, equal) example ... which order do the arguments go in?

Summary: keep functions short, do only 1 thing, and readable. Keep number of parameters to a function low (3 or fewer). Refactor code as you go along.

Chapter 4: Comments

  • Encapsulate the comment into the name of the function
    • encourages smaller logically separated functions
    • detours size creep
  • The problem with required docstrings
    • this can result in meaningless descriptions
    • generally, our linting at edX doesn't require docstrings
  • Radical thinking: comments are a failure
  • What's the level of comfort with refactoring?
    • Test coverage feels like a security blanket
    • Understanding of the architecture, system, file

Chapter 5: Formatting

  • edX has a lot of guidelines for formatting (e.g. edx-lint)
    • unfortunately, it isn't enforced on legacy code
  • edX has a different recommendation for docstrings (80) vs. code (120)
  • Ordering related functions
    • step down rule – helper functions come after the root
    • the opposite (helper functions first) is used in our codebase too
      • argument is that python encourages bottom up because main is defined at the end
    • the book recommends reading down from general to specific
    • the key is to think about your reader and how to best convey information
  • Homework:
    • Find code samples of step-down and bottom-up-down and discuss at the next meeting

Chapter 6: Objects and Data Structures

  • Distinction between objects and "bundles of data" hadn't really occurred to many people before
  • We're using the "hybrid" anti-pattern in many of our Django models (because Django kind of encourages it)
  • How relevant is this to Python, where "private" isn't a direct language construct?
    • But there is the leading underscore convention.
    • Which we should probably use more consistently
  • Law of Demeter / train wreck was deemed somewhat insightful
    • Maybe chaining is ok without mutation? (JS / jQuery)
    • IDEs arguably make this too easy
    • Makes refactoring difficult

Summary: Objects expose behavior and hide data; Data Structures expose data and have no significant behavior.

Chapter 7: Error Handling

  • Agreement that returning "None" should be avoided when practical
  • "Special case objects" as return values can in fact make code work more smoothly
  • It's sometimes good for exceptions to reach the top level; if they're handled too silently, we may not realize it's happening
  • Use HTTP error codes wisely
  • When to catch and translate an exception vs. passing it along?
    • If it's an expected condition, give it an appropriate exception and message
    • If it caught us by surprise, just pass it along to retain all useful detail for debugging?

Summary: Indicate errors by throwing exceptions rather than relying on special return types. Propagate exceptions to places that should know about them.

Chapter 8: Boundaries

Summary: Interfaces are cool. Use tests to lock down API behaviors before relying on the behaviors in production code.  Use interfaces to mock out non-existing code.

  • Writing "learning" tests along the way, and they will eventually become your integration tests.
  • Adapter pattern - modify the API according to your needs.

Chapter 9: Unit Tests

  • TDD (test driven development):
    • edX platform might be too large/not condusive for TDD
    • Cycle of testing → writing production code → testing too long
  • Coverage
    • edX used to look at test coverage, but don't anymore
  • One assert per test method
    • Might be overkill, especially if high setup costs
  • Bok choy tests
    • Should not set up a large amount of test setup and only test one small change at the end
    • Should be feature tests, not unit tests
  • Too many testing frameworks makes it difficult to know where to start/what framework to use for each use case
  • Write tests as close to the code that they're testing
    • ex. Cover python code with python unit tests
  • Easy to get false positives if mocking too many functions
    • autospec
  • One school of thought: "You don't need to write tests to test private methods"
    • Difficult to know how "deep" to go into the code when testing - should you go into the private methods?
  • Does not make sense to have a separate test eng team that writes code
  • Are lettuce and bok choy tests needed?

Chapter 10: Classes

  • Many small classes (and in Java, files), vs. a few large ones with a bunch of things thrown into them.
  • Mixin hell, and relatedly, class-based views in Django.  George does not recommend.
    • What is a mixin?  No distinction vs. "real inheritance" from python language, but by convention, we treat it basically like interfaces with non-abstract functionality.
  • Dependency injection with python - py.test fixtures are a good example.  Another example from George: https://github.com/edx/openedx-webhooks/blob/development/openedx_webhooks/github/dispatcher/actions/github_activity.py#L24
  • Dependency Inversion Principle - our classes should depend on abstractions, not concrete details.
  • George states that the rule of thumb in JS community is to keep inheritance heirarchies to depth <= 2.  We mostly agree that this makes code easier to reason about.
  • Balance between ease-of-use and richness of features: too much magic makes it harder to debug and understand code.
  • OCP - Classes that are open for extension, but closed to modification.  If we adopted this for openedx, wouldn't it lend itself to more and more subclasses, plugins?
    • Example: auto-complete libraries in JS - many library providers try to provide something that is responsible for behavior and rendering, which can become messy.  Better to isolate behavior to library, and have rendering left to the user (or other library).
  • Balance of OCP with single-resp. principle.
  • Nimisha - An example of how we like to organize code is in grades:
    • CourseGradeFactory - responsible for creating domain objects (given some input data).
    • CourseGrade - A Domain object (that doesn't deal with backend/storage) that is responsible for business logic.
    • PersistentCourseGrade - A Django models.Model that is only responsible for storage and queries against the database.
  • We think it would be a good idea to do a group design kata together at the end of this book club.
    • George says we could also jump right into React and Redux.

Chapter 11: Systems

  • Beans and POJOs (Plain Old Java Objects)
  • Dependency Injection in Python
    • barrier entry of low since you can pass functions around
  • Duck typing
    • how does one specify the dependency requirements of input parameters in Python?
      • docstrings
      • annotation of methods using Pycontracts
      • Python 3.5 comes with typing
  • Main take away
    • Being aware of dependencies
    • Separation between layers
    • Distinction between core and the details at scale
      • Once you choose a framework, it's super-hard to switch over
    • No Big Design Up Front
      • Delayed decision making
    • With detailed upfront design, it's inevitable that you'll run into issues when you eventually get into implementation phase
    • Separating construction of the system from its use
      • For example, at initialization time, keep construction separate from wiring
  • Also, keep in mind Javascript since similar concepts apply
  • What's the mindset when stepping into an established system vs. design of a brand-new system?
    • With IDAs, or even djangoapps (or experimental addaptive learning systems), we're adding or modifying subsystems all of the time.
  • How do we do system-level architecture at edX?
    • Has evolved over time from Arch-council (few years ago) to now more team-specific (bit ad-hoc) designing.
    • Our cross-cutting efforts now focus more on techonological decisions/discussions - for example, django-upgrade, docker, FED-modernization, etc.
  • 12-factor App: https://12factor.net/
  • Abstract architecture versus Reality architecture
    • sometimes it's easier said than done in the abstract
    • the book provides guidelines
  • Domain-specific language (DSL)
    • for example, Ansible has this
    • By having DSL, developers now need to learn a new language
    • More commonly used in the Ruby community - since the language is very flexible
    • Lettuce tests and RSpec are other examples
  • Over-designing examples
    • Shea stadium built too big
    • Parts of xBlock framework
  • Coupling of database model from object model in django
    • By having clients know that it's linked to a database, clients remember that they are using the database, so could potentially result in a more performant system.
    • On the other hand, by having an object model interface that is well optimized (via caching, optimized queries, etc), allows clients not to need to understand the details of your data model (including indices, etc) and how it should be used.
    • However, needs proper monitoring to verify that clients aren't calling the interface in an unintended way.

Chapter 12: Emergence

  • Easier to write code than read it
    • Similar to: easier to write shorthand notes on a lecture than to give a full lecture to someone
  • How much refactoring is too much?
    • For every few lines of code, stop and think if you can simplify
    • Too hard to refactor everything at once, should incrementally refactor
  • Early returns
    • Grouping early returns into another function and calling that function
    • Keep functions small, and then early returns are not a problem to have
  • Alex is a fan of the template method

Chapter 13: Concurrency

  • Myths and misconceptions
  • Seperate concurrent code from non-concurrent code
    • Celery makes this difficult
    • Celery tasks don't really do the same thing as our python code, so they won't tend to step on each other 
  • Easy to make new processes in Python
    • Processes don't share memory so you don't run into Java-y problems
  • Follow functional programming paradigm
  • RQ for making processes in Python

Code Smells

  • Comments
    • C1 - C5
      • "Most comments are lies"
      • Easy to fix, prevalent in the codebase
  • Environment
    • E1 - Build requires more than one step
      • Should keep this in mind, because we don't do this often enough
      • Does this count as code? Is it worth bringing up?
  • Functions
    • F1 - Too many arguments
      • For python, too many positional arguments, but not too many keyword arguments
    • F3 - Flag arguments, G15 - Selector arguments
      • Keyword boolean arguments - ok
      • Positional boolean arguments - not ok
      • Also easy to fix
    • F4 - Dead function, G9 - Dead code, G12 - Clutter
      • Strongly agree with this one, should get rid of dead code whenever possible
      • Coverage for dead code?
  • General
    • G2 - (Non)Obvious behavior is unimplemented
      • Generally agree with G2, except don't implement things just because you think you might need them, even if you don't have a use case for them currently
    • G4 - Overwritten safeties
      • What is an example of overwritten safeties in our codebase?
        • except <Exception>
      • It is unclear what this means for us
    • G5 - Duplication/DRY 
    • G6 (Code at wrong level of abstraction), G7 (Base classes depending on their derivatives), G8 (Too much information)
      • Yes, but maybe for later
    • G8 - Too much information
      • Edx-platform has a few cases where there are more than one way to do something, or very similar functions
      • Would this be easy to work on/educate people on?
        • "Poorly defined interface provides lots of functions that you must call" - easy
          • Example: In comment client, to find all the children of a thread, you must first find() the thread and then call retrieve with responses=True
      • Not a well-written rule, if we include this we should try to come up with some concrete examples
    • G10 - Vertical separation, G11 - Inconsistency
      • Nice for the future
    • G16 - Obscured intent
      • Important, but is this significant/prevalent in the codebase?
      • We already have a document for this
      • Also a little vague, we can probably wrap the ideas of this into some more precise rules
    • G19 - Use explanatory variables, G28 - Encapsulate conditionals, G29 - Avoid negative conditionals
      • Explain your code through variables, explain your code through conditionals
      • G28 is not done as regularly as it should be
      • PEP8 - What's the right way to wrap this really long line? Clean code
    • G20 - Function names should say what they do
    • G23 - Prefer polymorphism to if/else or switch/case
      • Important, but for later
    • G24 - Follow standard conventions
      • Is our style guide followed?
    • G25 - Replace magic numbers with named constants
      • Should do, but easily overlooked
      • Should we explore the different "meanings" of random numbers?
        • Not just a random integer, but any other magic value
    • G30 - Functions should do one thing, G34 - Functions should only descend one level of abstraction
      • What is "one thing"? Even in the "clean code" example, the function does more than one thing.
      • → Modify this rule to be "functions should be smaller", one way of thinking about it is through G34
      • Pain of super long functions is better than arbitrarily short functions - it might be better to have overly short than overly long functions
      • There was once a linter enabled for too many lines in a function - where did that go?
    • G36 - Avoid transitive navigation
      • Python - does it encourage transitive navigation or discourage it?
      • Sometimes used in django model queries to get better performance at the expense of clean code, but it is all at the same level of abstraction
  • Java
    • J1 - Avoid long import lists by using wildcards
      • DEFINITELY NOT THIS
  • Names
    • N1 - Choose descriptive names
      • N1 - N7 all are standards for using good names for things
    • N5 - Use long names for long scopes
      • But also...use short names for small scopes
  • Tests
    • None for now.
  • Not covered in this chapter, but we've discussed in this book club:
    • Side-effect free functions
    • Boy Scout Rule
    • Step-down/Step-up rule → relates to G30, G34, G6
    • TODO comments (including ticket #, expiration date)
    • Controversial
      • Returning None versus Raising an Exception
      • Early Returns

Presentation

  • Decisions:
    • Combo mock trial and presentation? Just presentation? Just mock trial?
    • Weekly "Clean Code" email?  "Clean Code Moment" at Eng meetings?  defer this decision
  • Pick a time: December 20th, 12 PM December 19th, 10am-11:30am

Present (30 min)

Goals for having Clean Code Best Practices

  • Time to Value 
    • Efficient PR reviews - alignment, common terminology, reference best practices document
    • More maintainable code
    • More readable code
    • More extensible code
  • Open source support
    • Efficient PR reviews
    • Consistency

Best Practices presentation

Code smells presented as recommendations at the beginning of the meeting:

Mock Trials (15 min each, total 30 min)

Goals

  1. To remind folks that things aren't always clear-cut and it's important for folks to think for themselves as each context is different - even though there's a well-respected published book on this topic.
  2. To remind folks that sometimes things can get subjective and be conscious of our own righteous/religious POVs when reviewing people's code.
  3. To have fun as we explore these topics further in edX engineering with light-hearted debates.

Roles

Trials

(Defendants: Uncle Bob's position on these)