Clean Code Notes - Run 2
- 1 August 29 - Chapter 1
- 2 September 5 - Chapters 2, 3
- 3 Chapter 4: Comments
- 4 Chapter 5: Formatting
- 5 Chapter 6: Objects and Data Structures
- 6 Chapter 7: Error Handling
- 7 Chapter 8: Boundaries
- 8 Chapter 9: Unit Tests
- 9 Chapter 10: Classes
- 10 Chapter 11: Systems
- 11 Chapter 12: Emergence
- 12 Chapter 13: Concurrency
- 13 Code Smells
- 14 Presentation
August 29 - Chapter 1
September 5 - Chapters 2, 3
Chapter 2 - Naming
Overall:
Names should be unambiguous and clear and should reveal their intent. Your reader should be able to assess the purpose of the named entity from the name alone.
Dos:
Names should not require mental mapping to extract meaning.
Class and object names should have noun or noun phrase names
Method names should have verb or verb phrase names.
Similar concepts should have names that are spelled similarly to aid code completion/discovery.
Attempt to make names that are spelled similarly easily distinguishable.
Make names distinguishable in their purpose.
Make names pronounceable.
Use names that are easily searchable, so no single-letter variables or numeric constants.
Pick one word per concept.
For example, don’t use fetch, retrieve, and get as method names unless they are semantically different.
Donts:
Don’t use abbreviations. Don’t use lower-case l or upper-case o as variable names.
Don’t use single letter variables unless they’re extremely limited in scope and purpose.”The length of a name should correspond to the size of its scope.”
For example, an index in a small loop.
Don’t be “disinformative”. Words have meaning, and readers will expect certain qualities or behaviors from your named entities based on their names.
For example, don’t refer to a collection of accounts as accountList unless it is actually a list. Author alludes to not encoding container type in names anyway.
Don’t use inconsistent spelling.
Do not use encodings.
When dealing with an interface and the implementing class(es), favor encoding the implementation over the interface, if you must.
Do not use member prefixes.
Specific Names to Avoid:
list series naming (a1, a2, … aN)
noise words (ProductInfo or ProductData; what purpose do “Info” or “Data” serve?)
"People are also afraid of renaming things for fear that some other developers will object. We do not share that fear and find that we are actually grateful when names change (for the better). Most of the time we don’t really memorize the names of classes and methods"
What is noise?
"Manager" in a name is useless in practice
Might sometimes be necessary to use?
Refactoring
Pycharm refactoring utilities - difficult to do well in python because there isn't static typing
If you are using an integer vs. a variable, use a variable if you're going to use it more than once
Consistency across codebase
No "magic numbers/strings"
Use something from the solution domain, not the problem domain
Opposite of domain-driven design book?
Clearly separate two types of code
Factory vs. factory implementation
-I prefix for interfaces vs. -Implementation prefix for implementation of interfaces
Interface prefix helps programmers look for subclasses
Implementation prefix conveys that you are by default interacting with the interface
Should just follow given conventions
Summary: names need to be clear and understandable
Chapter 3 - Functions
Small functions and stepdown rule make it easier to test code
Separating on level of abstraction different but helpful way to approach writing functions
Low-level implementation should be separated from higher-level functions
Functions vs. methods:
Python files encapsulate functionality, even if the functions in a python file are not all necessarily methods
Should function know about the outside world? (Requests, django settings, etc)
Nimisha: context is needed
Functions should have everything passed into them, and not mutate other objects
Keep number of arguments short vs. functions should have everything passed into them
If you keep a function very short, then this isn't a problem
Python: can use dictionaries
"Compositional functions" (that call many other methods/functions) might need many things passed into them, but lower level functions should be shorter and simpler
Does Python's looseness contribute to messier code?
Pep8 as a solution to this problem
Sometimes Java's rigidity prevents clean code practices
Higher-level functions that require a lot of information:
Functions that dictate what the user sees
to render a page, header, request, and template, with context, are needed
Courseware page - "monolith" page
Could refactor and make many components
Elm
Takes immutable model variable, one solution to passing in too many variables
React
"Smart components" vs "dumb components"
Dumb components only given what they need, unaware of anything else
Smart are state-aware, aware of their context
Attempt to use, as often as possible, dumb components
Javascript
pushes you into passing many parameters
Similar to Python with keyword arguments and kwargs?
A lot of edx-platform uses old, outdated javascript (backbone, coffeescript)
Hopefully frameworks like React help us enforce better design patterns
Adding another parameter into a function
Does that extra parameter introduce another "thing" that the function does?
If yes, should add another function
Boy scout rule ... 11 parameters is no good, should probably refactor
Active code reviews/PRs contribute to better code quality
Liked point about having a logical/natural ordering to arguments
assertEqual(expected, equal) example ... which order do the arguments go in?
Summary: keep functions short, do only 1 thing, and readable. Keep number of parameters to a function low (3 or fewer). Refactor code as you go along.
Chapter 4: Comments
Encapsulate the comment into the name of the function
encourages smaller logically separated functions
detours size creep
The problem with required docstrings
this can result in meaningless descriptions
generally, our linting at edX doesn't require docstrings
Radical thinking: comments are a failure
What's the level of comfort with refactoring?
Test coverage feels like a security blanket
Understanding of the architecture, system, file
Chapter 5: Formatting
edX has a lot of guidelines for formatting (e.g. edx-lint)
unfortunately, it isn't enforced on legacy code
edX has a different recommendation for docstrings (80) vs. code (120)
Ordering related functions
step down rule – helper functions come after the root
the opposite (helper functions first) is used in our codebase too
argument is that python encourages bottom up because main is defined at the end
the book recommends reading down from general to specific
the key is to think about your reader and how to best convey information
Homework:
Find code samples of step-down and bottom-up-down and discuss at the next meeting
Chapter 6: Objects and Data Structures
Distinction between objects and "bundles of data" hadn't really occurred to many people before
We're using the "hybrid" anti-pattern in many of our Django models (because Django kind of encourages it)
How relevant is this to Python, where "private" isn't a direct language construct?
But there is the leading underscore convention.
Which we should probably use more consistently
Law of Demeter / train wreck was deemed somewhat insightful
Maybe chaining is ok without mutation? (JS / jQuery)
IDEs arguably make this too easy
Makes refactoring difficult
Summary: Objects expose behavior and hide data; Data Structures expose data and have no significant behavior.
Chapter 7: Error Handling
Agreement that returning "None" should be avoided when practical
"Special case objects" as return values can in fact make code work more smoothly
It's sometimes good for exceptions to reach the top level; if they're handled too silently, we may not realize it's happening
Use HTTP error codes wisely
When to catch and translate an exception vs. passing it along?
If it's an expected condition, give it an appropriate exception and message
If it caught us by surprise, just pass it along to retain all useful detail for debugging?
Summary: Indicate errors by throwing exceptions rather than relying on special return types. Propagate exceptions to places that should know about them.
Chapter 8: Boundaries
Summary: Interfaces are cool. Use tests to lock down API behaviors before relying on the behaviors in production code. Use interfaces to mock out non-existing code.
Writing "learning" tests along the way, and they will eventually become your integration tests.
Adapter pattern - modify the API according to your needs.
Chapter 9: Unit Tests
TDD (test driven development):
edX platform might be too large/not condusive for TDD
Cycle of testing → writing production code → testing too long
Coverage
edX used to look at test coverage, but don't anymore
One assert per test method
Might be overkill, especially if high setup costs
Bok choy tests
Should not set up a large amount of test setup and only test one small change at the end
Should be feature tests, not unit tests
Too many testing frameworks makes it difficult to know where to start/what framework to use for each use case
Write tests as close to the code that they're testing
ex. Cover python code with python unit tests
Easy to get false positives if mocking too many functions
autospec
One school of thought: "You don't need to write tests to test private methods"
Difficult to know how "deep" to go into the code when testing - should you go into the private methods?
Does not make sense to have a separate test eng team that writes code
Are lettuce and bok choy tests needed?
Chapter 10: Classes
Many small classes (and in Java, files), vs. a few large ones with a bunch of things thrown into them.
Mixin hell, and relatedly, class-based views in Django. George does not recommend.
What is a mixin? No distinction vs. "real inheritance" from python language, but by convention, we treat it basically like interfaces with non-abstract functionality.
Dependency injection with python - py.test fixtures are a good example. Another example from George: https://github.com/edx/openedx-webhooks/blob/development/openedx_webhooks/github/dispatcher/actions/github_activity.py#L24
Dependency Inversion Principle - our classes should depend on abstractions, not concrete details.
George states that the rule of thumb in JS community is to keep inheritance heirarchies to depth <= 2. We mostly agree that this makes code easier to reason about.
Balance between ease-of-use and richness of features: too much magic makes it harder to debug and understand code.
OCP - Classes that are open for extension, but closed to modification. If we adopted this for openedx, wouldn't it lend itself to more and more subclasses, plugins?
Example: auto-complete libraries in JS - many library providers try to provide something that is responsible for behavior and rendering, which can become messy. Better to isolate behavior to library, and have rendering left to the user (or other library).
Balance of OCP with single-resp. principle.
Nimisha - An example of how we like to organize code is in grades:
CourseGradeFactory - responsible for creating domain objects (given some input data).
CourseGrade - A Domain object (that doesn't deal with backend/storage) that is responsible for business logic.
PersistentCourseGrade - A Django models.Model that is only responsible for storage and queries against the database.
We think it would be a good idea to do a group design kata together at the end of this book club.
George says we could also jump right into React and Redux.
Chapter 11: Systems
Beans and POJOs (Plain Old Java Objects)
Dependency Injection in Python
barrier entry of low since you can pass functions around
Duck typing
how does one specify the dependency requirements of input parameters in Python?
docstrings
annotation of methods using Pycontracts
Python 3.5 comes with typing
Main take away
Being aware of dependencies
Separation between layers
Distinction between core and the details at scale
Once you choose a framework, it's super-hard to switch over
No Big Design Up Front
Delayed decision making
With detailed upfront design, it's inevitable that you'll run into issues when you eventually get into implementation phase
Separating construction of the system from its use
For example, at initialization time, keep construction separate from wiring
Also, keep in mind Javascript since similar concepts apply
What's the mindset when stepping into an established system vs. design of a brand-new system?
With IDAs, or even djangoapps (or experimental addaptive learning systems), we're adding or modifying subsystems all of the time.
How do we do system-level architecture at edX?
Has evolved over time from Arch-council (few years ago) to now more team-specific (bit ad-hoc) designing.
Our cross-cutting efforts now focus more on techonological decisions/discussions - for example, django-upgrade, docker, FED-modernization, etc.
12-factor App: https://12factor.net/
Abstract architecture versus Reality architecture
sometimes it's easier said than done in the abstract
the book provides guidelines
Domain-specific language (DSL)
for example, Ansible has this
By having DSL, developers now need to learn a new language
More commonly used in the Ruby community - since the language is very flexible
Lettuce tests and RSpec are other examples
Over-designing examples
Shea stadium built too big
Parts of xBlock framework
Coupling of database model from object model in django
By having clients know that it's linked to a database, clients remember that they are using the database, so could potentially result in a more performant system.
On the other hand, by having an object model interface that is well optimized (via caching, optimized queries, etc), allows clients not to need to understand the details of your data model (including indices, etc) and how it should be used.
However, needs proper monitoring to verify that clients aren't calling the interface in an unintended way.
Chapter 12: Emergence
Easier to write code than read it
Similar to: easier to write shorthand notes on a lecture than to give a full lecture to someone
How much refactoring is too much?
For every few lines of code, stop and think if you can simplify
Too hard to refactor everything at once, should incrementally refactor
Early returns
Grouping early returns into another function and calling that function
Keep functions small, and then early returns are not a problem to have
Alex is a fan of the template method
Chapter 13: Concurrency
Myths and misconceptions
Seperate concurrent code from non-concurrent code
Celery makes this difficult
Celery tasks don't really do the same thing as our python code, so they won't tend to step on each other
Easy to make new processes in Python
Processes don't share memory so you don't run into Java-y problems
Follow functional programming paradigm
RQ for making processes in Python
Code Smells
Comments
C1 - C5
"Most comments are lies"
Easy to fix, prevalent in the codebase
Environment
E1 - Build requires more than one step
Should keep this in mind, because we don't do this often enough
Does this count as code? Is it worth bringing up?
Functions
F1 - Too many arguments
For python, too many positional arguments, but not too many keyword arguments
F3 - Flag arguments, G15 - Selector arguments
Keyword boolean arguments - ok
Positional boolean arguments - not ok
Also easy to fix
F4 - Dead function, G9 - Dead code, G12 - Clutter
Strongly agree with this one, should get rid of dead code whenever possible
Coverage for dead code?
General
G2 - (Non)Obvious behavior is unimplemented
Generally agree with G2, except don't implement things just because you think you might need them, even if you don't have a use case for them currently
G4 - Overwritten safeties
What is an example of overwritten safeties in our codebase?
except <Exception>
It is unclear what this means for us
G5 - Duplication/DRY
G6 (Code at wrong level of abstraction), G7 (Base classes depending on their derivatives), G8 (Too much information)
Yes, but maybe for later
G8 - Too much information
Edx-platform has a few cases where there are more than one way to do something, or very similar functions
Would this be easy to work on/educate people on?
"Poorly defined interface provides lots of functions that you must call" - easy
Example: In comment client, to find all the children of a thread, you must first find() the thread and then call retrieve with responses=True
Not a well-written rule, if we include this we should try to come up with some concrete examples
G10 - Vertical separation, G11 - Inconsistency
Nice for the future
G16 - Obscured intent
Important, but is this significant/prevalent in the codebase?
We already have a document for this
Also a little vague, we can probably wrap the ideas of this into some more precise rules
G19 - Use explanatory variables, G28 - Encapsulate conditionals, G29 - Avoid negative conditionals
Explain your code through variables, explain your code through conditionals
G28 is not done as regularly as it should be
PEP8 - What's the right way to wrap this really long line? Clean code
G20 - Function names should say what they do
G23 - Prefer polymorphism to if/else or switch/case
Important, but for later
G24 - Follow standard conventions
Is our style guide followed?
G25 - Replace magic numbers with named constants
Should do, but easily overlooked
Should we explore the different "meanings" of random numbers?
Not just a random integer, but any other magic value
G30 - Functions should do one thing, G34 - Functions should only descend one level of abstraction
What is "one thing"? Even in the "clean code" example, the function does more than one thing.
→ Modify this rule to be "functions should be smaller", one way of thinking about it is through G34
Pain of super long functions is better than arbitrarily short functions - it might be better to have overly short than overly long functions
There was once a linter enabled for too many lines in a function - where did that go?
G36 - Avoid transitive navigation
Python - does it encourage transitive navigation or discourage it?
Sometimes used in django model queries to get better performance at the expense of clean code, but it is all at the same level of abstraction
Java
J1 - Avoid long import lists by using wildcards
DEFINITELY NOT THIS
Names
N1 - Choose descriptive names
N1 - N7 all are standards for using good names for things
N5 - Use long names for long scopes
But also...use short names for small scopes
Tests
None for now.