Python 3 Conversion and Usage Tips

Design Principles

  1. Convert to bytes closest to the perimeter.
    • For example, pass strings around within the code as they were before, but convert them to bytes just when serializing (e.g., persisting them to a CSV or sending them over the wire).  Here's an example of where I would suggest having send_response take a string instead of bytes: https://github.com/edx/edx-platform/blob/a9d4f3c972223b502ef1addd6621e31fa49e61e7/common/djangoapps/terrain/stubs/youtube.py#L112-L114
    • From OEP-7: Bytes should only be used when you can answer the question: “Do I need this specific sequence of bytes.” The most error-resistant way to achieve this is to use what is called a “unicode sandwich.” This means that as soon as you receive data from a file or network interface, it should be converted to text. Your code should then treat it as text for as long as possible, only encoding it back to bytes when sending it to an interface that requires bytes (such as a file, a network interface, or a bytes-oriented library).
  2. Business logic in the code should scream back at you. Have the code be readable, so it is not swimming in details related to string conversions.
    • For example, use self.assertContains and self.assertNotContains instead of converting response.content from bytes.  Reason: self.assertContains and self.assertNotContains do the appropriate translations for us and they read much better.

Efficiency Tricks

  1. Use Regex replacements whenever possible - and bulk update the entire codebase.  PR 21842 is an example doing this for all assertIn(..response.content) cases.
  2. Since test startup time is slow, run and fix all tests within a file in bulk - rather than one test at a time.

FAQ


QuestionAnswer
1What version of Python 3 should our code work on?

For webservices 3.5, For libraries, 3.5 and 3.6

2How do I indicate that a repo works with Python 3?Make sure that your openedx.yaml file has an up-to-date oeps section that indicates that your repo is OEP-7 compliant.   See here for an example.
3Invalid Syntax ur"regex_string"

The ur raw string goes away in python three.  You can achieve the same effect by wrapping the raw r"regex" string in a six.text_type()


If the string passes in other strings via format be careful to escape the incoming strings as well.  You can use re.escape

to do this before interpolating them into your unicode regex string.

4Sorted with cmp seems to be broken?The cmp argument of sorted goes away in python 3.  You need to change how the sorting is done to be able to generate a sane order using the key function instead.
5response.content is now a bytestring

Option A (when using Django.test): Convert any assertIn(..response.content) and assertNotIn(..response.content) test code to use assertContains and assertNotContains.  See this PR.

Option B (if not using Django.test): Convert string to be a bytestring. Example PR: TODO

6Output of random is inconsistent across Python versions

Option A (when no compatibility issues): Use Python3's random since it results in a better distribution output. As explained in this Python issue, randomization output is not guaranteed to remain the same across Python versions. So our tests (and production code) should not assume consistent randomization, even with the same seed value.

Option B (when compatibility issues exist): If there are any data compatibility issues because of legacy code that assumes randomization consistency, use the random2 library in order to stick with Python2's old implementation. A situation where this might happen is if we're only storing a seed and recompute the state of an XBlock based on that seed using random.choice/shuffle/etc. as happens with Capa problems. See this thread.