Data WG 2022-01-27 Meeting notes

Date

Jan 18, 2022

Participants

  • @Edward Zarecor

  • @Dave Ormsbee (Axim)

  • @Andy Shultz (Deactivated)

  • @Simon Chen

  • @Andrés González

  • @Maria Fernanda Magallanes Z

  • @Sofiane Bebert

Goals

  •  

Discussion topics

Time

Item

Presenter

Notes

Time

Item

Presenter

Notes

15 minutes

Review our Working group landing page and details like how we work, how we make decisions

@Edward Zarecor

  • Quick review of the Landing page.

  • Discussion of a shared working group calendar. tCRIL can provide Zoom service to all working groups. Should discuss with someone at OpenCraft to understand if there are any issues with changing the Zoom for BTR? @Xavier Antoviaque Should I discuss with anyone in particular?

20 minutes

Discuss platform database support stance.

@Dave Ormsbee (Axim)

We currently run MySQL 5.7 across our stack, though we’ve generally avoided assuming MySQL (tests run in SQLite).

Some recent discussions and developments w.r.t. our use of databases:

  • Tutor and MariaDB: There are no officially built MySQL 5.7 images for arm64, and amd64 emulation caused commands like tutor local init to take twice as long to run. As a result, Tutor now recommends using MariaDB 10.4 for M1 Macs.

  • Full text search: The idea of using in-database full text search has been floated as a long term replacement to Elasticsearch to help simplify the stack. It’s unclear whether the result quality makes this practical though, particularly with MySQL. At the moment, there is no vendor-neutral abstraction for this in the Django ORM.

  • Per-field collation: As of Django 3.2, we can specify collation at the field-level. This would allow us to more easily fine tune issues around case sensitivity (e.g. making a UsageKey field case sensitive, but leaving a title to use an accent-aware, case-insensitive collation). These settings are vendor-specific though, and not compatible across databases.

We’ve also had periodic conversations about PostgreSQL–it’s generally better regarded and better supported by Django than MySQL. PostgreSQL would have been our default if we were starting from scratch today, but migrating over would be non-trivial and would likely put us in a years-long transitional period where both databases would need to be supported.

Short term question: Does it make sense to push up the move to MySQL 8.0?

  • MySQL 5.7 isn’t end of life until Oct. 2023, but 8.0 is already over three years old. It features better Unicode support, window functions, and improved JSON field queries. Not to mention proper ARM64 builds.

  • Would we even consider MariaDB? (I’m guessing no, since Aurora tracks only MySQL and PostgreSQL.)

Longer term question: What is our stance on PostgreSQL support in general? I can see at least four plausible paths:

  1. Status quo: MySQL is official and well tested, but aim for PostgreSQL compatibility and accept patches to fix any issues.

  2. Officially support both databases, by increasing our testing with PostgreSQL.

  3. Lean into MySQL, by allowing more MySQL-specific features via libraries like django-mysql.

  4. Lean into PostgreSQL, with an eye towards long term migration.

It is also possible that we can fund improvements to Django’s MySQL support in order to address specific gaps.

Notes:

  • Dave Ormsbee presented the problems we are facing with MySQL collation outlined above.

  • Is anyone currently running the platform under PostgreSQL

    • @Igor Degtiarov mentioned that RaccoonGang have done this recently and was going to do some investigation of how hard/easy this was.

  • Would we consider upgrading the MySQL 5.8 right now?

  • Should we consider upgrading to MariaDB, it is being currently used on Macs for development

  • We need to answer and document our approach to multi-database support. We are currently implicitly cross db and behind the ORM, but

    • Is anyone actually using that feature?

    • What are we losing in terms of specialized DB features?

  • Key questions

  • Does MySQL 8 have features we really want or will we come to regret staying with MySQL in 2 years.

  • Blockstore team was also facing a MySQL issue.

    • We sometimes have columns we want insensitive, some we want sensitive

    • This already breaks cross database compatibility because we need to pass in the collation name to the field in the ORM

    • @Jill Vogel is investigating creating a “custom field” that auto-magically handles field level collation settings.

    • @Andy Shultz (Deactivated) proposes solving with a field for insensitive sort, a field for sensitive sort. A simple solution that makes the table a little wider, but would be cross platform.

      • Currently facing this in pipeline from Snowflake (sensitive) → MySQL (insensitive)

  • If we cannot/will not upgrade to MySQL 5.7 on the server side for Nutmeg – which we can’t – @Dave Ormsbee (Axim) recommends sticking with MariaDB in ARM64 dev environments as it “just works.”

 

 

Action items

Develop a solution for running MySQL 8 in Development environments
Could build ARM images ourselves ([Dave]: This is only relevant if we want to run MySQL 5.7 in dev envs–8.0 has pre-built images)
Could simply stick with MariaDB which seems an adequate drop-in replacement
Develop a way to run our migration tests against MySQL 5.7 and 8 simultaneously. This is very important for 2U
We should specify which named release we will upgrade to MySQL 8
Simon to connect with Jeremy on the timing of potential upgrade at 2U

Decisions

  1. Nutmeg: MySQL 5.7 on the server side, and ARM64 dev environments can use MariaDB 10.4
  2. Olive: Support both MySQL 5.7 and 8.0 on the server side, and ARM64 dev environments will move to MySQL 8.0
  3. Poplar: Support MySQL 8.0 only