MySQL vs PostgreSQL

Currently MySQL is the only relational database supported for use with Open edX, but there are solid reasons to seriously consider adding support for PostgreSQL also. There are also very good reasons why this hasn’t been done already. This page attempts to summarize the reasons on both sides and detect if/when we reach a point when it becomes worthwhile to prioritize adding PostgreSQL support.

History

Why was MySQL chosen in the first place?

Almost exclusively because when edX was started back in 2011, AWS RDS supported MySQL but not PostgreSQL. (PostgreSQL support was added in November 2013). General consensus among the developers at the time was that PostgreSQL would have been a better long-term choice, but RDS offered vastly simpler operational overhead compared to the other database hosting options available at the time. Using RDS freed up developer bandwidth to work on the new features needed in the startup phase to ensure that edX would survive long enough for anyone to still care about the choice of database.

Why didn’t we switch to PostgreSQL once RDS offered it?

First, edX developers were too busy adding the aforementioned new features. Then in November 2014, AWS Aurora was released with functionality even more appealing for operating sites of the scale edx.org was becoming. PostgreSQL support again lagged, not becoming available until October 2017.

So, why didn’t we switch once Aurora offered it?

By that point we had a pretty large database with 6 years of history for one of the highest-traffic educational web sites in the world. The migration would not have been trivial, and the arguments for it boiled down to “we could stop wasting time on this assortment of MySQL bugs and quirks that we’ve mostly learned to work around and live with”.

Why hasn’t anybody outside edX/2U added PostgreSQL support?

Most of the other development organizations in the Open edX ecosystem also have large legacy MySQL databases that would need to be migrated or maintained in parallel with newer PostgreSQL databases. And even more problematic, they would have a pretty foundational difference in their software stack from the edx.org site driving most development on the code, leaving them exposed to potential bugs related to new commits not correctly accounting for the different database (or having bug reports de-prioritized as being potentially due to a custom choice of database that doesn’t impact most site operators).

Why add PostgreSQL support?

At first the gaps in functionality and reliability between MySQL and PostgreSQL were perceived as annoying nuisances, but as the scale of Open edX and the sites running it have grown, so has the scale of the pain in dealing with MySQL’s foibles. Some of the more notable ones:

Why not add PostgreSQL support?

As noted in the history section above, there are some pretty good reasons why PostgreSQL support hasn’t been added to Open edX yet despite its advantages.

  • Development effort opportunity cost. Although we primarily use database-agnostic Django functionality, we have hard-coded references to MySQL in devstack, Tutor, some migration steps with custom SQL, etc. We don’t yet have a solid grasp of the total scope of effort, and any development effort put into it will come at the expense of other valuable initiatives.

  • More maintenance overhead. Supporting two databases takes more work than supporting one, and it’s unlikely that all current Open edX sites would promptly switch from MySQL to PostgreSQL once the support is added.

  • Operational unknowns. We collectively have a lot of experience with running MySQL at scale and have tweaked various indexes and queries to accommodate MySQL (and occasionally Aurora quirks). While it’s clear that Postgres can also scale, we may see performance regressions during the process of migrating any large site.

After adding PostgreSQL support, should we drop MySQL support?

To allow for sane migration plans, we’d need at least some span of time when MySQL and PostgreSQL are both supported. But would we want to continue supporting both indefinitely, or drop MySQL support after allowing a reasonable amount of time for migration?

Reasons to drop MySQL support

  • Lower maintenance overhead. If we drop MySQL support, we get back to only needing to support a single database, reducing the long-term maintenance burden.

  • Freedom to use PostgreSQL-only features. Officially supporting MySQL would essentially prevent usage of many of the nicer PostgreSQL-only features in the core Open edX code. We’d only be able to use it in optional features/extensions, if at all.

  • No need to work around MySQL quirks. Several prominent open source projects have declined to support MySQL despite code contributions to add it, because their experience has been that resolving bugs involving its use takes valuable development effort away from other high priority tasks (and is simply unpleasant to deal with). Having experienced many of these problems ourselves (as described above), we might well make the same decision once PostgreSQL is a viable option.

Reasons to keep MySQL support

  • Community pressure. A significant percentage of the Open edX community may be reluctant to migrate from MySQL to PostgreSQL, especially if they already have significant in-house MySQL experience or use other software which only works with MySQL. We’d have to poll the community to gauge sentiment regarding this.

  • Fewer blockers to adoption. Organizations that have already invested heavily in MySQL might be more reluctant to adopt Open edX if it only works with PostgreSQL.