Hackathon XXVI: django-wiki: DEPR or fix?

Summary

Many years ago, we forked django-wiki and departed the upstream repo. So the course wiki feature no longer benefits from security updates & new features from the upstream repo. This project looks at our options to improve the wiki situation, particularly the options of deprecation and getting back to using the upstream repo.

TL;DR

While not a critical feature of most courses, wikis are present in enough courses to make removing the feature problematic. There are about 10 material changes in our fork vs. upstream, several of which would likely be welcomed upstream if submitted as PRs. So while not trivial, a switch back to an upstream release seems feasible if some effort is dedicated to working with the upstream maintainer on it over a period of time.

Current Usage

The first questions to answer - how much is the course wiki feature used?

Questions To Answer

  1. What’s the current usage of the django-wiki course wiki feature for prod-edx and prod-edge?

    1. How many courses have it activated?

    2. How many pages/content have been generated over time?

      • By course team? By learners?

    3. How many active learners are accessing the generated wiki content over time?

Answers

Technique

I first extracted all the wiki slugs and their associated course keys from the read replica's MongoDB collection which backs the Split Modulestore. I saved them to a JSON object using this JS (and some minor hand-edits to the output):

1 2 3 4 5 6 7 8 9 10 11 print("{\"all_wiki_slugs\": [ {}\n"); db.modulestore.active_versions.find( { "search_targets.wiki_slug": { $exists: true } }, {org: 1, course: 1, run: 1, "search_targets.wiki_slug": 1, _id: 0 } ).forEach( function(obj) { print(","); printjson(obj); } ); print("\n]}");

I then wrote a Python script which saved the modulestore wiki slug data to Snowflake for SQL querying, which was the quickest way to get answers. The script:

Facts: prod-edx

There are 3,658 root course wiki pages in the main site. Root course wiki pages are always unique and represented by a wiki slug.

1 2 3 4 5 6 select count(*) from wiki_urlpath where site_id=1 and level=1; -- +----------+ -- | count(*) | -- +----------+ -- | 3658 | -- +----------+

Many course runs share the same top-level course wiki.

1 2 3 4 5 6 7 8 9 10 11 WITH shared_wiki_per_run as ( SELECT slug, count(*) AS cnt FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS WHERE env='prod-edx' AND slug!='' GROUP BY 1 ) SELECT COUNT(*) FROM shared_wiki_per_run WHERE cnt > 1; -- +----------+ -- | COUNT(*) | -- |----------| -- | 2783 | -- +----------+

Here's the top-level wikis which have the most courseruns pointed to them:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 SELECT slug, count(*) FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS WHERE env='prod-edx' AND slug!='' GROUP BY 1 ORDER BY 2 DESC LIMIT 20; -- +--------------------------------------------+----------+ -- | SLUG | COUNT(*) | -- |--------------------------------------------+----------| -- | TUDelft.ProfEd.2016_v0 | 124 | -- | NYIF.FIPR0410x.2017 | 52 | -- | EPFLx.templateEN.0000 | 50 | -- | HarvardMedGlobalAcademy.B101.2T2017 | 50 | -- | NYIF.CL-RISK1001x.2016 | 49 | -- | GTx.CS1301x.1T2017 | 39 | -- | MITx.7.00x.3T2017 | 39 | -- | GTx.MGT6203x.1T2018 | 38 | -- | UQx.CORPINN1x.1T2018 | 36 | -- | Tecnologico_de_Monterrey.LREE1I01x.2017_T1 | 32 | -- | RITx.GAME103x.2T2017 | 31 | -- | IDBx.IDB6x.2T2017 | 29 | -- | BigDataUniversity.BD000EN.2016 | 25 | -- | GTx.ISYE8803x.2T2017 | 25 | -- | UQx.BUSLEAD5x.3T2018 | 23 | -- | USMx.AFM602x.2T2017 | 23 | -- | HarvardX.HKS101A.1T2017 | 22 | -- | Wellesley.ItalianOnline.Summer_2015 | 22 | -- | UWX.Networking1.2018_3 | 22 | -- | RWTHx.foundationsofentrepreneurship.wise16 | 21 | -- +--------------------------------------------+----------+

Over all course runs, the course runs point to 7,079 top-level wikis.

1 2 3 4 5 6 7 8 SELECT count(distinct(slug)) FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS WHERE env='prod-edx' AND slug!=''; -- +-----------------------+ -- | COUNT(DISTINCT(SLUG)) | -- |-----------------------| -- | 7079 | -- +-----------------------+

There are 16,394 course runs in the modulestore - 8,195 of those course runs point to existing top-level wikis.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 SELECT COUNT(*) FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS WHERE env='prod-edx' -- +----------+ -- | COUNT(*) | -- |----------| -- | 16934 | -- +----------+ WITH all_slugs AS ( SELECT slug FROM PROD.LMS.WIKI_URLPATH WHERE site_id=1 and level=1 ) SELECT COUNT(*) FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS WHERE env='prod-edx' AND slug in (SELECT slug from all_slugs) -- +----------+ -- | COUNT(*) | -- |----------| -- | 8195 | -- +----------+

Wiki editing was high when platform launched, particularly when averaged over the number of available courses. Wiki article edits have dropped off significantly since then.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 SELECT year(modified), month(modified), count(*) FROM PROD.LMS.WIKI_ARTICLEREVISION WHERE modified >= '2012-01-01 00:00:00' GROUP BY 1, 2 ORDER BY 1, 2; -- +----------------+-----------------+----------+ -- | YEAR(MODIFIED) | MONTH(MODIFIED) | COUNT(*) | -- |----------------+-----------------+----------| -- | 2012 | 8 | 129 | -- | 2012 | 9 | 618 | -- | 2012 | 10 | 6169 | -- | 2012 | 11 | 1844 | -- | 2012 | 12 | 889 | -- | 2013 | 1 | 1495 | -- | 2013 | 2 | 1968 | -- | 2013 | 3 | 1652 | -- | 2013 | 4 | 550 | -- | 2013 | 5 | 773 | -- | 2013 | 6 | 422 | -- | 2013 | 7 | 3184 | -- | 2013 | 8 | 903 | -- | 2013 | 9 | 2198 | -- | 2013 | 10 | 3522 | -- | 2013 | 11 | 2000 | -- | 2013 | 12 | 655 | -- | 2014 | 1 | 721 | -- | 2014 | 2 | 2838 | -- | 2014 | 3 | 4988 | -- | 2014 | 4 | 760 | -- | 2014 | 5 | 846 | -- | 2014 | 6 | 976 | -- | 2014 | 7 | 380 | -- | 2014 | 8 | 303 | -- | 2014 | 9 | 743 | -- | 2014 | 10 | 2878 | -- | 2014 | 11 | 890 | -- | 2014 | 12 | 413 | -- | 2015 | 1 | 750 | -- | 2015 | 2 | 1211 | -- | 2015 | 3 | 1171 | -- | 2015 | 4 | 1603 | -- | 2015 | 5 | 2564 | -- | 2015 | 6 | 4221 | -- | 2015 | 7 | 693 | -- | 2015 | 8 | 405 | -- | 2015 | 9 | 606 | -- | 2015 | 10 | 1549 | -- | 2015 | 11 | 1974 | -- | 2015 | 12 | 510 | -- | 2016 | 1 | 435 | -- | 2016 | 2 | 792 | -- | 2016 | 3 | 1078 | -- | 2016 | 4 | 1171 | -- | 2016 | 5 | 830 | -- | 2016 | 6 | 670 | -- | 2016 | 7 | 608 | -- | 2016 | 8 | 553 | -- | 2016 | 9 | 500 | -- | 2016 | 10 | 446 | -- | 2016 | 11 | 306 | -- | 2016 | 12 | 352 | -- | 2017 | 1 | 647 | -- | 2017 | 2 | 389 | -- | 2017 | 3 | 395 | -- | 2017 | 4 | 357 | -- | 2017 | 5 | 796 | -- | 2017 | 6 | 891 | -- | 2017 | 7 | 656 | -- | 2017 | 8 | 502 | -- | 2017 | 9 | 767 | -- | 2017 | 10 | 1002 | -- | 2017 | 11 | 636 | -- | 2017 | 12 | 452 | -- | 2018 | 1 | 770 | -- | 2018 | 2 | 509 | -- | 2018 | 3 | 605 | -- | 2018 | 4 | 556 | -- | 2018 | 5 | 470 | -- | 2018 | 6 | 603 | -- | 2018 | 7 | 979 | -- | 2018 | 8 | 556 | -- | 2018 | 9 | 799 | -- | 2018 | 10 | 763 | -- | 2018 | 11 | 323 | -- | 2018 | 12 | 271 | -- | 2019 | 1 | 282 | -- | 2019 | 2 | 390 | -- | 2019 | 3 | 110 | -- | 2019 | 4 | 162 | -- | 2019 | 5 | 120 | -- | 2019 | 6 | 107 | -- | 2019 | 7 | 149 | -- | 2019 | 8 | 354 | -- | 2019 | 9 | 127 | -- | 2019 | 10 | 142 | -- | 2019 | 11 | 116 | -- | 2019 | 12 | 217 | -- | 2020 | 1 | 366 | -- | 2020 | 2 | 398 | -- | 2020 | 3 | 823 | -- | 2020 | 4 | 1278 | -- | 2020 | 5 | 709 | -- | 2020 | 6 | 785 | -- | 2020 | 7 | 624 | -- | 2020 | 8 | 589 | -- | 2020 | 9 | 438 | -- | 2020 | 10 | 272 | -- | 2020 | 11 | 178 | -- | 2020 | 12 | 151 | -- | 2021 | 1 | 247 | -- | 2021 | 2 | 243 | -- | 2021 | 3 | 283 | -- | 2021 | 4 | 149 | -- | 2021 | 5 | 309 | -- | 2021 | 6 | 215 | -- | 2021 | 7 | 256 | -- | 2021 | 8 | 100 | -- +----------------+-----------------+----------+

The course wikis are accessed a small amount compared to our other site traffic:

Splunk Report: Course wiki access over time

Fix: Getting Back to Upstream Repo

How active is the upstream repository?

It is still the most widely used Django-based wiki package listed on https://djangopackages.org/ , and is one of the only two still actively maintained: https://djangopackages.org/grids/g/wikis/ . It added support for Python 3.9 and Django 3.2 four months ago, which we still need to do for our fork. It has merged 32 PRs in the past year, from 10 different developers.

How many changes are in our fork that don’t have equivalents upstream?

No obvious equivalent upstream, many of these are probably worth submitting upstream PRs for:

Probably upstream, but need verification:

Other points that need care:

  • Our migration history is different from upstream; if moving back to an upstream release, we’d need to do something like add one last custom migration that would get us to a point in upstream’s migration history and then edit the history table

  • See https://github.com/edx/edx-platform/pull/6019 for a much earlier attempt at a similar update that was ultimately abandoned, but raised some good points to consider

Note that we made our fork over 8 years ago, so upstream has made many changes in the meantime that we’d need to review, probably meriting somewhat careful a11y and security reviews at the very least.

Next Steps

@Jeremy Bowman will put together a blended project brief to determine if submitting our modifications upstream and getting back onto an official django-wiki release could work as a blended development project. This would save us a nontrivial amount of effort on future Django upgrades, etc. in addition to gaining many improvements made to the upstream package in the past 8 years.

Future Ideas

For future hackathon projects, here are two ideas about which @Marco Morales is hopeful:

Community Response

  • We could integrate in django-wiki in order to power a special response typed inside Open edX discussions.

  • Specifically, we'd add a new type of response that was the community response.

  • This response would allow all members of the community to collaborate on a single answer.

  • In particular, this response type could be used on question post types in the discussion forum.

  • Piazza (a platform for instructors to efficiently manage class discussions) has actually added this feature.

Resource Pages

  • Perhaps the Django wiki could power a new type of post in our discussion forums called resource.

  • These resource pages would be indexed and searchable in our discussions experience but would operate as a wiki with a full page editable interaction mode.

Both of these special post types (community response / resource page) could be an interesting way to combine wiki behavior in a discussion forum.