Summary

Many years ago, we forked django-wiki and departed the upstream repo. So the course wiki feature no longer benefits from security updates & new features from the upstream repo. This project looks at our options to improve the wiki situation, particularly the options of deprecation and getting back to using the upstream repo.

TL;DR

While not a critical feature of most courses, wikis are present in enough courses to make removing the feature problematic. There are about 10 material changes in our fork vs. upstream, several of which would likely be welcomed upstream if submitted as PRs. So while not trivial, a switch back to an upstream release seems feasible if some effort is dedicated to working with the upstream maintainer on it over a period of time.

Current Usage

The first questions to answer - how much is the course wiki feature used?

Questions To Answer

  1. What’s the current usage of the django-wiki course wiki feature for prod-edx and prod-edge?

    1. How many courses have it activated?

    2. How many pages/content have been generated over time?

      • By course team? By learners?

    3. How many active learners are accessing the generated wiki content over time?

Answers

Technique

I first extracted all the wiki slugs and their associated course keys from the read replica's MongoDB collection which backs the Split Modulestore. I saved them to a JSON object using this JS (and some minor hand-edits to the output):

print("{\"all_wiki_slugs\": [ {}\n");
db.modulestore.active_versions.find(
        { "search_targets.wiki_slug": { $exists: true } },
        {org: 1, course: 1, run: 1, "search_targets.wiki_slug": 1, _id: 0 }
    ).forEach(
        function(obj) {
            print(",");
            printjson(obj);
        }
    );
print("\n]}");

I then wrote a Python script which saved the modulestore wiki slug data to Snowflake for SQL querying, which was the quickest way to get answers. The script:

Facts: prod-edx

There are 3,658 root course wiki pages in the main site. Root course wiki pages are always unique and represented by a wiki slug.

select count(*) from wiki_urlpath where site_id=1 and level=1;
-- +----------+
-- | count(*) |
-- +----------+
-- |     3658 |
-- +----------+

Many course runs share the same top-level course wiki.

WITH shared_wiki_per_run as (
    SELECT slug, count(*) AS cnt
        FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS
        WHERE env='prod-edx' AND slug!='' GROUP BY 1
)
SELECT COUNT(*) FROM shared_wiki_per_run WHERE cnt > 1;
-- +----------+
-- | COUNT(*) |
-- |----------|
-- |     2783 |
-- +----------+

Here's the top-level wikis which have the most courseruns pointed to them:

SELECT slug, count(*) 
FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS 
WHERE env='prod-edx' AND slug!='' 
GROUP BY 1 ORDER BY 2 DESC LIMIT 20;

-- +--------------------------------------------+----------+
-- | SLUG                                       | COUNT(*) |
-- |--------------------------------------------+----------|
-- | TUDelft.ProfEd.2016_v0                     |      124 |
-- | NYIF.FIPR0410x.2017                        |       52 |
-- | EPFLx.templateEN.0000                      |       50 |
-- | HarvardMedGlobalAcademy.B101.2T2017        |       50 |
-- | NYIF.CL-RISK1001x.2016                     |       49 |
-- | GTx.CS1301x.1T2017                         |       39 |
-- | MITx.7.00x.3T2017                          |       39 |
-- | GTx.MGT6203x.1T2018                        |       38 |
-- | UQx.CORPINN1x.1T2018                       |       36 |
-- | Tecnologico_de_Monterrey.LREE1I01x.2017_T1 |       32 |
-- | RITx.GAME103x.2T2017                       |       31 |
-- | IDBx.IDB6x.2T2017                          |       29 |
-- | BigDataUniversity.BD000EN.2016             |       25 |
-- | GTx.ISYE8803x.2T2017                       |       25 |
-- | UQx.BUSLEAD5x.3T2018                       |       23 |
-- | USMx.AFM602x.2T2017                        |       23 |
-- | HarvardX.HKS101A.1T2017                    |       22 |
-- | Wellesley.ItalianOnline.Summer_2015        |       22 |
-- | UWX.Networking1.2018_3                     |       22 |
-- | RWTHx.foundationsofentrepreneurship.wise16 |       21 |
-- +--------------------------------------------+----------+

Over all course runs, the course runs point to 7,079 top-level wikis.

SELECT count(distinct(slug)) 
FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS 
WHERE env='prod-edx' AND slug!='';
-- +-----------------------+
-- | COUNT(DISTINCT(SLUG)) |
-- |-----------------------|
-- |                  7079 |
-- +-----------------------+

There are 16,394 course runs in the modulestore - 8,195 of those course runs point to existing top-level wikis.

SELECT COUNT(*) FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS WHERE env='prod-edx'
-- +----------+
-- | COUNT(*) |
-- |----------|
-- |    16934 |
-- +----------+

WITH all_slugs AS (
    SELECT slug
        FROM PROD.LMS.WIKI_URLPATH
        WHERE site_id=1 and level=1
)
SELECT COUNT(*)
    FROM USER_DATA.JESKEW.COURSERUN_WIKI_SLUGS
    WHERE env='prod-edx' AND slug in (SELECT slug from all_slugs)
-- +----------+
-- | COUNT(*) |
-- |----------|
-- |     8195 |
-- +----------+

Wiki editing was high when platform launched, particularly when averaged over the number of available courses. Wiki article edits have dropped off significantly since then.

SELECT year(modified), month(modified), count(*)
    FROM PROD.LMS.WIKI_ARTICLEREVISION
    WHERE modified >= '2012-01-01 00:00:00'
    GROUP BY 1, 2 ORDER BY 1, 2;
-- +----------------+-----------------+----------+
-- | YEAR(MODIFIED) | MONTH(MODIFIED) | COUNT(*) |
-- |----------------+-----------------+----------|
-- |           2012 |               8 |      129 |
-- |           2012 |               9 |      618 |
-- |           2012 |              10 |     6169 |
-- |           2012 |              11 |     1844 |
-- |           2012 |              12 |      889 |
-- |           2013 |               1 |     1495 |
-- |           2013 |               2 |     1968 |
-- |           2013 |               3 |     1652 |
-- |           2013 |               4 |      550 |
-- |           2013 |               5 |      773 |
-- |           2013 |               6 |      422 |
-- |           2013 |               7 |     3184 |
-- |           2013 |               8 |      903 |
-- |           2013 |               9 |     2198 |
-- |           2013 |              10 |     3522 |
-- |           2013 |              11 |     2000 |
-- |           2013 |              12 |      655 |
-- |           2014 |               1 |      721 |
-- |           2014 |               2 |     2838 |
-- |           2014 |               3 |     4988 |
-- |           2014 |               4 |      760 |
-- |           2014 |               5 |      846 |
-- |           2014 |               6 |      976 |
-- |           2014 |               7 |      380 |
-- |           2014 |               8 |      303 |
-- |           2014 |               9 |      743 |
-- |           2014 |              10 |     2878 |
-- |           2014 |              11 |      890 |
-- |           2014 |              12 |      413 |
-- |           2015 |               1 |      750 |
-- |           2015 |               2 |     1211 |
-- |           2015 |               3 |     1171 |
-- |           2015 |               4 |     1603 |
-- |           2015 |               5 |     2564 |
-- |           2015 |               6 |     4221 |
-- |           2015 |               7 |      693 |
-- |           2015 |               8 |      405 |
-- |           2015 |               9 |      606 |
-- |           2015 |              10 |     1549 |
-- |           2015 |              11 |     1974 |
-- |           2015 |              12 |      510 |
-- |           2016 |               1 |      435 |
-- |           2016 |               2 |      792 |
-- |           2016 |               3 |     1078 |
-- |           2016 |               4 |     1171 |
-- |           2016 |               5 |      830 |
-- |           2016 |               6 |      670 |
-- |           2016 |               7 |      608 |
-- |           2016 |               8 |      553 |
-- |           2016 |               9 |      500 |
-- |           2016 |              10 |      446 |
-- |           2016 |              11 |      306 |
-- |           2016 |              12 |      352 |
-- |           2017 |               1 |      647 |
-- |           2017 |               2 |      389 |
-- |           2017 |               3 |      395 |
-- |           2017 |               4 |      357 |
-- |           2017 |               5 |      796 |
-- |           2017 |               6 |      891 |
-- |           2017 |               7 |      656 |
-- |           2017 |               8 |      502 |
-- |           2017 |               9 |      767 |
-- |           2017 |              10 |     1002 |
-- |           2017 |              11 |      636 |
-- |           2017 |              12 |      452 |
-- |           2018 |               1 |      770 |
-- |           2018 |               2 |      509 |
-- |           2018 |               3 |      605 |
-- |           2018 |               4 |      556 |
-- |           2018 |               5 |      470 |
-- |           2018 |               6 |      603 |
-- |           2018 |               7 |      979 |
-- |           2018 |               8 |      556 |
-- |           2018 |               9 |      799 |
-- |           2018 |              10 |      763 |
-- |           2018 |              11 |      323 |
-- |           2018 |              12 |      271 |
-- |           2019 |               1 |      282 |
-- |           2019 |               2 |      390 |
-- |           2019 |               3 |      110 |
-- |           2019 |               4 |      162 |
-- |           2019 |               5 |      120 |
-- |           2019 |               6 |      107 |
-- |           2019 |               7 |      149 |
-- |           2019 |               8 |      354 |
-- |           2019 |               9 |      127 |
-- |           2019 |              10 |      142 |
-- |           2019 |              11 |      116 |
-- |           2019 |              12 |      217 |
-- |           2020 |               1 |      366 |
-- |           2020 |               2 |      398 |
-- |           2020 |               3 |      823 |
-- |           2020 |               4 |     1278 |
-- |           2020 |               5 |      709 |
-- |           2020 |               6 |      785 |
-- |           2020 |               7 |      624 |
-- |           2020 |               8 |      589 |
-- |           2020 |               9 |      438 |
-- |           2020 |              10 |      272 |
-- |           2020 |              11 |      178 |
-- |           2020 |              12 |      151 |
-- |           2021 |               1 |      247 |
-- |           2021 |               2 |      243 |
-- |           2021 |               3 |      283 |
-- |           2021 |               4 |      149 |
-- |           2021 |               5 |      309 |
-- |           2021 |               6 |      215 |
-- |           2021 |               7 |      256 |
-- |           2021 |               8 |      100 |
-- +----------------+-----------------+----------+

The course wikis are accessed a small amount compared to our other site traffic:

Splunk Report: Course wiki access over time

Fix: Getting Back to Upstream Repo

How active is the upstream repository?

It is still the most widely used Django-based wiki package listed on https://djangopackages.org/ , and is one of the only two still actively maintained: https://djangopackages.org/grids/g/wikis/ . It added support for Python 3.9 and Django 3.2 four months ago, which we still need to do for our fork. It has merged 32 PRs in the past year, from 10 different developers.

How many changes are in our fork that don’t have equivalents upstream?

No obvious equivalent upstream, many of these are probably worth submitting upstream PRs for:

Probably upstream, but need verification:

Other points that need care:

Note that we made our fork over 8 years ago, so upstream has made many changes in the meantime that we’d need to review, probably meriting somewhat careful a11y and security reviews at the very least.

Next Steps

Jeremy Bowman (Deactivated) will put together a blended project brief to determine if submitting our modifications upstream and getting back onto an official django-wiki release could work as a blended development project. This would save us a nontrivial amount of effort on future Django upgrades, etc. in addition to gaining many improvements made to the upstream package in the past 8 years.

Future Ideas

For future hackathon projects, here are two ideas about which Marco Morales (Deactivated) is hopeful:

Community Response

Resource Pages

Both of these special post types (community response / resource page) could be an interesting way to combine wiki behavior in a discussion forum.