This is a proposal for an Open edX tagging service which allows Open edX administrators and content authors to define taxonomies, and then allows users to tag most entities with those tags.

This proposed tagging service has the following features:

This proposal is designed with the following behavior in mind:

Implementation

The gist of the implementation is that we will create a “Tagging Service” as an independently Deployable Application which will leverage the Neo4j graph database to store taxonomies and the relationships between taxonomies and Open edX entities. The LMS, Studio, and other apps can make API calls with the Tagging Service which in turn reads/writes to Neo4j.

Taxonomies and the tags they contain are represented in Neo4j as Nodes, with relationships among them as appropriate. A Taxonomy node CONTAINS many Tag nodes, and Tag nodes may optionally have relationships among each other, such as (LearningOutcomeMultiplication) NARROWS (LearningOutcomeBasicMath) which specifies that the “Multiplication” learning outcome(s) are a subset of the “Basic Math” learning outcomes.

Open edX entities (courses, users, etc.) are also represented in Neo4j as Nodes, with a TaggableEntity label and a type-specific label such as User, and an “externalId” property. Then any TaggableEntity can be TAGGEDWITH a Tag.

At large scales, the Tagging Service would likely require Neo4j Enterprise which supports clustering/HA; that is fine since it is licensed under the AGPL, like Open edX itself.

Since the Tagging Service itself will do relatively little computation and is mostly focused on translating API requests to Neo4j queries, it should probably be written using an asychronous framework like AIOHTTP or Node.js to allow a single instance of the Tagging Service to serve even large Open edX instances with many LMS nodes.

Separation of Concerns

LMS/Studio

Tagging Service

Neo4j

Example

Here is a full Neo4j Cypher (Cypher is the Neo4j query language) statement that will create examples of all of these nodes and relationships as would be used by the tagging service.

In the following example, there are:

And the following tags have been applied:

Example Cypher statement:

CREATE
(bob:User:TaggableEntity {type: 'user', externalId: 327645, displayAs: 'bob'}),
(mathCourse:Course:TaggableEntity {type: 'course', externalId: 'course-v1:OpenCraft+math+course'}),
(mathUnit:Content:TaggableEntity {type: 'content', externalId: 'block-v1:OpenCraft+math+course+type@vertical+block@interpreting_data'}),

(cc:Taxonomy {name: 'Common Core State Standards', type: 'public'}),
(cc)-[:OWNEDBY {}]->(bob),
(a:Tag {tag: 'CCSS.MATH', description: 'Mathematics'}),
(aa:Tag {tag: 'CCSS.MATH.CONTENT.HSS', description: 'High School: Statistics & Probability'}),
(aa)-[:NARROWS {}]->(a),
(aaa:Tag {tag: 'CCSS.Math.Content.HSS.ID', description: 'Interpreting Categorical & Quantitative Data'}),
(aaa)-[:NARROWS {}]->(aa),
(aaaa:Tag {tag: 'CCSS.MATH.CONTENT.HSS.ID.A', description: 'Summarize, represent, and interpret data on a single count or measurement variable'}),
(aaaa)-[:NARROWS {}]->(aaa),
(aaaaa1:Tag {tag: 'CCSS.MATH.CONTENT.HSS.ID.A.1', description: 'Represent data with plots on the real number line (dot plots, histograms, and box plots).'}),
(aaaaa1)-[:NARROWS {}]->(aaaa),
(aaaaa2:Tag {tag: 'CCSS.MATH.CONTENT.HSS.ID.A.2', description: 'Use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile range, standard deviation) of two or more different data sets.'}),
(aaaaa2)-[:NARROWS {}]->(aaaa),
(cc)-[:CONTAINS {}]->(a),
(cc)-[:CONTAINS {}]->(aa),
(cc)-[:CONTAINS {}]->(aaa),
(cc)-[:CONTAINS {}]->(aaaa),
(cc)-[:CONTAINS {}]->(aaaaa1),
(cc)-[:CONTAINS {}]->(aaaaa2),

(mathUnit)-[:TAGGEDWITH {}]->(aaaaa1),
(mathCourse)-[:TAGGEDWITH {}]->(aa),

(bpt:Taxonomy {name: 'Bob Private Tags', type: 'private'}),
(bpt)-[:OWNEDBY {}]->(bob),
(bobFavorite:Tag {tag: 'Favorite', description: 'Favorite'}),
(bobWIP:Tag {tag: 'WIP', description: 'Work in Progress'}),
(bpt)-[:CONTAINS {}]->(bobFavorite),
(bpt)-[:CONTAINS {}]->(bobWIP),

(mathUnit)-[:TAGGEDWITH {}]->(x1:PrivateTag)-[:OWNEDBY {}]->(bob),
(x1)-[:TAGGEDWITH {}]->(bobFavorite)

Visualizing this in Neo4j gives:

API Examples

Here are some examples of API calls that could be made of the Tagging Service, and the corresponding Neo4j queries it would run internally to serve those API calls. As you can see, most API calls can be directly translated into a single query run on Neo4j, which is key to keeping the tagging service lightweight and performant.


Apply public tag “CCSS.MATH.CONTENT.HSS.ID.A.2” from Taxonomy “Common Core State Standards” to block “block-v1:a+b+c”

MATCH (taxonomy:Taxonomy {name: 'Common Core State Standards', type: 'public'})-[:CONTAINS]->(tag:Tag {tag: 'CCSS.MATH.CONTENT.HSS.ID.A.2'})
MERGE (block:Content:TaggableEntity {type: 'content', externalId: 'block-v1:a+b+c'})
MERGE (block)-[:TAGGEDWITH]->(tag)


User 123 applies private tag “needswork” to block “block-v1:a+b+c”

This API call will automatically create a private taxonomy called “My Tags” for the user, to contain their unstructured keyword tags, if one doesn’t exist already.

MERGE (user:User:TaggableEntity {type: 'user', externalId: 123})
MERGE (block:Content:TaggableEntity {type: 'content', externalId: 'block-v1:a+b+c'})
MERGE (pt:Taxonomy {name: 'My Tags', type: 'private'})-[:OWNEDBY]->(user)
MERGE (pt)-[:CONTAINS]->(tag:Tag {tag: 'needswork'})
CREATE
   (block)-[:TAGGEDWITH]->(x1:PrivateTag)-[:OWNEDBY]->(user),
   (x1)-[:TAGGEDWITH]->(pt)

TBD: how to change that last CREATE to a MERGE so it’s idempotent?


Get all tags (public and private) on block “block-v1:a+b+c” visible to user 123


MATCH (te:TaggableEntity {type: 'content', externalId: 'block-v1:a+b+c'})
MATCH (te)-[:TAGGEDWITH]->(t:Tag)
MATCH (te)-[:TAGGEDWITH]->(pt:PrivateTag)-[:OWNEDBY]->(:User:TaggableEntity {externalId: 123}), (pt)-[:TAGGEDWITH]->(t2:Tag)
RETURN t,t2


Get all tags (public and private) on block “block-v1:a+b+c”

Only admins/servers could make this API call.

MATCH (te:TaggableEntity {type: 'content', externalId: 'block-v1:a+b+c'})
MATCH (te)-[:TAGGEDWITH*1..2]->(t:Tag)
RETURN t


Integration with Studio

Initially, a Taxonomy section could be added to Studio, to allow authoring of Taxonomies as first-class entities, for tagging content or users. The Studio UI would use Neo4j’s existing taxonomy visualization code (D3-based) to display taxonomies, but editing could only be done by uploading a CSV that defines the taxonomy. At first, taxonomies would only support simple hierarchies or unstructured keyword sets.

In addition, Studio (and maybe the LMS, for discussion posts?) would allow users to publicly or privately tag content with free-form keywords. Taxonomies can be linked to a course. When typing out a keyword in the “tags” field of any block/unit/course, a dropdown would appear showing an autocomplete menu of matching tags from all taxonomies currently linked to the course, as well as options like “Create new tag ‘respiration’ (My Personal Tags) (Private)” or “Create new tag ‘respiration’ in (AP Biology 12 Taxonomy)”

Try it Yourself

You can see examples of modelling tags and taxonomies in Neo4j easily by running:

docker run --publish=7474:7474 --publish=7687:7687 neo4j:3.4

Then, browse to http://localhost:7474/browser/ (the initial password is "neo4j"; you must change it upon login). Enter in the example commands shown in this proposal, then run the command “MATCH (n) RETURN n LIMIT 100” to see the graph.

To reset this test database at any time, run this command:

MATCH (n)
DETACH DELETE n


Open Questions

  1. Should Taxonomies be typed (User hierarchy taxonomies, learning outcome taxonomies, etc.)? Alternately, should Taxonomies specify what types of TaggableEntity they apply to?
  2. Should Taxonomies be a Node that CONTAINS all their tags, or a Label in Neo4j applied to all of that Taxonomy’s tags? (Probably a Node)
  3. Should/can we support RDF import/export?
  4. Should we support an additional type of tag, which is a parametrized tag? So a taxonomy could contain things like “Author {applies_to: ‘User’}”, “Publisher {applies_to: ‘Group’}”, and then content could be tagged like
    Block X has tag “Author: “ with user “Braden MacDonald”

    This is useful for Open edX instances that want to have custom structured data about their content. For example, a particular Open edX instance may need to tag all of their content with “Author”, “Publisher”, “Copyright Owner”, and “Department”.

    In Neo4j syntax, a parametrized Author tag could look like:

    MERGE (user:User:TaggableEntity {type: 'user', externalId: 345})
    MERGE (block:Content:TaggableEntity {type: 'content', externalId: 'block-v1:a+b+c'})
    MATCH (taxonomy:Taxonomy {name: 'MyOrg Attribution Taxonomy', type: 'public'})-[:CONTAINS]->(tag:ParametrizedTag {tag: 'Author'})
    CREATE
       (block)-[:TAGGEDWITH]->(x1:AppliedParametrizedTag)-[:TAGGEDWITH]->(tag),
       (x1)-[:PARAMETER]->(user)