Background: Community Needs Driving The Project
We’ve heard many use cases expressed that illustrate the need to align course content to taxonomies. These use cases are diverse and span a range of needs and outcomes. However, they are all underpinned by a common need of connecting course content to taxonomies, or controlled vocabularies. This is often expressed by the need to “tag content”, or to “add tags to content”.
The outcomes of these use cases range from improving content authoring and content reuse workflows; to enabling content recommendation for learners; to supporting instructional design goals. Some examples include:
“As a content author, I want to attach topic tags to questions I create in Content Libraries so it's easy for me to find questions by topic and reuse them in different assessments.”
“As a content author, I want to search for all the videos we’ve created that are about a certain topic, like “soil health” so that it’s easy to find the content I need in my Library and use it in my courses.”
“As an Instructor, I want to be able to search for all content tagged for a certain skill. For example, I may have a student who needs more practice with factoring binomial equations. I want to search for and find all the content tagged for factoring binomial equations. Even better, I want to refine the search for all formative assessments that cover factoring binomial equations. This will make it easier for me to connect my learners with the content they need to fill knowledge gaps.”
“As a learning designer, I want to tag units for competencies and learning objectives so that I can check for alignment between unit learning objectives and the overall course objectives.”
“As a marketing team lead, I want to be able to make content recommendations to learners based on prior content they have interacted with.”
“As an administrator, I want to integrate an adaptive engine to our platform that can make content recommendations based on learner profiles, interests and goals.”
Value and Impact
By building platform capacity to align content with taxonomies, or to “add tags to content”, we can deliver many benefits to all Open edX user personas:
Value for authors and instructional designers
Align any part of your course to a competency or skill
Organize content in your libraries around competencies or skills, such as problem banks for factoring binomial equations or videos aligned to factoring binomial equations
Organize content in your libraries by subject, such as videos about ethnology or assessments about constitutional law
Conduct targeted searches in your library, such as videos that cover the skill of factoring binomial equations
Analyze student understanding of/engagement with particular competencies or skills.
For example, identify trends, such as high failure rates on assessments aligned to a certain skill, and assess the quality of the content used to teach that skill
For example, identify knowledge gaps for individual learners or learner cohorts and customize content for them (or allow a third-party adaptive engine to do so)
Define prerequisite relationships between sets of content
Value for administrators
Standardize and control the taxonomies your authors are using to align content to competencies, skills, subjects
Choose to author your own taxonomy, or ingest third-party taxonomies, such as the Open Skills Network taxonomy or Lightcast Skills taxonomy
Value for learners
Discover specific content in courses to fill competency or skill gaps
Access more robust data to inform decisions on which courses to pursue
Value for organizations:
Teaching and Learning: Stepping stone toward unlocking the potential to integrate with adaptive learning services. Aligning course content to tags is a step toward enabling external services to build complex, adaptive experiences that are customized to individual learner profiles.
Teaching and Learning: Stepping stone toward unlocking other modular learning capabilities, such enabling learners to build self-directed learning pathways, or enabling course teams to build more flexible and diverse types of learning presentations.
Marketing and Discovery: Stepping stone toward unlocking personalized recommendations.
What We Propose To Build
From the above use cases, we can distill a core set of generalized platform capabilities. By building these core capabilities, we will enable the Open edX community to unlock the value described in many of the above use cases.
The platform must support the capability to associate tags with content at all levels of content. This includes adding tags to individual components (text blocks, video blocks, question blocks), to all parts of the course (units, subsection and sections), and to full courses (and any new learning presentation types that might evolve in the future).
Tagging must be integrated into authoring workflows in Content Libraries and in the course outline.
Tags must be designed to align with content reuse workflows, ie integrated in Content Libraries and the course outline.
The platform cannot be prescriptive about all taxonomies organizations use. Rather, it must enable instances and organizations to customize taxonomies to meet individual needs.
Conversely, the platform must support a core set of standardized, platform-wide taxonomies to enable platform-wide use cases such as content search, content recommendations and more.
The platform must support use/ingestion of third-party taxonomies and/or the ability to create custom taxonomies from scratch.
The platform must support the capacity to create hierarchical and horizontal relationships between taxonomies.
Tagging infrastructure must be designed neutrally, in order to support multiple purposes, from instructional design needs to content management needs.
Tagging infrastructure must be designed with a unified UX/UI experience in mind, particularly between content authoring and reuse workflows between Content Libraries and the course outline.
Note: Adding content tagging capabilities will follow on the release of Content Libraries, V2 (H1 2023). Content Libraries, V2 enables authors to create videos, text and problems in Libraries and reuse them in any course. More information here.
How We Propose To Build It
Until now tagging capabilities have been built to support the delivery of specific, narrowly scoped, features. For example, the Discovery IDA has both a simple feature to provide tagging via the Django taggit library, and a custom set of models to allow the mapping of a skills taxonomy to courses. Additionally, a number of course-run-specific metadata fields are stored in the course_metadata_courserun in the edx-platform database. These fields require database migrations to add and remove and are specific to particular deployments of the platform in many cases. While many of the details of the technical implementation will not be understood until the technical design has been approved, there are important considerations that we can list regarding our expected approach.
First, the approach should be appropriate for a shared software platform. The addition of meta-data should be under the control of each deployment – with a set of sensible, platform-wide defaults – and should not require changes to the software core to add or remove meta-data.
Second, the approach should consider that the inability to flexibly apply meta-data to content has been an impediment to a number of efforts. However, the disaggregated user pain was not sufficient to inspire any team to build a general service to date. From a platform point of view, when we sum up user pain, it is obvious that a general capability for mapping meta-data to course content would unlock value across the platform.
Third, we believe that meta-data added by the owners of models should be authoritative. Yet we acknowledge that that data is valuable for other critical needs. For example, authors of content should own adding – or approving – meta-data about the content they create. Authors should define the content's level of difficulty, its expected time to complete, and the skills it teaches. However, this meta-data is also critical for marketing courses, computing learner analytics, and recommending content.
A general capability for classifying course content and appropriate APIs will have a number of benefits. First, it will allow us to deprecate limited implementations reducing confusion and maintenance costs. Second, it will allow us to share metadata across domain boundaries while ensuring appropriate ownership of that data. Third, it will make the software a better platform allowing specific instances of the platform to apply the appropriate meta-data to course content without altering the platform core. Fourth, it will allow us to converge on a single vocabulary for platform metadata and to disambiguate terms like tag, taxonomy, metadata, etc.
Key Capabilities
Tagging Infrastructure
The platform will support tagging based primarily on name-value pair style tags. It will support the following four types:
Field Type | Description | Field (example names) | Tag (value) | Author Experience |
Free-form | Open-ended option to allow any author to add any desired tags; these are all collected in a single (possibly invisible) field for good data management | Tags | Anything author wants to enter | Author sees a field to add as many tags as desired and can free-form add. Could include predictive suggestions of existing tags as they type. |
System-defined | Core tags controlled at the platform level in order to keep consistency across search and discovery and for general messiness control. Admins would not be able to change these. | Language, format/content type, organization | Specific set against each of these. | Many of these would not be visible but some might be integrated into facet search (e.g., content format, language). Most would not be editable. |
Admin-defined fields | Admins can set up specific fields that authors can free-form enter tags on. These would typically be used for instances in which there could be many possible tags but admin wants them organized separately from free-form tags | Outcomes, Learning Objectives | Anything author wants to enter | Author is presented with the field and can enter a single free-form value. |
Admin-defined closed taxonomies | Admins can set up specific fields that require closed taxonomies (including selecting from existing, uploading, manually creating). Includes ability to create child and grandchild hierarchies | Lightcast skills, state standards | Biology, Microbiology | Author is presented with the field and a drop-down (or other UI selection element) to select the value. |
Example of Name-Value Pairs might include:
Field | Tag |
subject | eg Anthropology |
competency | eg Conflict Resolution |
skill | eg Define Stakeholder Roles |
curriculum alignment | eg Operations and Algebraic Thinking |
learning outcome | eg Factor Binomial equations |
level of difficulty | eg Medium |
hours of effort | eg 4 - 6 |
prerequisites | eg Linear Algebra |
2. Taxonomies will support localization. This will hold true for platform-supported taxonomies and imported taxonomies.
Tag Behavior
All of the fields and associated tags will display on content whether it’s in a Library or in a course outline.
The fields and tags will display with a piece of content when it moves from a Library to one or more courses. The same holds true in reverse, when a tag is added to a course section in Studio and that content is exported to a Library.
The tags and taxonomies that an author sees are determined by the taxonomies chosen and configured for their Instance or their organization. So if Instance A uses the Lightcast Skills taxonomy and Instance B uses the Open Skills Management Taxonomy, each will only see the tags in the taxonomy configured for them.
Platform Taxonomies
The platform will identify a core set of fields and taxonomies that are established and standardized, such as “language”, “content format” and “organization”. The platform will control these taxonomies to mitigate messy data and so that these fields can be used for faceted search infrastructure in libraries.
These core fields may be auto-generated from core data models in content blocks.
The fields and taxonomies will be unchangeable by the user.
Many of these would not be visible but some might be integrated into facet search (e.g., content format, language). Most would not be editable.
2. The platform will offer a few recommended fields and a “menu” of optional closed taxonomies for those fields. For example, we may suggest a field for “skill” and offer three skills taxonomies for optional use.
Administrators would have the option to choose one of the taxonomies, upload their own taxonomy, or not use that field at all.
Admin User Stories
Admins can set up as many specific fields as they’d like that authors can free-form tags on.
Admins can set up as many specific fields as they’d like that are associated with closed taxonomies such as the Open Skills Network taxonomy with the “Skills” field, or the Core Subject Taxonomy for Mathematical for Mathematical Sciences Education with the subject taxonomy. Admins would have three pathways to associate closed taxonomies with fields:
Choosing from a menu of taxonomies that already exist in the platform
Ingesting a new external taxonomy
Creating a taxonomy from scratch
Admins can apply taxonomies:
across courses and libraries in an Instance
across courses and libraries in an organization
to specified organizations in an Instance
Admins can share taxonomies and tags across instances
Administrators can create hierarchical relationships (child and grandchild) between tags.
Administrators can create horizontal relationships between tags.
Ability to require that certain tags are filled in, for example all competency tags must be filled in.
Admins can choose to add AI-generated tags from particular taxonomies to particular content sets in bulk.
Admin Experience: Taxonomy Management System
Environment for administrators to create a new taxonomy and edit it.
Mechanism to ingest third-party taxonomies and edit them.
Permission controls for editing, managing and finalizing taxonomies.
Option to enable bulk-add using AI generation.
Version control
Only one version of a taxonomy will be supported at a time
Author User Stories - Creating Tags
Authors can see tags displayed in content blocks in Libraries and in each level of the course hierarchy in Studio.
When content is reused from Libraries in a course, authors will be able to add additional tags to the content, but will not be able to change any tags that were associated with the content in the Library.
Authors can add free-form tags.
UI where author just sees a field to add as many tags as desired, with predictive suggestions.
Authors can add free-form tags on admin-defined fields.
UI where author is presented with the field and can enter a single free-form tag.
Authors can choose from a tag from an admin-defined closed taxonomy.
UI where author is presented with the field and a drop-down (or other UI selection element) to select the value.
Permissions for adding or editing tags follow the same logic and permissions structure as editing content.
Author User Stories - Content Management & Search
Authors can conduct basic keyword search functionality on all content in libraries
Authors can utilize facet-style filtering on search results
Authors can conduct advanced searches such as “all videos covering topic X and competency Y”.
Authors can group content by tags, such as all videos tagged with Competency X.
Proposed Definitions
Tag: Any application of metadata to an object. In the Open edX context, a user story may be, “I want to add subject tags, or skills tags, to videos, to units, to sections, or to a course.”
Taxonomy: A controlled vocabulary in which all the values belong to a single hierarchical structure and have parent/child relationships to other terms, or horizontal relationships to other terms. For example, the Core Subject Taxonomy for Mathematical Sciences Education or the Open Skills Network Taxonomies.
Name-Value Pair: The mechanism to relate tags to data sets, where name functions as the constant that defines the data set, and value functions as the variable tags that belong to the set. For example:
Field (name) | Tag (value) |
Subject | Biology |
Anthropology | |
History |
Some tags may require hierarchies up 3+ layers deep. For example,
Field | Parent Tag | Child Tag | Grandchild tag |
Subject | Biology | Genetics | Molecular Genetics |
Anthropology | Cultural Anthropology | Ethnology | |
History | Political History | Constitutional History |
Tags require multi-select functionality (each field supports multiple values). For example,
Field | Tag |
Subject | Biology, Chemistry |
Anthropology, Music | |
History, Geography |
Projects for future product discovery
Does tagging extend to other entities besides content, such as people?
Do tags display for learners in the LMS?
Do we extend content searching capabilities to the course outline in Studio?