Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Blockstore proposals are fairly long, so this doc tries to summarize some of the principle differences.


Table of Contents

Collections

  • All proposals center ownership, permissions, and licensing around the concept of a Collection.
  • Each piece of content belongs to exactly one Collection.
  • Examples:
    • single course (possibly multiple runs)
    • problem bank
    • library of videos created by a video team

Differences: Collection Versioning

Original and Database Proposals

  • Collections point to versioned content, and are also versioned as a whole.

File Proposal

  • Collections point to versioned content, but the Collection itself is not versioned.

Content Primitives

...

Content Primitives

  • Identified by UUID.
  • Versioned numerically (1, 2, 3, etc.)
  • Tagging metadata is stored outside of the core Blockstore.

Differences: Granularity and Versioning

Original Proposal and Database Proposal

  • Files/Assets are tracked individually.
  • Units are tracked individually.

Original Proposal

  • In addition to per-Unit and per-File tracking, ContentSets (a group of Links) are also versioned.

File Proposal

  • ContentBundles are versioned as a whole, not individual assets inside them.
  • Depending on the intended usage, a ContentBundle could be a single video, a Unit, or an entire Sequence.

Differences: OLX vs. Assets

...

  • All content is stored in Content Bundles, which is like a small directory of files.
  • The OLX for a Unit would go into an XML file in a ContentBundle.
  • All Bundle content is stored in an S3-like object store.
  • Metadata about what content constitutes a particular version is in the object store, not the database.
  • Assets used by the Unit would go into the same ContentBundle.
  • Advantages
    • Units are more self contained.
    • Easier to adapt for use cases outside of Open edX, since ContentBundles don't assume an OLX/Assets divide.
    • Easier to associate bundles of related Assets, like a Video's various encodings, subtitles, thumbnails, etc.
    • Cheaper storage.

Differences: Granularity and Versioning

Original Proposal and Database Proposal

  • Files/Assets are tracked individually.
  • Units are tracked individually.

Original Proposal

  • In addition to per-Unit and per-File tracking, ContentSets (a group of Links) are also versioned.

File Proposal

  • ContentBundles are versioned as a whole, not individual assets inside them
    • .
  • Depending on the intended usage, a ContentBundle could be a single video, a Unit, or an entire Sequence.

Differences: Modeling Sequences and Courses

...

None of the proposals really addresses this, but all of them assume that there will be an external system (either a plugin or separate service) that uses ElasticSearch as a backend.

Collections

  • All proposals center ownership, permissions, and licensing around the concept of a Collection.
  • Each piece of content belongs to exactly one Collection.
  • Examples:
    • single course (possibly multiple runs)
    • problem bank
    • library of videos created by a video team

Differences: Collection Versioning

Original and Database Proposals

  • Collections point to versioned content, and are also versioned as a whole.

File Proposal

  • Collections point to versioned content, but the Collection itself is not versioned.

Neither of these stances is fundamental to the designs.

Questions for Discussion

  1. What are the use cases for Collection-level versioning?
    1. Licensing version ranges for the Collection.


Meeting Notes

Braden, on File-based:

  • Strengths and Weakness: Flexibility
    • more future-proof
    • Braden concerns
      • Validation
        • DO: OLX validation has to happen in any approach. There is more flexibility in the file-based store though, so it might be harder.
        • BM: Includes and references as part of the XML
          • What depends on what specifically
            • Pulling out a problem from a Unit Bundle, what assets does that Bundle need?
      • Queryability
        • Does not capture fine grained dependencies (individual assets, units?)
          • Use case:
            • 20 PDFs that I link to.
            • If there are multiple Units that link to the same PDF, link bundle to each unit that needs it.
            • No way to know which PDFs are used where.
            • NA: Are we going to expose the concept of a Bundle to users?
            • BM: Solution: When you link to another Bundle, can specify list of file paths that you're using.
              • NA: Will need to layer this onto search functionality.
        • Keeping metadata in sync with S3 source of truth
          • Version information, extract names
          • BM: Would be useful to know what's in use and what's not.
            • Could track via S3, CDN logging
          • Use Case: Blockstore V2. Need to take horrible task of transforming all the data.
      • Doesn't offer many features
        • Course Structure (getting complete graph of course)
        • Versioning of Collections
          • Licensing changes use case?
          • Release Notes
        • Relies more heavily on search capability (since no DB query)
        • Can't do anything smart related to contents
          • BM:
            • Not sure if Blockstore is the right level for it, but XBlock data migrations.
            • OLX Validation in general. (plugin)
            • Having to do boilerplate to validate data integrity.
              • Does a file's contents match its name?
              • Does a bundle's contents match its declared type?
              • Does a reference to another bundle actually exist?
              • Does a version referenced actually exist?
  • Nimisha questions:
    • Theoretically decoupled from the fact that it's OLX, or that it's a filesystem...?
    • Blockstore used for read/write/edit, but read-optimized is a separate store?
      • Yes, read-optimized LMS store is separate
      • PP: Read-optimized store supporting adaptive learning?
        • BM: Have something higher level on top of Blockstore that stores Course Outlines
    • Search for re-use?
      • ES backed, separate module in Blockstore or external to it.


Decision: Moving forward with file-based proposal, but come back to and examine:

  • precursor files
  • extracting reusable problems from Units
  • static file references from within XBlocks, how those translate into LMS URLs
  • Search! Need to get to the bottom of this, use cases, requirements.
  • Garbage collection – is there a good way to do this?
  • Links simplification

Dave to summarize meeting decisions, consolidate wiki entries.

  • OEP 20
    • enumerate main decisions
    • fuller details can be in Confluence
  • Potentially separate OEPs for;
    • Blockstore
    • Layer on top of Blockstore / Mapping edx-platform content
    • Tagging
    • Search