Open edX AuthZ Framework Long-Term Vision

Open edX AuthZ Framework Long-Term Vision

This document wil be used for the MVP and later phases of the AuthZ project so it’s not limited to the long-term or immediate needs.

Overview

This document proposes the long-term architecture for our new authorization framework built on Casbin. It sets the foundation to meet the current requirements, guides the MVP, and leaves room for future tooling and growth. The aim is to provide a high-level, concise direction that addresses immediate priorities while staying flexible for long-term use cases.

The framework will:

  • Resolve current limitations of the legacy roles and permissions system.

  • Provide a solid foundation for the MVP, starting with Content Libraries.

  • Enable long-term evolution toward ABAC and extensibility.

  • Integrate with the Admin Console as the central point for policy management.

Architecture Overview

The framework's design is guided by a set of principles that shape its technical architecture:

  • Centralized enforcement: all services delegate authorization checks to a single layer.

  • Abstraction over Casbin: Open edX services interact through stable APIs without direct exposure to Casbin internals.

  • Extensibility by design: plugins can contribute roles, permissions, and policies.

  • Explainability and auditability: authorization decisions are transparent and traceable.

  • Simplicity first: start with scoped RBAC, deferring ABAC and advanced matchers until needed.

Components & boundaries

1. Authorization Engine

The authorization engine is responsible for managing all authorization processes, including enforcement, policy evaluation, assignments, roles, and permissions. It externalizes access control logic from application code, providing a centralized and consistent way to determine if a subject (user) can perform an action on a resource.

Casbin

Casbin Key Notes

We will use Casbin as the authorization engine for the Open edX ecosystem. Casbin is a powerful and efficient open-source library that enforces authorization by combining two artifacts:

  • A model (model.conf) that defines how requests and policies are structured and evaluated.

  • A policy (policy.csv or database entries) that contains the actual rules.

The Casbin Enforcer ties these together, loading the model and policies, evaluating requests, and returning a decision. We’ll use the production-ready Python (pycasbin) library and Django integration (django-authorization) for Django native APIs.

Model.conf for Open edX

This tells the system how the access control model is going to behave: how to ask questions, how to define roles, permissions, and assignments (policies), how to group roles, permissions and users, and how to match requests to what’s in the policies.

For our use case, we want to support RBAC initially, but with the intent of supporting more flexible use cases with an ABAC approach. We should also consider the following:

  • Objects must belong to a scope → We represent scope/containment in two possible ways:

    • Explicit Grouping: store child→parent edges as g* entries in the Casbin policy store.

    • Dynamic Containment: resolve containment at runtime with in_container() helper.

  • Support hierarchical containers → The model supports multiple levels of containment (org → course → report, org → lib → asset, …) using either:

    • g* chains (explicit)

    • Lookups by in_container() → dynamic

  • Authorization at the object level → Policies and matchers operate on object IDs and container labels. Should we also consider API paths?

  • Scoped role assignments → Roles are always scoped with a g line:

    • g, user:<id>, role:<id>, <scope>
    • Scope may be an org, a course, a library, or * (global).

  • Global scope support → Use * in role assignments to represent system-wide scope:

    • g, user:maria, role:admin, *
  • Direct org-level permissions →

    • p, role:admin, org:DemoX, ^(read|write)$, allow
  • Type-wide policies → Policies may use type-wide patterns (course:*, lib:*, report:*, asset:*).

    • These patterns exist only in the policy model, not in raw object IDs.

    • Matching is handled by a helper (type_match).

  • Exception handling → Exceptions are implemented with deny overrides.

    • p, role:admin, course-v1:OpenedX+DemoX+DemoCourse, ^(delete)$, deny
  • Actions grouping should happen in the model.conf → what does manage mean? etc

  • Deny overrides → Policy effect is defined as:

    • e = some(where (p.eft == deny)) ? deny : some(where (p.eft == allow))
  • Readable and consistent labels → All subjects and roles use labels (user:maria, role:admin).

    • Containers use prefixes (org:DemoX, course-v1:*, lib:*).

    • Raw IDs remain unchanged (e.g., course-v1:OpenedX+DemoX+DemoCourse).

  • Extensible grouping → The model is open to adding new container types (e.g., unit:*).

    • Explicit Grouping: add new g* edges.

    • Dynamic Containment: extend in_container() logic.

  • Performance-minded (need to choose between these) → Enforcement is designed to be efficient:

    • Explicit Grouping: relies on Casbin’s fast g* graph evaluation.

    • Dynamic Containment: requires caching inside in_container() and classify() to avoid repeated lookups.

    • Explicit containment: developers must deliberately write all containment decisions while writing checks.

Review for model.conf and authz.policy proposals: How to Model: model.conf and authz.policy

2. Policy Storage

Our policy file: authz.policy

Casbin stores policies in a datastore managed through an adapter, which exposes APIs for loading, querying, and updating rules.

For Open edX, we use the django-authorization library with the Django ORM adapter, enabling persistence in MySQL and access through a standard Django interface.

Policy storage will support:

  • Static policies: shipped in authz.policy, defining default roles, permissions, and (if needed) assignments. These act as safe defaults on startup.

  • Dynamic policies: created or updated at runtime through the Casbin API, persisted through the adapter.

This setup ensures predictable defaults while supporting flexible, runtime policy changes.

3. Open edX Layer

The Open edX Layer acts as a mediator between services and Casbin. Its main goals are:

  • Prevent Casbin’s internals from leaking directly into services.

  • Lower the cognitive load for developers and operators when using the new system.

  • Provide Open edX–specific definitions that bootstrap the entire authorization framework.

Services must never interact with Casbin directly. They will import the Open edX Layer or call its APIs instead, ensuring a consistent abstraction across the platform.

Core Functions

Capability

Details

Capability

Details

Purpose & abstraction

Services never call Casbin directly. This layer mediates all enforcement/management, hides Casbin internals, and lowers cognitive load.

API Contracts

Self-explanatory JSON schemas for enforcement and management. Endpoint to retrieve the current auth model per service. Explain API for debugging decisions. Consistent request/response + error taxono

Enforcement utilities

Functions: has_permission, get_roles_for_user, get_permissions_on_scope, get_current_auth_model; strongly typed requests, batch checks. Maintain query parity with the current system (query coverage to be validated).

Management views

CRUD for roles, permissions, assignments. Clear JSON requests write map to Casbin policies (role→permission, user→role).

Casbin specifics

Matchers (has_role, has_attribute, keyMatch), adapters, role managers. Hierarchy management helpers.

Model configuration

Ship default model.conf per service; allow volume overrides (tutor); support RBAC by default and ABAC incrementally (domains, patterns, conditionals); matchers and hierarchy management included.

Policy defaults

Per-service defaults via Tutor plugins/config files; CLI management for policies (Tutor - Casbin Go library support?); bootstrap safe roles & permissions.

Adapters & storage

MySQL via Django ORM adapter; integrated with django-authorization. Uses Casbin as a library (pip) with necessary wiring working out-of-the-box.

Consistency framework

Keep Casbin in sync with domain data. Model links (FK) rules to Open edX objects; when the main object is deleted, the shell entry deletes and the related policies are removed. Event-based cleanup (*_deleted), optional resource registry, reconciler as fallback. Prune the Casbin table regularly to remove orphans.

User lifecycle management

On user delete, remove related assignments from the policy store.

Logging & error handling

Structured logs for enforcement & management (who/what/why, matched rule), sensitive fields redacted. Clear error codes/messages.

Testing helpers

Local rule simulation, utilities for unit/CI tests, dry-run of enforcement.

Tutor integration

Config files, default policy bundles, and optional CLI management (Go-based cli support if needed).

Closed-to-modification

Do not modify Casbin core; build everything on top with an immutable core approach.

Implementation details

The Open edX Layer will be delivered as a combination of:

  • A Django plugin,

  • A Tutor plugin,

  • An external library,

  • A Tutor patch,

  • Management UI

With Casbin specific components like:

  • Adapters → Django ORM adapter with MySQL, managed by the AuthZ Layer.

  • Enforcer → use SyncedEnforcer for thread safety in multi-threaded environments.

  • Watcher → use auto-reload or watchers to ensure we’re reading the latest policies

  • Error handling → consistent JSON errors, structured logs with redacted sensitive attributes.

  • Testing → helpers for simulating rules, policy snapshots for CI, local dry-run utilities.

4. Client Service (LMS/CMS, and other IDAs.)

Client services (LMS/CMS, and other IDAs) consume the authorization framework. They must not interact with Casbin directly. Instead, they rely on the Open edX Layer for both enforcement and management, which abstracts Casbin internals and provides stable APIs.

  • Policy defaults specific to service → each service manages its own default policies (e.g., LMS defines collaborator roles for libraries) which are used by the authorization engine. This file should be hosted by a tutor plugin specific for the service, overriding the policy defaults in case that’s needed.

  • Model.conf → can use the default model but may also extend it via Tutor as well if needed behavior change.

  • Use of enforcement utilities → rely on queries and enforcer helpers (has_permission, roles_for_user, permissions_on_scope).

  • Typed structs/JSON requests → strongly typed request/response contracts for documentation, unit testing, and consistency.

  • Integration workflow:

    • Service calls → AuthZ Layer → Casbin Enforcer → Policy datastore → decision returned.

    • Lifecycle events keep policies in sync with domain objects / FK keys linking.

  • Operator experience → services do not see Casbin tables (p, g, v0..vN); they consume clean JSON APIs with human-readable fields.

  • Extensibility → Tutor plugins or service-specific bundles provide default roles and permissions; plugins can contribute new roles/permissions. On uninstall, explicit warnings/errors are raised (not silent cleanup).

  • Other Clients: MFEs (micro-frontends) consume the same APIs through the AuthZ Layer.

Workflow Diagram

Excalidraw — Collaborative whiteboarding made easy

AuthZ-Ecosystem-Request-Workflow.png

Data & Storage Model

We have already established that policies will be stored in MySQL, with Casbin integrated through the Casbin adapter. This section provides more detail on how policies and related data will be managed, including consistency, pruning, caching, and operator overrides.

  • Static policies

    • Shipped in authz.policy files and loaded into the adapter at initialization.

    • These files are immutable: they define only the default roles, permissions, and (if needed) assignments.

    • If a new role is created through the API, it is persisted in the database and does not modify the static file.

  • Dynamic policies

    • Created and updated at runtime through the Casbin API.

    • Persisted directly in MySQL through the Casbin adapter.

  • Consistency strategy (Backreference/proxy model)

    • Provides transactional consistency between domain objects and policies.

    • Each Casbin policy is linked to an Open edX object via a proxy model with foreign keys (e.g., User–Resource).

    • When the domain object is deleted, the proxy entry is deleted in the same transaction, ensuring related policies are also removed.

  • User-role assignments → managed in the policy DB adapter.

  • Role-permission mappings → managed in the policy DB adapter.

  • Role-role hierarchies → stored in the policy DB adapter.

  • Policy loading strategy

    • Loading the full set of policies into memory is not feasible at scale.

    • Instead, policies should be loaded in chunks or subsets as needed for specific requests.

    • On invalidation, reload only the affected subset.

    • This avoids both scalability and consistency problems.

    • Casbin offers watchers to synchronize enforcers across instances, but there is currently no support for MySQL watchers (Casbin watchers)

Policy Management & Discovery

Can someone do something? (blocking access control)

As we mentioned above the services that directly use the authz layer as a dependency (library) must import the APIs (api.py) offered by the authz layer to enforce checks. The minimal questions a service might ask are:

  • Can User X do Action on Object Y in Scope Z? → can(user, permission/action, object, scope) → allow/deny

    • Should be used in all enforcement points, and processes should be blocked until a response is returned

  • And other variants:

    • Can User X do Action on (specific) Object?

    • Can User X do Action on (more generic than object) Scope?

  • Also in batch:

    • Can User X do multiple actions [(Action1, Object1, Scope1), (Action2, Object2, Scope2), ...]?

    • What can User X access from this list of resources?

If the question asks specifically about an object, the scope can be optional, considering the scope as the same object.

The authz layer will also include REST APIs for communication over the network when needed. For example, for permission-aware access & routing, clients must consume the authz layer REST APIs to know whether a user has permissions over specific components or to get authorization data over users or resources. For example:

Discovery & filtering

  • What roles does User X have?

  • What roles does User X have in Scope Y?

  • Who has Role X?

  • Who has Role X in Scope Y?

  • What policies exist?

  • What role assignments exist?

  • What users have roles on Scope Y?

What about Bridgekeeper?

Our current system uses Bridgekeeper for advanced filtering, including:

  • Regarding permissions, Bridgekeeper is used in some views to do user.has_perms(). This behavior can be replicated in Casbin through the e.enforce() method.

  • Regarding queries, Bridgekeeper is also used to filter model QuerySets, such as ContentLibrary. Casbin, however, doesn’t provide a built-in mechanism for this type of query filtering →

  • Additionally, Bridgekeeper defines certain context-related rules, such as is_studio_request, is_course_creator, is_active_user. In Casbin, handling such cases would likely require the use of a custom matcher.

Our current options for continuing to support this level of filtering without overcomplicating policies would be a hybrid of Casbin (via custom matchers if necessary), Django ORM, and Bridgekeeper, always prioritizing the use of Casbin & Django ORM to build complex queries that depend on domain objects, for example, retrieving all libraries a user can access.

Management

  • Add/Edit/Remove User X to Role Y in Scope Z

  • Add/Edit/Remove permission for Role X to do Action Y on Object Z

  • Assign Role X to user Y in Scope Z

  • Does policy/assignment exist?

Extensibility

  • Add new roles, permissions, and assignments via the Policy API

    • Casbin policies are stored in authz.policy or in the database through an adapter. Policies can be managed at runtime using the Management API (add_policy, remove_policy, etc.).

    • This allows us to create or remove roles, permissions, and user–role assignments through our own APIs.

  • Load default roles, permissions, and assignments during initialization

    • Default rules can be placed in authz.policy or stored in the DB through an adapter.

    • Tutor can be used to inject defaults at service startup.

    • Another option is to load defaults through a Django plugin during initialization.

    • Casbin automatically handles duplicates in its policy storage.

    • Consistency note: loading defaults through plugins could create traceability problems. We must ensure there is a clear record of what plugin introduced each rule so that defaults can be audited and reproduced.

  • Extend model.conf

    • The model file is static, based on Casbin’s PERM metamodel (request_definition, policy_definition, policy_effect, matches).

    • We can add new sections for extra role graphs, domains, or conditions in the matcher.

  • Extend the default policy

    • Policies can be extended at runtime with new entries (add_policy) or patched by modifying the default file or directly in runtime using the adapter.

    • This may not be required if defaults are already injected by Tutor or Django plugins at initialization, but it remains an option for incremental changes.

  • Add new matchers or functions

    • Casbin allows registering custom functions (add_function) and referencing them in matches.

    • This enables extending the decision logic beyond the built-in operators (keyMatch, regexMatch, ipMatch, etc.).

    • Developers can replace or expand matchers in model.conf with these custom functions.

  • Tutor plugins or service-specific defaults

    • Tutor plugins can provide default roles and permissions - by overriding configuration files.

    • Plugins can also contribute new roles and permissions.

    • On uninstall, explicit warnings or errors are raised. Silent cleanup is not allowed.

Performance & Consistency

https://casbin.org/docs/performance/

Policy Design

  • Don't duplicate rules in authz.policy. Design role ↔ permission carefully; these checks run all the time.

  • Grant permissions to roles, then assign users → roles (keeps lookups fast; smaller policies).

  • Load testing required for the policies we'd define to ensure we don't create unnecessary paths when evaluating authz.

Enforcer Management

  • One enforcer per process: each LMS/CMS worker has its own enforcer; each Celery worker too.

  • Initialize enforcer once per process, not per request → or consider any other more performant strategy. Do we keep one enforcer per process and swap scopes with load_filtered_policy(...), or use a small per-scope pool (LRU)?

Policy Loading Strategy

  • Avoid per-call reloads: do not touch the DB/adapter on every enforce(...). Load once, reuse in memory.

  • Only one real cache in Casbin: the decision cache (e.g., cached_enforce(sub, obj, act)), which caches answers to identical S-A-O checks. It does not cache rules.

  • If you load every time you enforce a decision, you risk DB storms. Example: 50 concurrent requests → 50 DB hits. Never load policies "per enforce"; only on first use / refresh.

cache-workflow.png

Policy Residency Management

  • Load a subset with load_filtered_policy(scope) on the first request for that scope.

  • Keep it in memory and reuse; don't reload on every request.

  • Optionally keep a small per-process pool (LRU) of ready scopes (e.g., per org).

  • Take advantage of locality: if most requests in a short window hit the same org/component (e.g., editing a library), load that org once so the next checks don't hit the DB → drop entries for that scope when policy changes.

Consistency Across Processes

  • Watchers are required so every enforcer (web and Celery) sees updates and calls load_filtered_policy(...) or load_policy() as needed.

  • Consistency is a must: if enforcers are spread across multiple processes, make sure updates propagate.

  • Duplicates: Casbin tolerates duplicates out-of-the-box. Still try to keep policy DRY.

Fastbin

Correctness in the long term

  • The authz layer must ensure that the performance threshold set doesn’t change when adding new rules. For that, a set of testing mechanisms for benchmarking and backward compatibility with the rules that are in place.

  • Can we use something similar to the Casbin web editor to test out the correctness of our policies? https://casbin.org/editor

Observability & Audit

https://casbin.org/docs/log-error/, also consider building an aspects dashboard for access control to improve visibility.

Evolution [WIP]

Here’s a proposal for the evolution of the authz project from the MVP to a robust authorization system compliant with Open edX long-term requirements:

Phase 1: MVP

  1. Create a solid model.conf to test Casbin with a use case close to what we'll implement.

  2. Build engine utilities for the Casbin-based authorization engine. This includes enforcers, adapters, matchers, and other Casbin-specific tools needed for our APIs.

  3. Develop APIs as the main interface to be used by services and our own REST APIs (this is our api.py).

  4. Add REST APIs which consume our api.py.

  5. Drop in replacement of authorization management for Libraries.

    1. Need drop in replacement for bridgekeeper to maintain same use cases.

Phase 2: TBD!

Risks & Open Questions

  • Can Casbin alone handle the same queries and features as bridgekeeper? Already documented here: