Open edX AuthZ Framework Long-Term Vision
- 1 Overview
- 2 Architecture Overview
- 2.1 Components & boundaries
- 2.1.1 1. Authorization Engine
- 2.1.1.1 Casbin
- 2.1.1.2 Model.conf for Open edX
- 2.1.2 2. Policy Storage
- 2.1.2.1 Our policy file: authz.policy
- 2.1.3 3. Open edX Layer
- 2.1.3.1 Core Functions
- 2.1.4 4. Client Service (LMS/CMS, and other IDAs.)
- 2.1.1 1. Authorization Engine
- 2.2 Workflow Diagram
- 2.1 Components & boundaries
- 3 Data & Storage Model
- 4 Policy Management & Discovery
- 5 Extensibility
- 6 Performance & Consistency
- 7 Observability & Audit
- 8 Evolution [WIP]
- 9 Risks & Open Questions
This document wil be used for the MVP and later phases of the AuthZ project so it’s not limited to the long-term or immediate needs.
Overview
This document proposes the long-term architecture for our new authorization framework built on Casbin. It sets the foundation to meet the current requirements, guides the MVP, and leaves room for future tooling and growth. The aim is to provide a high-level, concise direction that addresses immediate priorities while staying flexible for long-term use cases.
The framework will:
Resolve current limitations of the legacy roles and permissions system.
Provide a solid foundation for the MVP, starting with Content Libraries.
Enable long-term evolution toward ABAC and extensibility.
Integrate with the Admin Console as the central point for policy management.
Architecture Overview
The framework's design is guided by a set of principles that shape its technical architecture:
Centralized enforcement: all services delegate authorization checks to a single layer.
Abstraction over Casbin: Open edX services interact through stable APIs without direct exposure to Casbin internals.
Extensibility by design: plugins can contribute roles, permissions, and policies.
Explainability and auditability: authorization decisions are transparent and traceable.
Simplicity first: start with scoped RBAC, deferring ABAC and advanced matchers until needed.
Components & boundaries
1. Authorization Engine
The authorization engine is responsible for managing all authorization processes, including enforcement, policy evaluation, assignments, roles, and permissions. It externalizes access control logic from application code, providing a centralized and consistent way to determine if a subject (user) can perform an action on a resource.
Casbin
We will use Casbin as the authorization engine for the Open edX ecosystem. Casbin is a powerful and efficient open-source library that enforces authorization by combining two artifacts:
A model (
model.conf) that defines how requests and policies are structured and evaluated.A policy (
policy.csvor database entries) that contains the actual rules.
The Casbin Enforcer ties these together, loading the model and policies, evaluating requests, and returning a decision. We’ll use the production-ready Python (pycasbin) library and Django integration (django-authorization) for Django native APIs.
Model.conf for Open edX
This tells the system how the access control model is going to behave: how to ask questions, how to define roles, permissions, and assignments (policies), how to group roles, permissions and users, and how to match requests to what’s in the policies.
For our use case, we want to support RBAC initially, but with the intent of supporting more flexible use cases with an ABAC approach. We should also consider the following:
Review for model.conf and authz.policy proposals: How to Model: model.conf and authz.policy
2. Policy Storage
Our policy file: authz.policy
Casbin stores policies in a datastore managed through an adapter, which exposes APIs for loading, querying, and updating rules.
For Open edX, we use the django-authorization library with the Django ORM adapter, enabling persistence in MySQL and access through a standard Django interface.
Policy storage will support:
Static policies: shipped in authz.policy, defining default roles, permissions, and (if needed) assignments. These act as safe defaults on startup.
Dynamic policies: created or updated at runtime through the Casbin API, persisted through the adapter.
This setup ensures predictable defaults while supporting flexible, runtime policy changes.
3. Open edX Layer
The Open edX Layer acts as a mediator between services and Casbin. Its main goals are:
Prevent Casbin’s internals from leaking directly into services.
Lower the cognitive load for developers and operators when using the new system.
Provide Open edX–specific definitions that bootstrap the entire authorization framework.
Services must never interact with Casbin directly. They will import the Open edX Layer or call its APIs instead, ensuring a consistent abstraction across the platform.
Core Functions
Capability | Details |
|---|---|
Purpose & abstraction | Services never call Casbin directly. This layer mediates all enforcement/management, hides Casbin internals, and lowers cognitive load. |
API Contracts | Self-explanatory JSON schemas for enforcement and management. Endpoint to retrieve the current auth model per service. Explain API for debugging decisions. Consistent request/response + error taxono |
Enforcement utilities | Functions: |
Management views | CRUD for roles, permissions, assignments. Clear JSON requests write map to Casbin policies (role→permission, user→role). |
Casbin specifics | Matchers ( |
Model configuration | Ship default model.conf per service; allow volume overrides (tutor); support RBAC by default and ABAC incrementally (domains, patterns, conditionals); matchers and hierarchy management included. |
Policy defaults | Per-service defaults via Tutor plugins/config files; CLI management for policies (Tutor - Casbin Go library support?); bootstrap safe roles & permissions. |
Adapters & storage | MySQL via Django ORM adapter; integrated with django-authorization. Uses Casbin as a library (pip) with necessary wiring working out-of-the-box. |
Consistency framework | Keep Casbin in sync with domain data. Model links (FK) rules to Open edX objects; when the main object is deleted, the shell entry deletes and the related policies are removed. Event-based cleanup ( |
User lifecycle management | On user delete, remove related assignments from the policy store. |
Logging & error handling | Structured logs for enforcement & management (who/what/why, matched rule), sensitive fields redacted. Clear error codes/messages. |
Testing helpers | Local rule simulation, utilities for unit/CI tests, dry-run of enforcement. |
Tutor integration | Config files, default policy bundles, and optional CLI management (Go-based cli support if needed). |
Closed-to-modification | Do not modify Casbin core; build everything on top with an immutable core approach. |
Implementation details
The Open edX Layer will be delivered as a combination of:
A Django plugin,
A Tutor plugin,
An external library,
A Tutor patch,
Management UI
With Casbin specific components like:
Adapters → Django ORM adapter with MySQL, managed by the AuthZ Layer.
Enforcer → use SyncedEnforcer for thread safety in multi-threaded environments.
Watcher → use auto-reload or watchers to ensure we’re reading the latest policies
Error handling → consistent JSON errors, structured logs with redacted sensitive attributes.
Testing → helpers for simulating rules, policy snapshots for CI, local dry-run utilities.
4. Client Service (LMS/CMS, and other IDAs.)
Client services (LMS/CMS, and other IDAs) consume the authorization framework. They must not interact with Casbin directly. Instead, they rely on the Open edX Layer for both enforcement and management, which abstracts Casbin internals and provides stable APIs.
Policy defaults specific to service → each service manages its own default policies (e.g., LMS defines collaborator roles for libraries) which are used by the authorization engine. This file should be hosted by a tutor plugin specific for the service, overriding the policy defaults in case that’s needed.
Model.conf → can use the default model but may also extend it via Tutor as well if needed behavior change.
Use of enforcement utilities → rely on queries and enforcer helpers (
has_permission,roles_for_user,permissions_on_scope).Typed structs/JSON requests → strongly typed request/response contracts for documentation, unit testing, and consistency.
Integration workflow:
Service calls → AuthZ Layer → Casbin Enforcer → Policy datastore → decision returned.
Lifecycle events keep policies in sync with domain objects / FK keys linking.
Operator experience → services do not see Casbin tables (
p, g, v0..vN); they consume clean JSON APIs with human-readable fields.Extensibility → Tutor plugins or service-specific bundles provide default roles and permissions; plugins can contribute new roles/permissions. On uninstall, explicit warnings/errors are raised (not silent cleanup).
Other Clients: MFEs (micro-frontends) consume the same APIs through the AuthZ Layer.
Workflow Diagram
Excalidraw — Collaborative whiteboarding made easy
Data & Storage Model
We have already established that policies will be stored in MySQL, with Casbin integrated through the Casbin adapter. This section provides more detail on how policies and related data will be managed, including consistency, pruning, caching, and operator overrides.
Static policies →
Shipped in
authz.policyfiles and loaded into the adapter at initialization.These files are immutable: they define only the default roles, permissions, and (if needed) assignments.
If a new role is created through the API, it is persisted in the database and does not modify the static file.
Dynamic policies →
Created and updated at runtime through the Casbin API.
Persisted directly in MySQL through the Casbin adapter.
Consistency strategy (Backreference/proxy model) →
Provides transactional consistency between domain objects and policies.
Each Casbin policy is linked to an Open edX object via a proxy model with foreign keys (e.g., User–Resource).
When the domain object is deleted, the proxy entry is deleted in the same transaction, ensuring related policies are also removed.
User-role assignments → managed in the policy DB adapter.
Role-permission mappings → managed in the policy DB adapter.
Role-role hierarchies → stored in the policy DB adapter.
Policy loading strategy →
Loading the full set of policies into memory is not feasible at scale.
Instead, policies should be loaded in chunks or subsets as needed for specific requests.
On invalidation, reload only the affected subset.
This avoids both scalability and consistency problems.
Casbin offers watchers to synchronize enforcers across instances, but there is currently no support for MySQL watchers (Casbin watchers)
Policy Management & Discovery
Can someone do something? (blocking access control)
As we mentioned above the services that directly use the authz layer as a dependency (library) must import the APIs (api.py) offered by the authz layer to enforce checks. The minimal questions a service might ask are:
Can User X do Action on Object Y in Scope Z? → can(user, permission/action, object, scope) → allow/deny
Should be used in all enforcement points, and processes should be blocked until a response is returned
And other variants:
Can User X do Action on (specific) Object?
Can User X do Action on (more generic than object) Scope?
Also in batch:
Can User X do multiple actions [(Action1, Object1, Scope1), (Action2, Object2, Scope2), ...]?
What can User X access from this list of resources?
If the question asks specifically about an object, the scope can be optional, considering the scope as the same object.
The authz layer will also include REST APIs for communication over the network when needed. For example, for permission-aware access & routing, clients must consume the authz layer REST APIs to know whether a user has permissions over specific components or to get authorization data over users or resources. For example:
Blocking access to a page depending on the role → only library admins can access the authz management view → HOW - WIP → review thread Re: Open edX AuthZ Framework Long-Term Vision | Comment
Discovery & filtering
What roles does User X have?
What roles does User X have in Scope Y?
Who has Role X?
Who has Role X in Scope Y?
What policies exist?
What role assignments exist?
What users have roles on Scope Y?
What about Bridgekeeper?
Our current system uses Bridgekeeper for advanced filtering, including:
Regarding permissions, Bridgekeeper is used in some views to do
user.has_perms(). This behavior can be replicated in Casbin through thee.enforce()method.Regarding queries, Bridgekeeper is also used to filter model QuerySets, such as
ContentLibrary. Casbin, however, doesn’t provide a built-in mechanism for this type of query filtering →Additionally, Bridgekeeper defines certain context-related rules, such as
is_studio_request,is_course_creator,is_active_user. In Casbin, handling such cases would likely require the use of a custom matcher.
Our current options for continuing to support this level of filtering without overcomplicating policies would be a hybrid of Casbin (via custom matchers if necessary), Django ORM, and Bridgekeeper, always prioritizing the use of Casbin & Django ORM to build complex queries that depend on domain objects, for example, retrieving all libraries a user can access.
Management
Add/Edit/Remove User X to Role Y in Scope Z
Add/Edit/Remove permission for Role X to do Action Y on Object Z
Assign Role X to user Y in Scope Z
Does policy/assignment exist?
Extensibility
Add new roles, permissions, and assignments via the Policy API
Casbin policies are stored in
authz.policyor in the database through an adapter. Policies can be managed at runtime using the Management API (add_policy,remove_policy, etc.).This allows us to create or remove roles, permissions, and user–role assignments through our own APIs.
Load default roles, permissions, and assignments during initialization
Default rules can be placed in
authz.policyor stored in the DB through an adapter.Tutor can be used to inject defaults at service startup.
Another option is to load defaults through a Django plugin during initialization.
Casbin automatically handles duplicates in its policy storage.
Consistency note: loading defaults through plugins could create traceability problems. We must ensure there is a clear record of what plugin introduced each rule so that defaults can be audited and reproduced.
Extend
model.confThe model file is static, based on Casbin’s PERM metamodel (
request_definition,policy_definition,policy_effect,matches).We can add new sections for extra role graphs, domains, or conditions in the matcher.
Extend the default policy
Policies can be extended at runtime with new entries (
add_policy) or patched by modifying the default file or directly in runtime using the adapter.This may not be required if defaults are already injected by Tutor or Django plugins at initialization, but it remains an option for incremental changes.
Add new matchers or functions
Casbin allows registering custom functions (
add_function) and referencing them in matches.This enables extending the decision logic beyond the built-in operators (
keyMatch,regexMatch,ipMatch, etc.).Developers can replace or expand matchers in
model.confwith these custom functions.
Tutor plugins or service-specific defaults
Tutor plugins can provide default roles and permissions - by overriding configuration files.
Plugins can also contribute new roles and permissions.
On uninstall, explicit warnings or errors are raised. Silent cleanup is not allowed.
Performance & Consistency
https://casbin.org/docs/performance/
Policy Design
Don't duplicate rules in
authz.policy. Design role ↔ permission carefully; these checks run all the time.Grant permissions to roles, then assign users → roles (keeps lookups fast; smaller policies).
Load testing required for the policies we'd define to ensure we don't create unnecessary paths when evaluating authz.
Enforcer Management
One enforcer per process: each LMS/CMS worker has its own enforcer; each Celery worker too.
Initialize enforcer once per process, not per request → or consider any other more performant strategy. Do we keep one enforcer per process and swap scopes with
load_filtered_policy(...), or use a small per-scope pool (LRU)?
Policy Loading Strategy
Avoid per-call reloads: do not touch the DB/adapter on every
enforce(...). Load once, reuse in memory.Only one real cache in Casbin: the decision cache (e.g.,
cached_enforce(sub, obj, act)), which caches answers to identical S-A-O checks. It does not cache rules.If you load every time you enforce a decision, you risk DB storms. Example: 50 concurrent requests → 50 DB hits. Never load policies "per enforce"; only on first use / refresh.
Policy Residency Management
Load a subset with
load_filtered_policy(scope)on the first request for that scope.Keep it in memory and reuse; don't reload on every request.
Optionally keep a small per-process pool (LRU) of ready scopes (e.g., per org).
Take advantage of locality: if most requests in a short window hit the same org/component (e.g., editing a library), load that org once so the next checks don't hit the DB → drop entries for that scope when policy changes.
Consistency Across Processes
Watchers are required so every enforcer (web and Celery) sees updates and calls
load_filtered_policy(...)orload_policy()as needed.Consistency is a must: if enforcers are spread across multiple processes, make sure updates propagate.
Duplicates: Casbin tolerates duplicates out-of-the-box. Still try to keep policy DRY.
Fastbin
Consider fastbin approach to improve performance:
GitHub - wakemaster39/fastbin: Performance orientated extensions to pycasbin
Correctness in the long term
The authz layer must ensure that the performance threshold set doesn’t change when adding new rules. For that, a set of testing mechanisms for benchmarking and backward compatibility with the rules that are in place.
Can we use something similar to the Casbin web editor to test out the correctness of our policies? https://casbin.org/editor
Observability & Audit
https://casbin.org/docs/log-error/, also consider building an aspects dashboard for access control to improve visibility.
Evolution [WIP]
Here’s a proposal for the evolution of the authz project from the MVP to a robust authorization system compliant with Open edX long-term requirements:
Risks & Open Questions
Can Casbin alone handle the same queries and features as bridgekeeper? Already documented here: