xAPI/Caliper Meeting Notes

2021-05-28

Agenda

  1. Follow up on slack message related to Caliper

  2. Clarification request on Processor: Validator in OEP-26 proposal doc

  3. Setting up periodic meeting/status sharing

  4. Discussion on previous Caliper certification attempt

  5. Discussion regarding scalability of solution (meaning, concerns, challenges etc.)

  6. Discussion on ADR procedure

Notes

  • Preventing inadvertent PII leakage

    • Requires all new transformers to use extract_subdict_by_keys properly

      • How to ensure this happens for all new event types?

        1. Create a test harness - that is a base test for all events

        2. Have an ADR

          1. publicized on the README

          2. explains our approach to this PII concern

          3. links to the test

        3. [Linter - automated parsing of code that checks for new event types and warns the developer]

    • Decision: Move forward with #1 and #2. Aamir will verify #1 with Zia.

  • IMS Certification

    • Membership is required for Test site access and slack access.

    • Will revisit once the rest of the framework is further along and we’ve added the course-completion event.

  • Scalability

    • #1 Performance retained with growth of users and events or #2 Adding more event types?

    • Answer: #1

    • 1. Performance concerns

      • Granular code optimizations - discovered those through Python profiling

      • Doing heavy-lifting operation within the web request of a user/learner

        • Not block on responding to web requests

      • Web request

        • → emits event → happens in the web process

          • → transformation of event → ? is this happening in a separate celery process or in the original web process?

            • → sending events to consumers → asynchronous job

      • Performance Testing: https://openedx.atlassian.net/wiki/spaces/AC/pages/29688477

        • Load Testing

          • Heavy-weight - instead, prefer gradual rollout and monitoring in production

        • Profile Testing

          • Python Profile testing tool

    • 2. Reliability and Resiliency concerns

      • sending events to consumers → asynchronous job

        • Handling errors when consumers are offline

        • Handling errors when our own backend system breaks temporarily

    • 3. Latency concerns

      • Latency - meaning - how long does it take between a learner performs an action and the consumer receives the corresponding event?

      • We’ll want to optimize this at some point - but it doesn’t have to be part of v1.

      • For adaptive learning (in the future), we’ll want latency to be as minimal as possible.

  • Cadence: Weekly

  • ADRs

  • Validators

    • IMS certification testing → 1-2x a year

    • A nice-to-have - can revisit at another time

2021-06-02

Agenda

  1. Discussion on solution for addressing accidental PII leakage

  2. Setting up and maintaining a list of fields that would be considered PII

  3. Discussion on ADR for backwards compatibility

  4. Discussion on error handling if a field is not found in an event

  5. Way forward for ADR of enterprise_uuid related modifications in edx enterprise and edx platform

Notes

1 and 2. PII leakage

  • Problem statements

    • A - A new event type is created. The integration developer forgets to callextract_subdict_by_keys for that new event type. Hence, all fields are forwarded on.

    • B - Even though an event type calls extract_subdict_by_keys, developer doesn’t realize that field X is a PII field.

  • Possible solutions

    • For Problem B

      • 1 - Blocklist of field names - the code in extract_subdict_by_keys then does fuzzy comparison to ensure the extracted field is not in the blocklist.

        • Note: need to handle “fuzzy” name matching

          • For example: “ip_address” is listed in Blocklist versus “ip_addr” used in edX event

    • For Problem A

      • 2 - Create a test harness - that is a base test for all events - leveraging Base Transformer

        • Double check that extract_subdict_by_keys is called.

      • 3 - Have an ADR

        • publicized on the README

        • explains our approach to this PII concern

          • says new event types MUST call extract_subdict_by_keys

          • links to the test

      • 4 - Linter (LATER)

        • [Linter - automated parsing of code that checks for new event types and warns the developer]

      • 5 - Standardization Process (LATER)

3. Backwards compatibility

  • What do the specs say?

  • Add our own versions

  • Field Types

    • Spec-required

    • Spec-optional

    • OpenedX-optional

  • Problem statements

    • A - We add a new field to our transformed xAPI/Caliper event.

      • Solution idea: bump our own minor version (separate from spec).

        • Good communication tool to inform community and consumers.

    • B - We remove an existing field from our transformed xAPI/Caliper event.

      • B1 - We remove an existing field from our transformed xAPI/Caliper event, which is core to the spec.

        • Solution - Never do this! Caught by our tests.

      • B2 - We remove an existing field from our transformed xAPI/Caliper event, which is optional in the spec or absent from the absent.

      • C - The edX event removes fields.

        • C1 - Removed fields are required for an xAPI/Caliper.

        • C2 - Removed fields are optional for an xAPI/Caliper.

    • (NON-Problem) D - The edX event adds new fields to an existing transformed xAPI/Caliper event.

4. Error handling

Acked. No longer creating silent failures.

5. Enterprise

  • check-in with George

    • enterprise_uuid - ADR review; PR review - Engineer

    • project plan for rollout - PM

2021-06-10

Agenda

  • Review the ADR for addressing PII leakage issue

    • What: We are NOT letting fields pass through, unless they are explicitly extracted.

    • How: Abstraction layers

      • accessor method to extract the field.

      • extract_subdict_by_keys

    • Future Ideas (maybe future iterations):

      • Automation tooling and linting

      • Fuzzy comparison

  • Review the ADR for version control of transformers

  • Mapping document for events xAPI and Caliper

  • Connecting with Enterprise

    • sent an email to George

  • Async & Resiliency

    • problem statements: not blocking on web workers

    • handling failure scenarios when the recipient is offline.

Action Items

  • Project Plan

2021-06-25

Agenda:

  • Review of PR and ADR related to PII leakage

  • PR for find_nested method

  • Review of PR and ADR related to transformer versions

  • Update on discussion with George

  • Review project plan

Notes:

  • find_nested - be mindful of potential performance costs with deep searches in python JSON structures

  • ADRs are currently in pending state.

    • ACTION Nim: review the ADRs

  • Project plan

    • ACTION Edly: ADR for access control

    • ACTION Edly: Update Read-the-docs to explain each config setting

    • ACTION Edly: ADR for resiliency

2021-07-08

Agenda:

  • Discuss comments on access control ADR PR

    • Decision: Update ADR to be explicit about enterprise access control and university-partner access control.

    • The logic has been agreed upon and we can start the development.

  • Discuss PR for problem_check events

    • Requested Nimisha’s feedback. Not a high priority task.

  • Discuss project plan

  • Discuss problem_completed and video_completed event

    • Take a look at the completion framework and its usage: https://github.com/edx/completion

    • We need to use completion framework for video.completed as it is for now.

    • Problem completed event may be skipped for initial release because it’s logic is no different from problem.submitted

  • Notes:

    • Allowlist instead of Whitelist

    • Add a separate epic for resiliency

      • Persistence is part of the to-do list but MAYBE we can do without it in the initial release. Need to ask George what their current strategy is.

      • Use persist on failure

      • Clarify treatment (# of retry attempts, persistence, etc) of different exception types

      • Ensure debug logs are self-evident and clear - so we know for what reason an event was skipped

      • Look at grades/tasks.py as a model example - of usage of persistence and differential treatment of exception types.

2021-07-15

Agenda:

  1. Discuss ADR for resiliency

    1. Delete 3

    2. Elaborate on saving celeray tasks instead of events

    3. Every time a persisted event is retried, a method checks the number of entries for xapi/caliper and generate an alert (relic, log etc.)

  2. Discussion on having filter processors as synchronous or asynchronous

    1. Move event name filter to synchronous layer and have string comparisions instead of regex.

    2. Adding Unique event IDs - in the synchronous processor

    3. Add in ADR: have a robust method to avoid duplicate events in case one of the LRS is down

  3. Discussion on problem_check ADR

    1. Changing a type to a list - could be an issue for LRS parsers.

    2. Add question in the statement and mention in ADR that it can be incomplete (incase of multiplechoice etc.)

    3. Explore sending multiple events in the case of multiple questions in a single problem

      1. Maybe divide the grade among events and add a “parent_problem_grade” in the events

      2. Maybe send an additional event having the overall grade of the problem.

2021-07-28

  1. Final review of ADR for resiliency

    1. #1 - add detail of nested celery task

    2. #4 - Discussion with Zia whether changes will be generic or specific to app

    3. #3 - remove the link is down thing

  2. Filters based on event name reconfigured as to synchronous processors

    1. This is now done.

  3. Unique event IDs added to xAPI events (already exist in Caliper events)

    1. This is now added.

    2. LRS can rely on this - if needed for extraneous circumstances.

  4. Discussion on problem_check ADR (list response)

    1. Inputs we can get

      1. Roy Shillo - long-term consumer and producer of xAPI statements

      2. IMS representative

      3. Canvas and Moodle’s implementation

      4. edX team that owns CAPA problem

2021-08-03

Agenda:

  1. Discussion on PRs related to course completion event (old and new PR)

  2. Discussion on PR related to change in ADR of xAPI

Decision:

  1. edx.course.completed will be emitted based on timestamp of persisted grade in the database.

  2. Need to emit course.passed and course.failed events in addition to course.completed, when learner achieves a passing or failing grade respectively.

  3. Implementation notes: code is probably better placed within the grades django app, rather than a common utility folder or the certificates app.

2021-08-23 (meeting with Roi and Merav, Campus.il)

2021-08-25

2021-09-10

  • Final PRs are being merged.

  • Final end-to-end testing in progress for xAPI.

    • Using Scorm Cloud, same developers as TinCan API.

    • Caliper’s LRS is an open-source one without any validations.

  • Caliper certification - seeking an edX squad to own this.

  • Remainder Effort

    • Enterprise usage

    • Caliper certification

    • Multi-problem CAPA

  • Community Evangelization

    • Video demo

    • Configuration documentation

    • Opinions on event transformations

      • Better to have a consistent standard model, than multiple variations.

2021-09-20

  • End-to-end tests

    • remaining: course completion and course enrollments with enterprise

    • then: schedule a demo with edX folks

  • Eliminating dependency with edx-platform