OEP-30 Implementation

This documentation is now stale and superseded by the implementation and documentation in the Code Annotations repository.



OEP-30 outlines our intentions regarding PII annotations in code that runs on edx.org and other instances of openedx.  This article is a place where we can collect decisions around implementation details and outputs from OEP-30 discovery tasks.

Original documents

OEP 30: https://open-edx-proposals.readthedocs.io/en/latest/oep-0030-arch-pii-markup-and-auditing.html

Scratch notes: https://docs.google.com/document/d/1fu2DzfkUGXYRXnXo-V3kGVYB4SB-Z_E4XfodkqKW6bg/edit

Annotating 3rd Party Django Models (PLAT-2344)

Model Inheritance and Mixins

In Django, there are three different types of model inheritance:

  1. Multiple-Model Inheritance: Base model creates a table, and subclass models each create tables with OneToOne keys pointing up to the parent base model.
  2. Abstract Base Model: Base model does not create a table, subclass models each create tables and inherit any fields defined in the base class.
  3. Proxy Model: Base model creates a table, and subclass models do not create tables, nor define fields.  Subclass models can only define python-level behavior and act as proxy to the base class DB table.

Proxy models would never be positively annotated to contain PII because they neither define fields, nor have a DB table of their own.  Only non-proxy models in its ancestry (in most cases, just the one inherited model) would possibly define PII fields.  Similarly, Abstract base models do not directly "store" PII because they do not have a table of their own.  For the purposes of the annotation-asserting tool, I recommend that both abstract base models and proxy models are ignored.

All other kinds of models (non-abstract, non-proxy) are concrete and have a database table, so they MUST be annotated.

Model "mixins" may inherit from object or from models.Model depending on the codebase norms, and whether they need to define fields (in which case they must inherit from models.Model).  If a model mixin cannot define fields because it only imports from object, then it does not require an annotation. Likewise, model mixins which inherit from models.Model CAN define fields, but are necessarily abstract base models, so they are subject to the same annotation rules as Abstract Base Models above (i.e. annotations not required).

Implementation Decision: All classes which inherit from models.Model directly or indirectly, AND are non-proxy/non-abstract must have a corresponding PII annotation.  I.e., a class definition requires annotations (positive OR negative) if:

 issubclass(MyModel, models.Model) and not MyModel._meta.abstract and not MyModel._meta.proxy

Package Versions and Satellite Repos

This section is concerned only with 3rd party repos and edx satellite repos providing installable django apps containing models.

Problem Statement: 3rd party PII annotations must be tied to specific package versions (you cannot possibly introspect django apps without first picking a version to install, duh).  However, the version that a satellite edx repo installs during unit tests may be different than the version edx-platform or other IDA installs.  Sometimes, perhaps, one version may have more PII than the other.  So, which version to check?  Similarly, which repos are responsible for checking annotations for 3rd party models?  Should both the satellite repo AND the IDA-level repo assert annotations on the same 3rd party model?

The version of a 3rd party package specified in the IDA-level repository (in a production.txt requirements file) is ultimately the actual version which will be installed in prod.  Thus, the model introspection and annotation checking of all 3rd party applications SHOULD happen at the IDA-level repository.  As the number of our IDAs increases, there will be an increasing amount of duplicate 3rd party annotations, but that is a small price to pay, and ultimately more accurate as 3rd party package versions between IDAs diverge.

With all 3rd party installed apps already being checked at the IDA-level repo during C-I tests, satellite repos need only check and annotate the django apps defined within.  This local-only mode of checking for internal satellite repos does not need to do any model inheritance lookups because there is no possible case where inherited models that actually require annotations wouldn't already be checked at the IDA-level repository:

  • If the local model subclasses an Abstract Base Model in a 3rd party package (such as TimeStampedModel from django-model-utils), no action is necessary because we do not require annotations for abstract base models.
  • If the local model subclasses, or is a Proxy to a non-Abstract Base Model in a 3rd party package, then no action is necessary because that 3rd party model MUST be installed at the IDA-level in order for the local model to be functional, in which case it must already be checked for 3rd party annotations at the IDA-level.

The "safelist" file will serve as remote annotations for 3rd party models.  We do already record the 3rd party package versions in production.txt of the IDA-level repo, but additionally recording the version number in the safelist would force developers to manually check the package every time it gets updated and manually bump the version in the safelist too.  That seems like the most foolproof thing to do, but honestly that feels like way too much overhead, so we SHOULD NOT record version numbers in the safelist.

Implementation Decision: Only models defined locally within a satellite repo are subject to annotation assertions, regardless of their inheritance ancestry (call this local-only mode).  However, at the IDA-level repo, 1) local models, 2) inherited models, and 3) all other discoverable 3rd party models should be checked, employing a "safelist" to annotate them remotely if necessary.  Additionally, safelist annotations SHOULD NOT specify the package version providing the model.

The Safelist

This file will remotely annotate 3rd party models in IDA-level repositories.  Given the decisions above, this file need not describe package versions, nor any additional metadata.

.pii_safe_list.yaml
app_label1.ModelName1:
    pii: <description of the PII, field names, etc.>
    pii_types: <comma delimited list of types>
    pii_retirement: <description of the retirement functionality>
app_label1.ModelName2:
    no_pii: <use `null` or leave empty and yaml will assume null>
...

Each top level key is the model path in django's standard representation (app label in snake_case, and model class name in CamelCase, delimited by a period).  The values are themselves mappings containing the standard PII attributes in semantic YAML (rather than rST tags).

OEP-30 tooling should expect to find this file at the root of the repository next to .piiconf.

Enforceable Model Discovery

PLAT-2357 provides an example code stub for running through installed django models.  I've incorporated the findings on this page into this code stub below, which will serve as an updated starting-point for the "django" plugin.  It discovers all apps, whether local, satellite, and 3rd party, unless set to local-mode by assigning an app label to local_app_label.

Script for listing all enforceable models
import inspect
from django.apps import apps
from django.db import models

local_app_label = None
#local_app_label = 'milestones'

def enforceable(model):
    """
    The given model actually requires annotations according to PLAT-2344.
    """
    return issubclass(model, models.Model) \
        and not model is models.Model \
        and not model._meta.abstract \
        and not model._meta.proxy

for app in apps.get_app_configs():
    if local_app_label and not app.label == local_app_label:
        # We are in local-only mode, so skip apps which are not provided by this repository.
        continue
    found_models = []
    for root_model in app.get_models():
        # Do not check model inheritance iff in local-mode.
        if local_app_label:
            heirarchy = [root_model]
        else:
            # getmro() includes the _entire_ inheritance closure, not just the direct inherited classes.
            heirarchy = inspect.getmro(root_model)
        for model in heirarchy:
            if enforceable(model):
                 # model._meta.app_label is the lowercase snake_case representation of the app.
                 # model._meta.object_name is the CamelCase representation of the model.
                 found_models.append('{}.{}'.format(model._meta.app_label, model._meta.object_name))
    if found_models:
        print('Found enforceable models via the {} app:'.format(app.label))
        for found in found_models:
            print('  {}'.format(found))

In devstack, it prints this (click to expand):

Example list of enforceable models in edx-platform in devstack
Found enforceable models via the auth app:
  auth.Permission
  auth.Group
  auth.User
Found enforceable models via the contenttypes app:
  contenttypes.ContentType
Found enforceable models via the redirects app:
  redirects.Redirect
Found enforceable models via the sessions app:
  sessions.Session
Found enforceable models via the sites app:
  sites.Site
Found enforceable models via the djcelery app:
  djcelery.TaskMeta
  djcelery.TaskSetMeta
  djcelery.IntervalSchedule
  djcelery.CrontabSchedule
  djcelery.PeriodicTasks
  djcelery.PeriodicTask
  djcelery.WorkerState
  djcelery.TaskState
Found enforceable models via the waffle app:
  waffle.Flag
  waffle.Switch
  waffle.Sample
Found enforceable models via the status app:
  status.GlobalStatusMessage
  status.CourseMessage
Found enforceable models via the static_replace app:
  static_replace.AssetBaseUrlConfig
  static_replace.AssetExcludedExtensionsConfig
Found enforceable models via the contentserver app:
  contentserver.CourseAssetCacheTtlConfig
  contentserver.CdnUserAgentsConfig
Found enforceable models via the site_configuration app:
  site_configuration.SiteConfiguration
  site_configuration.SiteConfigurationHistory
Found enforceable models via the video_config app:
  video_config.HLSPlaybackEnabledFlag
  video_config.CourseHLSPlaybackEnabledFlag
  video_config.VideoTranscriptEnabledFlag
  video_config.CourseVideoTranscriptEnabledFlag
  video_config.TranscriptMigrationSetting
  video_config.MigrationEnqueuedCourse
  video_config.VideoThumbnailSetting
  video_config.UpdatedCourseVideos
Found enforceable models via the video_pipeline app:
  video_pipeline.VideoPipelineIntegration
  video_pipeline.VideoUploadsEnabledByDefault
  video_pipeline.CourseVideoUploadsEnabledByDefault
Found enforceable models via the courseware app:
  courseware.StudentModule
  courseware.StudentModuleHistory
  courseware.XModuleUserStateSummaryField
  courseware.XModuleStudentPrefsField
  courseware.XModuleStudentInfoField
  courseware.OfflineComputedGrade
  courseware.OfflineComputedGradeLog
  courseware.StudentFieldOverride
  courseware.DynamicUpgradeDeadlineConfiguration
  courseware.CourseDynamicUpgradeDeadlineConfiguration
  courseware.OrgDynamicUpgradeDeadlineConfiguration
Found enforceable models via the student app:
  student.AnonymousUserId
  student.UserStanding
  student.UserProfile
  student.UserSignupSource
  student.UserTestGroup
  student.Registration
  student.PendingNameChange
  student.PendingEmailChange
  student.PasswordHistory
  student.LoginFailures
  student.CourseEnrollment
  student.ManualEnrollmentAudit
  student.CourseEnrollmentAllowed
  student.CourseAccessRole
  student.DashboardConfiguration
  student.LinkedInAddToProfileConfiguration
  student.EntranceExamConfiguration
  student.LanguageProficiency
  student.SocialLink
  student.CourseEnrollmentAttribute
  student.EnrollmentRefundConfiguration
  student.RegistrationCookieConfiguration
  student.UserAttribute
  student.LogoutViewConfiguration
Found enforceable models via the track app:
  track.TrackingLog
Found enforceable models via the util app:
  util.RateLimitConfiguration
Found enforceable models via the certificates app:
  certificates.CertificateWhitelist
  certificates.GeneratedCertificate
  certificates.CertificateGenerationHistory
  certificates.CertificateInvalidation
  certificates.ExampleCertificateSet
  certificates.ExampleCertificate
  certificates.CertificateGenerationCourseSetting
  certificates.CertificateGenerationConfiguration
  certificates.CertificateHtmlViewConfiguration
  certificates.CertificateTemplate
  certificates.CertificateTemplateAsset
Found enforceable models via the instructor_task app:
  instructor_task.InstructorTask
  instructor_task.GradeReportSetting
Found enforceable models via the course_groups app:
  course_groups.CourseUserGroup
  course_groups.CohortMembership
  course_groups.CourseUserGroupPartitionGroup
  course_groups.CourseCohortsSettings
  course_groups.CourseCohort
  course_groups.UnregisteredLearnerCohortAssignments
Found enforceable models via the bulk_email app:
  bulk_email.Target
  bulk_email.CohortTarget
  bulk_email.Target
  bulk_email.CourseModeTarget
  bulk_email.Target
  bulk_email.CourseEmail
  bulk_email.Optout
  bulk_email.CourseEmailTemplate
  bulk_email.CourseAuthorization
  bulk_email.BulkEmailFlag
Found enforceable models via the branding app:
  branding.BrandingInfoConfig
  branding.BrandingApiConfig
Found enforceable models via the external_auth app:
  external_auth.ExternalAuthMap
Found enforceable models via the django_openid_auth app:
  django_openid_auth.Nonce
  django_openid_auth.Association
  django_openid_auth.UserOpenID
Found enforceable models via the oauth2 app:
  oauth2.Client
  oauth2.Grant
  oauth2.AccessToken
  oauth2.RefreshToken
Found enforceable models via the edx_oauth2_provider app:
  edx_oauth2_provider.TrustedClient
Found enforceable models via the oauth2_provider app:
  oauth2_provider.Application
  oauth2_provider.Grant
  oauth2_provider.AccessToken
  oauth2_provider.RefreshToken
Found enforceable models via the oauth_dispatch app:
  oauth_dispatch.RestrictedApplication
  oauth_dispatch.ApplicationAccess
  oauth_dispatch.ApplicationOrganization
Found enforceable models via the third_party_auth app:
  third_party_auth.OAuth2ProviderConfig
  third_party_auth.SAMLConfiguration
  third_party_auth.SAMLProviderConfig
  third_party_auth.SAMLProviderData
  third_party_auth.LTIProviderConfig
  third_party_auth.ProviderApiPermissions
Found enforceable models via the oauth_provider app:
  oauth_provider.Nonce
  oauth_provider.Scope
  oauth_provider.Scope
  oauth_provider.Consumer
  oauth_provider.Token
Found enforceable models via the wiki app:
  wiki.Article
  wiki.ArticleForObject
  wiki.ArticleRevision
  wiki.ArticlePlugin
  wiki.ReusablePlugin
  wiki.ArticlePlugin
  wiki.SimplePlugin
  wiki.ArticlePlugin
  wiki.RevisionPlugin
  wiki.ArticlePlugin
  wiki.RevisionPluginRevision
  wiki.URLPath
Found enforceable models via the django_notify app:
  django_notify.NotificationType
  django_notify.Settings
  django_notify.Subscription
  django_notify.Notification
Found enforceable models via the admin app:
  admin.LogEntry
Found enforceable models via the django_comment_common app:
  django_comment_common.Role
  django_comment_common.Permission
  django_comment_common.ForumsConfig
  django_comment_common.CourseDiscussionSettings
  django_comment_common.DiscussionsIdMapping
Found enforceable models via the notes app:
  notes.Note
Found enforceable models via the splash app:
  splash.SplashConfig
Found enforceable models via the user_api app:
  user_api.UserPreference
  user_api.UserCourseTag
  user_api.UserOrgTag
  user_api.RetirementState
  user_api.UserRetirementPartnerReportingStatus
  user_api.UserRetirementRequest
  user_api.UserRetirementStatus
Found enforceable models via the shoppingcart app:
  shoppingcart.Order
  shoppingcart.OrderItem
  shoppingcart.Invoice
  shoppingcart.InvoiceTransaction
  shoppingcart.InvoiceItem
  shoppingcart.CourseRegistrationCodeInvoiceItem
  shoppingcart.InvoiceItem
  shoppingcart.InvoiceHistory
  shoppingcart.CourseRegistrationCode
  shoppingcart.RegistrationCodeRedemption
  shoppingcart.Coupon
  shoppingcart.CouponRedemption
  shoppingcart.PaidCourseRegistration
  shoppingcart.OrderItem
  shoppingcart.CourseRegCodeItem
  shoppingcart.OrderItem
  shoppingcart.CourseRegCodeItemAnnotation
  shoppingcart.PaidCourseRegistrationAnnotation
  shoppingcart.CertificateItem
  shoppingcart.OrderItem
  shoppingcart.DonationConfiguration
  shoppingcart.Donation
  shoppingcart.OrderItem
Found enforceable models via the course_modes app:
  course_modes.CourseMode
  course_modes.CourseModesArchive
  course_modes.CourseModeExpirationConfig
Found enforceable models via the entitlements app:
  entitlements.CourseEntitlementPolicy
  entitlements.CourseEntitlement
  entitlements.CourseEntitlementSupportDetail
Found enforceable models via the verify_student app:
  verify_student.ManualVerification
  verify_student.SSOVerification
  verify_student.SoftwareSecurePhotoVerification
  verify_student.VerificationDeadline
Found enforceable models via the dark_lang app:
  dark_lang.DarkLangConfig
Found enforceable models via the microsite_configuration app:
  microsite_configuration.Microsite
  microsite_configuration.MicrositeHistory
  microsite_configuration.MicrositeOrganizationMapping
  microsite_configuration.MicrositeTemplate
Found enforceable models via the rss_proxy app:
  rss_proxy.WhitelistedRssUrl
Found enforceable models via the embargo app:
  embargo.EmbargoedCourse
  embargo.EmbargoedState
  embargo.RestrictedCourse
  embargo.Country
  embargo.CountryAccessRule
  embargo.CourseAccessRuleHistory
  embargo.IPFilter
Found enforceable models via the course_action_state app:
  course_action_state.CourseRerunState
Found enforceable models via the mobile_api app:
  mobile_api.MobileApiConfig
  mobile_api.AppVersionConfig
  mobile_api.IgnoreMobileAvailableFlagConfig
Found enforceable models via the social_django app:
  social_django.UserSocialAuth
  social_django.Nonce
  social_django.Association
  social_django.Code
  social_django.Partial
Found enforceable models via the survey app:
  survey.SurveyForm
  survey.SurveyAnswer
Found enforceable models via the lms_xblock app:
  lms_xblock.XBlockAsidesConfig
Found enforceable models via the problem_builder app:
  problem_builder.Answer
  problem_builder.Share
Found enforceable models via the submissions app:
  submissions.StudentItem
  submissions.Submission
  submissions.Score
  submissions.ScoreSummary
  submissions.ScoreAnnotation
Found enforceable models via the assessment app:
  assessment.Rubric
  assessment.Criterion
  assessment.CriterionOption
  assessment.Assessment
  assessment.AssessmentPart
  assessment.AssessmentFeedbackOption
  assessment.AssessmentFeedback
  assessment.PeerWorkflow
  assessment.PeerWorkflowItem
  assessment.TrainingExample
  assessment.StudentTrainingWorkflow
  assessment.StudentTrainingWorkflowItem
  assessment.StaffWorkflow
Found enforceable models via the workflow app:
  workflow.AssessmentWorkflow
  workflow.AssessmentWorkflowStep
  workflow.AssessmentWorkflowCancellation
Found enforceable models via the edxval app:
  edxval.Profile
  edxval.Video
  edxval.CourseVideo
  edxval.EncodedVideo
  edxval.VideoImage
  edxval.VideoTranscript
  edxval.TranscriptPreference
  edxval.ThirdPartyTranscriptCredentialsState
Found enforceable models via the course_overviews app:
  course_overviews.CourseOverview
  course_overviews.CourseOverviewTab
  course_overviews.CourseOverviewImageSet
  course_overviews.CourseOverviewImageConfig
Found enforceable models via the block_structure app:
  block_structure.BlockStructureConfiguration
  block_structure.BlockStructureModel
Found enforceable models via the cors_csrf app:
  cors_csrf.XDomainProxyConfiguration
Found enforceable models via the commerce app:
  commerce.CommerceConfiguration
Found enforceable models via the credit app:
  credit.CreditProvider
  credit.CreditCourse
  credit.CreditRequirement
  credit.CreditRequirementStatus
  credit.CreditEligibility
  credit.CreditRequest
  credit.CreditConfig
Found enforceable models via the teams app:
  teams.CourseTeam
  teams.CourseTeamMembership
Found enforceable models via the xblock_django app:
  xblock_django.XBlockConfiguration
  xblock_django.XBlockStudioConfigurationFlag
  xblock_django.XBlockStudioConfiguration
Found enforceable models via the programs app:
  programs.ProgramsApiConfig
Found enforceable models via the catalog app:
  catalog.CatalogIntegration
Found enforceable models via the self_paced app:
  self_paced.SelfPacedConfiguration
Found enforceable models via the thumbnail app:
  thumbnail.KVStore
Found enforceable models via the milestones app:
  milestones.Milestone
  milestones.MilestoneRelationshipType
  milestones.CourseMilestone
  milestones.CourseContentMilestone
  milestones.UserMilestone
Found enforceable models via the api_admin app:
  api_admin.ApiAccessRequest
  api_admin.ApiAccessConfig
  api_admin.Catalog
Found enforceable models via the verified_track_content app:
  verified_track_content.VerifiedTrackCohortedCourse
  verified_track_content.MigrateVerifiedTrackCohortsSetting
Found enforceable models via the badges app:
  badges.BadgeClass
  badges.BadgeAssertion
  badges.CourseCompleteImageConfiguration
  badges.CourseEventBadgesConfiguration
Found enforceable models via the email_marketing app:
  email_marketing.EmailMarketingConfiguration
Found enforceable models via the celery_utils app:
  celery_utils.FailedTask
  celery_utils.ChordData
Found enforceable models via the crawlers app:
  crawlers.CrawlersConfig
Found enforceable models via the waffle_utils app:
  waffle_utils.WaffleFlagCourseOverrideModel
Found enforceable models via the course_goals app:
  course_goals.CourseGoal
Found enforceable models via the experiments app:
  experiments.ExperimentData
  experiments.ExperimentKeyValue
Found enforceable models via the edx_proctoring app:
  edx_proctoring.ProctoredExam
  edx_proctoring.ProctoredExamReviewPolicy
  edx_proctoring.ProctoredExamReviewPolicyHistory
  edx_proctoring.ProctoredExamStudentAttempt
  edx_proctoring.ProctoredExamStudentAttemptHistory
  edx_proctoring.ProctoredExamStudentAllowance
  edx_proctoring.ProctoredExamStudentAllowanceHistory
  edx_proctoring.ProctoredExamSoftwareSecureReview
  edx_proctoring.ProctoredExamSoftwareSecureReviewHistory
  edx_proctoring.ProctoredExamSoftwareSecureComment
Found enforceable models via the organizations app:
  organizations.Organization
  organizations.OrganizationCourse
Found enforceable models via the enterprise app:
  enterprise.HistoricalEnterpriseCustomer
  enterprise.EnterpriseCustomer
  enterprise.EnterpriseCustomerUser
  enterprise.PendingEnterpriseCustomerUser
  enterprise.PendingEnrollment
  enterprise.EnterpriseCustomerBrandingConfiguration
  enterprise.EnterpriseCustomerIdentityProvider
  enterprise.HistoricalEnterpriseCustomerEntitlement
  enterprise.EnterpriseCustomerEntitlement
  enterprise.HistoricalEnterpriseCourseEnrollment
  enterprise.EnterpriseCourseEnrollment
  enterprise.HistoricalEnterpriseCustomerCatalog
  enterprise.EnterpriseCustomerCatalog
  enterprise.HistoricalEnrollmentNotificationEmailTemplate
  enterprise.EnrollmentNotificationEmailTemplate
  enterprise.EnterpriseCustomerReportingConfiguration
Found enforceable models via the consent app:
  consent.HistoricalDataSharingConsent
  consent.DataSharingConsent
  consent.DataSharingConsentTextOverrides
Found enforceable models via the integrated_channel app:
  integrated_channel.LearnerDataTransmissionAudit
  integrated_channel.ContentMetadataItemTransmission
Found enforceable models via the degreed app:
  degreed.DegreedGlobalConfiguration
  degreed.HistoricalDegreedEnterpriseCustomerConfiguration
  degreed.DegreedEnterpriseCustomerConfiguration
  degreed.DegreedLearnerDataTransmissionAudit
Found enforceable models via the sap_success_factors app:
  sap_success_factors.SAPSuccessFactorsGlobalConfiguration
  sap_success_factors.SAPSuccessFactorsEnterpriseCustomerConfiguration
  sap_success_factors.SapSuccessFactorsLearnerDataTransmissionAudit
Found enforceable models via the xapi app:
  xapi.XAPILRSConfiguration
Found enforceable models via the schedules app:
  schedules.Schedule
  schedules.ScheduleConfig
  schedules.ScheduleExperience
Found enforceable models via the grades app:
  grades.PersistentGradesEnabledFlag
  grades.CoursePersistentGradesFlag
  grades.ComputeGradesSetting
  grades.VisibleBlocks
  grades.PersistentSubsectionGrade
  grades.PersistentCourseGrade
  grades.PersistentSubsectionGradeOverride
Found enforceable models via the credentials app:
  credentials.CredentialsApiConfig
  credentials.NotifyCredentialsConfig
Found enforceable models via the bookmarks app:
  bookmarks.Bookmark
  bookmarks.XBlockCache
Found enforceable models via the theming app:
  theming.SiteTheme
Found enforceable models via the completion app:
  completion.BlockCompletion
Found enforceable models via the coursewarehistoryextended app:
  coursewarehistoryextended.StudentModuleHistoryExtended

3rd Party App Discovery

In order to seed the initial safelist for a given IDA, we shouldn't just start with every "enforceable" model, since the safelist should not contain many of those models.  For instance, the safelist should never include models that are defined locally in the same codebase as itself.  Also, it is important to note the difference between "non-local" and "3rd party" w.r.t. an IDA codebase.  "Non-local" includes 3rd party apps, and also edx-owned pip installed apps (aka edx satellite apps):

This script will generate a starting point for a safelist for any IDA, containing all models in all non-local applications:

Script for listing non-local models
import sys
import inspect
from django.apps import apps
from django.db import models

def enforceable(model):
    """
    The given model actually requires annotations according to PLAT-2344.
    """
    return issubclass(model, models.Model) \
        and not model is models.Model \
        and not model._meta.abstract \
        and not model._meta.proxy

def is_not_local(model):
    """
    Return True if the given model is local to the current IDA.
    """
    # If it _was_ local to this IDA repository, it should be defined somewhere
    # under sys.prefix + '/src/' or in a path that points to the current
    # checked-out code.  "site-packages" is a dead giveaway that the source
    # code came from far away.
    not_local = inspect.getsourcefile(model).startswith(
        sys.prefix + '/local/lib/python2.7/site-packages/'  # this would only work for python2.7
    )
    return not_local
 
for app in apps.get_app_configs():
    for root_model in app.get_models():
        # getmro() includes the _entire_ inheritance closure, not just the direct inherited classes.
        heirarchy = inspect.getmro(root_model)
        for model in heirarchy:
            if enforceable(model) and is_not_local(model):
                 model_path = inspect.getsourcefile(model).split('site-packages/')[-1]
                 # model._meta.app_label is the lowercase snake_case representation of the app.
                 # model._meta.object_name is the CamelCase representation of the model.
                 print('{}.{}:  # via {}'.format(
                     model._meta.app_label,
                     model._meta.object_name,
                     model_path)
                 )

which generates the following safelist (.pii_safe_list.yaml) starting-point for edx-platform.  Note that it is valid yaml, but will fail the OEP-30 checker tooling since there are no annotations for any of the 3rd party models.

.pii_safe_list.yaml.EXAMPLE
auth.Permission:  # via django/contrib/auth/models.py
auth.Group:  # via django/contrib/auth/models.py
auth.User:  # via django/contrib/auth/models.py
contenttypes.ContentType:  # via django/contrib/contenttypes/models.py
redirects.Redirect:  # via django/contrib/redirects/models.py
sessions.Session:  # via django/contrib/sessions/models.py
sites.Site:  # via django/contrib/sites/models.py
djcelery.TaskMeta:  # via djcelery/models.py
djcelery.TaskSetMeta:  # via djcelery/models.py
djcelery.IntervalSchedule:  # via djcelery/models.py
djcelery.CrontabSchedule:  # via djcelery/models.py
djcelery.PeriodicTasks:  # via djcelery/models.py
djcelery.PeriodicTask:  # via djcelery/models.py
djcelery.WorkerState:  # via djcelery/models.py
djcelery.TaskState:  # via djcelery/models.py
waffle.Flag:  # via waffle/models.py
waffle.Switch:  # via waffle/models.py
waffle.Sample:  # via waffle/models.py
django_openid_auth.Nonce:  # via django_openid_auth/models.py
django_openid_auth.Association:  # via django_openid_auth/models.py
django_openid_auth.UserOpenID:  # via django_openid_auth/models.py
oauth2.Client:  # via provider/oauth2/models.py
oauth2.Grant:  # via provider/oauth2/models.py
oauth2.AccessToken:  # via provider/oauth2/models.py
oauth2.RefreshToken:  # via provider/oauth2/models.py
edx_oauth2_provider.TrustedClient:  # via edx_oauth2_provider/models.py
oauth2_provider.Application:  # via oauth2_provider/models.py
oauth2_provider.Grant:  # via oauth2_provider/models.py
oauth2_provider.AccessToken:  # via oauth2_provider/models.py
oauth2_provider.RefreshToken:  # via oauth2_provider/models.py
oauth_provider.Nonce:  # via oauth_provider/models.py
oauth_provider.Scope:  # via oauth_provider/models.py
oauth_provider.Scope:  # via oauth_provider/models.py
oauth_provider.Consumer:  # via oauth_provider/models.py
oauth_provider.Token:  # via oauth_provider/models.py
admin.LogEntry:  # via django/contrib/admin/models.py
splash.SplashConfig:  # via splash/models.py
social_django.UserSocialAuth:  # via social_django/models.py
social_django.Nonce:  # via social_django/models.py
social_django.Association:  # via social_django/models.py
social_django.Code:  # via social_django/models.py
social_django.Partial:  # via social_django/models.py
problem_builder.Answer:  # via problem_builder/models.py
problem_builder.Share:  # via problem_builder/models.py
submissions.StudentItem:  # via submissions/models.py
submissions.Submission:  # via submissions/models.py
submissions.Score:  # via submissions/models.py
submissions.ScoreSummary:  # via submissions/models.py
submissions.ScoreAnnotation:  # via submissions/models.py
assessment.Rubric:  # via openassessment/assessment/models/base.py
assessment.Criterion:  # via openassessment/assessment/models/base.py
assessment.CriterionOption:  # via openassessment/assessment/models/base.py
assessment.Assessment:  # via openassessment/assessment/models/base.py
assessment.AssessmentPart:  # via openassessment/assessment/models/base.py
assessment.AssessmentFeedbackOption:  # via openassessment/assessment/models/peer.py
assessment.AssessmentFeedback:  # via openassessment/assessment/models/peer.py
assessment.PeerWorkflow:  # via openassessment/assessment/models/peer.py
assessment.PeerWorkflowItem:  # via openassessment/assessment/models/peer.py
assessment.TrainingExample:  # via openassessment/assessment/models/training.py
assessment.StudentTrainingWorkflow:  # via openassessment/assessment/models/student_training.py
assessment.StudentTrainingWorkflowItem:  # via openassessment/assessment/models/student_training.py
assessment.StaffWorkflow:  # via openassessment/assessment/models/staff.py
workflow.AssessmentWorkflow:  # via openassessment/workflow/models.py
workflow.AssessmentWorkflowStep:  # via openassessment/workflow/models.py
workflow.AssessmentWorkflowCancellation:  # via openassessment/workflow/models.py
edxval.Profile:  # via edxval/models.py
edxval.Video:  # via edxval/models.py
edxval.CourseVideo:  # via edxval/models.py
edxval.EncodedVideo:  # via edxval/models.py
edxval.VideoImage:  # via edxval/models.py
edxval.VideoTranscript:  # via edxval/models.py
edxval.TranscriptPreference:  # via edxval/models.py
edxval.ThirdPartyTranscriptCredentialsState:  # via edxval/models.py
thumbnail.KVStore:  # via sorl/thumbnail/models.py
milestones.Milestone:  # via milestones/models.py
milestones.MilestoneRelationshipType:  # via milestones/models.py
milestones.CourseMilestone:  # via milestones/models.py
milestones.CourseContentMilestone:  # via milestones/models.py
milestones.UserMilestone:  # via milestones/models.py
celery_utils.FailedTask:  # via celery_utils/models.py
celery_utils.ChordData:  # via celery_utils/models.py
edx_proctoring.ProctoredExam:  # via edx_proctoring/models.py
edx_proctoring.ProctoredExamReviewPolicy:  # via edx_proctoring/models.py
edx_proctoring.ProctoredExamReviewPolicyHistory:  # via edx_proctoring/models.py
edx_proctoring.ProctoredExamStudentAttempt:  # via edx_proctoring/models.py
edx_proctoring.ProctoredExamStudentAttemptHistory:  # via edx_proctoring/models.py
edx_proctoring.ProctoredExamStudentAllowance:  # via edx_proctoring/models.py
edx_proctoring.ProctoredExamStudentAllowanceHistory:  # via edx_proctoring/models.py
edx_proctoring.ProctoredExamSoftwareSecureReview:  # via edx_proctoring/models.py
edx_proctoring.ProctoredExamSoftwareSecureReviewHistory:  # via edx_proctoring/models.py
edx_proctoring.ProctoredExamSoftwareSecureComment:  # via edx_proctoring/models.py
organizations.Organization:  # via organizations/models.py
organizations.OrganizationCourse:  # via organizations/models.py
enterprise.HistoricalEnterpriseCustomer:  # via enterprise/models.py
enterprise.EnterpriseCustomer:  # via enterprise/models.py
enterprise.EnterpriseCustomerUser:  # via enterprise/models.py
enterprise.PendingEnterpriseCustomerUser:  # via enterprise/models.py
enterprise.PendingEnrollment:  # via enterprise/models.py
enterprise.EnterpriseCustomerBrandingConfiguration:  # via enterprise/models.py
enterprise.EnterpriseCustomerIdentityProvider:  # via enterprise/models.py
enterprise.HistoricalEnterpriseCustomerEntitlement:  # via enterprise/models.py
enterprise.EnterpriseCustomerEntitlement:  # via enterprise/models.py
enterprise.HistoricalEnterpriseCourseEnrollment:  # via enterprise/models.py
enterprise.EnterpriseCourseEnrollment:  # via enterprise/models.py
enterprise.HistoricalEnterpriseCustomerCatalog:  # via enterprise/models.py
enterprise.EnterpriseCustomerCatalog:  # via enterprise/models.py
enterprise.HistoricalEnrollmentNotificationEmailTemplate:  # via enterprise/models.py
enterprise.EnrollmentNotificationEmailTemplate:  # via enterprise/models.py
enterprise.EnterpriseCustomerReportingConfiguration:  # via enterprise/models.py
consent.HistoricalDataSharingConsent:  # via consent/models.py
consent.DataSharingConsent:  # via consent/models.py
consent.DataSharingConsentTextOverrides:  # via consent/models.py
integrated_channel.LearnerDataTransmissionAudit:  # via integrated_channels/integrated_channel/models.py
integrated_channel.ContentMetadataItemTransmission:  # via integrated_channels/integrated_channel/models.py
degreed.DegreedGlobalConfiguration:  # via integrated_channels/degreed/models.py
degreed.HistoricalDegreedEnterpriseCustomerConfiguration:  # via integrated_channels/degreed/__init__.py
degreed.DegreedEnterpriseCustomerConfiguration:  # via integrated_channels/degreed/models.py
degreed.DegreedLearnerDataTransmissionAudit:  # via integrated_channels/degreed/models.py
sap_success_factors.SAPSuccessFactorsGlobalConfiguration:  # via integrated_channels/sap_success_factors/models.py
sap_success_factors.SAPSuccessFactorsEnterpriseCustomerConfiguration:  # via integrated_channels/sap_success_factors/models.py
sap_success_factors.SapSuccessFactorsLearnerDataTransmissionAudit:  # via integrated_channels/sap_success_factors/models.py
xapi.XAPILRSConfiguration:  # via integrated_channels/xapi/models.py
completion.BlockCompletion:  # via completion/models.py

Forked repositories

Forked repos are a grey area for annotations; on one hand we may freely annotate them because we have merge rights, but on the other hand the more we deviate from upstream the harder it will be to merge back upstream changes.  For this reason, we have come to the following decision:

Implementation Decision: Treat forked repositories as 3rd party.  Do not annotate models in them directly, but rather use the 3rd party annotation mechanism (the safelist).

Generating RST docs (PLAT-2346)

In short, the python libraries offering RST generation are very basic, and individually cannot offer even the gamut of basic features we need for annotations reports.  Fortunately, RST Isn't a huge pain to see in raw (meant to be human-readable), so we should just focus on constructing raw RST for annotations reporting.  See PLAT-2346 for more details.

Extensions (PLAT-2365)

There are several languages that may need to be searched, each with their own unique comment style and challenges. Not every repository will need every language, so our goals are to:

  • Allow extensions to add new languages without having to update the tool
  • Allow configuration to define which filename extensions map to which extensions
  • These extensions are only run for static analysis. Deeper level introspection, such as we use for Django models would probably need to be scripted in the parent language, but could still output report files in a format that would allow them to be used with our other tools

To allow for simple adoption, ease of development, and consistency with other edX projects we've settled on Stevedore to manage the extensions. We will use Stevedore's NamedExtensionManager to find installed plugins, and a YAML configuration section to map the desired filename extensions to the installed extensions. Extensions will inherit from an "abstract" base class that will explicitly show the necessary interface, and should only need to implement perhaps 2 methods:

  1. validate: to examine the passed-in file and return any annotation formatting errors
  2. search: to actually search a passed-in file and return a list of found annotations
Sample base class and extesion
class AnnotationExtension(object):
    """
    Abstract base class that annotation extensions will inherit from
    """
    def __init__(self, annotation_tokens):
        self.annotation_tokens = annotation_tokens

    def validate(self, file_handle):
        """
        Validates that any annotations in the given file are properly formatted 
        """
        raise NotImplementedError('validate called on base class!')

    def search(self, file_handle):
        """
        Does the actual annotation search for the given file
        """
        raise NotImplementedError('search called on base class!')


class PythonAnnotationExtension(AnnotationExtension):
    """
    Annotation extension for Python source files
    """
    def validate(self, file_handle:
        ...
        return results

    def search(self):
        ...
        return results

Plugins will need to be defined as entry points in annotation_finder.searchers namespace in setup.py or setup.cfg.

Sample setup.py
    entry_points={
        'annotation_finder.searchers': [
            'python = extensions.python_extension:PythonAnnotationExtension',
            'javascript = plugins.javascript_extension:JavascriptAnnotationExtension',
        ],
    },

This script shows how we could find, validate configuration, and execute a search on all of the plugins:

test_extensions.py
import os
import yaml
from stevedore import named

test_config = """
annotations:
    pii:
        - ".. pii::"
        - ".. pii_types::":
            - id
            - name
            - other
        - ".. pii_retirement::":
            - retained
            - local_api
            - consumer_api
            - third_party
    nopii: ".. no_pii::"

extensions:
    python:
        - py
    javascript:
        - js
        - jsx
"""


def load_failed_handler(*args, **kwargs):
    """
    Callback for when we fail to load an extension, otherwise it fails silently
    """
    print(args)
    print(kwargs)


def search(ext, file_handle, file_extensions_map, filename_extension):
    """
    Executes a search on the given file, only if it is configured for this
    extension
    """
    if filename_extension not in file_extensions_map[ext.name]:
        print('{} does not support {}. Skipping.'.format(ext.name, filename_extension))
        return (ext.name, [])

    return ext.name, ext.obj.search(file_handle)


if __name__ == '__main__':
    config = yaml.load(test_config)

    print(config)

    # These are the names of all of our configured extensions
    configured_extension_names = config['extensions'].keys()

    print(configured_extension_names)

    # Load Stevedore extensions that we are configured for (and only those)
    mgr = named.NamedExtensionManager(
        names=configured_extension_names,
        namespace='annotation_finder.searchers',
        invoke_on_load=True,
        on_load_failure_callback=load_failed_handler,
        invoke_args=(config['annotations'],),  # This is temporary
    )

    # Output all found extension entry points (whether or not they were loaded)
    print(mgr.list_entry_points())

    # Output all extensions that were actually able to load
    for extension in mgr.extensions:
        print(extension)

    # Index the results by extension name
    file_extensions_map = {}
    known_extensions = set()
    for extension_name in config['extensions']:
        file_extensions_map[extension_name] = config['extensions'][extension_name]
        known_extensions.update(config['extensions'][extension_name])

    source_path = '/foo/bar/'

    # From here we could begin the actual file searching and reporting...
    # This is not optimized, but without the prints or doing any actual searching 
    # runs all of edx-platform in 1.18 second.
    for root, dirs, files in os.walk(source_path):
        for filename in files:
            filename_extension = os.path.splitext(filename)[1][1:]

            if filename_extension not in known_extensions:
                print("{} is not a known extension, skipping.".format(filename_extension))
                continue

            full_name = os.path.join(root, filename)
            print(full_name)

            with open(full_name, 'r') as file_handle:
                try:
                    # Call get_supported_extensions on all loaded extensions
                    results = mgr.map(search, file_handle, file_extensions_map, filename_extension)
                    print(results)
                except IndexError:
                    # Should we define a catchall in config?
                    print("No file extension in {}, skipping.".format(full_name))

Configuration (PLAT-2361)

Configuration for the annotation tooling needs to handle the following things:

  1. Describe the known annotation statements (the unique strings such as ".. pii::" that we search for), and their enum values (where necessary)
  2. State which extensions are to be used, and what filename extensions they are to be used for

We may also want to add options for the things that are currently spec'd as command line options (in / out files and paths, safelist filename) but presumably those are simple enough to add at the top level if we choose to, and don't need exposition here.


Config file format
# This section describes the known annotations
annotations:
    # An annotation can be a single statement that stands alone
    nopii: ".. no_pii::"

    # Or it can describe a group of statements, in which case 
    # the statements must appear in the same order as listed here
    pii:
        # A statement can be a simple value, in which case the
        # text that follows it will be captured
        - ".. pii::"

        # Or it can be an enum list, in which case only the values
        # included will be allowed. In this case a ".. pii::" 
        # annotation must be followed immediately by a 
        # ".. pii_types::" statement which must then be followed
        # immediately by a ".. pii_retirement::" statement.
        - ".. pii_types::":
            # Multiple enum values can be given on an annotation
            # as long as they are separated by spaces such as:
            # .. pii_types:: name username ip
            # An enum annotation must include at least on enum
            # value
            - id
            - name
            - username
            - password
            - location
            - phone_number
            - email_address
            - birth_date
            - ip
            - external_service
            - biography
            - gender
            - sex
            - image
            - video
            - other
        - ".. pii_retirement::":
            - retained
            - local_api
            - consumer_api
            - third_party

# This section is for extension configuration, each 
# sub-section is the name of a Stevedore extension
# that must be installed. Under each extension name
# is a list of file extensions that it will be used
# for.
extensions:
    python:
        - py
        - py3
        - pyw
        - rpy
        - pyt
    javascript:
        - js
        - jsx

Reporting Output (PLAT-2350)

Reporting output from the tools should match the following format:


Reporting Output
# Top level is a dict, keys are filenames relative to the search path
{
     '/openedx/core/djangoapps/pii_enforcer/pii_searcher.py': 
    # Underneath the keys are a list of annotations
    [   
        # Stand-alone annotations are formatted as follows:
        {   
            'annotation_data': 'No PII is stored here',
            'annotation_token': '.. no_pii::',
            'line_number': 2,
            'found_by': ['python']  # These are the names of the extensions or scripts that found this annotation
        },                                                         
        {
            'annotation_data': 'We do not store PII in this model',
            'annotation_token': '.. no_pii::',
            'line_number': 17,
            'found_by': ['python']
        }
    ],
    '/openedx/core/djangoapps/user_api/legacy_urls.py': 
    [
        # Annotation groups are represented differently
        {
            'annotation_group': 'pii', # This is the name given to the group in configuration
            'annotations': 
            [
                {
                    'annotation_data': 'This model stores user addresses and phone numbers',
                    'annotation_token': '.. pii::',
                    'line_number': 16,
                    'found_by': ['python']
                },
                {
                    # In cases where the annotation type is an enum, "annotation_data" becomes a list
                    'annotation_data': ['address', 'phone_number'],
                    'annotation_token': '.. pii_types::',
                    'line_number': 17,
                    'found_by': ['python']
                },
                {
                    'annotation_data': ['local_api', 'consumer_api'],
                    'annotation_token': '.. pii_retirement::',
                    'line_number': 18,
                    'found_by': ['python']
                }
            ]
         }
    ]
}