OEP-30 outlines our intentions regarding PII annotations in code that runs on edx.org and other instances of openedx. This article is a place where we can collect decisions around implementation details and outputs from OEP-30 discovery tasks.
Original documents
OEP 30: https://open-edx-proposals.readthedocs.io/en/latest/oep-0030-arch-pii-markup-and-auditing.html
Scratch notes: https://docs.google.com/document/d/1fu2DzfkUGXYRXnXo-V3kGVYB4SB-Z_E4XfodkqKW6bg/edit
Annotating 3rd Party Django Models (PLAT-2344)
Model Inheritance and Mixins
In Django, there are three different types of model inheritance:
- Multiple-Model Inheritance: Base model creates a table, and subclass models each create tables with OneToOne keys pointing up to the parent base model.
- Abstract Base Model: Base model does not create a table, subclass models each create tables and inherit any fields defined in the base class.
- Proxy Model: Base model creates a table, and subclass models do not create tables, nor define fields. Subclass models can only define python-level behavior and act as proxy to the base class DB table.
Proxy models would never be positively annotated to contain PII because they neither define fields, nor have a DB table of their own. Only non-proxy models in its ancestry (in most cases, just the one inherited model) would possibly define PII fields. Similarly, Abstract base models do not directly "store" PII because they do not have a table of their own. For the purposes of the annotation-asserting tool, I recommend that both abstract base models and proxy models are ignored.
All other kinds of models (non-abstract, non-proxy) are concrete and have a database table, so they MUST be annotated.
Model "mixins" may inherit from object
or from models.Model
depending on the codebase norms, and whether they need to define fields (in which case they must inherit from models.Model
). If a model mixin cannot define fields because it only imports from object
, then it does not require an annotation. Likewise, model mixins which inherit from models.Model
CAN define fields, but are necessarily abstract base models, so they are subject to the same annotation rules as Abstract Base Models above (i.e. annotations not required).
Implementation Decision: All classes which inherit from models.Model
directly or indirectly, AND are non-proxy/non-abstract must have a corresponding PII annotation. I.e., a class definition requires annotations (positive OR negative) if:
issubclass(MyModel, models.Model) and not MyModel._meta.abstract and not MyModel._meta.proxy
Package Versions and Satellite Repos
Problem Statement: 3rd party PII annotations must be tied to specific package versions (you cannot possibly introspect django apps without first picking a version to install, duh). However, the version that a satellite edx repo installs during unit tests may be different than the version edx-platform or other IDA installs. Sometimes, perhaps, one version may have more PII than the other. So, which version to check? Similarly, which repos are responsible for checking annotations for 3rd party models? Should both the satellite repo AND the IDA-level repo assert annotations on the same 3rd party model?
The version of a 3rd party package specified in the IDA-level repository (in a production.txt
requirements file) is ultimately the actual version which will be installed in prod. Thus, the model introspection and annotation checking of all 3rd party applications SHOULD happen at the IDA-level repository. As the number of our IDAs increases, there will be an increasing amount of duplicate 3rd party annotations, but that is a small price to pay, and ultimately more accurate as 3rd party package versions between IDAs diverge.
With all 3rd party installed apps already being checked at the IDA-level repo during C-I tests, satellite repos need only check and annotate the django apps defined within. This local-only mode of checking for internal satellite repos does not need to do any model inheritance lookups because there is no possible case where inherited models that actually require annotations wouldn't already be checked at the IDA-level repository:
- If the local model subclasses an Abstract Base Model in a 3rd party package (such as
TimeStampedModel
from django-model-utils
), no action is necessary because we do not require annotations for abstract base models. - If the local model subclasses, or is a Proxy to a non-Abstract Base Model in a 3rd party package, then no action is necessary because that 3rd party model MUST be installed at the IDA-level in order for the local model to be functional, in which case it must already be checked for 3rd party annotations at the IDA-level.
The "safelist" file will serve as remote annotations for 3rd party models. We do already record the 3rd party package versions in production.txt
of the IDA-level repo, but additionally recording the version number in the safelist would force developers to manually check the package every time it gets updated and manually bump the version in the safelist too. That seems like the most foolproof thing to do, but honestly that feels like way too much overhead, so we SHOULD NOT record version numbers in the safelist.
Implementation Decision: Only models defined locally within a satellite repo are subject to annotation assertions, regardless of their inheritance ancestry (call this local-only mode). However, at the IDA-level repo, 1) local models, 2) inherited models, and 3) all other discoverable 3rd party models should be checked, employing a "safelist" to annotate them remotely if necessary. Additionally, safelist annotations SHOULD NOT specify the package version providing the model.
The Safelist
This file will remotely annotate 3rd party models in IDA-level repositories. Given the decisions above, this file need not describe package versions, nor any additional metadata.
app_label1.ModelName1:
pii: <description of the PII, field names, etc.>
pii_types: <comma delimited list of types>
pii_retirement: <description of the retirement functionality>
app_label1.ModelName2:
no_pii: <use `null` or leave empty and yaml will assume null>
...
Each top level key is the model path in django's standard representation (app label in snake_case, and model class name in CamelCase, delimited by a period). The values are themselves mappings containing the standard PII attributes in semantic YAML (rather than rST tags).
OEP-30 tooling should expect to find this file at the root of the repository next to .piiconf
.
Enforceable Model Discovery
PLAT-2357 provides an example code stub for running through installed django models. I've incorporated the findings on this page into this code stub below, which will serve as an updated starting-point for the "django" plugin. It discovers all apps, whether local, satellite, and 3rd party, unless set to local-mode by assigning an app label to local_app_label
.
import inspect
from django.apps import apps
from django.db import models
local_app_label = None
#local_app_label = 'milestones'
def enforceable(model):
"""
The given model actually requires annotations according to PLAT-2344.
"""
return issubclass(model, models.Model) \
and not model is models.Model \
and not model._meta.abstract \
and not model._meta.proxy
for app in apps.get_app_configs():
if local_app_label and not app.label == local_app_label:
# We are in local-only mode, so skip apps which are not provided by this repository.
continue
found_models = []
for root_model in app.get_models():
# Do not check model inheritance iff in local-mode.
if local_app_label:
heirarchy = [root_model]
else:
# getmro() includes the _entire_ inheritance closure, not just the direct inherited classes.
heirarchy = inspect.getmro(root_model)
for model in heirarchy:
if enforceable(model):
# model._meta.app_label is the lowercase snake_case representation of the app.
# model._meta.object_name is the CamelCase representation of the model.
found_models.append('{}.{}'.format(model._meta.app_label, model._meta.object_name))
if found_models:
print('Found enforceable models via the {} app:'.format(app.label))
for found in found_models:
print(' {}'.format(found))
In devstack, it prints this (click to expand):
Found enforceable models via the auth app:
auth.Permission
auth.Group
auth.User
Found enforceable models via the contenttypes app:
contenttypes.ContentType
Found enforceable models via the redirects app:
redirects.Redirect
Found enforceable models via the sessions app:
sessions.Session
Found enforceable models via the sites app:
sites.Site
Found enforceable models via the djcelery app:
djcelery.TaskMeta
djcelery.TaskSetMeta
djcelery.IntervalSchedule
djcelery.CrontabSchedule
djcelery.PeriodicTasks
djcelery.PeriodicTask
djcelery.WorkerState
djcelery.TaskState
Found enforceable models via the waffle app:
waffle.Flag
waffle.Switch
waffle.Sample
Found enforceable models via the status app:
status.GlobalStatusMessage
status.CourseMessage
Found enforceable models via the static_replace app:
static_replace.AssetBaseUrlConfig
static_replace.AssetExcludedExtensionsConfig
Found enforceable models via the contentserver app:
contentserver.CourseAssetCacheTtlConfig
contentserver.CdnUserAgentsConfig
Found enforceable models via the site_configuration app:
site_configuration.SiteConfiguration
site_configuration.SiteConfigurationHistory
Found enforceable models via the video_config app:
video_config.HLSPlaybackEnabledFlag
video_config.CourseHLSPlaybackEnabledFlag
video_config.VideoTranscriptEnabledFlag
video_config.CourseVideoTranscriptEnabledFlag
video_config.TranscriptMigrationSetting
video_config.MigrationEnqueuedCourse
video_config.VideoThumbnailSetting
video_config.UpdatedCourseVideos
Found enforceable models via the video_pipeline app:
video_pipeline.VideoPipelineIntegration
video_pipeline.VideoUploadsEnabledByDefault
video_pipeline.CourseVideoUploadsEnabledByDefault
Found enforceable models via the courseware app:
courseware.StudentModule
courseware.StudentModuleHistory
courseware.XModuleUserStateSummaryField
courseware.XModuleStudentPrefsField
courseware.XModuleStudentInfoField
courseware.OfflineComputedGrade
courseware.OfflineComputedGradeLog
courseware.StudentFieldOverride
courseware.DynamicUpgradeDeadlineConfiguration
courseware.CourseDynamicUpgradeDeadlineConfiguration
courseware.OrgDynamicUpgradeDeadlineConfiguration
Found enforceable models via the student app:
student.AnonymousUserId
student.UserStanding
student.UserProfile
student.UserSignupSource
student.UserTestGroup
student.Registration
student.PendingNameChange
student.PendingEmailChange
student.PasswordHistory
student.LoginFailures
student.CourseEnrollment
student.ManualEnrollmentAudit
student.CourseEnrollmentAllowed
student.CourseAccessRole
student.DashboardConfiguration
student.LinkedInAddToProfileConfiguration
student.EntranceExamConfiguration
student.LanguageProficiency
student.SocialLink
student.CourseEnrollmentAttribute
student.EnrollmentRefundConfiguration
student.RegistrationCookieConfiguration
student.UserAttribute
student.LogoutViewConfiguration
Found enforceable models via the track app:
track.TrackingLog
Found enforceable models via the util app:
util.RateLimitConfiguration
Found enforceable models via the certificates app:
certificates.CertificateWhitelist
certificates.GeneratedCertificate
certificates.CertificateGenerationHistory
certificates.CertificateInvalidation
certificates.ExampleCertificateSet
certificates.ExampleCertificate
certificates.CertificateGenerationCourseSetting
certificates.CertificateGenerationConfiguration
certificates.CertificateHtmlViewConfiguration
certificates.CertificateTemplate
certificates.CertificateTemplateAsset
Found enforceable models via the instructor_task app:
instructor_task.InstructorTask
instructor_task.GradeReportSetting
Found enforceable models via the course_groups app:
course_groups.CourseUserGroup
course_groups.CohortMembership
course_groups.CourseUserGroupPartitionGroup
course_groups.CourseCohortsSettings
course_groups.CourseCohort
course_groups.UnregisteredLearnerCohortAssignments
Found enforceable models via the bulk_email app:
bulk_email.Target
bulk_email.CohortTarget
bulk_email.Target
bulk_email.CourseModeTarget
bulk_email.Target
bulk_email.CourseEmail
bulk_email.Optout
bulk_email.CourseEmailTemplate
bulk_email.CourseAuthorization
bulk_email.BulkEmailFlag
Found enforceable models via the branding app:
branding.BrandingInfoConfig
branding.BrandingApiConfig
Found enforceable models via the external_auth app:
external_auth.ExternalAuthMap
Found enforceable models via the django_openid_auth app:
django_openid_auth.Nonce
django_openid_auth.Association
django_openid_auth.UserOpenID
Found enforceable models via the oauth2 app:
oauth2.Client
oauth2.Grant
oauth2.AccessToken
oauth2.RefreshToken
Found enforceable models via the edx_oauth2_provider app:
edx_oauth2_provider.TrustedClient
Found enforceable models via the oauth2_provider app:
oauth2_provider.Application
oauth2_provider.Grant
oauth2_provider.AccessToken
oauth2_provider.RefreshToken
Found enforceable models via the oauth_dispatch app:
oauth_dispatch.RestrictedApplication
oauth_dispatch.ApplicationAccess
oauth_dispatch.ApplicationOrganization
Found enforceable models via the third_party_auth app:
third_party_auth.OAuth2ProviderConfig
third_party_auth.SAMLConfiguration
third_party_auth.SAMLProviderConfig
third_party_auth.SAMLProviderData
third_party_auth.LTIProviderConfig
third_party_auth.ProviderApiPermissions
Found enforceable models via the oauth_provider app:
oauth_provider.Nonce
oauth_provider.Scope
oauth_provider.Scope
oauth_provider.Consumer
oauth_provider.Token
Found enforceable models via the wiki app:
wiki.Article
wiki.ArticleForObject
wiki.ArticleRevision
wiki.ArticlePlugin
wiki.ReusablePlugin
wiki.ArticlePlugin
wiki.SimplePlugin
wiki.ArticlePlugin
wiki.RevisionPlugin
wiki.ArticlePlugin
wiki.RevisionPluginRevision
wiki.URLPath
Found enforceable models via the django_notify app:
django_notify.NotificationType
django_notify.Settings
django_notify.Subscription
django_notify.Notification
Found enforceable models via the admin app:
admin.LogEntry
Found enforceable models via the django_comment_common app:
django_comment_common.Role
django_comment_common.Permission
django_comment_common.ForumsConfig
django_comment_common.CourseDiscussionSettings
django_comment_common.DiscussionsIdMapping
Found enforceable models via the notes app:
notes.Note
Found enforceable models via the splash app:
splash.SplashConfig
Found enforceable models via the user_api app:
user_api.UserPreference
user_api.UserCourseTag
user_api.UserOrgTag
user_api.RetirementState
user_api.UserRetirementPartnerReportingStatus
user_api.UserRetirementRequest
user_api.UserRetirementStatus
Found enforceable models via the shoppingcart app:
shoppingcart.Order
shoppingcart.OrderItem
shoppingcart.Invoice
shoppingcart.InvoiceTransaction
shoppingcart.InvoiceItem
shoppingcart.CourseRegistrationCodeInvoiceItem
shoppingcart.InvoiceItem
shoppingcart.InvoiceHistory
shoppingcart.CourseRegistrationCode
shoppingcart.RegistrationCodeRedemption
shoppingcart.Coupon
shoppingcart.CouponRedemption
shoppingcart.PaidCourseRegistration
shoppingcart.OrderItem
shoppingcart.CourseRegCodeItem
shoppingcart.OrderItem
shoppingcart.CourseRegCodeItemAnnotation
shoppingcart.PaidCourseRegistrationAnnotation
shoppingcart.CertificateItem
shoppingcart.OrderItem
shoppingcart.DonationConfiguration
shoppingcart.Donation
shoppingcart.OrderItem
Found enforceable models via the course_modes app:
course_modes.CourseMode
course_modes.CourseModesArchive
course_modes.CourseModeExpirationConfig
Found enforceable models via the entitlements app:
entitlements.CourseEntitlementPolicy
entitlements.CourseEntitlement
entitlements.CourseEntitlementSupportDetail
Found enforceable models via the verify_student app:
verify_student.ManualVerification
verify_student.SSOVerification
verify_student.SoftwareSecurePhotoVerification
verify_student.VerificationDeadline
Found enforceable models via the dark_lang app:
dark_lang.DarkLangConfig
Found enforceable models via the microsite_configuration app:
microsite_configuration.Microsite
microsite_configuration.MicrositeHistory
microsite_configuration.MicrositeOrganizationMapping
microsite_configuration.MicrositeTemplate
Found enforceable models via the rss_proxy app:
rss_proxy.WhitelistedRssUrl
Found enforceable models via the embargo app:
embargo.EmbargoedCourse
embargo.EmbargoedState
embargo.RestrictedCourse
embargo.Country
embargo.CountryAccessRule
embargo.CourseAccessRuleHistory
embargo.IPFilter
Found enforceable models via the course_action_state app:
course_action_state.CourseRerunState
Found enforceable models via the mobile_api app:
mobile_api.MobileApiConfig
mobile_api.AppVersionConfig
mobile_api.IgnoreMobileAvailableFlagConfig
Found enforceable models via the social_django app:
social_django.UserSocialAuth
social_django.Nonce
social_django.Association
social_django.Code
social_django.Partial
Found enforceable models via the survey app:
survey.SurveyForm
survey.SurveyAnswer
Found enforceable models via the lms_xblock app:
lms_xblock.XBlockAsidesConfig
Found enforceable models via the problem_builder app:
problem_builder.Answer
problem_builder.Share
Found enforceable models via the submissions app:
submissions.StudentItem
submissions.Submission
submissions.Score
submissions.ScoreSummary
submissions.ScoreAnnotation
Found enforceable models via the assessment app:
assessment.Rubric
assessment.Criterion
assessment.CriterionOption
assessment.Assessment
assessment.AssessmentPart
assessment.AssessmentFeedbackOption
assessment.AssessmentFeedback
assessment.PeerWorkflow
assessment.PeerWorkflowItem
assessment.TrainingExample
assessment.StudentTrainingWorkflow
assessment.StudentTrainingWorkflowItem
assessment.StaffWorkflow
Found enforceable models via the workflow app:
workflow.AssessmentWorkflow
workflow.AssessmentWorkflowStep
workflow.AssessmentWorkflowCancellation
Found enforceable models via the edxval app:
edxval.Profile
edxval.Video
edxval.CourseVideo
edxval.EncodedVideo
edxval.VideoImage
edxval.VideoTranscript
edxval.TranscriptPreference
edxval.ThirdPartyTranscriptCredentialsState
Found enforceable models via the course_overviews app:
course_overviews.CourseOverview
course_overviews.CourseOverviewTab
course_overviews.CourseOverviewImageSet
course_overviews.CourseOverviewImageConfig
Found enforceable models via the block_structure app:
block_structure.BlockStructureConfiguration
block_structure.BlockStructureModel
Found enforceable models via the cors_csrf app:
cors_csrf.XDomainProxyConfiguration
Found enforceable models via the commerce app:
commerce.CommerceConfiguration
Found enforceable models via the credit app:
credit.CreditProvider
credit.CreditCourse
credit.CreditRequirement
credit.CreditRequirementStatus
credit.CreditEligibility
credit.CreditRequest
credit.CreditConfig
Found enforceable models via the teams app:
teams.CourseTeam
teams.CourseTeamMembership
Found enforceable models via the xblock_django app:
xblock_django.XBlockConfiguration
xblock_django.XBlockStudioConfigurationFlag
xblock_django.XBlockStudioConfiguration
Found enforceable models via the programs app:
programs.ProgramsApiConfig
Found enforceable models via the catalog app:
catalog.CatalogIntegration
Found enforceable models via the self_paced app:
self_paced.SelfPacedConfiguration
Found enforceable models via the thumbnail app:
thumbnail.KVStore
Found enforceable models via the milestones app:
milestones.Milestone
milestones.MilestoneRelationshipType
milestones.CourseMilestone
milestones.CourseContentMilestone
milestones.UserMilestone
Found enforceable models via the api_admin app:
api_admin.ApiAccessRequest
api_admin.ApiAccessConfig
api_admin.Catalog
Found enforceable models via the verified_track_content app:
verified_track_content.VerifiedTrackCohortedCourse
verified_track_content.MigrateVerifiedTrackCohortsSetting
Found enforceable models via the badges app:
badges.BadgeClass
badges.BadgeAssertion
badges.CourseCompleteImageConfiguration
badges.CourseEventBadgesConfiguration
Found enforceable models via the email_marketing app:
email_marketing.EmailMarketingConfiguration
Found enforceable models via the celery_utils app:
celery_utils.FailedTask
celery_utils.ChordData
Found enforceable models via the crawlers app:
crawlers.CrawlersConfig
Found enforceable models via the waffle_utils app:
waffle_utils.WaffleFlagCourseOverrideModel
Found enforceable models via the course_goals app:
course_goals.CourseGoal
Found enforceable models via the experiments app:
experiments.ExperimentData
experiments.ExperimentKeyValue
Found enforceable models via the edx_proctoring app:
edx_proctoring.ProctoredExam
edx_proctoring.ProctoredExamReviewPolicy
edx_proctoring.ProctoredExamReviewPolicyHistory
edx_proctoring.ProctoredExamStudentAttempt
edx_proctoring.ProctoredExamStudentAttemptHistory
edx_proctoring.ProctoredExamStudentAllowance
edx_proctoring.ProctoredExamStudentAllowanceHistory
edx_proctoring.ProctoredExamSoftwareSecureReview
edx_proctoring.ProctoredExamSoftwareSecureReviewHistory
edx_proctoring.ProctoredExamSoftwareSecureComment
Found enforceable models via the organizations app:
organizations.Organization
organizations.OrganizationCourse
Found enforceable models via the enterprise app:
enterprise.HistoricalEnterpriseCustomer
enterprise.EnterpriseCustomer
enterprise.EnterpriseCustomerUser
enterprise.PendingEnterpriseCustomerUser
enterprise.PendingEnrollment
enterprise.EnterpriseCustomerBrandingConfiguration
enterprise.EnterpriseCustomerIdentityProvider
enterprise.HistoricalEnterpriseCustomerEntitlement
enterprise.EnterpriseCustomerEntitlement
enterprise.HistoricalEnterpriseCourseEnrollment
enterprise.EnterpriseCourseEnrollment
enterprise.HistoricalEnterpriseCustomerCatalog
enterprise.EnterpriseCustomerCatalog
enterprise.HistoricalEnrollmentNotificationEmailTemplate
enterprise.EnrollmentNotificationEmailTemplate
enterprise.EnterpriseCustomerReportingConfiguration
Found enforceable models via the consent app:
consent.HistoricalDataSharingConsent
consent.DataSharingConsent
consent.DataSharingConsentTextOverrides
Found enforceable models via the integrated_channel app:
integrated_channel.LearnerDataTransmissionAudit
integrated_channel.ContentMetadataItemTransmission
Found enforceable models via the degreed app:
degreed.DegreedGlobalConfiguration
degreed.HistoricalDegreedEnterpriseCustomerConfiguration
degreed.DegreedEnterpriseCustomerConfiguration
degreed.DegreedLearnerDataTransmissionAudit
Found enforceable models via the sap_success_factors app:
sap_success_factors.SAPSuccessFactorsGlobalConfiguration
sap_success_factors.SAPSuccessFactorsEnterpriseCustomerConfiguration
sap_success_factors.SapSuccessFactorsLearnerDataTransmissionAudit
Found enforceable models via the xapi app:
xapi.XAPILRSConfiguration
Found enforceable models via the schedules app:
schedules.Schedule
schedules.ScheduleConfig
schedules.ScheduleExperience
Found enforceable models via the grades app:
grades.PersistentGradesEnabledFlag
grades.CoursePersistentGradesFlag
grades.ComputeGradesSetting
grades.VisibleBlocks
grades.PersistentSubsectionGrade
grades.PersistentCourseGrade
grades.PersistentSubsectionGradeOverride
Found enforceable models via the credentials app:
credentials.CredentialsApiConfig
credentials.NotifyCredentialsConfig
Found enforceable models via the bookmarks app:
bookmarks.Bookmark
bookmarks.XBlockCache
Found enforceable models via the theming app:
theming.SiteTheme
Found enforceable models via the completion app:
completion.BlockCompletion
Found enforceable models via the coursewarehistoryextended app:
coursewarehistoryextended.StudentModuleHistoryExtended
3rd Party App Discovery
In order to seed the initial safelist for a given IDA, we shouldn't just start with every "enforceable" model, since the safelist should not contain many of those models. For instance, the safelist should never include models that are defined locally in the same codebase as itself. Also, it is important to note the difference between "non-local" and "3rd party" w.r.t. an IDA codebase. "Non-local" includes 3rd party apps, and also edx-owned pip installed apps (aka edx satellite apps):
This script will generate a starting point for a safelist for any IDA, containing all models in all non-local applications:
import sys
import inspect
from django.apps import apps
from django.db import models
def enforceable(model):
"""
The given model actually requires annotations according to PLAT-2344.
"""
return issubclass(model, models.Model) \
and not model is models.Model \
and not model._meta.abstract \
and not model._meta.proxy
def is_not_local(model):
"""
Return True if the given model is local to the current IDA.
"""
# If it _was_ local to this IDA repository, it should be defined somewhere
# under sys.prefix + '/src/' or in a path that points to the current
# checked-out code. "site-packages" is a dead giveaway that the source
# code came from far away.
not_local = inspect.getsourcefile(model).startswith(
sys.prefix + '/local/lib/python2.7/site-packages/' # this would only work for python2.7
)
return not_local
for app in apps.get_app_configs():
for root_model in app.get_models():
# getmro() includes the _entire_ inheritance closure, not just the direct inherited classes.
heirarchy = inspect.getmro(root_model)
for model in heirarchy:
if enforceable(model) and is_not_local(model):
model_path = inspect.getsourcefile(model).split('site-packages/')[-1]
# model._meta.app_label is the lowercase snake_case representation of the app.
# model._meta.object_name is the CamelCase representation of the model.
print('{}.{}: # via {}'.format(
model._meta.app_label,
model._meta.object_name,
model_path)
)
which generates the following safelist (.pii_safe_list.yaml) starting-point for edx-platform. Note that it is valid yaml, but will fail the OEP-30 checker tooling since there are no annotations for any of the 3rd party models.
auth.Permission: # via django/contrib/auth/models.py
auth.Group: # via django/contrib/auth/models.py
auth.User: # via django/contrib/auth/models.py
contenttypes.ContentType: # via django/contrib/contenttypes/models.py
redirects.Redirect: # via django/contrib/redirects/models.py
sessions.Session: # via django/contrib/sessions/models.py
sites.Site: # via django/contrib/sites/models.py
djcelery.TaskMeta: # via djcelery/models.py
djcelery.TaskSetMeta: # via djcelery/models.py
djcelery.IntervalSchedule: # via djcelery/models.py
djcelery.CrontabSchedule: # via djcelery/models.py
djcelery.PeriodicTasks: # via djcelery/models.py
djcelery.PeriodicTask: # via djcelery/models.py
djcelery.WorkerState: # via djcelery/models.py
djcelery.TaskState: # via djcelery/models.py
waffle.Flag: # via waffle/models.py
waffle.Switch: # via waffle/models.py
waffle.Sample: # via waffle/models.py
django_openid_auth.Nonce: # via django_openid_auth/models.py
django_openid_auth.Association: # via django_openid_auth/models.py
django_openid_auth.UserOpenID: # via django_openid_auth/models.py
oauth2.Client: # via provider/oauth2/models.py
oauth2.Grant: # via provider/oauth2/models.py
oauth2.AccessToken: # via provider/oauth2/models.py
oauth2.RefreshToken: # via provider/oauth2/models.py
edx_oauth2_provider.TrustedClient: # via edx_oauth2_provider/models.py
oauth2_provider.Application: # via oauth2_provider/models.py
oauth2_provider.Grant: # via oauth2_provider/models.py
oauth2_provider.AccessToken: # via oauth2_provider/models.py
oauth2_provider.RefreshToken: # via oauth2_provider/models.py
oauth_provider.Nonce: # via oauth_provider/models.py
oauth_provider.Scope: # via oauth_provider/models.py
oauth_provider.Scope: # via oauth_provider/models.py
oauth_provider.Consumer: # via oauth_provider/models.py
oauth_provider.Token: # via oauth_provider/models.py
admin.LogEntry: # via django/contrib/admin/models.py
splash.SplashConfig: # via splash/models.py
social_django.UserSocialAuth: # via social_django/models.py
social_django.Nonce: # via social_django/models.py
social_django.Association: # via social_django/models.py
social_django.Code: # via social_django/models.py
social_django.Partial: # via social_django/models.py
problem_builder.Answer: # via problem_builder/models.py
problem_builder.Share: # via problem_builder/models.py
submissions.StudentItem: # via submissions/models.py
submissions.Submission: # via submissions/models.py
submissions.Score: # via submissions/models.py
submissions.ScoreSummary: # via submissions/models.py
submissions.ScoreAnnotation: # via submissions/models.py
assessment.Rubric: # via openassessment/assessment/models/base.py
assessment.Criterion: # via openassessment/assessment/models/base.py
assessment.CriterionOption: # via openassessment/assessment/models/base.py
assessment.Assessment: # via openassessment/assessment/models/base.py
assessment.AssessmentPart: # via openassessment/assessment/models/base.py
assessment.AssessmentFeedbackOption: # via openassessment/assessment/models/peer.py
assessment.AssessmentFeedback: # via openassessment/assessment/models/peer.py
assessment.PeerWorkflow: # via openassessment/assessment/models/peer.py
assessment.PeerWorkflowItem: # via openassessment/assessment/models/peer.py
assessment.TrainingExample: # via openassessment/assessment/models/training.py
assessment.StudentTrainingWorkflow: # via openassessment/assessment/models/student_training.py
assessment.StudentTrainingWorkflowItem: # via openassessment/assessment/models/student_training.py
assessment.StaffWorkflow: # via openassessment/assessment/models/staff.py
workflow.AssessmentWorkflow: # via openassessment/workflow/models.py
workflow.AssessmentWorkflowStep: # via openassessment/workflow/models.py
workflow.AssessmentWorkflowCancellation: # via openassessment/workflow/models.py
edxval.Profile: # via edxval/models.py
edxval.Video: # via edxval/models.py
edxval.CourseVideo: # via edxval/models.py
edxval.EncodedVideo: # via edxval/models.py
edxval.VideoImage: # via edxval/models.py
edxval.VideoTranscript: # via edxval/models.py
edxval.TranscriptPreference: # via edxval/models.py
edxval.ThirdPartyTranscriptCredentialsState: # via edxval/models.py
thumbnail.KVStore: # via sorl/thumbnail/models.py
milestones.Milestone: # via milestones/models.py
milestones.MilestoneRelationshipType: # via milestones/models.py
milestones.CourseMilestone: # via milestones/models.py
milestones.CourseContentMilestone: # via milestones/models.py
milestones.UserMilestone: # via milestones/models.py
celery_utils.FailedTask: # via celery_utils/models.py
celery_utils.ChordData: # via celery_utils/models.py
edx_proctoring.ProctoredExam: # via edx_proctoring/models.py
edx_proctoring.ProctoredExamReviewPolicy: # via edx_proctoring/models.py
edx_proctoring.ProctoredExamReviewPolicyHistory: # via edx_proctoring/models.py
edx_proctoring.ProctoredExamStudentAttempt: # via edx_proctoring/models.py
edx_proctoring.ProctoredExamStudentAttemptHistory: # via edx_proctoring/models.py
edx_proctoring.ProctoredExamStudentAllowance: # via edx_proctoring/models.py
edx_proctoring.ProctoredExamStudentAllowanceHistory: # via edx_proctoring/models.py
edx_proctoring.ProctoredExamSoftwareSecureReview: # via edx_proctoring/models.py
edx_proctoring.ProctoredExamSoftwareSecureReviewHistory: # via edx_proctoring/models.py
edx_proctoring.ProctoredExamSoftwareSecureComment: # via edx_proctoring/models.py
organizations.Organization: # via organizations/models.py
organizations.OrganizationCourse: # via organizations/models.py
enterprise.HistoricalEnterpriseCustomer: # via enterprise/models.py
enterprise.EnterpriseCustomer: # via enterprise/models.py
enterprise.EnterpriseCustomerUser: # via enterprise/models.py
enterprise.PendingEnterpriseCustomerUser: # via enterprise/models.py
enterprise.PendingEnrollment: # via enterprise/models.py
enterprise.EnterpriseCustomerBrandingConfiguration: # via enterprise/models.py
enterprise.EnterpriseCustomerIdentityProvider: # via enterprise/models.py
enterprise.HistoricalEnterpriseCustomerEntitlement: # via enterprise/models.py
enterprise.EnterpriseCustomerEntitlement: # via enterprise/models.py
enterprise.HistoricalEnterpriseCourseEnrollment: # via enterprise/models.py
enterprise.EnterpriseCourseEnrollment: # via enterprise/models.py
enterprise.HistoricalEnterpriseCustomerCatalog: # via enterprise/models.py
enterprise.EnterpriseCustomerCatalog: # via enterprise/models.py
enterprise.HistoricalEnrollmentNotificationEmailTemplate: # via enterprise/models.py
enterprise.EnrollmentNotificationEmailTemplate: # via enterprise/models.py
enterprise.EnterpriseCustomerReportingConfiguration: # via enterprise/models.py
consent.HistoricalDataSharingConsent: # via consent/models.py
consent.DataSharingConsent: # via consent/models.py
consent.DataSharingConsentTextOverrides: # via consent/models.py
integrated_channel.LearnerDataTransmissionAudit: # via integrated_channels/integrated_channel/models.py
integrated_channel.ContentMetadataItemTransmission: # via integrated_channels/integrated_channel/models.py
degreed.DegreedGlobalConfiguration: # via integrated_channels/degreed/models.py
degreed.HistoricalDegreedEnterpriseCustomerConfiguration: # via integrated_channels/degreed/__init__.py
degreed.DegreedEnterpriseCustomerConfiguration: # via integrated_channels/degreed/models.py
degreed.DegreedLearnerDataTransmissionAudit: # via integrated_channels/degreed/models.py
sap_success_factors.SAPSuccessFactorsGlobalConfiguration: # via integrated_channels/sap_success_factors/models.py
sap_success_factors.SAPSuccessFactorsEnterpriseCustomerConfiguration: # via integrated_channels/sap_success_factors/models.py
sap_success_factors.SapSuccessFactorsLearnerDataTransmissionAudit: # via integrated_channels/sap_success_factors/models.py
xapi.XAPILRSConfiguration: # via integrated_channels/xapi/models.py
completion.BlockCompletion: # via completion/models.py
Forked repositories
Forked repos are a grey area for annotations; on one hand we may freely annotate them because we have merge rights, but on the other hand the more we deviate from upstream the harder it will be to merge back upstream changes. For this reason, we have come to the following decision:
Implementation Decision: Treat forked repositories as 3rd party. Do not annotate models in them directly, but rather use the 3rd party annotation mechanism (the safelist).
Generating RST docs (PLAT-2346)
In short, the python libraries offering RST generation are very basic, and individually cannot offer even the gamut of basic features we need for annotations reports. Fortunately, RST Isn't a huge pain to see in raw (meant to be human-readable), so we should just focus on constructing raw RST for annotations reporting. See PLAT-2346 for more details.
There are several languages that may need to be searched, each with their own unique comment style and challenges. Not every repository will need every language, so our goals are to:
- Allow extensions to add new languages without having to update the tool
- Allow configuration to define which filename extensions map to which extensions
- These extensions are only run for static analysis. Deeper level introspection, such as we use for Django models would probably need to be scripted in the parent language, but could still output report files in a format that would allow them to be used with our other tools
To allow for simple adoption, ease of development, and consistency with other edX projects we've settled on Stevedore to manage the extensions. We will use Stevedore's NamedExtensionManager to find installed plugins, and a YAML configuration section to map the desired filename extensions to the installed extensions. Extensions will inherit from an "abstract" base class that will explicitly show the necessary interface, and should only need to implement perhaps 2 methods:
validate:
to examine the passed-in file and return any annotation formatting errorssearch
: to actually search a passed-in file and return a list of found annotations
class AnnotationExtension(object):
"""
Abstract base class that annotation extensions will inherit from
"""
def __init__(self, annotation_tokens):
self.annotation_tokens = annotation_tokens
def validate(self, file_handle):
"""
Validates that any annotations in the given file are properly formatted
"""
raise NotImplementedError('validate called on base class!')
def search(self, file_handle):
"""
Does the actual annotation search for the given file
"""
raise NotImplementedError('search called on base class!')
class PythonAnnotationExtension(AnnotationExtension):
"""
Annotation extension for Python source files
"""
def validate(self, file_handle:
...
return results
def search(self):
...
return results
Plugins will need to be defined as entry points in annotation_finder.searchers
namespace in setup.py or setup.cfg.
entry_points={
'annotation_finder.searchers': [
'python = extensions.python_extension:PythonAnnotationExtension',
'javascript = plugins.javascript_extension:JavascriptAnnotationExtension',
],
},
This script shows how we could find, validate configuration, and execute a search on all of the plugins:
import os
import yaml
from stevedore import named
test_config = """
annotations:
pii:
- ".. pii::"
- ".. pii_types::":
- id
- name
- other
- ".. pii_retirement::":
- retained
- local_api
- consumer_api
- third_party
nopii: ".. no_pii::"
extensions:
python:
- py
javascript:
- js
- jsx
"""
def load_failed_handler(*args, **kwargs):
"""
Callback for when we fail to load an extension, otherwise it fails silently
"""
print(args)
print(kwargs)
def search(ext, file_handle, file_extensions_map, filename_extension):
"""
Executes a search on the given file, only if it is configured for this
extension
"""
if filename_extension not in file_extensions_map[ext.name]:
print('{} does not support {}. Skipping.'.format(ext.name, filename_extension))
return (ext.name, [])
return ext.name, ext.obj.search(file_handle)
if __name__ == '__main__':
config = yaml.load(test_config)
print(config)
# These are the names of all of our configured extensions
configured_extension_names = config['extensions'].keys()
print(configured_extension_names)
# Load Stevedore extensions that we are configured for (and only those)
mgr = named.NamedExtensionManager(
names=configured_extension_names,
namespace='annotation_finder.searchers',
invoke_on_load=True,
on_load_failure_callback=load_failed_handler,
invoke_args=(config['annotations'],), # This is temporary
)
# Output all found extension entry points (whether or not they were loaded)
print(mgr.list_entry_points())
# Output all extensions that were actually able to load
for extension in mgr.extensions:
print(extension)
# Index the results by extension name
file_extensions_map = {}
known_extensions = set()
for extension_name in config['extensions']:
file_extensions_map[extension_name] = config['extensions'][extension_name]
known_extensions.update(config['extensions'][extension_name])
source_path = '/foo/bar/'
# From here we could begin the actual file searching and reporting...
# This is not optimized, but without the prints or doing any actual searching
# runs all of edx-platform in 1.18 second.
for root, dirs, files in os.walk(source_path):
for filename in files:
filename_extension = os.path.splitext(filename)[1][1:]
if filename_extension not in known_extensions:
print("{} is not a known extension, skipping.".format(filename_extension))
continue
full_name = os.path.join(root, filename)
print(full_name)
with open(full_name, 'r') as file_handle:
try:
# Call get_supported_extensions on all loaded extensions
results = mgr.map(search, file_handle, file_extensions_map, filename_extension)
print(results)
except IndexError:
# Should we define a catchall in config?
print("No file extension in {}, skipping.".format(full_name))
Configuration for the annotation tooling needs to handle the following things:
- Describe the known annotation statements (the unique strings such as ".. pii::" that we search for), and their enum values (where necessary)
- State which extensions are to be used, and what filename extensions they are to be used for
We may also want to add options for the things that are currently spec'd as command line options (in / out files and paths, safelist filename) but presumably those are simple enough to add at the top level if we choose to, and don't need exposition here.
# This section describes the known annotations
annotations:
# An annotation can be a single statement that stands alone
nopii: ".. no_pii::"
# Or it can describe a group of statements, in which case
# the statements must appear in the same order as listed here
pii:
# A statement can be a simple value, in which case the
# text that follows it will be captured
- ".. pii::"
# Or it can be an enum list, in which case only the values
# included will be allowed. In this case a ".. pii::"
# annotation must be followed immediately by a
# ".. pii_types::" statement which must then be followed
# immediately by a ".. pii_retirement::" statement.
- ".. pii_types::":
# Multiple enum values can be given on an annotation
# as long as they are separated by spaces such as:
# .. pii_types:: name username ip
# An enum annotation must include at least on enum
# value
- id
- name
- username
- password
- location
- phone_number
- email_address
- birth_date
- ip
- external_service
- biography
- gender
- sex
- image
- video
- other
- ".. pii_retirement::":
- retained
- local_api
- consumer_api
- third_party
# This section is for extension configuration, each
# sub-section is the name of a Stevedore extension
# that must be installed. Under each extension name
# is a list of file extensions that it will be used
# for.
extensions:
python:
- py
- py3
- pyw
- rpy
- pyt
javascript:
- js
- jsx
Reporting Output (PLAT-2350)
Reporting output from the tools should match the following format:
# Top level is a dict, keys are filenames relative to the search path
{
'/openedx/core/djangoapps/pii_enforcer/pii_searcher.py':
# Underneath the keys are a list of annotations
[
# Stand-alone annotations are formatted as follows:
{
'annotation_data': 'No PII is stored here',
'annotation_token': '.. no_pii::',
'line_number': 2,
'found_by': ['python'] # These are the names of the extensions or scripts that found this annotation
},
{
'annotation_data': 'We do not store PII in this model',
'annotation_token': '.. no_pii::',
'line_number': 17,
'found_by': ['python']
}
],
'/openedx/core/djangoapps/user_api/legacy_urls.py':
[
# Annotation groups are represented differently
{
'annotation_group': 'pii', # This is the name given to the group in configuration
'annotations':
[
{
'annotation_data': 'This model stores user addresses and phone numbers',
'annotation_token': '.. pii::',
'line_number': 16,
'found_by': ['python']
},
{
# In cases where the annotation type is an enum, "annotation_data" becomes a list
'annotation_data': ['address', 'phone_number'],
'annotation_token': '.. pii_types::',
'line_number': 17,
'found_by': ['python']
},
{
'annotation_data': ['local_api', 'consumer_api'],
'annotation_token': '.. pii_retirement::',
'line_number': 18,
'found_by': ['python']
}
]
}
]
}