This documentation is now stale and superseded by the implementation and documentation in the Code Annotations repository. |
OEP-30 outlines our intentions regarding PII annotations in code that runs on edx.org and other instances of openedx. This article is a place where we can collect decisions around implementation details and outputs from OEP-30 discovery tasks.
Scratch notes: https://docs.google.com/document/d/1fu2DzfkUGXYRXnXo-V3kGVYB4SB-Z_E4XfodkqKW6bg/edit
In Django, there are three different types of model inheritance:
Proxy models would never be positively annotated to contain PII because they neither define fields, nor have a DB table of their own. Only non-proxy models in its ancestry (in most cases, just the one inherited model) would possibly define PII fields. Similarly, Abstract base models do not directly "store" PII because they do not have a table of their own. For the purposes of the annotation-asserting tool, I recommend that both abstract base models and proxy models are ignored.
All other kinds of models (non-abstract, non-proxy) are concrete and have a database table, so they MUST be annotated.
Model "mixins" may inherit from object
or from models.Model
depending on the codebase norms, and whether they need to define fields (in which case they must inherit from models.Model
). If a model mixin cannot define fields because it only imports from object
, then it does not require an annotation. Likewise, model mixins which inherit from models.Model
CAN define fields, but are necessarily abstract base models, so they are subject to the same annotation rules as Abstract Base Models above (i.e. annotations not required).
Implementation Decision: All classes which inherit from models.Model
directly or indirectly, AND are non-proxy/non-abstract must have a corresponding PII annotation. I.e., a class definition requires annotations (positive OR negative) if:
issubclass(MyModel, models.Model) and not MyModel._meta.abstract and not MyModel._meta.proxy |
This section is concerned only with 3rd party repos and edx satellite repos providing installable django apps containing models. |
Problem Statement: 3rd party PII annotations must be tied to specific package versions (you cannot possibly introspect django apps without first picking a version to install, duh). However, the version that a satellite edx repo installs during unit tests may be different than the version edx-platform or other IDA installs. Sometimes, perhaps, one version may have more PII than the other. So, which version to check? Similarly, which repos are responsible for checking annotations for 3rd party models? Should both the satellite repo AND the IDA-level repo assert annotations on the same 3rd party model?
The version of a 3rd party package specified in the IDA-level repository (in a production.txt
requirements file) is ultimately the actual version which will be installed in prod. Thus, the model introspection and annotation checking of all 3rd party applications SHOULD happen at the IDA-level repository. As the number of our IDAs increases, there will be an increasing amount of duplicate 3rd party annotations, but that is a small price to pay, and ultimately more accurate as 3rd party package versions between IDAs diverge.
With all 3rd party installed apps already being checked at the IDA-level repo during C-I tests, satellite repos need only check and annotate the django apps defined within. This local-only mode of checking for internal satellite repos does not need to do any model inheritance lookups because there is no possible case where inherited models that actually require annotations wouldn't already be checked at the IDA-level repository:
TimeStampedModel
from django-model-utils
), no action is necessary because we do not require annotations for abstract base models.The "safelist" file will serve as remote annotations for 3rd party models. We do already record the 3rd party package versions in production.txt
of the IDA-level repo, but additionally recording the version number in the safelist would force developers to manually check the package every time it gets updated and manually bump the version in the safelist too. That seems like the most foolproof thing to do, but honestly that feels like way too much overhead, so we SHOULD NOT record version numbers in the safelist.
Implementation Decision: Only models defined locally within a satellite repo are subject to annotation assertions, regardless of their inheritance ancestry (call this local-only mode). However, at the IDA-level repo, 1) local models, 2) inherited models, and 3) all other discoverable 3rd party models should be checked, employing a "safelist" to annotate them remotely if necessary. Additionally, safelist annotations SHOULD NOT specify the package version providing the model.
This file will remotely annotate 3rd party models in IDA-level repositories. Given the decisions above, this file need not describe package versions, nor any additional metadata.
app_label1.ModelName1: pii: <description of the PII, field names, etc.> pii_types: <comma delimited list of types> pii_retirement: <description of the retirement functionality> app_label1.ModelName2: no_pii: <use `null` or leave empty and yaml will assume null> ... |
Each top level key is the model path in django's standard representation (app label in snake_case, and model class name in CamelCase, delimited by a period). The values are themselves mappings containing the standard PII attributes in semantic YAML (rather than rST tags).
OEP-30 tooling should expect to find this file at the root of the repository next to .piiconf
.
PLAT-2357 provides an example code stub for running through installed django models. I've incorporated the findings on this page into this code stub below, which will serve as an updated starting-point for the "django" plugin. It discovers all apps, whether local, satellite, and 3rd party, unless set to local-mode by assigning an app label to local_app_label
.
import inspect from django.apps import apps from django.db import models local_app_label = None #local_app_label = 'milestones' def enforceable(model): """ The given model actually requires annotations according to PLAT-2344. """ return issubclass(model, models.Model) \ and not model is models.Model \ and not model._meta.abstract \ and not model._meta.proxy for app in apps.get_app_configs(): if local_app_label and not app.label == local_app_label: # We are in local-only mode, so skip apps which are not provided by this repository. continue found_models = [] for root_model in app.get_models(): # Do not check model inheritance iff in local-mode. if local_app_label: heirarchy = [root_model] else: # getmro() includes the _entire_ inheritance closure, not just the direct inherited classes. heirarchy = inspect.getmro(root_model) for model in heirarchy: if enforceable(model): # model._meta.app_label is the lowercase snake_case representation of the app. # model._meta.object_name is the CamelCase representation of the model. found_models.append('{}.{}'.format(model._meta.app_label, model._meta.object_name)) if found_models: print('Found enforceable models via the {} app:'.format(app.label)) for found in found_models: print(' {}'.format(found)) |
In devstack, it prints this (click to expand):
Found enforceable models via the auth app: auth.Permission auth.Group auth.User Found enforceable models via the contenttypes app: contenttypes.ContentType Found enforceable models via the redirects app: redirects.Redirect Found enforceable models via the sessions app: sessions.Session Found enforceable models via the sites app: sites.Site Found enforceable models via the djcelery app: djcelery.TaskMeta djcelery.TaskSetMeta djcelery.IntervalSchedule djcelery.CrontabSchedule djcelery.PeriodicTasks djcelery.PeriodicTask djcelery.WorkerState djcelery.TaskState Found enforceable models via the waffle app: waffle.Flag waffle.Switch waffle.Sample Found enforceable models via the status app: status.GlobalStatusMessage status.CourseMessage Found enforceable models via the static_replace app: static_replace.AssetBaseUrlConfig static_replace.AssetExcludedExtensionsConfig Found enforceable models via the contentserver app: contentserver.CourseAssetCacheTtlConfig contentserver.CdnUserAgentsConfig Found enforceable models via the site_configuration app: site_configuration.SiteConfiguration site_configuration.SiteConfigurationHistory Found enforceable models via the video_config app: video_config.HLSPlaybackEnabledFlag video_config.CourseHLSPlaybackEnabledFlag video_config.VideoTranscriptEnabledFlag video_config.CourseVideoTranscriptEnabledFlag video_config.TranscriptMigrationSetting video_config.MigrationEnqueuedCourse video_config.VideoThumbnailSetting video_config.UpdatedCourseVideos Found enforceable models via the video_pipeline app: video_pipeline.VideoPipelineIntegration video_pipeline.VideoUploadsEnabledByDefault video_pipeline.CourseVideoUploadsEnabledByDefault Found enforceable models via the courseware app: courseware.StudentModule courseware.StudentModuleHistory courseware.XModuleUserStateSummaryField courseware.XModuleStudentPrefsField courseware.XModuleStudentInfoField courseware.OfflineComputedGrade courseware.OfflineComputedGradeLog courseware.StudentFieldOverride courseware.DynamicUpgradeDeadlineConfiguration courseware.CourseDynamicUpgradeDeadlineConfiguration courseware.OrgDynamicUpgradeDeadlineConfiguration Found enforceable models via the student app: student.AnonymousUserId student.UserStanding student.UserProfile student.UserSignupSource student.UserTestGroup student.Registration student.PendingNameChange student.PendingEmailChange student.PasswordHistory student.LoginFailures student.CourseEnrollment student.ManualEnrollmentAudit student.CourseEnrollmentAllowed student.CourseAccessRole student.DashboardConfiguration student.LinkedInAddToProfileConfiguration student.EntranceExamConfiguration student.LanguageProficiency student.SocialLink student.CourseEnrollmentAttribute student.EnrollmentRefundConfiguration student.RegistrationCookieConfiguration student.UserAttribute student.LogoutViewConfiguration Found enforceable models via the track app: track.TrackingLog Found enforceable models via the util app: util.RateLimitConfiguration Found enforceable models via the certificates app: certificates.CertificateWhitelist certificates.GeneratedCertificate certificates.CertificateGenerationHistory certificates.CertificateInvalidation certificates.ExampleCertificateSet certificates.ExampleCertificate certificates.CertificateGenerationCourseSetting certificates.CertificateGenerationConfiguration certificates.CertificateHtmlViewConfiguration certificates.CertificateTemplate certificates.CertificateTemplateAsset Found enforceable models via the instructor_task app: instructor_task.InstructorTask instructor_task.GradeReportSetting Found enforceable models via the course_groups app: course_groups.CourseUserGroup course_groups.CohortMembership course_groups.CourseUserGroupPartitionGroup course_groups.CourseCohortsSettings course_groups.CourseCohort course_groups.UnregisteredLearnerCohortAssignments Found enforceable models via the bulk_email app: bulk_email.Target bulk_email.CohortTarget bulk_email.Target bulk_email.CourseModeTarget bulk_email.Target bulk_email.CourseEmail bulk_email.Optout bulk_email.CourseEmailTemplate bulk_email.CourseAuthorization bulk_email.BulkEmailFlag Found enforceable models via the branding app: branding.BrandingInfoConfig branding.BrandingApiConfig Found enforceable models via the external_auth app: external_auth.ExternalAuthMap Found enforceable models via the django_openid_auth app: django_openid_auth.Nonce django_openid_auth.Association django_openid_auth.UserOpenID Found enforceable models via the oauth2 app: oauth2.Client oauth2.Grant oauth2.AccessToken oauth2.RefreshToken Found enforceable models via the edx_oauth2_provider app: edx_oauth2_provider.TrustedClient Found enforceable models via the oauth2_provider app: oauth2_provider.Application oauth2_provider.Grant oauth2_provider.AccessToken oauth2_provider.RefreshToken Found enforceable models via the oauth_dispatch app: oauth_dispatch.RestrictedApplication oauth_dispatch.ApplicationAccess oauth_dispatch.ApplicationOrganization Found enforceable models via the third_party_auth app: third_party_auth.OAuth2ProviderConfig third_party_auth.SAMLConfiguration third_party_auth.SAMLProviderConfig third_party_auth.SAMLProviderData third_party_auth.LTIProviderConfig third_party_auth.ProviderApiPermissions Found enforceable models via the oauth_provider app: oauth_provider.Nonce oauth_provider.Scope oauth_provider.Scope oauth_provider.Consumer oauth_provider.Token Found enforceable models via the wiki app: wiki.Article wiki.ArticleForObject wiki.ArticleRevision wiki.ArticlePlugin wiki.ReusablePlugin wiki.ArticlePlugin wiki.SimplePlugin wiki.ArticlePlugin wiki.RevisionPlugin wiki.ArticlePlugin wiki.RevisionPluginRevision wiki.URLPath Found enforceable models via the django_notify app: django_notify.NotificationType django_notify.Settings django_notify.Subscription django_notify.Notification Found enforceable models via the admin app: admin.LogEntry Found enforceable models via the django_comment_common app: django_comment_common.Role django_comment_common.Permission django_comment_common.ForumsConfig django_comment_common.CourseDiscussionSettings django_comment_common.DiscussionsIdMapping Found enforceable models via the notes app: notes.Note Found enforceable models via the splash app: splash.SplashConfig Found enforceable models via the user_api app: user_api.UserPreference user_api.UserCourseTag user_api.UserOrgTag user_api.RetirementState user_api.UserRetirementPartnerReportingStatus user_api.UserRetirementRequest user_api.UserRetirementStatus Found enforceable models via the shoppingcart app: shoppingcart.Order shoppingcart.OrderItem shoppingcart.Invoice shoppingcart.InvoiceTransaction shoppingcart.InvoiceItem shoppingcart.CourseRegistrationCodeInvoiceItem shoppingcart.InvoiceItem shoppingcart.InvoiceHistory shoppingcart.CourseRegistrationCode shoppingcart.RegistrationCodeRedemption shoppingcart.Coupon shoppingcart.CouponRedemption shoppingcart.PaidCourseRegistration shoppingcart.OrderItem shoppingcart.CourseRegCodeItem shoppingcart.OrderItem shoppingcart.CourseRegCodeItemAnnotation shoppingcart.PaidCourseRegistrationAnnotation shoppingcart.CertificateItem shoppingcart.OrderItem shoppingcart.DonationConfiguration shoppingcart.Donation shoppingcart.OrderItem Found enforceable models via the course_modes app: course_modes.CourseMode course_modes.CourseModesArchive course_modes.CourseModeExpirationConfig Found enforceable models via the entitlements app: entitlements.CourseEntitlementPolicy entitlements.CourseEntitlement entitlements.CourseEntitlementSupportDetail Found enforceable models via the verify_student app: verify_student.ManualVerification verify_student.SSOVerification verify_student.SoftwareSecurePhotoVerification verify_student.VerificationDeadline Found enforceable models via the dark_lang app: dark_lang.DarkLangConfig Found enforceable models via the microsite_configuration app: microsite_configuration.Microsite microsite_configuration.MicrositeHistory microsite_configuration.MicrositeOrganizationMapping microsite_configuration.MicrositeTemplate Found enforceable models via the rss_proxy app: rss_proxy.WhitelistedRssUrl Found enforceable models via the embargo app: embargo.EmbargoedCourse embargo.EmbargoedState embargo.RestrictedCourse embargo.Country embargo.CountryAccessRule embargo.CourseAccessRuleHistory embargo.IPFilter Found enforceable models via the course_action_state app: course_action_state.CourseRerunState Found enforceable models via the mobile_api app: mobile_api.MobileApiConfig mobile_api.AppVersionConfig mobile_api.IgnoreMobileAvailableFlagConfig Found enforceable models via the social_django app: social_django.UserSocialAuth social_django.Nonce social_django.Association social_django.Code social_django.Partial Found enforceable models via the survey app: survey.SurveyForm survey.SurveyAnswer Found enforceable models via the lms_xblock app: lms_xblock.XBlockAsidesConfig Found enforceable models via the problem_builder app: problem_builder.Answer problem_builder.Share Found enforceable models via the submissions app: submissions.StudentItem submissions.Submission submissions.Score submissions.ScoreSummary submissions.ScoreAnnotation Found enforceable models via the assessment app: assessment.Rubric assessment.Criterion assessment.CriterionOption assessment.Assessment assessment.AssessmentPart assessment.AssessmentFeedbackOption assessment.AssessmentFeedback assessment.PeerWorkflow assessment.PeerWorkflowItem assessment.TrainingExample assessment.StudentTrainingWorkflow assessment.StudentTrainingWorkflowItem assessment.StaffWorkflow Found enforceable models via the workflow app: workflow.AssessmentWorkflow workflow.AssessmentWorkflowStep workflow.AssessmentWorkflowCancellation Found enforceable models via the edxval app: edxval.Profile edxval.Video edxval.CourseVideo edxval.EncodedVideo edxval.VideoImage edxval.VideoTranscript edxval.TranscriptPreference edxval.ThirdPartyTranscriptCredentialsState Found enforceable models via the course_overviews app: course_overviews.CourseOverview course_overviews.CourseOverviewTab course_overviews.CourseOverviewImageSet course_overviews.CourseOverviewImageConfig Found enforceable models via the block_structure app: block_structure.BlockStructureConfiguration block_structure.BlockStructureModel Found enforceable models via the cors_csrf app: cors_csrf.XDomainProxyConfiguration Found enforceable models via the commerce app: commerce.CommerceConfiguration Found enforceable models via the credit app: credit.CreditProvider credit.CreditCourse credit.CreditRequirement credit.CreditRequirementStatus credit.CreditEligibility credit.CreditRequest credit.CreditConfig Found enforceable models via the teams app: teams.CourseTeam teams.CourseTeamMembership Found enforceable models via the xblock_django app: xblock_django.XBlockConfiguration xblock_django.XBlockStudioConfigurationFlag xblock_django.XBlockStudioConfiguration Found enforceable models via the programs app: programs.ProgramsApiConfig Found enforceable models via the catalog app: catalog.CatalogIntegration Found enforceable models via the self_paced app: self_paced.SelfPacedConfiguration Found enforceable models via the thumbnail app: thumbnail.KVStore Found enforceable models via the milestones app: milestones.Milestone milestones.MilestoneRelationshipType milestones.CourseMilestone milestones.CourseContentMilestone milestones.UserMilestone Found enforceable models via the api_admin app: api_admin.ApiAccessRequest api_admin.ApiAccessConfig api_admin.Catalog Found enforceable models via the verified_track_content app: verified_track_content.VerifiedTrackCohortedCourse verified_track_content.MigrateVerifiedTrackCohortsSetting Found enforceable models via the badges app: badges.BadgeClass badges.BadgeAssertion badges.CourseCompleteImageConfiguration badges.CourseEventBadgesConfiguration Found enforceable models via the email_marketing app: email_marketing.EmailMarketingConfiguration Found enforceable models via the celery_utils app: celery_utils.FailedTask celery_utils.ChordData Found enforceable models via the crawlers app: crawlers.CrawlersConfig Found enforceable models via the waffle_utils app: waffle_utils.WaffleFlagCourseOverrideModel Found enforceable models via the course_goals app: course_goals.CourseGoal Found enforceable models via the experiments app: experiments.ExperimentData experiments.ExperimentKeyValue Found enforceable models via the edx_proctoring app: edx_proctoring.ProctoredExam edx_proctoring.ProctoredExamReviewPolicy edx_proctoring.ProctoredExamReviewPolicyHistory edx_proctoring.ProctoredExamStudentAttempt edx_proctoring.ProctoredExamStudentAttemptHistory edx_proctoring.ProctoredExamStudentAllowance edx_proctoring.ProctoredExamStudentAllowanceHistory edx_proctoring.ProctoredExamSoftwareSecureReview edx_proctoring.ProctoredExamSoftwareSecureReviewHistory edx_proctoring.ProctoredExamSoftwareSecureComment Found enforceable models via the organizations app: organizations.Organization organizations.OrganizationCourse Found enforceable models via the enterprise app: enterprise.HistoricalEnterpriseCustomer enterprise.EnterpriseCustomer enterprise.EnterpriseCustomerUser enterprise.PendingEnterpriseCustomerUser enterprise.PendingEnrollment enterprise.EnterpriseCustomerBrandingConfiguration enterprise.EnterpriseCustomerIdentityProvider enterprise.HistoricalEnterpriseCustomerEntitlement enterprise.EnterpriseCustomerEntitlement enterprise.HistoricalEnterpriseCourseEnrollment enterprise.EnterpriseCourseEnrollment enterprise.HistoricalEnterpriseCustomerCatalog enterprise.EnterpriseCustomerCatalog enterprise.HistoricalEnrollmentNotificationEmailTemplate enterprise.EnrollmentNotificationEmailTemplate enterprise.EnterpriseCustomerReportingConfiguration Found enforceable models via the consent app: consent.HistoricalDataSharingConsent consent.DataSharingConsent consent.DataSharingConsentTextOverrides Found enforceable models via the integrated_channel app: integrated_channel.LearnerDataTransmissionAudit integrated_channel.ContentMetadataItemTransmission Found enforceable models via the degreed app: degreed.DegreedGlobalConfiguration degreed.HistoricalDegreedEnterpriseCustomerConfiguration degreed.DegreedEnterpriseCustomerConfiguration degreed.DegreedLearnerDataTransmissionAudit Found enforceable models via the sap_success_factors app: sap_success_factors.SAPSuccessFactorsGlobalConfiguration sap_success_factors.SAPSuccessFactorsEnterpriseCustomerConfiguration sap_success_factors.SapSuccessFactorsLearnerDataTransmissionAudit Found enforceable models via the xapi app: xapi.XAPILRSConfiguration Found enforceable models via the schedules app: schedules.Schedule schedules.ScheduleConfig schedules.ScheduleExperience Found enforceable models via the grades app: grades.PersistentGradesEnabledFlag grades.CoursePersistentGradesFlag grades.ComputeGradesSetting grades.VisibleBlocks grades.PersistentSubsectionGrade grades.PersistentCourseGrade grades.PersistentSubsectionGradeOverride Found enforceable models via the credentials app: credentials.CredentialsApiConfig credentials.NotifyCredentialsConfig Found enforceable models via the bookmarks app: bookmarks.Bookmark bookmarks.XBlockCache Found enforceable models via the theming app: theming.SiteTheme Found enforceable models via the completion app: completion.BlockCompletion Found enforceable models via the coursewarehistoryextended app: coursewarehistoryextended.StudentModuleHistoryExtended |
In order to seed the initial safelist for a given IDA, we shouldn't just start with every "enforceable" model, since the safelist should not contain many of those models. For instance, the safelist should never include models that are defined locally in the same codebase as itself. Also, it is important to note the difference between "non-local" and "3rd party" w.r.t. an IDA codebase. "Non-local" includes 3rd party apps, and also edx-owned pip installed apps (aka edx satellite apps):
This script will generate a starting point for a safelist for any IDA, containing all models in all non-local applications:
import sys import inspect from django.apps import apps from django.db import models def enforceable(model): """ The given model actually requires annotations according to PLAT-2344. """ return issubclass(model, models.Model) \ and not model is models.Model \ and not model._meta.abstract \ and not model._meta.proxy def is_not_local(model): """ Return True if the given model is local to the current IDA. """ # If it _was_ local to this IDA repository, it should be defined somewhere # under sys.prefix + '/src/' or in a path that points to the current # checked-out code. "site-packages" is a dead giveaway that the source # code came from far away. not_local = inspect.getsourcefile(model).startswith( sys.prefix + '/local/lib/python2.7/site-packages/' # this would only work for python2.7 ) return not_local for app in apps.get_app_configs(): for root_model in app.get_models(): # getmro() includes the _entire_ inheritance closure, not just the direct inherited classes. heirarchy = inspect.getmro(root_model) for model in heirarchy: if enforceable(model) and is_not_local(model): model_path = inspect.getsourcefile(model).split('site-packages/')[-1] # model._meta.app_label is the lowercase snake_case representation of the app. # model._meta.object_name is the CamelCase representation of the model. print('{}.{}: # via {}'.format( model._meta.app_label, model._meta.object_name, model_path) ) |
which generates the following safelist (.pii_safe_list.yaml) starting-point for edx-platform. Note that it is valid yaml, but will fail the OEP-30 checker tooling since there are no annotations for any of the 3rd party models.
auth.Permission: # via django/contrib/auth/models.py auth.Group: # via django/contrib/auth/models.py auth.User: # via django/contrib/auth/models.py contenttypes.ContentType: # via django/contrib/contenttypes/models.py redirects.Redirect: # via django/contrib/redirects/models.py sessions.Session: # via django/contrib/sessions/models.py sites.Site: # via django/contrib/sites/models.py djcelery.TaskMeta: # via djcelery/models.py djcelery.TaskSetMeta: # via djcelery/models.py djcelery.IntervalSchedule: # via djcelery/models.py djcelery.CrontabSchedule: # via djcelery/models.py djcelery.PeriodicTasks: # via djcelery/models.py djcelery.PeriodicTask: # via djcelery/models.py djcelery.WorkerState: # via djcelery/models.py djcelery.TaskState: # via djcelery/models.py waffle.Flag: # via waffle/models.py waffle.Switch: # via waffle/models.py waffle.Sample: # via waffle/models.py django_openid_auth.Nonce: # via django_openid_auth/models.py django_openid_auth.Association: # via django_openid_auth/models.py django_openid_auth.UserOpenID: # via django_openid_auth/models.py oauth2.Client: # via provider/oauth2/models.py oauth2.Grant: # via provider/oauth2/models.py oauth2.AccessToken: # via provider/oauth2/models.py oauth2.RefreshToken: # via provider/oauth2/models.py edx_oauth2_provider.TrustedClient: # via edx_oauth2_provider/models.py oauth2_provider.Application: # via oauth2_provider/models.py oauth2_provider.Grant: # via oauth2_provider/models.py oauth2_provider.AccessToken: # via oauth2_provider/models.py oauth2_provider.RefreshToken: # via oauth2_provider/models.py oauth_provider.Nonce: # via oauth_provider/models.py oauth_provider.Scope: # via oauth_provider/models.py oauth_provider.Scope: # via oauth_provider/models.py oauth_provider.Consumer: # via oauth_provider/models.py oauth_provider.Token: # via oauth_provider/models.py admin.LogEntry: # via django/contrib/admin/models.py splash.SplashConfig: # via splash/models.py social_django.UserSocialAuth: # via social_django/models.py social_django.Nonce: # via social_django/models.py social_django.Association: # via social_django/models.py social_django.Code: # via social_django/models.py social_django.Partial: # via social_django/models.py problem_builder.Answer: # via problem_builder/models.py problem_builder.Share: # via problem_builder/models.py submissions.StudentItem: # via submissions/models.py submissions.Submission: # via submissions/models.py submissions.Score: # via submissions/models.py submissions.ScoreSummary: # via submissions/models.py submissions.ScoreAnnotation: # via submissions/models.py assessment.Rubric: # via openassessment/assessment/models/base.py assessment.Criterion: # via openassessment/assessment/models/base.py assessment.CriterionOption: # via openassessment/assessment/models/base.py assessment.Assessment: # via openassessment/assessment/models/base.py assessment.AssessmentPart: # via openassessment/assessment/models/base.py assessment.AssessmentFeedbackOption: # via openassessment/assessment/models/peer.py assessment.AssessmentFeedback: # via openassessment/assessment/models/peer.py assessment.PeerWorkflow: # via openassessment/assessment/models/peer.py assessment.PeerWorkflowItem: # via openassessment/assessment/models/peer.py assessment.TrainingExample: # via openassessment/assessment/models/training.py assessment.StudentTrainingWorkflow: # via openassessment/assessment/models/student_training.py assessment.StudentTrainingWorkflowItem: # via openassessment/assessment/models/student_training.py assessment.StaffWorkflow: # via openassessment/assessment/models/staff.py workflow.AssessmentWorkflow: # via openassessment/workflow/models.py workflow.AssessmentWorkflowStep: # via openassessment/workflow/models.py workflow.AssessmentWorkflowCancellation: # via openassessment/workflow/models.py edxval.Profile: # via edxval/models.py edxval.Video: # via edxval/models.py edxval.CourseVideo: # via edxval/models.py edxval.EncodedVideo: # via edxval/models.py edxval.VideoImage: # via edxval/models.py edxval.VideoTranscript: # via edxval/models.py edxval.TranscriptPreference: # via edxval/models.py edxval.ThirdPartyTranscriptCredentialsState: # via edxval/models.py thumbnail.KVStore: # via sorl/thumbnail/models.py milestones.Milestone: # via milestones/models.py milestones.MilestoneRelationshipType: # via milestones/models.py milestones.CourseMilestone: # via milestones/models.py milestones.CourseContentMilestone: # via milestones/models.py milestones.UserMilestone: # via milestones/models.py celery_utils.FailedTask: # via celery_utils/models.py celery_utils.ChordData: # via celery_utils/models.py edx_proctoring.ProctoredExam: # via edx_proctoring/models.py edx_proctoring.ProctoredExamReviewPolicy: # via edx_proctoring/models.py edx_proctoring.ProctoredExamReviewPolicyHistory: # via edx_proctoring/models.py edx_proctoring.ProctoredExamStudentAttempt: # via edx_proctoring/models.py edx_proctoring.ProctoredExamStudentAttemptHistory: # via edx_proctoring/models.py edx_proctoring.ProctoredExamStudentAllowance: # via edx_proctoring/models.py edx_proctoring.ProctoredExamStudentAllowanceHistory: # via edx_proctoring/models.py edx_proctoring.ProctoredExamSoftwareSecureReview: # via edx_proctoring/models.py edx_proctoring.ProctoredExamSoftwareSecureReviewHistory: # via edx_proctoring/models.py edx_proctoring.ProctoredExamSoftwareSecureComment: # via edx_proctoring/models.py organizations.Organization: # via organizations/models.py organizations.OrganizationCourse: # via organizations/models.py enterprise.HistoricalEnterpriseCustomer: # via enterprise/models.py enterprise.EnterpriseCustomer: # via enterprise/models.py enterprise.EnterpriseCustomerUser: # via enterprise/models.py enterprise.PendingEnterpriseCustomerUser: # via enterprise/models.py enterprise.PendingEnrollment: # via enterprise/models.py enterprise.EnterpriseCustomerBrandingConfiguration: # via enterprise/models.py enterprise.EnterpriseCustomerIdentityProvider: # via enterprise/models.py enterprise.HistoricalEnterpriseCustomerEntitlement: # via enterprise/models.py enterprise.EnterpriseCustomerEntitlement: # via enterprise/models.py enterprise.HistoricalEnterpriseCourseEnrollment: # via enterprise/models.py enterprise.EnterpriseCourseEnrollment: # via enterprise/models.py enterprise.HistoricalEnterpriseCustomerCatalog: # via enterprise/models.py enterprise.EnterpriseCustomerCatalog: # via enterprise/models.py enterprise.HistoricalEnrollmentNotificationEmailTemplate: # via enterprise/models.py enterprise.EnrollmentNotificationEmailTemplate: # via enterprise/models.py enterprise.EnterpriseCustomerReportingConfiguration: # via enterprise/models.py consent.HistoricalDataSharingConsent: # via consent/models.py consent.DataSharingConsent: # via consent/models.py consent.DataSharingConsentTextOverrides: # via consent/models.py integrated_channel.LearnerDataTransmissionAudit: # via integrated_channels/integrated_channel/models.py integrated_channel.ContentMetadataItemTransmission: # via integrated_channels/integrated_channel/models.py degreed.DegreedGlobalConfiguration: # via integrated_channels/degreed/models.py degreed.HistoricalDegreedEnterpriseCustomerConfiguration: # via integrated_channels/degreed/__init__.py degreed.DegreedEnterpriseCustomerConfiguration: # via integrated_channels/degreed/models.py degreed.DegreedLearnerDataTransmissionAudit: # via integrated_channels/degreed/models.py sap_success_factors.SAPSuccessFactorsGlobalConfiguration: # via integrated_channels/sap_success_factors/models.py sap_success_factors.SAPSuccessFactorsEnterpriseCustomerConfiguration: # via integrated_channels/sap_success_factors/models.py sap_success_factors.SapSuccessFactorsLearnerDataTransmissionAudit: # via integrated_channels/sap_success_factors/models.py xapi.XAPILRSConfiguration: # via integrated_channels/xapi/models.py completion.BlockCompletion: # via completion/models.py |
Forked repos are a grey area for annotations; on one hand we may freely annotate them because we have merge rights, but on the other hand the more we deviate from upstream the harder it will be to merge back upstream changes. For this reason, we have come to the following decision:
Implementation Decision: Treat forked repositories as 3rd party. Do not annotate models in them directly, but rather use the 3rd party annotation mechanism (the safelist).
In short, the python libraries offering RST generation are very basic, and individually cannot offer even the gamut of basic features we need for annotations reports. Fortunately, RST Isn't a huge pain to see in raw (meant to be human-readable), so we should just focus on constructing raw RST for annotations reporting. See PLAT-2346 for more details.
There are several languages that may need to be searched, each with their own unique comment style and challenges. Not every repository will need every language, so our goals are to:
To allow for simple adoption, ease of development, and consistency with other edX projects we've settled on Stevedore to manage the extensions. We will use Stevedore's NamedExtensionManager to find installed plugins, and a YAML configuration section to map the desired filename extensions to the installed extensions. Extensions will inherit from an "abstract" base class that will explicitly show the necessary interface, and should only need to implement perhaps 2 methods:
validate:
to examine the passed-in file and return any annotation formatting errorssearch
: to actually search a passed-in file and return a list of found annotationsclass AnnotationExtension(object): """ Abstract base class that annotation extensions will inherit from """ def __init__(self, annotation_tokens): self.annotation_tokens = annotation_tokens def validate(self, file_handle): """ Validates that any annotations in the given file are properly formatted """ raise NotImplementedError('validate called on base class!') def search(self, file_handle): """ Does the actual annotation search for the given file """ raise NotImplementedError('search called on base class!') class PythonAnnotationExtension(AnnotationExtension): """ Annotation extension for Python source files """ def validate(self, file_handle: ... return results def search(self): ... return results |
Plugins will need to be defined as entry points in annotation_finder.searchers
namespace in setup.py or setup.cfg.
entry_points={ 'annotation_finder.searchers': [ 'python = extensions.python_extension:PythonAnnotationExtension', 'javascript = plugins.javascript_extension:JavascriptAnnotationExtension', ], }, |
This script shows how we could find, validate configuration, and execute a search on all of the plugins:
import os import yaml from stevedore import named test_config = """ annotations: pii: - ".. pii::" - ".. pii_types::": - id - name - other - ".. pii_retirement::": - retained - local_api - consumer_api - third_party nopii: ".. no_pii::" extensions: python: - py javascript: - js - jsx """ def load_failed_handler(*args, **kwargs): """ Callback for when we fail to load an extension, otherwise it fails silently """ print(args) print(kwargs) def search(ext, file_handle, file_extensions_map, filename_extension): """ Executes a search on the given file, only if it is configured for this extension """ if filename_extension not in file_extensions_map[ext.name]: print('{} does not support {}. Skipping.'.format(ext.name, filename_extension)) return (ext.name, []) return ext.name, ext.obj.search(file_handle) if __name__ == '__main__': config = yaml.load(test_config) print(config) # These are the names of all of our configured extensions configured_extension_names = config['extensions'].keys() print(configured_extension_names) # Load Stevedore extensions that we are configured for (and only those) mgr = named.NamedExtensionManager( names=configured_extension_names, namespace='annotation_finder.searchers', invoke_on_load=True, on_load_failure_callback=load_failed_handler, invoke_args=(config['annotations'],), # This is temporary ) # Output all found extension entry points (whether or not they were loaded) print(mgr.list_entry_points()) # Output all extensions that were actually able to load for extension in mgr.extensions: print(extension) # Index the results by extension name file_extensions_map = {} known_extensions = set() for extension_name in config['extensions']: file_extensions_map[extension_name] = config['extensions'][extension_name] known_extensions.update(config['extensions'][extension_name]) source_path = '/foo/bar/' # From here we could begin the actual file searching and reporting... # This is not optimized, but without the prints or doing any actual searching # runs all of edx-platform in 1.18 second. for root, dirs, files in os.walk(source_path): for filename in files: filename_extension = os.path.splitext(filename)[1][1:] if filename_extension not in known_extensions: print("{} is not a known extension, skipping.".format(filename_extension)) continue full_name = os.path.join(root, filename) print(full_name) with open(full_name, 'r') as file_handle: try: # Call get_supported_extensions on all loaded extensions results = mgr.map(search, file_handle, file_extensions_map, filename_extension) print(results) except IndexError: # Should we define a catchall in config? print("No file extension in {}, skipping.".format(full_name)) |
Configuration for the annotation tooling needs to handle the following things:
We may also want to add options for the things that are currently spec'd as command line options (in / out files and paths, safelist filename) but presumably those are simple enough to add at the top level if we choose to, and don't need exposition here.
# This section describes the known annotations annotations: # An annotation can be a single statement that stands alone nopii: ".. no_pii::" # Or it can describe a group of statements, in which case # the statements must appear in the same order as listed here pii: # A statement can be a simple value, in which case the # text that follows it will be captured - ".. pii::" # Or it can be an enum list, in which case only the values # included will be allowed. In this case a ".. pii::" # annotation must be followed immediately by a # ".. pii_types::" statement which must then be followed # immediately by a ".. pii_retirement::" statement. - ".. pii_types::": # Multiple enum values can be given on an annotation # as long as they are separated by spaces such as: # .. pii_types:: name username ip # An enum annotation must include at least on enum # value - id - name - username - password - location - phone_number - email_address - birth_date - ip - external_service - biography - gender - sex - image - video - other - ".. pii_retirement::": - retained - local_api - consumer_api - third_party # This section is for extension configuration, each # sub-section is the name of a Stevedore extension # that must be installed. Under each extension name # is a list of file extensions that it will be used # for. extensions: python: - py - py3 - pyw - rpy - pyt javascript: - js - jsx |
Reporting output from the tools should match the following format:
# Top level is a dict, keys are filenames relative to the search path { '/openedx/core/djangoapps/pii_enforcer/pii_searcher.py': # Underneath the keys are a list of annotations [ # Stand-alone annotations are formatted as follows: { 'annotation_data': 'No PII is stored here', 'annotation_token': '.. no_pii::', 'line_number': 2, 'found_by': ['python'] # These are the names of the extensions or scripts that found this annotation }, { 'annotation_data': 'We do not store PII in this model', 'annotation_token': '.. no_pii::', 'line_number': 17, 'found_by': ['python'] } ], '/openedx/core/djangoapps/user_api/legacy_urls.py': [ # Annotation groups are represented differently { 'annotation_group': 'pii', # This is the name given to the group in configuration 'annotations': [ { 'annotation_data': 'This model stores user addresses and phone numbers', 'annotation_token': '.. pii::', 'line_number': 16, 'found_by': ['python'] }, { # In cases where the annotation type is an enum, "annotation_data" becomes a list 'annotation_data': ['address', 'phone_number'], 'annotation_token': '.. pii_types::', 'line_number': 17, 'found_by': ['python'] }, { 'annotation_data': ['local_api', 'consumer_api'], 'annotation_token': '.. pii_retirement::', 'line_number': 18, 'found_by': ['python'] } ] } ] } |