...
Annotating 3rd Party Django Models (PLAT-2344)
Model Inheritance and Mixins
...
Implementation Decision: Treat forked repositories as 3rd party. Do not annotate models in them directly, but rather use the 3rd party annotation mechanism (the safelist).
Generating RST docs (PLAT-2346)
In short, the python libraries offering RST generation are very basic, and individually cannot offer even the gamut of basic features we need for annotations reports. Fortunately, RST Isn't a huge pain to see in raw (meant to be human-readable), so we should just focus on constructing raw RST for annotations reporting. See PLAT-2346 for more details.
Extensions (PLAT-2365)
There are several languages that may need to be searched, each with their own unique comment style and challenges. Not every repository will need every language, so our goals are to:
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
import os import yaml from stevedore import named test_config = """ annotations: pii: - ".. pii::" - ".. pii_types::": - id - name - other - ".. pii_retirement::": - retained - local_api - consumer_api - third_party nopii: ".. no_pii::" extensions: python: - py javascript: - js - jsx """ def load_failed_handler(*args, **kwargs): """ Callback for when we fail to load an extension, otherwise it fails silently """ print(args) print(kwargs) def search(ext, file_handle, file_extensions_map, filename_extension): """ Executes a search on the given file, only if it is configured for this extension """ if filename_extension not in file_extensions_map[ext.name]: print('{} does not support {}. Skipping.'.format(ext.name, filename_extension)) return (ext.name, []) return ext.name, ext.obj.search(file_handle) if __name__ == '__main__': config = yaml.load(test_config) print(config) # These are the names of all of our configured extensions configured_extension_names = config['extensions'].keys() print(configured_extension_names) # Load Stevedore extensions that we are configured for (and only those) mgr = named.NamedExtensionManager( names=configured_extension_names, namespace='annotation_finder.searchers', invoke_on_load=True, on_load_failure_callback=load_failed_handler, invoke_args=(config['annotations'],), # This is temporary ) # Output all found extension entry points (whether or not they were loaded) print(mgr.list_entry_points()) # Output all extensions that were actually able to load for extension in mgr.extensions: print(extension) # Index the results by extension name file_extensions_map = {} known_extensions = set() for extension_name in config['extensions']: file_extensions_map[extension_name] = config['extensions'][extension_name] known_extensions.update(config['extensions'][extension_name]) source_path = '/foo/bar/' # From here we could begin the actual file searching and reporting... # This is not optimized, but without the prints or doing any actual searching # runs all of edx-platform in 1.18 second. for root, dirs, files in os.walk(source_path): for filename in files: filename_extension = os.path.splitext(filename)[1][1:] if filename_extension not in known_extensions: print("{} is not a known extension, skipping.".format(filename_extension)) continue full_name = os.path.join(root, filename) print(full_name) with open(full_name, 'r') as file_handle: try: # Call get_supported_extensions on all loaded extensions results = mgr.map(search, file_handle, file_extensions_map, filename_extension) print(results) except IndexError: # Should we define a catchall in config? print("No file extension in {}, skipping.".format(full_name)) |
Configuration (PLAT-2361)
Configuration for the annotation tooling needs to handle the following things:
...
Code Block | ||||||
---|---|---|---|---|---|---|
| ||||||
# This section describes the known annotations annotations: # An annotation can be a single statement that stands alone nopii: ".. no_pii::" # Or it can describe a group of statements, in which case # the statements must appear in the same order as listed here pii: # A statement can be a simple value, in which case the # text that follows it will be captured - ".. pii::" # Or it can be an enum list, in which case only the values # included will be allowed. In this case a ".. pii::" # annotation must be followed immediately by a # ".. pii_types::" statement which must then be followed # immediately by a ".. pii_retirement::" statement. - ".. pii_types::": # Multiple enum values can be given on an annotation # as long as they are separated by spaces such as: # .. pii_types:: name username ip # An enum annotation must include at least on enum # value - id - name - username - password - location - phone_number - email_address - birth_date - ip - external_service - biography - gender - sex - image - video - other - ".. pii_retirement::": - retained - local_api - consumer_api - third_party # This section is for extension configuration, each # sub-section is the name of a Stevedore extension # that must be installed. Under each extension name # is a list of file extensions that it will be used # for. extensions: python: - py - py3 - pyw - rpy - pyt javascript: - js - jsx |
Reporting Output (PLAT-2350)
Reporting output from the tools should match the following format:
Code Block | ||||
---|---|---|---|---|
| ||||
# Top level is a dict, keys are filenames relative to the search path
{
'/openedx/core/djangoapps/pii_enforcer/pii_searcher.py':
# Underneath the keys are a list of annotations
[
# Stand-alone annotations are formatted as follows:
{
'annotation_data': 'No PII is stored here',
'annotation_token': '.. no_pii::',
'line_number': 2,
'plugin_names': ['python'] # These are the names of the extensions or scripts that found this annotation
},
{
'annotation_data': 'We do not store PII in this model',
'annotation_token': '.. no_pii::',
'line_number': 17,
'plugin_name': ['python']
}
],
'/openedx/core/djangoapps/user_api/legacy_urls.py':
[
# Annotation groups are represented differently
{
'annotation_group': 'pii', # This is the name given to the group in configuration
'annotations':
[
{
'annotation_data': 'This model stores user addresses and phone numbers',
'annotation_token': '.. pii::',
'line_number': 16,
'plugin_name': ['python']
},
{
# In cases where the annotation type is an enum, "annotation_data" becomes a list
'annotation_data': ['address', 'phone_number'],
'annotation_token': '.. pii_types::',
'line_number': 17,
'plugin_name': ['python']
},
{
'annotation_data': ['local_api', 'consumer_api'],
'annotation_token': '.. pii_retirement::',
'line_number': 18,
'plugin_name': ['python']
}
]
}
]
} |