Asset Compilation Audit 2017-11-01

This page describes the edx-platform static asset compilation as it exists today. A list of items being actively worked on (based on these findings) is at edx-platform Static Asset Work Log.

Production Mode

To get this mode, you have to make the following changes to devstack:

  • Comment out the overrides in the PIPELINE section of lms/envs/devstack.py and cms/envs/devstack.py
  • Set the following values in /edx/app/edxapp/lms.env.json and /edx/app/edxapp/cms.env.json
    • "COMPREHENSIVE_THEME_DIRS": ["/edx/app/edxapp/edx-platform/themes"]
    • "ENABLE_COMPREHENSIVE_THEMING": true

I'm examining this first because dev mode is a subset of what happens here.

What Happens

Command: paver update_assets --settings=devstack_docker

  1. pavelib/assets.py::update_assets executed. This does not incur Django startup costs.
  2. update_assets calls pavelib/prereqs.py::install_node_prereqs to ensure npm dependencies are up to date.
    1. This is < 1s when dependencies are up to date.
    2. Insert: How much when we do have to install from scratch?
  3. (4s) Exec of "xmodule_assets common/static/xmodule" – xmodule_assets is an entrypoint defined in xmodule's setup.py and points to /common/lib/xmodule/xmodule/static_content.py::main. The output dir is common/static/xmodule.
    1. xmodule_assets inspects all XModules and XModuleDescriptors and looks for JS (JS + CoffeeScript) and and CSS (CSS + SCSS) declared in class attributes. It does this in the following order:
      1. XModuleDescriptor JavaScript
      2. XModuleDescriptor CSS
      3. XModule JavaScript
      4. XModule CSS
      5. It then generates _module-styles.scss for both the descriptors/css and modules/css dirs. This is the file that imports all the copied SCSS files. It's prepended with imports for bourbon/bourbon and lms/theme/variables (theming support).
    2. Important notes:
      1. The end goal of XModule asset compilation is to generate one big bundled file for each of the directories (modules/js, modules/css, descriptors/js, descriptors/css).
      2. However, the xmodule_assets command does not compile Sass, CoffeeScript, or do the bundling. That happens later. This script just extracts the files from XModule-specified locations and puts them in common/static/xmodule.
      3. Despite the naming, the source files include CoffeeScript as well as Sass partials. 
      4. The files are ordered (they're declared in a list). They're given a prefix based on that ordering.
      5. The files are renamed with md5 hashes, in an effort to de-dupe shared dependencies.
      6. Random note: This is the one script in edx-platform that uses docopt for CLI parsing.
  4. (< 1s) Copy NPM installed vendor assets to common/static/common/js/vendor
  5. (5s) CoffeeScript compilation:
    1. node_modules/.bin/coffee --compile `find /edx/app/edxapp/edx-platform/lms /edx/app/edxapp/edx-platform/cms /edx/app/edxapp/edx-platform/common -type f -name "*.coffee"`
    2. This is what actually compiles any XModule CoffeeScript from the fragments generated by xmodule_assets.
  6. (36s) Webpack configuration
    1. Webpack needs to grab certain settings from LMS and Studio such as the STATIC_ROOT directory. This is particularly aggravating for Studio because that value is determined by the current git hash (e.g. /edx/var/edxapp/staticfiles/b57f144724).
    2. The python script therefore makes three separate calls to the print_setting management command to grab this information:
      1. python manage.py lms --settings=devstack_docker print_setting STATIC_ROOT 2>/dev/null
      2. python manage.py cms --settings=devstack_docker print_setting STATIC_ROOT 2>/dev/null
      3. python manage.py lms --settings=devstack_docker print_setting WEBPACK_CONFIG_PATH 2>/dev/null
    3. Even though this is basically grabbing three config values, it takes roughly 36 seconds to run on my devstack because of high edx-platform startup costs.
  7. (38s) Webpack execution
    1. NODE_ENV=development STATIC_ROOT_LMS=/edx/var/edxapp/staticfiles STATIC_ROOT_CMS=/edx/var/edxapp/staticfiles/b57f144724 $(npm bin)/webpack --config=webpack.prod.config.js
    2. Output files go in common/static/bundles
    3. Transpiles JS, creates optimized versions with hashes in the filenames, and mapping files for debugging.
    4. Also seems to create hash-file named woff2, eot, and svg files – these are font-awesome fonts being used by the Studio front end.
    5. The time here is spent in CPU processing JS files. I'm not clear on where the bottleneck is within this processing, though I suspect it's in JS minification (based on earlier profiling results). Needs more investigation.
    6. This is where all new features should be developed, so the importance of this part of the execution will only grow.
  8. (3m 3s) Sass Compilation
    1. Commands:
      1. python manage.py lms --settings=devstack_docker compile_sass lms
      2. python manage.py cms --settings=devstack_docker compile_sass cms
    2. A lot of the work here is replicated – there's a lot of overlap between LMS and Studio CSS, so we're overwriting a lot of files with the same values.
    3. The individual themes being compiled are independent, so could be parallelized.
    4. LMS themes are more than 2X as expensive as Studio themes to compile.
    5. The sass compilation is initiated in Python, using libsass.
  9. (8m 5s) Django collectstatic
    1. Commands:
      1. python manage.py lms --settings=devstack_docker collectstatic --noinput > logs/lms-collectstatic.log
      2. python manage.py cms --settings=devstack_docker collectstatic --noinput > logs/studio-collectstatic.log
    2. This is the main place where JavaScript and CSS are bundled together in our current system, according to config in lms/envs/common.py and cms/envs/common.py – the XModule fragments compiled at step #3 and partly processed in step #5 gets stitched together here.

    3. We define a couple of custom STATICFILES_FINDERS in our config file so that we can find files that are in themes or need to be detected via XBlock entrypoints.
    4. Most of these mappings are completely static however, and we should be able to port these over to webpack once we sort out any dependency issues.
    5. The majority of the time here is spent in optimizing the JS. Running collectstatic to copy files without optimizations enabled takes around 50 seconds. The other 7+ minutes is spent in post processing.
      1. This is potentially a place where we could see significant gains:
        1. Many large vendor JS files are being needlessly post-processed each time, despite never actually changing.
        2. Many JavaScript assets are replicated across the different themes, despite being identical.

Files Produced

444M of files are output to the STATIC_ROOT_LMS (/edx/var/edxapp/staticfiles)

Note that for all assets we output to this directory, we have both the original asset as well as the md5-hash-named copy.

LMS Files

Original SizeDirectoryOverviewCurrent SizeChanges
137M
/xmodule_js

This is the most confusing one because it contains within it outright copies of many of the top level directories. xmodule_js/common_static appears to be copied over to the root static dir with folders like fontsimages, etc. That being said, xmodule_js/common_static/js is just a subset of js, so it looks like the contents of xmodule_js/common_static are copied first, and then more things are added on top of that.

Most of this is xmodule_js/common_static at 96M, but there's also 35M here for xmodule_js/fixtures/hls (test video files we made). Talk to Greg Martin (Deactivated) about how we might address the video sizes next week.



63M
/js

The biggest items here are vendor files (25M) such as tinymce, pdfjs, ova (Open Video Annotation), and CodeMirror. After that, 11M is CapaModule related JavaScript, most prominently 8M of jsme and the closely related jsmolcalc (see additional notes about jsme in the row for /vendor).

We have a number of bundled application-specific files that weigh hundreds of KB (e.g. the 421K discovery_factory.js), each of which has its own copy of moment and moment.js and locale information.



31M
/xblock

These contain static assets that are copied over from XBlocks (XBlocks can specify their static assets in their setup.py). These files are also namespaced by XBlock tag name. This is problematic for the problem_builder family of XBlocks because the dozen or so separate XBlocks from that family all have copies of the same 1.2M of static assets. We could address this by allowing XBlocks to specify their own static resource namespace, or defaulting the namespace to something tied to the package setup file rather than to each tag name. This kind of duplication happens to a lesser extent with school_yourself and google_drive.

The largest individual block is edx_sga, mostly because it has its own copy of sinon.js (a 1.8M test framework), and copies of other third party dependencies such as JQuery, URIjs, underscore, and requirejs. Only about 84K out of the 3.7M of JavaScript for that XBlock represents non-redundant code specific to SGA. This is mostly because the XBlock framework doesn't provide a good way to specify or manage shared dependecies.



20M
/vendor
Despite the name, the only thing in here is static assets for edx-jsme, which provides the molecule editor for Capa. This tool is no longer supported, but has not been removed. Our actual vendor files are sprinkled everywhere in the source tree, usually multiple times at slightly different versions.

20M
/css
More than half of this are the large bundled CSS generated by our v1 sass files (~800K each), and the smaller per-app CSS generated by our v2 pattern lib sass (~160K each). We also have about 7M worth of vendor CSS, the most notable of which is for pdfjs (4M). There's also the somewhat mysterious 1.3M css/vendor/fonts binary file (not directory), which appears to be an accidental check-in of someone's OS X alias (with a terrifying amount of metadata).

17M
/standord-style

These are theme directories. 12M of this is CSS. Both lms-course.css and lms-main-v1.css are over 700K, and assorted v2 pattern lib CSS files weigh between 160K and 216K each. There's also a 4X multiplier at work – each file has an RTL translated version and md5-hash named copies.

The other 5MB is JavaScript. Our bundled JS files are copied into each theme.

Note: These directories should really be namespaced so that they are built to places like /themes/edx.org instead of the top-level name of /edx.org.



17M
/red-theme


17M
/edx.org


17M
/edge.edx.org


17M
/dark-theme


16M
/common
Again, vendor files dominate here, such as common/js/vendor/sinon.js at 2.1M. 13M of this is the common/js/vendor directory. Over 700K comes from spec files and helpers.

15M
/open-edx
This is actually a theme directory, but one that has 9.7M of CSS instead of the 12M other themes here have. I'm not completely sure on the root cause for this, but it appears that the open-edx theme is not compiling in Bootstrap, meaning both that there is no open-edx/css/bootstrap directory, and the open-edx/css/discussion is smaller.

14M
/bundles
Source map files are the largest items here, with commons.js.map topping out at 892K. Currency.js is dominated by a third party dependency (which-country requires point-in-polygon, which is > 500K). It's also worth noting that JS files here are getting post-processed into hash-names twice – once by webpack, and once by Django.

9.7M
/images

About half of this are two hilariously large copies of a placeholder image that could be replaced with something far smaller.

The other half are images from vsepr, which part of chemtools, a capa problem type.



4.8M
/templates
This is actually a little wacky, because these are mostly Django and Mako templates that shouldn't be compiled out at all, much less post-processed as publicly accessible static assets. However, a small portion of these are underscore templates.

4.5M
/edx-pattern-library
This is basically all fonts, both font-awesome and OpenSans. This is a different font-awesome font from the one that's named by just its hash in bundles.

4.0M
/rest_framework
Mostly documentation for the Django REST framework, and the fonts and JS needed to make that work. It has yet another copy of fontawesome fonts, in four different formats.

3.7M
/data
This is GeoIP data used for embargo code. We also post-process an MD5 hashed version of this, despite the fact that it's only used by Python code.

3.5M
/sass
608K of this is bourbon, but most of this is of our own making. Like templates, this doesn't seem to be something that belongs in the post-processed asset bundle.

3.4M
/fonts
Yet another copy of Open Sans and FontAwesome fonts, with a tiny Creative Commons font as well.

2.4M
/xmodule
The assets compiled out by xmodule_assets (step 3 in the first part of this wiki doc). Most of this is JavaScript, with the circuit simulator being the largest individual contributor at around 400K.

2.3M
/flags







(I'm punting investigating these until later, since they're really on the long tail right now.)





















2.2M
/certificates


1.8M
/teams


1.8M
/learner_profile


1.6M
/wiki


1.4M
/admin


872K
/applets


768K
/support


756K
/proctoring


676K
/course_bookmarks


580K
/course_experience


496K
/coffee


400K
/discussion


388K
/audio


280K
/course_search


240K
/edx-ui-toolkit


216K
/lms


208K
/debug_toolbar


164K
/enterprise


164K
/django_extensions


68K
/mptt



20K
/text



12K
/djcelery