Asset Compilation Audit 2017-11-01
This page describes the edx-platform static asset compilation as it exists today. A list of items being actively worked on (based on these findings) is at edx-platform Static Asset Work Log.
Production Mode
To get this mode, you have to make the following changes to devstack:
- Comment out the overrides in the PIPELINE section of
lms/envs/devstack.py
andcms/envs/devstack.py
- Set the following values in
/edx/app/edxapp/lms.env.json
and/edx/app/edxapp/cms.env.json
"COMPREHENSIVE_THEME_DIRS": ["/edx/app/edxapp/edx-platform/themes"]
"ENABLE_COMPREHENSIVE_THEMING": true
I'm examining this first because dev mode is a subset of what happens here.
What Happens
Command: paver update_assets --settings=devstack_docker
pavelib/assets.py::update_assets
executed. This does not incur Django startup costs.update_assets
callspavelib/prereqs.py::install_node_prereqs
to ensure npm dependencies are up to date.- This is < 1s when dependencies are up to date.
- Insert: How much when we do have to install from scratch?
- (4s) Exec of "
xmodule_assets common/static/xmodule
" –xmodule_assets
is an entrypoint defined in xmodule'ssetup.py
and points to/common/lib/xmodule/xmodule/static_content.py::main
. The output dir iscommon/static/xmodule.
xmodule_assets
inspects all XModules and XModuleDescriptors and looks for JS (JS + CoffeeScript) and and CSS (CSS + SCSS) declared in class attributes. It does this in the following order:- XModuleDescriptor JavaScript
- XModuleDescriptor CSS
- XModule JavaScript
- XModule CSS
- It then generates
_module-styles.scss
for both thedescriptors/css
andmodules/css
dirs. This is the file that imports all the copied SCSS files. It's prepended with imports forbourbon/bourbon
andlms/theme/variables
(theming support).
- Important notes:
- The end goal of XModule asset compilation is to generate one big bundled file for each of the directories (modules/js, modules/css, descriptors/js, descriptors/css).
- However, the
xmodule_assets
command does not compile Sass, CoffeeScript, or do the bundling. That happens later. This script just extracts the files from XModule-specified locations and puts them incommon/static/xmodule.
- Despite the naming, the source files include CoffeeScript as well as Sass partials.
- The files are ordered (they're declared in a list). They're given a prefix based on that ordering.
- The files are renamed with md5 hashes, in an effort to de-dupe shared dependencies.
- Random note: This is the one script in edx-platform that uses docopt for CLI parsing.
- (< 1s) Copy NPM installed vendor assets to
common/static/common/js/vendor
- (5s) CoffeeScript compilation:
node_modules/.bin/coffee --compile `find /edx/app/edxapp/edx-platform/lms /edx/app/edxapp/edx-platform/cms /edx/app/edxapp/edx-platform/common -type f -name "*.coffee"`
- This is what actually compiles any XModule CoffeeScript from the fragments generated by
xmodule_assets
.
- (36s) Webpack configuration
- Webpack needs to grab certain settings from LMS and Studio such as the
STATIC_ROOT
directory. This is particularly aggravating for Studio because that value is determined by the current git hash (e.g./edx/var/edxapp/staticfiles/b57f144724
). - The python script therefore makes three separate calls to the
print_setting
management command to grab this information:python manage.py lms --settings=devstack_docker print_setting STATIC_ROOT 2>/dev/null
python manage.py cms --settings=devstack_docker print_setting STATIC_ROOT 2>/dev/null
python manage.py lms --settings=devstack_docker print_setting WEBPACK_CONFIG_PATH 2>/dev/null
- Even though this is basically grabbing three config values, it takes roughly 36 seconds to run on my devstack because of high edx-platform startup costs.
- Webpack needs to grab certain settings from LMS and Studio such as the
- (38s) Webpack execution
NODE_ENV=development STATIC_ROOT_LMS=/edx/var/edxapp/staticfiles STATIC_ROOT_CMS=/edx/var/edxapp/staticfiles/b57f144724 $(npm bin)/webpack --config=webpack.prod.config.js
- Output files go in
common/static/bundles
- Transpiles JS, creates optimized versions with hashes in the filenames, and mapping files for debugging.
- Also seems to create hash-file named woff2, eot, and svg files – these are font-awesome fonts being used by the Studio front end.
- The time here is spent in CPU processing JS files. I'm not clear on where the bottleneck is within this processing, though I suspect it's in JS minification (based on earlier profiling results). Needs more investigation.
- This is where all new features should be developed, so the importance of this part of the execution will only grow.
- (3m 3s) Sass Compilation
Commands:
python manage.py lms --settings=devstack_docker compile_sass lms
python manage.py cms --settings=devstack_docker compile_sass cms
- A lot of the work here is replicated – there's a lot of overlap between LMS and Studio CSS, so we're overwriting a lot of files with the same values.
- The individual themes being compiled are independent, so could be parallelized.
- LMS themes are more than 2X as expensive as Studio themes to compile.
- The sass compilation is initiated in Python, using libsass.
- (8m 5s) Django collectstatic
- Commands:
python manage.py lms --settings=devstack_docker collectstatic --noinput > logs/lms-collectstatic.log
python manage.py cms --settings=devstack_docker collectstatic --noinput > logs/studio-collectstatic.log
This is the main place where JavaScript and CSS are bundled together in our current system, according to config in
lms/envs/common.py
andcms/envs/common.py
– the XModule fragments compiled at step #3 and partly processed in step #5 gets stitched together here.- We define a couple of custom
STATICFILES_FINDERS
in our config file so that we can find files that are in themes or need to be detected via XBlock entrypoints. - Most of these mappings are completely static however, and we should be able to port these over to webpack once we sort out any dependency issues.
- The majority of the time here is spent in optimizing the JS. Running collectstatic to copy files without optimizations enabled takes around 50 seconds. The other 7+ minutes is spent in post processing.
- This is potentially a place where we could see significant gains:
- Many large vendor JS files are being needlessly post-processed each time, despite never actually changing.
- Many JavaScript assets are replicated across the different themes, despite being identical.
- This is potentially a place where we could see significant gains:
- Commands:
Files Produced
444M of files are output to the STATIC_ROOT_LMS (
/edx/var/edxapp/staticfiles)
Note that for all assets we output to this directory, we have both the original asset as well as the md5-hash-named copy.
LMS Files
Original Size | Directory | Overview | Current Size | Changes |
---|---|---|---|---|
137M | /xmodule_js | This is the most confusing one because it contains within it outright copies of many of the top level directories. Most of this is | ||
63M | /js | The biggest items here are vendor files (25M) such as tinymce, pdfjs, ova (Open Video Annotation), and CodeMirror. After that, 11M is CapaModule related JavaScript, most prominently 8M of We have a number of bundled application-specific files that weigh hundreds of KB (e.g. the 421K | ||
31M | /xblock | These contain static assets that are copied over from XBlocks (XBlocks can specify their static assets in their setup.py). These files are also namespaced by XBlock tag name. This is problematic for the The largest individual block is | ||
20M | /vendor | Despite the name, the only thing in here is static assets for edx-jsme, which provides the molecule editor for Capa. This tool is no longer supported, but has not been removed. Our actual vendor files are sprinkled everywhere in the source tree, usually multiple times at slightly different versions. | ||
20M | /css | More than half of this are the large bundled CSS generated by our v1 sass files (~800K each), and the smaller per-app CSS generated by our v2 pattern lib sass (~160K each). We also have about 7M worth of vendor CSS, the most notable of which is for pdfjs (4M). There's also the somewhat mysterious 1.3M css/vendor/fonts binary file (not directory), which appears to be an accidental check-in of someone's OS X alias (with a terrifying amount of metadata). | ||
17M | /standord-style | These are theme directories. 12M of this is CSS. Both lms-course.css and lms-main-v1.css are over 700K, and assorted v2 pattern lib CSS files weigh between 160K and 216K each. There's also a 4X multiplier at work – each file has an RTL translated version and md5-hash named copies. The other 5MB is JavaScript. Our bundled JS files are copied into each theme. Note: These directories should really be namespaced so that they are built to places like | ||
17M | /red-theme | |||
17M | /edx.org | |||
17M | /edge.edx.org | |||
17M | /dark-theme | |||
16M | /common | Again, vendor files dominate here, such as common/js/vendor/sinon.js at 2.1M. 13M of this is the common/js/vendor directory. Over 700K comes from spec files and helpers. | ||
15M | /open-edx | This is actually a theme directory, but one that has 9.7M of CSS instead of the 12M other themes here have. I'm not completely sure on the root cause for this, but it appears that the open-edx theme is not compiling in Bootstrap, meaning both that there is no open-edx/css/bootstrap directory, and the open-edx/css/discussion is smaller. | ||
14M | /bundles | Source map files are the largest items here, with commons.js.map topping out at 892K. Currency.js is dominated by a third party dependency (which-country requires point-in-polygon, which is > 500K). It's also worth noting that JS files here are getting post-processed into hash-names twice – once by webpack, and once by Django. | ||
9.7M | /images | About half of this are two hilariously large copies of a placeholder image that could be replaced with something far smaller. The other half are images from vsepr, which part of chemtools, a capa problem type. | ||
4.8M | /templates | This is actually a little wacky, because these are mostly Django and Mako templates that shouldn't be compiled out at all, much less post-processed as publicly accessible static assets. However, a small portion of these are underscore templates. | ||
4.5M | /edx-pattern-library | This is basically all fonts, both font-awesome and OpenSans. This is a different font-awesome font from the one that's named by just its hash in bundles . | ||
4.0M | /rest_framework | Mostly documentation for the Django REST framework, and the fonts and JS needed to make that work. It has yet another copy of fontawesome fonts, in four different formats. | ||
3.7M | /data | This is GeoIP data used for embargo code. We also post-process an MD5 hashed version of this, despite the fact that it's only used by Python code. | ||
3.5M | /sass | 608K of this is bourbon, but most of this is of our own making. Like templates , this doesn't seem to be something that belongs in the post-processed asset bundle. | ||
3.4M | /fonts | Yet another copy of Open Sans and FontAwesome fonts, with a tiny Creative Commons font as well. | ||
2.4M | /xmodule | The assets compiled out by xmodule_assets (step 3 in the first part of this wiki doc). Most of this is JavaScript, with the circuit simulator being the largest individual contributor at around 400K. | ||
2.3M | /flags | (I'm punting investigating these until later, since they're really on the long tail right now.) | ||
2.2M | /certificates | |||
1.8M | /teams | |||
1.8M | /learner_profile | |||
1.6M | /wiki | |||
1.4M | /admin | |||
872K | /applets | |||
768K | /support | |||
756K | /proctoring | |||
676K | /course_bookmarks | |||
580K | /course_experience | |||
496K | /coffee | |||
400K | /discussion | |||
388K | /audio | |||
280K | /course_search | |||
240K | /edx-ui-toolkit | |||
216K | /lms | |||
208K | /debug_toolbar | |||
164K | /enterprise | |||
164K | /django_extensions | |||
68K | /mptt | |||
20K | /text | |||
12K | /djcelery |