As part of - PLAT-1903Getting issue details... STATUS I did some preliminary set up and testing with Artifactory against Devstack. The primary goals here were to get some hands-on experience working with Artifactory and it's docs, find any unexpected issues / work that would be required moving to Artifactory (or other package cache solution), and see what kinds of up-front benefits we might get from implementing a Python package cache.
Setup
Initially I ran the Docker image from their site, using Docker Compose and selecting the Open Source version:
https://www.jfrog.com/confluence/display/RTF/Installing+with+Docker
https://github.com/JFrogDev/artifactory-docker-examples
Sadly the OSS version did not support PyPI, I'm not sure if this is a trial limitation or if they simply don't support it in OSS at all. I then had to sign up for a free trial and install the more complicated Artifactory Pro using their Docker examples repo:
https://github.com/JFrogDev/artifactory-docker-examples/tree/master/docker-compose#artifactory-pro-and-ha
https://github.com/JFrogDev/artifactory-docker-examples
This required adding additional shared directories to Docker for Mac in the Docker settings, but otherwise worked fine according to the (pretty good) documentation. After these steps I had Artifactory running locally at http://localhost:8081/artifactory/
Configuration
After some confusion about how to access the admin vs. nginx front-end I was able to sign in as admin/password. There is a short configuration wizard. It was very easy to select PyPI and get a pass-through cache running. In order for my Devstack to use the Artifactory cache I had to add a ~/.pip/pip.conf
:
[global] trusted-host = 192.168.254.11 index-url = http://192.168.254.11:8081/artifactory/api/pypi/pypi/simple
NOTE: I had to replace localhost with my host machine IP due to connecting from another Docker container.
I hand installed some packages using pip and saw them populate the Artifactory cache.
Package pushing to Artifactory
A large part of what we're looking for is the ability to push our own packages to a cache without sending them to PyPI for various reasons (mostly they are forks specific to edX that we don't want to pollute PyPI with). I wanted to run some tests putting as many packages as possible from edx-platform's github.txt into Artifactory. I wrote a script that would clone all packages from github, checkout the given commit (if given), force setup.py to have the version number we force, build the package to Artifactory, and output a new requirements file for packages that were built. With a few hours' work I was able to get something that successfully built all but the following packages: django-celery, django-wiki, django-openid-auth, django-debug-toolbar-mongo, edx-proctoring. Mostly these seemed to be setup.py issues or problems with versions of setuptools / distutils, django-wiki has an issue with setup.py not including it's README, pystache_custom had a bad name (pystache_custom-dev). 29 other packages were cached in Artifactory. In order to build to Artifactory I had to create a ~/.pypirc
:
[distutils] index-servers = local [local] repository: http://192.168.254.11:8081/artifactory/api/pypi/pypi-local username: admin password: password
NOTE: I had to replace localhost with my host machine IP due to connecting from another Docker container.
Official docs are here: https://www.jfrog.com/confluence/display/RTF/PyPI+Repositories
Timing results
All tests were just for setup time of a Devstack Django 1.11, clean tox environment (.tox entirely removed between tests) triggered running: tox -e py27-django111 -- pytest
. Times were measured from hitting enter on the command to when the line "py27-django111 installed:
" appeared. NOTE: These tests were run from my home connection, and therefore are likely to be a fair amount slower on uncached runs due to lower bandwidth than in the office.
Using current github.txt
Default current setup: 10:33
Empty Artifactory (caching all dependencies for the first time): 11:33
Populated Artifactory (all dependencies cached that it can): 7:30
Using github.txt with our 29 non-pip packages built to Artifactory (5 still using github requirements)
Empty Artifactory (github.txt packages cached, nothing else): 6:46
Populated Artifactory: 4:47
I did not collect specific timing data, but found the expected several-minute speedup on
Conclusions
- Artifactory does what it says on the tin, was well documented and easy to set up
- Having a local Python package cache on Devstacks would save a lot of pain from tox tests, ~2-4 minutes
- Pre-building our current github dependencies to PyPI or a cache would save even more time, ~4-5 minutes
- We should look more into what it can offer for other caches (NPM, Docker, etc)
- Having a local cache is a huge win for devstack, and could allow for offline testing / use. Whether or not we use Artifactory we should look into using DevPI in Docker as a local package cache for Devstack.