Artifactory local setup investigation

As part of PLAT-1903 - Getting issue details... STATUS I did some preliminary set up and testing with Artifactory against Devstack. The primary goals here were to get some hands-on experience working with Artifactory and it's docs, find any unexpected issues / work that would be required moving to Artifactory (or other package cache solution), and see what kinds of up-front benefits we might get from implementing a Python package cache.

Setup

Initially I ran the Docker image from their site, using Docker Compose and selecting the Open Source version:

https://www.jfrog.com/confluence/display/RTF/Installing+with+Docker
https://github.com/JFrogDev/artifactory-docker-examples

Sadly the OSS version did not support PyPI, I'm not sure if this is a trial limitation or if they simply don't support it in OSS at all. I then had to sign up for a free trial and install the more complicated Artifactory Pro using their Docker examples repo:

https://github.com/JFrogDev/artifactory-docker-examples/tree/master/docker-compose#artifactory-pro-and-ha
https://github.com/JFrogDev/artifactory-docker-examples

This required adding additional shared directories to Docker for Mac in the Docker settings, but otherwise worked fine according to the (pretty good) documentation. After these steps I had Artifactory running locally at http://localhost:8081/artifactory/

Configuration

After some confusion about how to access the admin vs. nginx front-end I was able to sign in as admin/password. There is a short configuration wizard. It was very easy to select PyPI and get a pass-through cache running. In order for my Devstack to use the Artifactory cache I had to add a ~/.pip/pip.conf:

[global]
trusted-host = 192.168.254.11
index-url = http://192.168.254.11:8081/artifactory/api/pypi/pypi/simple

NOTE: I had to replace localhost with my host machine IP due to connecting from another Docker container.

I hand installed some packages using pip and saw them populate the Artifactory cache.

Package pushing to Artifactory

A large part of what we're looking for is the ability to push our own packages to a cache without sending them to PyPI for various reasons (mostly they are forks specific to edX that we don't want to pollute PyPI with). I wanted to run some tests putting as many packages as possible from edx-platform's github.txt into Artifactory. I wrote a script that would clone all packages from github, checkout the given commit (if given), force setup.py to have the version number we force, build the package to Artifactory, and output a new requirements file for packages that were built. With a few hours' work I was able to get something that successfully built all but the following packages: django-celery, django-wiki, django-openid-auth, django-debug-toolbar-mongo, edx-proctoring. Mostly these seemed to be setup.py issues or problems with versions of setuptools / distutils, django-wiki has an issue with setup.py not including it's README, pystache_custom had a bad name (pystache_custom-dev). 29 other packages were cached in Artifactory. In order to build to Artifactory I had to create a ~/.pypirc:

[distutils]
index-servers =
    local

[local]
repository: http://192.168.254.11:8081/artifactory/api/pypi/pypi-local
username: admin
password: password

NOTE: I had to replace localhost with my host machine IP due to connecting from another Docker container.

Official docs are here: https://www.jfrog.com/confluence/display/RTF/PyPI+Repositories

Timing results

All tests were just for setup time of a Devstack Django 1.11, clean tox environment (.tox entirely removed between tests) triggered running: tox -e py27-django111 -- pytest . Times were measured from hitting enter on the command to when the line "py27-django111 installed:" appeared. NOTE: These tests were run from my home connection, and therefore are likely to be a fair amount slower on uncached runs due to lower bandwidth than in the office.

Using current github.txt

Default current setup: 10:33
Empty Artifactory (caching all dependencies for the first time): 11:33
Populated Artifactory (all dependencies cached that it can): 7:30

Using github.txt with our 29 non-pip packages built to Artifactory (5 still using github requirements)

Empty Artifactory (github.txt packages cached, nothing else): 6:46
Populated Artifactory: 4:47

Conclusions

Artifactory does what it says on the tin, was well documented and easy to set up
Having a local Python package cache on Devstacks would save a lot of pain from tox tests, ~2-4 minutes
Pre-building our current github dependencies to PyPI or a cache would save even more time, ~4-5 minutes
We should look more into what it can offer for other caches (NPM, Docker, etc)
Having a local cache is a huge win for devstack, and could allow for offline testing / use. Whether or not we use Artifactory we should look into using DevPI in Docker as a local package cache for Devstack.
It is worth noting that in our other investigations Julia Eskew (Deactivated) found that we could publish several of our github.txt packages to PyPI or merge back our forks. Doing those things would greatly lower the impact of using a Python package cache. Details here: PLAT-1907 - Getting issue details... STATUS

Platform