This initiative is to investigate and, if found to be useful, help implement one or more package caching solutions for (at least) edx-platform. We are looking at options like Artifactory and DevPI to help speed up Python-related builds and testing, as well as gain some potential security benefits, and solve some problems related to forks we run off of but don't want to push to PyPI. Additional gains can be made for Node packages, Docker, etc. but are not the primary focus of this investigation (yet!).
The work for this investigation and details can be found in these Jira epics:
- PLAT-1696Getting issue details... STATUS
- PLAT-1901Getting issue details... STATUS
Findings
- A write-up on which github.txt Python packages could move to a package cache, and which could be handled via other means is written up in - PLAT-1907Getting issue details... STATUS
- Notes on setting up Artifactory and some timing / test run information is here
- Notes on setting up DevPI and timing results are here
DevPI vs. Artifactory
DevPI | Artifactory | |
---|---|---|
Cost | OSS (MIT license) | OSS version seems not to support PyPI? Probably in the $50 - $100 month range based on competitor costs |
Devstack local cache? | Yes | Probably not due to licensing |
Speed | Test times were slightly faster in DevPI, probably more to do with it being on the host machine instead of Docker more than the package itself. They are likely comparable speed-wise. | |
UI | Web, did not try. | Full featured, easy to browse packages, user permissions, etc |
Command line | Full featured | Seems pretty limited to pip functionality, did not dig into it though |
Ease of Setup | Easy for a local setup | Easy for a local setup |
High Availability / Global cache | Provides single-writer, multiple-reader replication functionality, seems pretty new but probably robust enough for our use cases. Designed for geographically distributed systems, so we could place servers in different locations. | Provides localized cluster functionality. Requires Enterprise licensing. All servers need to be on the same LAN and share the same database server. |
Database | sqlite3 | MySQL, Oracle, MS SQL, PostgreSQL |
Filesystem | local, PostgreSQL (pls no), other plugins? | local (synchronizable in HA configuration), S3, NFS |