Artifact Storage
Overview
This is an ongoing issue we are attempting to address. This page represents a starting point.
Goals
- Stability against outages in upstream artifact stores (e.g., pypi, github, etc)
- Have a reliable, intuitive, and consistent storage mechanism for the things that we produce and others would like to consume
Open questions:
- What SLA are we striving for out of a given artifact store?
- What level of support/availability do we give to community?
Requirements out of a given artifact tool
- self-service
- Intuitive to find given artifacts
- audit trail/logging
- auth
- publicly readable?
- Storage approach for what we produce vs what we consume does not need to be the same
Things that we consume that we would like to protect against failures with a pull-through cache
Artifact | Default storage location | Impact of outage | Usage | Observed uptime/bad-events |
---|---|---|---|---|
Python libraries | pypi | dev, build | V High | |
Custom wheels | S3 | EMR/analytics jobs can't run | High | |
npm packages | npm | dev, build, production | Med-high | Partial outage 7/7/16, roughly 5 hours |
rubygems | rubygems.org | cs-comments-service builds can't run | Low | |
debian packages (apt) | can't build new AMIs | High | ||
debian packages (ppas) | various ppas | can't build new AMIs | High | |
docker files/images | dockerhub | can't run tests against new IDAs | ?? | |
maven repository bits/jars | maven central | can't build android | ?? | |
Github repos | github | dev, build | V High | |
PEAR install bits | PEAR | marketing | Med | |
keys (data czars) | private repo/github | data czars can't get their data (can't run analytics exports) | Low | |
ssh keys | github | among others, impacts ability to run analytics | Med | |
S3-stored artifacts (excluding anything run in edx-platform) | AWS | video pipeline analytics can't be run | V High |
Things we produce that we want to store (Not focusing on this right now)
Artifact | Default storage location | Impact if it's not available | Observed uptime/bad-events |
---|---|---|---|
iOS applications | S3, hockeyapp | ||
Android apps | |||
AMIs | |||
.box files | |||
custom-built wheels | |||
test/product-feature artifacts |
Problem we may be trying to solve
- We want a consistent view of our artifacts
- Packages no longer exist
- eg. python 2.7.10
- Packages no longer exist
- Higher availability than external services
- Unified view of package management
Artifacts we're calling out of scope right now
- our own PPAs
- docker AMIs
---archive---
- iOS applications
- Android applications
- AMIs
- Various IDAs + Platform
- Build workers
- devstack .box files
- custom built wheels
- test or product-feature artifacts
- e.g., screenshots of product under test (for example, Netflix has a way to see an app under different languages, etc)
---------------
- Pypi libraries
- libraries
- wheels
- npm packages
- rubygems
- debian packages
- docker files/images
- maven repository bits (e.g., for android installs)
- Github repositories used in installs
- PEAR install bits (e.g., for our marketing repo/drupal/php)
- keys
- data-czars' keys
- secrets needed for deployments
- keys for mobile apps
- S3-stored artifacts