Artifact Storage

Artifact Storage

Overview

This is an ongoing issue we are attempting to address. This page represents a starting point.

 

Goals

  • Stability against outages in upstream artifact stores (e.g., pypi, github, etc)

  • Have a reliable, intuitive, and consistent storage mechanism for the things that we produce and others would like to consume

Open questions:

  • What SLA are we striving for out of a given artifact store?

  • What level of support/availability do we give to community?

Requirements out of a given artifact tool

  • self-service

  • Intuitive to find given artifacts

  • audit trail/logging

  • auth

  • publicly readable?

  • Storage approach for what we produce vs what we consume does not need to be the same

Things that we consume that we would like to protect against failures with a pull-through cache

Artifact

Default storage location

Impact of outage

Usage

Observed uptime/bad-events

Artifact

Default storage location

Impact of outage

Usage

Observed uptime/bad-events

Python libraries

pypi

dev, build

V High

 

Custom wheels

S3

EMR/analytics jobs can't run

High

 

npm packages

npm

dev, build, production

Med-high

Partial outage 7/7/16, roughly 5 hours

rubygems

rubygems.org

cs-comments-service builds can't run

Low

 

debian packages (apt)

 

can't build new AMIs

High

 

debian packages (ppas)

various ppas

can't build new AMIs

High

 

docker files/images

dockerhub

can't run tests against new IDAs

??

 

maven repository bits/jars

maven central

can't build android

??

 

Github repos

github

dev, build

V High

 

PEAR install bits

PEAR

marketing

Med

 

keys (data czars)

private repo/github

data czars can't get their data (can't run analytics exports)

Low

 

ssh keys

github

among others, impacts ability to run analytics

Med

 

S3-stored artifacts

(excluding anything run in edx-platform)

AWS

video pipeline

analytics can't be run

V High

 

 

Things we produce that we want to store (Not focusing on this right now)

Artifact

Default storage location

Impact if it's not available

Observed uptime/bad-events

Artifact

Default storage location

Impact if it's not available

Observed uptime/bad-events

iOS applications

S3, hockeyapp

 

 

Android apps

 

 

 

AMIs

 

 

 

.box files

 

 

 

custom-built wheels

 

 

 

test/product-feature artifacts

 

 

 

Problem we may be trying to solve

  • We want a consistent view of our artifacts

    • Packages no longer exist

      • eg. python 2.7.10

  • Higher availability than external services

  • Unified view of package management 

 

Artifacts we're calling out of scope right now

  • our own PPAs

  • docker AMIs

 

---archive---

  • iOS applications

  • Android applications

  • AMIs

    • Various IDAs + Platform

    • Build workers

  • devstack .box files

  • custom built wheels

  • test or product-feature artifacts

    • e.g., screenshots of product under test (for example, Netflix has a way to see an app under different languages, etc)

 

---------------

  • Pypi libraries

    • libraries

    • wheels

  • npm packages

  • rubygems

  • debian packages

  • docker files/images

  • maven repository bits (e.g., for android installs)

  • Github repositories used in installs

  • PEAR install bits (e.g., for our marketing repo/drupal/php)

  • keys

    • data-czars' keys

    • secrets needed for deployments

    • keys for mobile apps

  • S3-stored artifacts