Artifact Storage

Overview

This is an ongoing issue we are attempting to address. This page represents a starting point.

 

Goals

  • Stability against outages in upstream artifact stores (e.g., pypi, github, etc)
  • Have a reliable, intuitive, and consistent storage mechanism for the things that we produce and others would like to consume

Open questions:

  • What SLA are we striving for out of a given artifact store?
  • What level of support/availability do we give to community?

Requirements out of a given artifact tool

  • self-service
  • Intuitive to find given artifacts
  • audit trail/logging
  • auth
  • publicly readable?
  • Storage approach for what we produce vs what we consume does not need to be the same

Things that we consume that we would like to protect against failures with a pull-through cache

ArtifactDefault storage locationImpact of outageUsageObserved uptime/bad-events
Python librariespypidev, buildV High 
Custom wheelsS3EMR/analytics jobs can't runHigh 
npm packagesnpmdev, build, productionMed-highPartial outage 7/7/16, roughly 5 hours
rubygemsrubygems.orgcs-comments-service builds can't runLow 
debian packages (apt) can't build new AMIsHigh 
debian packages (ppas)various ppascan't build new AMIsHigh 
docker files/imagesdockerhubcan't run tests against new IDAs?? 
maven repository bits/jarsmaven centralcan't build android?? 
Github reposgithubdev, buildV High 
PEAR install bitsPEARmarketingMed 
keys (data czars)private repo/githubdata czars can't get their data (can't run analytics exports)Low 
ssh keysgithubamong others, impacts ability to run analyticsMed 

S3-stored artifacts

(excluding anything run in edx-platform)

AWS

video pipeline

analytics can't be run

V High 

 

Things we produce that we want to store (Not focusing on this right now)

ArtifactDefault storage locationImpact if it's not availableObserved uptime/bad-events
iOS applicationsS3, hockeyapp  
Android apps   
AMIs   
.box files   
custom-built wheels   
test/product-feature artifacts   

Problem we may be trying to solve

  • We want a consistent view of our artifacts
    • Packages no longer exist
      • eg. python 2.7.10
  • Higher availability than external services
  • Unified view of package management 

 

Artifacts we're calling out of scope right now

  • our own PPAs
  • docker AMIs

 

---archive---

  • iOS applications
  • Android applications
  • AMIs
    • Various IDAs + Platform
    • Build workers
  • devstack .box files
  • custom built wheels
  • test or product-feature artifacts
    • e.g., screenshots of product under test (for example, Netflix has a way to see an app under different languages, etc)

 

---------------

  • Pypi libraries
    • libraries
    • wheels
  • npm packages
  • rubygems
  • debian packages
  • docker files/images
  • maven repository bits (e.g., for android installs)
  • Github repositories used in installs
  • PEAR install bits (e.g., for our marketing repo/drupal/php)
  • keys
    • data-czars' keys
    • secrets needed for deployments
    • keys for mobile apps
  • S3-stored artifacts