Braindump on Configuration: Today and Future
Meeting Info
Date: Dec 4, 2020
Recording: https://discuss.openedx.org/t/lets-talk-about-the-native-installation/3269/14
Presenter: @Cory Lee (Deactivated)
Facilitator: @Former user (Deleted)
Notes
Rationale
Open edX Architecture moving from a centralized model to a decentralized model.
Similarly for the edX SRE team
Many of our teams own backend and frontend components, within and outside of the monolith
Each may want to do things differently - their pace, their testing, their definition of done, etc.
Hence, our centralized Configuration repo is problematic and bottlenecked
Changes to repo
Need to work with last 2 old releases, multiple OSes, multiple environments (sandboxes, etc).
Only the latest community release is supported, but some internal edX uses use older releases.
In the past, have taken OSPR changes but haven’t tested ourselves, but has added to our maintenance burden
Also realized that any change in configuration needed a long deprecation process
Configuration state in production
Many teams just couldn’t figure out what the state of a configuration was in production or in other environments
It was difficult to parse this from the code since the settings were tangled in Ansible files, YAML files, etc, etc.
So… moved to detangling the settings (De-DRY)
Current State
Moved to using a single YAML file for each App
Asym-Crypto-YAML: https://github.com/edx/asym-crypto-yaml
Inline encryption of our YAML - so secrets can be encrypted inline
Allows devs to see the secrets inlined, but protected encrypted
Remote Config
Hermes: https://github.com/edx/hermes
Intended to speed up config deployments
Whenever its ETAG changes, it runs a python file
Config files are completely separate from our IDA deployments
downside today: can’t tell today exactly when the config is updated; could be fixed with further tooling at some point
Have been doing this for about a year perhaps
Made refactoring configuration a lot easier
Config Flow changes today
Configuration repo runs once in the “beginning”
We run Ansible to create the initial AMI
After that, we no longer use the configuration repo
Dev
declares the Python configuration setting in the App
set the values in remote config
only set in configuration if the setting is needed in NGinx
Remote config changes are merged
CI verifies decryption of secrets
CI verifies formatting
Hermes automatically deploys the config change
Future with Containers
Background
Remote config today
Remote config is Hermes with a few lines of python code
Hermes is currently configured with Ansible code
edX Secrets today
Stored in Vault
Using OS tech today from Hashicorp
Moving to decentralized model
Defaults are in the app
Dockerfile in each repo
Open edX version of the application, without any edX-specific
Example YAML config file as well
Dockerfile buildable by running
docker build .
→ without any additional params, etc.Just: Build the image and mount the YAML
Clearer delineation between Open edX and edX.org
So edX.org can run at a different pace and a different scale
edX.org: Docker images → using Helm → k8s
Would continue to support Blue/Green Deployments
Helmcharts are currently edX-specific in order to provide clearer directions to edX engineers
Could provide Open edX versions of HelmCharts and k8s Manifest as examples, if they will be useful
Private requirements for edX.org → would like to move this out of the public repos
So edX.org can dogfood this design pattern
You may notice new files with
-newrelic
tags - these are the start of these edX-specific files
Discussion
OEP-45 follow-up
NGinx configuration - what are the plans for this?
Would this be with sample Helm charts?
edX would be using
nginx ingress
Possible Options
A: Provide an example chart
B: Provide an open chart that edX also consumes
Myth: Entire configuration for an IDA can be stored in a given repo
Example: Authentication is coupled across repos
operation was in the Ansible playbooks
Note: decentralized devstack has a similar issue
In addition to Helmcharts, you are also using Consul Templates to pull in secrets from Vault?
Yes, those go into as values in the Helmcharts.
Timeline for deprecation?
Let’s discuss this together and work together on accelerating and converging on this effort
Currently evaluating
Tutor
as a community-supported deployment mechanismUsual questions: Testing, Devstack, Docker files, etc.
It seems similar problems underneath, no?
For example, should Tutor be the next Devstack?
Tutor - make it easy to install
Devstack is trying to address the same issue
Currently, not a hard blocker since we are pushing images in our current Devstack.
edX Architecture team is the newly designated “owner” for Devstack and plans to look into addressing short-term issues as well as long-term plans. Nim will ask the team to watch the recording from today and follow-up with this group in order to converge paths.
Cory’s speaking notes (rough transcript):
*******************
10m edx.org context
*******************
As most of you already know, edx is over the past
few years been transitioning
from a monolithic architecture to a
microservices architecture
with separate micro frontends.
Something you may or may not know. We have also been transitioning
internally from a
centralized devops team where most of the infrastructure work was
done by a core team as a service to
other teams, to a decentralized model where teams own their own
applications and infrastructure
as much as possible and a core team of SREs facilitates this
ownership.
While the centralized model worked well for us early on
As the number of teams and projects grew it became overwhelming,
as the devops team was not
frequently the closest people to the problem with the most context.
We found that cross team dependencies were a frequent source of
friction with the centralized model
as team A would need to inject work into the devops backlog,
wait for it to be completed before they could
proceed, and it was difficult to plan for this type of work
in advance because it wasn't always
apparent at the outset if you would need devops assistance
or not.
In order to minimize these dependencies the (now) SRE
team is tasked with enabling
each team to own their
applications from top to bottom,
this means, frontends, backends, databases,
alerting, passwords. everything.
This is somewhat aspirational at the moment,
but it is the guiding star we are driving towards.
Internally at edx, many of our teams own a frontend or two,
and django backend application of two, and some components
of the edx-platform monolith. They work in these repos
most days and are very familiar with them, but edx
has something like 450 repos in our github so it is
impossible for anyone to REALLY know how everything works.
As you might imagine these teams now mostly decoupled
sometimes want to do very different things with their
deployment processes, testing, automations,
use different technologies, etc.
This is where our centralized configuration repo
becomes problematic. Since all the
applications are coupled together sharing code,
changes to this repo
need to work for all applications in all
contexts, but the people making these changes,
the developers on the application teams,
don't necessarily have all this context as they mostly
work in silos on their own applications.
We also accept open source pull requests to this repo, but
Any changes to this repo need to work
for the past two
open edx releases, the current
master branch, they need to work in our sandboxes, devstacks,
The past few versions of ubuntu.
Ideally they
would not make any breaking
changes for the openedx community.
Some of the components are
developed entirely by the openedx community
and we don't have the capacity to
test these changes
(i.e we might receive
pull requests for
google cloud services,
but dont use google cloud as an example)
Reviewing and testing changes
to this repo and having confidence that they wont break something
is therefore really hard.
*******************
10m simplifying configuration asym-crypto and remote config (Hermes)
*******************
Some of our past attempts at solving this include:
simplifying configuration and adding asym-crypto-yaml
and remote config
*******************
simplifying configuration and adding asym-crypto-yaml
*******************
A frequent complaint we received from our developers was that it was
difficult to determine what their
configuration settings were as we used a complex multi tiered merging
of YAML configs in the configuration repo
to build a single .yaml or multiple json files (in the case of edx-platform)
for all our apps. Additionally applications had their own defaults
ansible had its own per-app and global defaults, and then we layered
multiple files on top of it to generate a single file(or multiples for the LMS)
for each application. This was to facilitate config sharing across applications.
We decided to simplify this to just every app having a single yaml file, period.
No Json, and all of the 'default'
config values would be moved into the application instead being in the
ansible repository.
To do this we needed to be able to store our secrets in our config files,
and to do this we developed asym-crypto-yaml
which let us do inline encryption in our yaml files
We were successful in reducing all the apps to using a single yaml file,
but were not able to remove all of the old
configurations from the configuation repo, nor were we able to remove
the JSON configurations because they were needed for old
edx-platform named releases.
This was where I began to really began to appreciate
that any significant refactors of the configuration repo require us to
go through a lenghty deprecation process.
*******************
Remote config (aka Hermes)
*******************
Once we had all our applications pulling from
a single yaml file in an effort
to bypass the configration repo for
most settings changes a developer
might want to make we developed remote config AKA Hermes.
This was intended to speed up config deployments
as running the entire ansible play
is slow for simple config changes.
hermes is a simple program really,
it monitors a remote file, and whenever it's ETAG changes it
runs a shell command.
In our case, we modified the configuration
repo to when hermes is enabled modify sudoers and allow it to
restart the application.
We then configure the bash command that hermes
runs to download the remote file decrypt the inline secrets
using asym-crypto-yaml and restart gunicorn.
This enables our config changes to our app.yaml
file to bypass our deployment pipelines
(and therefore our ansible runs and complex config merging)
for most of the settings our
developers care about.
However again we weren't able to factor out all of the
old config from configuration, and the nested configuration
is still used for configuring things like nginx
and the machine images that we are deploying, so this still leads
to a lot of confusion, though the answers are sometimes much simpler.
*******************
Retirement of the configuration repo
*******************
This inability to make changes to the way we ship edx.org without
undergoing a significantly time consuming deprecation process
to support openedx is essentially what has caused us to begin
undergoing an internal deprecation of the configuration repo,
with the intent of *eventually*
deprecating and ceasing to support it externally.
With that said, we are aware that we need to follow
the formal deprecation process and cannot just pull the
rug out from under the openedx community.
*******************
Not using asym-crypto-yaml & remote config
*******************
It is important to note that remote config or
"hermes" is configured by the ansible code
in configuration, which is being
deprecated.
Our new kubernetes deployments dont
actually leverage hermes at all,
and so in all liklihood, we will not
be using
it either in the future.
In our kubernetes deployments we are storing
secrets in vault and using consul-template
to render secrets into our application
yaml files,
and thus no longer have a need to use
either of these pieces of software as
we can get the same functionality from
these free hashicorp products.
*******************
So what is next?
*******************
We are moving to a decentralized model where each
application mostly contains everything it needs
in order to run, this means default configuration
values, any libraries it needs to install etc.
The plan will probably be pretty familiar and
boring to you if you have spent much time working with Docker
We are containerizing our applications by
putting a Dockerfile in each repo.
That should be openedx centric and should
include nothing about edx.org
The repo will contain everything needed to
run that microservice except the YAML config file.
I would personally like all of the defaults
to be production centric and for all dev and sandbox environments to
configure everything by manipulating the YAML file.
The Dockerfile in the repo will be buildable by
simply doing 'docker build .'
at the root of the directory, and will produce the open edx image.
I like to think of this as "Just add YAML"
You will do `docker build .`
mount your YAML file into the container and you are good to go.
This allows both our developers and open
source developers to compartmentalize
when working on a micro
service so they dont have to ingest the entire edx
ecosystem to make a few changes to one microservice.
Everything you need should be in that repo.
One thing that we really want to do is create
a clear delineation between what is openedx and edx.org
so that edx.org can move fast and break stuff,
and so that openedx can be a stable product for your consumption
The reality is that edx.org has very different
needs from most operators of openedx.
We operate at a scale that isn't reasonable for most installs.
And most organizations that operate openedx at a similar scale have
their own infrastructure.
As such we are deploying our docker images to kubernetes using helm.
With our config files being rendered by using consul template to load
secrets from vault at deploy time.
But kubernetes is too large for many small installs
of openedx so we dont want to force this
technology choice on everyone.
Our helm charts are currently very edx.org specific
and make many assumptions about what our underlying
infrastructure is in order
to provide a clean experience to our developers working at edx.org
There is currently something of an outstanding
question of if we should attempt
to provide an open source helm chart, or provide
example manifests for running openedx on kubernetes, but
our current approach is to keep our
edx.org manifests private and to publish public docker images.
So for the parts of the configuration
repo that matter to
most installs (the installed libraries
required to run the
application, the default config values etc)
those will be in the repo in question.
Then for installing private requirements and
edx.org specific artifact customizations
we plan to ourselves become a consumer of the
openedx docker images by building private images that are
derivative from the open source images.
Currently many of our dockerfiles build both a
public and a '-newrelic' image which is essentially the beginnings of our
private images, but some of our applications,
like the platform need to install many plugins
and additional plugins for the edx.org
install.
The current images in the repos are also derivative
of ubuntu upstream images, we would eventually
like these to be based on the python base images,
they are only ubuntu
now for parity with the configuration repo
package names