Devstack Survey Results (2020-October)

The Arch squad sent a survey to edX and Arbisoft engineers in October 2020 to gain insight into devstack usage and pain points. This document summarizes the results of the survey.

 

Who responded (theme and expertise)?

40 people responded, of which 38 classified into a theme.
Fairly even distribution of responses across themes (slightly Platform heavy).
What percentage of each theme replied to the survey?

There is a normal distribution (bell-curve) of self-reported expertise with devstack.
A higher # of Platform engineers self-report as knowledgeable (“4”) about devstack.

Few people know how to add test data (but many think they could figure it out).
Nobody from Platform claimed they were proficient at setting up test data.

How to change configuration is a mystery to about ⅓ of respondents.


How often are developers impacted?

Most folks use devstack often (few times a week or more).

About ⅔ of respondents run into devstack issues at least multiple times a month.

People don’t use “make destroy” very often (a few times a year or never).

Most people don’t re-provision very often.
People in Platform tend to do this more often.
Do folks not re-provision because it blows away local test data?


How long do things take?

The following chart is ordered from slowest to fastest.

All respondents consider dev.provision to be slow.
Almost all respondents consider dev.pull to be slow.
Devs do not run dev.provision very often (per chart above). It could be because they don’t need it, or it’s too slow, or it takes too long to recreate their local test data, or some other reason.
We don’t have similar information on how frequently devs run dev.pull.

It is still difficult to determine commands to run.
Maybe the docs are in the wrong place or not clear enough.

Now that we have a clear owner of devstack, perhaps get help will be faster.


Free-form Comments

Commands Know-How

There is uncertainty around which commands to run when. We have big hammers (like reprovisioning all of devstack), but not clear regularly-needed hammers.

  • I don't think I've gotten `make dev.static` to work locally. Also, can never remember the difference between `make dev.down` vs `make dev.stop` but I always do the wrong one.

  • "figuring out the right combination of commands to run" and "fixing an error", particularly one in a part of the system I may depend on but don't directly work on, are some of the biggest issues.

  • I usually use the docker commands over the make commands (pull / up) so I'm less likely to pull all the stuff I don't need to.

  • I don't know when "make dev.provision" is needed

  • It's unclear when refreshing devstack if a set of commands needs to be run

  • And apparently `make dev.down` is not the opposite of `make dev.up`.

  • Not knowing what the correct way to update devstack is (e.g. do I just pull one repo master, or all of them each time, and do I pull images every time I update the git repos for one or more repoes?). Also not knowing what to do post-updates such as I do I have re-run make requirements, migrations, etc etc. Perhaps we can have some consistent post-repo-pull or post-image-pull steps that always run to keep everyone up to date. Database state and code state could go out of date, so remembering to do migrations, static asset update etc could in the least be indicated in the log output? so one can remember to do those (if not automated)

  • Having to run migrations or rebuild requirements in LMS is really annoying

  • Make commands not listed when you just type `make`. I'd love if there could just be a single location with all commands so I don't have to go `grep`ping

  • I think we should try hard to simplify our tooling and get back to standard docker commands for the containers, python/django commands as much as possible for backend, and standard npm/node scripts for frontend. It feels like we have too many layers, and it makes it difficult to know what the most efficient way to perform a task is at any given time. The layers are ostensibly there to make things easier/more push button, but I think they're actually just masking what's really going on, and trying to do too much at the expense of making developers just wait. Paver, for instance, feels very, very, very heavy. I'm also not really clear on why we need Makefiles. That feels very... old school. Until I came here, I hadn't used Makefiles since C/C++ classes in college.

Out-of-the-box Consistent Experience

Desire to have services work together smoothly out of the box.

  • I am still relatively new but as I tough more areas I get confused sometimes which service belongs in devstack and which I have to manually setup. Ideally each repo has a similar command like make dev.up, and 'it just works' :)

  • I've been working here since June and I've lost *at least* a month's worth of work due to problems with devstack. I touch a lot of different repos and services and I want them to work together smoothly and out of the box.

  • We really need to integrate frontend apps into the future of devstack; more precisely, I want a consistent framework for managing a complete, working, development environment for edX projects independent of the technologies used.

  • I often test changes in sandboxes, more driven by the inability to integrate with external systems from a development environment than by dissatisfaction with devstack. I am, however, often surprised at how positive the experience is in contrast to devstack.

Troubleshooting

Folks are not empowered to solve their local problems. They resort to big hammers and are blocked by external help.

  • Thanks for slack :) That's my greatest resources when I hit something weird in devstack

  • Finding answers when issues come up; searching and asking in Slack doesn't even really scale internally.

  • Also, we need to work out instructions for setting up a case-sensitive volume for devstack on macOS. Ecommerce now has dependencies that are confused by macOS's case-insensitive default ("No module named 'Crypto'"), and git behaves quite badly when there are case-only differentiated files.

  • Breaking changes/incompatibilities across repos are relatively frequent and poorly communicated

  • The thing about devstack frustration is that it's a slightly different broken thing every month. Maybe there's some npm install weirdness one time. The next time it'll be that MongoDB doesn't start up correctly unless you run the upgrade script every time, etc.

  • Getting help is slow, and the help I get doesn't usually empower me to solve my own problems in the future.

  • I often run into inscrutable front end errors building static assets that I'm not sure how to troubleshoot.

Data Provisioning

Developers need the ability to provision their devstacks with “basic” test data for their most popular workflows.

  • Would be nice if devstack came out of the box with all the data/setup needed to complete the most popular workflows in the UI (e.g. checkout a course, search for a course/program, create a course, etc.)

  • I often need example courses. It would be really great if I could run a management command to create a course that has all the correct data so that I can then use it with an enterprise front end.

  • I know how to change test data via discovery but nothing else. For website we only ever really need discovery so it works ok.

  • For local test data: definitely depends on which service I need to dust off and remember about.

  • The biggest issue with devstack in my opinion is that it is not always easy setting up a devstack to accurately represent a production environment. This comes in many forms, but some examples are setting up data for scale testing (sure, it works with my 3 courses locally, but will it be fine with 15,000 on prod?), setting up "weird" content (like special exams or proctored exams), my note below about having a verified learner, and setting up flags to try and mirror the state for edX. I'm sure there are others, but these are just the ones that first popped into my head. THERE IS NO VERIFIED USER. Yes, I know we have verified@example.com, BUT that user isn't even enrolled in the Demo course as a verified learner. In order to set up a verified user in a course, I have to create a course, go into ecommerce, reduce the price to 0, and then purchase a free verified seat. It's a bit crazy to me how difficult it is to do anything but audit a course locally when most of the features we develop have different experiences based on audit/paid.

  • relatively hard to set up mock data/services for development.

Service Dependencies & Resource Consumption

Developers seek decoupled services with consistent inter-service communications, and smaller footprints.

  • This is more specific to enterprise - having to manage a bunch of different services at one time - catalog, license manager, the LMS, and then 1 or 2 MFEs. When not on enterprise teams, my biggest pain point was library development.

  • Also would be nice if services were either more decoupled or kept more in sync in local environments - if I only really work on ecommerce, I'm probably not going to go out of my way to keep the forums repo up to date, but I do expect to be able to navigate around the course pages locally and not see random errors on the discussions tab.

  • Every service requiring LMS and LMS being broken. And not always "broken" broken, just things like master getting pulled, then the code not matching the requirements, etc. I'd like there to be a stable LMS for when I need to develop other services and LMS doesn't matter to me other than auth.

  • Needing to use the lms for login when I want to work in publisher or discovery. I rarely work in the lms itself, so can easily get stuck if the lms container refuses to come up for any reason.

  • Missing configurations between services.

  • Getting different services to talk to each other is a problem. I'm always having to figure out how services are talking to each other and what needs to have permissions for what, etc. (CORS permissions, worker permissions, etc.)

  • Have had docker use all of available memory and lock up my computer, Docker quickly fills up on old images and requires a prune,

  • Services I don't care about that spin up to 100% CPU forever, that I just shut down instead of debugging what's wrong with them.

  • How much memory and compute resources devstack uses on my laptop, I think this is going to continue to get worse until we get machines with 32GB of RAM, or take some measures to use fewer resources with our MySQL and LMS containers. Maybe every service can actually use the same mysql container?

  • Pulling images takes too long, LMS has too many dependencies (chrome and firefox? really?),

Devstack Speed

Developers avoid dev.pull, although it’s a reliable way to sync with the latest code. Using paver as an abstraction layer slows tasks.

  • Some things like `make dev.pull` are quite slow, but I very rarely have to do them so it's not that big of a deal.

  • So with the timing of different things, yes, some of them take a while. But they should be done rarely. I provision my devstack maybe 1 or 2 times per year. Same with making static. For the vast majority of the time, I will individually update my images where its pulling git, and then if it says the container is missing a package, I go in and make requirements. I think a lot of people struggle with solutions like that and think that if anything is wrong, it requires a reset when 99 times out of 100, it can be fixed much more easily.

  • A significant problem is network. Any dev.pull takes about an hour for me. Because I am now an infrequent user I have to pretty much always re-provision whenever I need to do a devstack task, which is often closer to a 4 hr ordeal.

  • Studio/LMS commands take way too long because they are all wrapped in Paver

  • The time it takes for code to update is sometimes pretty slow (as in Watchman triggering the reload for your application). Also running tests if reuse-db is not properly set up/used.

Devstack Workflow

Devstack usage patterns vary significantly across squads and developer seniority.

  • There are a lot of common devstack problems that once you've encountered enough times you can easily work through by habit. Inherently, it's not very friendly for beginners.

  • I’m new-ish and most of my tickets have been in Prospectus; I haven’t touched devstack in months and didn’t get much time with it in before that.

  • My workflow involves very rarely updating the containers. I mostly use stop and up. I mostly use LMS and a local dependency to a library I am working on. If I need newer edx-platform code I usually run git pull and then run whatever paver commands I need, which is often none, and sometimes just updating requirements and more rarely updating db.

  • Sometimes it can be confusing when it's right to work on a repository/package/service within devstack, or directly on your local machine (ie, outside of devstack, in a venv).

Devstack Complexity

Varying layers of technology add to devstack’s complexity.

  • Devstack itself is too sprawling and complex, so very few engineers know how to jump in and make good changes to it

  • Our layering of configurations is very confusing, and it's not clear what layers are for what (defaults, fallbacks, devstack values, prod, etc.)

  • The structure of Django projects is new to me and very different from microfrontends.

  • Ah, another frustration is when a redirect might go to something like edx.devstack.lms (which doesn't work) so then you have to manually go in and switch it to localhost:18000. We just need to fix all uses that are broken so our app can just work.

  • Dealing with static assets is a nightmare in LMS, particularly if you need to test theming.

Decentralized Devstack: Mixed Feelings

Decentralized devstack can be a viable solution, but underlying problems like provisioning and complexity still remain.

  • There seems to be a focus on 'decentralizing' devstack? or running services seperately? tbh i don't care about this and it just makes me think I'm gonna have to relearn how everything works.

  • I'm very on board with DD, but I still don't fully understand how test data gets propagated and where we store things (scripts, sql dumps) for things like "run this to have all the data populated for a full purchase flow" or "run this to test Masters stuff".

  • love the decentralized work! have not started using it yet but happy to try when I can

Other Feedback

  • I would just love to see the results of this survey. I'm especially curious about the numbers surrounding how often people are provisioning and starting from scratch.

  • I've been using devstack for many years and while I know it's a constant work in progress it has really come a long way since the "old days"

  • Keep iterating and evolving!