Devstack Survey Results (2020-October)
- Nimisha Asthagiri (Deactivated)
The Arch squad sent a survey to edX and Arbisoft engineers in October 2020 to gain insight into devstack usage and pain points. This document summarizes the results of the survey.
Who responded (theme and expertise)?
40 people responded, of which 38 classified into a theme.
Fairly even distribution of responses across themes (slightly Platform heavy).
What percentage of each theme replied to the survey?
There is a normal distribution (bell-curve) of self-reported expertise with devstack.
A higher # of Platform engineers self-report as knowledgeable (“4”) about devstack.
Few people know how to add test data (but many think they could figure it out).
Nobody from Platform claimed they were proficient at setting up test data.
How often are developers impacted?
How long do things take?
Free-form Comments
Commands Know-How
I don't think I've gotten `make dev.static` to work locally. Also, can never remember the difference between `make dev.down` vs `make dev.stop` but I always do the wrong one.
"figuring out the right combination of commands to run" and "fixing an error", particularly one in a part of the system I may depend on but don't directly work on, are some of the biggest issues.
I usually use the docker commands over the make commands (pull / up) so I'm less likely to pull all the stuff I don't need to.
I don't know when "make dev.provision" is needed
It's unclear when refreshing devstack if a set of commands needs to be run
And apparently `make dev.down` is not the opposite of `make dev.up`.
Not knowing what the correct way to update devstack is (e.g. do I just pull one repo master, or all of them each time, and do I pull images every time I update the git repos for one or more repoes?). Also not knowing what to do post-updates such as I do I have re-run make requirements, migrations, etc etc. Perhaps we can have some consistent post-repo-pull or post-image-pull steps that always run to keep everyone up to date. Database state and code state could go out of date, so remembering to do migrations, static asset update etc could in the least be indicated in the log output? so one can remember to do those (if not automated)
Having to run migrations or rebuild requirements in LMS is really annoying
Make commands not listed when you just type `make`. I'd love if there could just be a single location with all commands so I don't have to go `grep`ping
I think we should try hard to simplify our tooling and get back to standard docker commands for the containers, python/django commands as much as possible for backend, and standard npm/node scripts for frontend. It feels like we have too many layers, and it makes it difficult to know what the most efficient way to perform a task is at any given time. The layers are ostensibly there to make things easier/more push button, but I think they're actually just masking what's really going on, and trying to do too much at the expense of making developers just wait. Paver, for instance, feels very, very, very heavy. I'm also not really clear on why we need Makefiles. That feels very... old school. Until I came here, I hadn't used Makefiles since C/C++ classes in college.
Out-of-the-box Consistent Experience
I am still relatively new but as I tough more areas I get confused sometimes which service belongs in devstack and which I have to manually setup. Ideally each repo has a similar command like make dev.up, and 'it just works' :)
I've been working here since June and I've lost *at least* a month's worth of work due to problems with devstack. I touch a lot of different repos and services and I want them to work together smoothly and out of the box.
We really need to integrate frontend apps into the future of devstack; more precisely, I want a consistent framework for managing a complete, working, development environment for edX projects independent of the technologies used.
I often test changes in sandboxes, more driven by the inability to integrate with external systems from a development environment than by dissatisfaction with devstack. I am, however, often surprised at how positive the experience is in contrast to devstack.
Troubleshooting
Thanks for slack :) That's my greatest resources when I hit something weird in devstack
Finding answers when issues come up; searching and asking in Slack doesn't even really scale internally.
Also, we need to work out instructions for setting up a case-sensitive volume for devstack on macOS. Ecommerce now has dependencies that are confused by macOS's case-insensitive default ("No module named 'Crypto'"), and git behaves quite badly when there are case-only differentiated files.
Breaking changes/incompatibilities across repos are relatively frequent and poorly communicated
The thing about devstack frustration is that it's a slightly different broken thing every month. Maybe there's some npm install weirdness one time. The next time it'll be that MongoDB doesn't start up correctly unless you run the upgrade script every time, etc.
Getting help is slow, and the help I get doesn't usually empower me to solve my own problems in the future.
I often run into inscrutable front end errors building static assets that I'm not sure how to troubleshoot.
Data Provisioning
Would be nice if devstack came out of the box with all the data/setup needed to complete the most popular workflows in the UI (e.g. checkout a course, search for a course/program, create a course, etc.)
I often need example courses. It would be really great if I could run a management command to create a course that has all the correct data so that I can then use it with an enterprise front end.
I know how to change test data via discovery but nothing else. For website we only ever really need discovery so it works ok.
For local test data: definitely depends on which service I need to dust off and remember about.
The biggest issue with devstack in my opinion is that it is not always easy setting up a devstack to accurately represent a production environment. This comes in many forms, but some examples are setting up data for scale testing (sure, it works with my 3 courses locally, but will it be fine with 15,000 on prod?), setting up "weird" content (like special exams or proctored exams), my note below about having a verified learner, and setting up flags to try and mirror the state for edX. I'm sure there are others, but these are just the ones that first popped into my head. THERE IS NO VERIFIED USER. Yes, I know we have verified@example.com, BUT that user isn't even enrolled in the Demo course as a verified learner. In order to set up a verified user in a course, I have to create a course, go into ecommerce, reduce the price to 0, and then purchase a free verified seat. It's a bit crazy to me how difficult it is to do anything but audit a course locally when most of the features we develop have different experiences based on audit/paid.
relatively hard to set up mock data/services for development.
Service Dependencies & Resource Consumption
This is more specific to enterprise - having to manage a bunch of different services at one time - catalog, license manager, the LMS, and then 1 or 2 MFEs. When not on enterprise teams, my biggest pain point was library development.
Also would be nice if services were either more decoupled or kept more in sync in local environments - if I only really work on ecommerce, I'm probably not going to go out of my way to keep the forums repo up to date, but I do expect to be able to navigate around the course pages locally and not see random errors on the discussions tab.
Every service requiring LMS and LMS being broken. And not always "broken" broken, just things like master getting pulled, then the code not matching the requirements, etc. I'd like there to be a stable LMS for when I need to develop other services and LMS doesn't matter to me other than auth.
Needing to use the lms for login when I want to work in publisher or discovery. I rarely work in the lms itself, so can easily get stuck if the lms container refuses to come up for any reason.
Missing configurations between services.
Getting different services to talk to each other is a problem. I'm always having to figure out how services are talking to each other and what needs to have permissions for what, etc. (CORS permissions, worker permissions, etc.)