Tutor troubleshooting notes

This is not an official guide! This is just a log of my (@Kyle McCormick's) experiences running Tutor locally, both for my own reference, and for the community’s reference as we work to improve Tutor and it’s official docs via the https://openedx.atlassian.net/wiki/spaces/COMM/pages/3315335223.

Task 1: Set up Tutor for local development

Goal

  • Using a branch B, which is based on openedx/edx-platform:master, I want to:

    • Run LMS and CMS with local code

    • Run static analysis & tests

    • Run a management command

Relevant Tutor Docs

Issues Encountered

  • When working out of a repo generated by tutor dev bindmount:

    • git state was confusing as a newcomer

      • no branch information

      • patches applied on top of master - wasn’t clear where they came from at first

      • copied static assets and custom settings files made git state dirty

    • git remote protocol needed to be switched from https to ssh to allow push

    • git had to be configured to pull relevant branches

  • When using tutor dev start ... or tutor dev exec ... :

    • settings weren’t set correctly? Regis mentioned that exec doesn’t set DJANGO_SETTINGS_MODULE properly…

  • When setting COMMON_OPENEDX_VERSION to my feature branch and trying tutor images build openedx :

    • If branch has updates, needed to use --no-cache, which took a very long time.

    • Had to override openedx-dockerfile-git-patches-default to be empty, otherwise cherry-picking would conflict with the state of my branch (because said commits are already on master, it seems?)

  • Volume mounting requires full path; ~ for home didn’t work.

  • Had to start from scratch due to < insert requirements problem that carlos had >

Successful Approach

# Clone edx-platform and switch to my branch. $ git clone git@github.com:openedx/edx-platform $ (cd edx-platform && git checkout $B) # Install Tutor Nightly. $ virtualenv tutor-venv $ source tutor-venv/bin/activate $ git clone --branch nightly git@github.com:overhangio/tutor $ cd tutor $ pip install -r requirements/dev.txt $ pip install -e . # Provision Tutor. $ tutor local quickstart $ tutor local stop # Copy edx-platform virtual environment to host. $ tutor dev start lms $ tutor dev bindmount lms /openedx/venv $ tutor dev stop lms $ tutor dev dc rm # Configure Tutor to mount your local edx-platform and virtual environment # by writing a env/dev/docker-compose.override.yml file to your Tutor env. # The syntax for each mount is $HOST_PATH:$DOCKER_PATH. # Be sure to substitute in the appropriate $HOST_PATH for each mount. $ cat > $(tutor config printroot)/env/dev/docker-compose.override.yml <<- EOF version: "3.7" services: lms: volumes: - /home/kyle/openedx/edx-platform:/openedx/edx-platform - /home/kyle/.local/share/tutor-nightly/volumes/venv:/openedx/venv cms: volumes: - /home/kyle/openedx/edx-platform:/openedx/edx-platform - /home/kyle/.local/share/tutor-nightly/volumes/venv:/openedx/venv lms-worker: volumes: - /home/kyle/openedx/edx-platform:/openedx/edx-platform - /home/kyle/.local/share/tutor-nightly/volumes/venv:/openedx/venv cms-worker: volumes: - /home/kyle/openedx/edx-platform:/openedx/edx-platform - /home/kyle/.local/share/tutor-nightly/volumes/venv:/openedx/venv EOF # Install requirements, provision demo course, admin user and static assets. $ tutor dev run lms make requirements $ tutor dev run lms npm install $ tutor dev run lms openedx-assets build --env=dev $ tutor dev createuser admin admin@example.com --password admin --staff --superuser $ tutor dev importdemocourse # (NOT REQUIRED - JUST EXAMPLES) Run tests, linting, and a management command. $ tutor dev run lms pytest path/to/some/code $ tutor dev run lms pylint path/to/some/code $ tutor dev run lms ./manage.py lms run_some_management_command # Run LMS and CMS. $ tutor dev start -d lms cms

Suggested Improvements

This was just a first pass at suggested improvements. We’ve been iterating these ideas in some GitHub issues:

# Clone edx-platform and switch to my branch. $ git clone git@github.com:openedx/edx-platform $ (cd edx-platform && git checkout $B) # Install Tutor Nightly. $ virtualenv tutor-venv $ source tutor-venv/bin/activate # Either: $ git clone --branch nightly git@github.com:overhangio/tutor $ cd tutor $ make requirements # OR: # tutor-nightly could be a metapackage depending on the latest nightly (`N.dev`) tutor release $ pip install tutor-nightly # Configure mounting: # * from my edx-platform to the container's /openedx/edx-platform, and # * from the default location in tutor-nightly config to /openedx/venv. $ tutor config save --set OPENEDX_MOUNTS=/home/kyle/openedx/edx-platfrom:/openedx/edx-platform,/openedx/venv # Provision Tutor, to include a default user as well as static assets. $ tutor dev quickinit # Run tests, linting, and a management command. $ tutor dev run bash app@lms$ pytest path/to/some/code app@lms$ pylint path/to/some/code app@lms$ ./manage.py lms run_some_management_command app@lms$ exit # Run LMS, with virtual environment and application code from host. $ tutor dev runserver -d lms cms

Task 2: Set up Tutor in a local Kubernetes (k8s) cluster

Goal

Run LMS, CMS and the Learning MFE in a k8s cluster on my local machine using Tutor

Relevant Tutor Docs

Notes

Stream of consciousness… follow-up items are denoted with

  • Docs recommend Minikube for trying things out. Great.

    • Minikube setup was easy.

  • tutor k8s start: Failing. Unable to connect to MySQL.

    • Tried tutor k8s exec 'mysql mysql --username=... --password=...', could not connect.

    • Also, tutor k8s exec command arg1 arg2: this doesn't work. Needs to be tutor k8s exec 'command arg1 arg2'.

      • Wrote up

    • Went back to dev mode, confirmed that I can connect to MySQL.

    • Brain: These are two different databases, so I need run quickstart again.

  • tutor k8s quickstart:

    • Hung when it got to discovery.

    • Disabled discovery for now.

    • Without discovery plugin enabled, quickstart succeeds 👍🏻

  • How do I view LMS in the browser?

    • local.overhang.io : unable to connect.

    • docs tell me to look at caddy's external IP and configure my DNS server with it.

    • kubectl --namespace openedx get services/caddy says that EXTERNAL_IP is <pending>

    • StackOverflow tells me that I need to run minikube tunnel in order for the load balancer to work within minikube

    • Now, EXTERNAL_IP shows up.

    • Putting that IP address in the browser hangs in an encouraging way but doesn't show anything.

    • Caddy's logs show that I am successfully making requests to Caddy.

    • I'm looking at the Caddyfile. Caddyfile is defined in terms of the hostnames I have from config.yaml. That is, local.overhang.io.

    • Brain: Caddy is expecting URLs to be in the form of *.local.overhang.io/*

    • Edit /etc/hosts, pointing local.overhang.io at the value of EXTERNAL_IP

    • Go to https://local.overhang.io in the browser... hangs still

    • Brain: Right, I don't have TLS set up

    • Go to http://local.overhang.io: LMS loads!

    • Takeaway: Would be good to note minikube tunnel and /etc/hosts changes that were necessary for this to work locally.

  • Trying to log in...

    • Need to make a superuser

    • tutor k8s exec lms bash ... ./manage.py lms manage_user ... --superuser

    • Could not load django_debug_toolabr

    • DJANGO_SETTINGS_MODULE=lms.envs.production ./manage.py lms manage_user ... --superuser

      • TODO: File an issue for DJANGO_SETTINGS_MODULE being mis-set?

    • Log in works!

  • Trying to enroll...

    • Whoops, no courses.

    • tutor k8s importdemocourse.

    • That worked! Enrolled.

  • Trying to start the learning mfe...

    • tutor plugins enable mfe

    • tutor config save

    • tutor k8s init --limit=mfe -> exited successfully

    • MFE app isn't running, logs say Error from server (BadRequest): container "mfe" in pod "mfe-7757f78d77-qcpmr" is waiting to start: image can't be pulled

      • according to deployments.yml, the assigned image is docker.io/overhangio/openedx-mfe:13.0.2

      • docker pull docker.io/overhangio/openedx-mfe:13.0.2 yields manifest for overhangio/openedx-mfe:13.0.2 not found: manifest unknown: manifest unknown

      • Went to openedx-mfe on DockerHub, latest image is 12.01.

      • Brain: Oh, maybe I'm supposed to build this myself?

      • tutor images build mfe

        • Was going to file an issue about the missing 13.x dockerhub image, but then I found

        • Image built successfully!

        • Tutor still wants to use the remote image, which doesn’t exist…

        • Pushed custom-built image to https://hub.docker.com/r/kdmccormick96/openedx-mfe, and set MFE_DOCKER_IMAGE to point to it.

        • We should either:

          • address the lack of dynamic config overrides for MFEs (on frontend-wg’s radar with , not currently being worked on, though)

          • add docs explaining how to work around the MFE issue by building a custom image, and somehow remove the messaging about the MFE image being missing from dockerhub… which is technically correct but doesn’t lead the user to the solution.

    • tutor k8s stop mfe && tutor k8s start mfe -> The Deployment "mfe" is invalid.

      • I think it's failing because there are old pods running still. See 1: Old pods aren't getting destroyed

      • Went to dashboard (minikube dashboard) and selected the openedx namespace.

      • There are still pods running, but their instance ID is different than the instance ID that tutor k8s stop is using.

      • Find the instance ID from the dashboard.

      • Run kubectl delete --namespace openedx --selector=app.kubernetes.io/instance=openedx-INSTANCE_ID_FROM_DASHBOARD,app.kubernetes.io/component!=loadbalancer deployments,services,configmaps,jobs. Resources are deleted.

      • TODO: what was happening here?

    • Run tutor k8s start again.

      • local.overhang.io hangs. Ran tutor k8s logs lms. Get django.db.utils.OperationalError: (1045, "Access denied for user 'openedx'@'172.18.0.1' (using password: YES)").

      • Ran tutor k8s quickstart again.

      • Job failed; access denied still.

      • Stop containers: tutor k8s stop.

      • Ran tutor k8s quickstart again. Still access denied.

      • Brain:

        • This must be an issue with the mysql database itself. Where is mysql data stored?

        • checking volumes.yml...

        • it's stored using the default storage class

        • kubectl get storageclass -> minikube hostPath is the default -> it's somewhere in the minikube container.

        • maybe restarting minikube will start me off with a fresh database?

        • minikube stop && minikube start && tutor k8s quickstart -> same error as before

        • ugg, let's try minikube delete && minikube start && tutor k8s quickstart

        • TODO: what was happening here?

    • After destroying and starting over…

      • Minor issue: External IP had changed. Had to update /etc/hosts

      • Another issue: Celery 5 seems to break lms-worker and cms-worker. This isn’t specific to k8s, and seems to have come up in the past week.

      • With the Celery fix in place, Studio and LMS work!

      • and with my custom image pushed to DockerHub, the course outline in the Learning MFE works!

      • Unfortunately, courseware in the Learning MFE shows “An error has occurred” with no other explanation. No JS error logs in the console or 5XXs/4XXs in the network log.

        • Workaround: Toggle on courseware.use_legacy_frontendwaffle flag in order to use legacy courseware

        • TODO: figure out why this didn’t work

      • Now, using legacy, courseware works.

References

1: Old pods aren't getting destroyed

$ tutor k8s start mfe kubectl get namespaces openedx NAME STATUS AGE openedx Active 12h Namespace already exists: skipping creation. kubectl apply --kustomize /home/kyle/.local/share/tutor-nightly/env --selector app.kubernetes.io/name=mfe The Deployment "mfe" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/instance":"openedx-KXV0satwwOtuGRz7lwgslwU7", "app.kubernetes.io/managed-by":"tutor", "app.kubernetes.io/name":"mfe", "app.kubernetes.io/part-of":"openedx"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable Error: Command failed with status 1: kubectl apply --kustomize /home/kyle/.local/share/tutor-nightly/env --selector app.kubernetes.io/name=mfe $ tutor k8s stop kubectl delete --namespace openedx --selector=app.kubernetes.io/instance=openedx-KXV0satwwOtuGRz7lwgslwU7,app.kubernetes.io/component!=loadbalancer deployments,services,configmaps,jobs service "cms" deleted service "elasticsearch" deleted service "lms" deleted service "mfe" deleted service "mongodb" deleted service "mysql" deleted service "redis" deleted service "smtp" deleted configmap "caddy-config-b9gk6kf847" deleted configmap "openedx-config-kkhkt28hth" deleted configmap "openedx-settings-cms-bkft5k2b4h" deleted configmap "openedx-settings-lms-h25dtdkdh2" deleted configmap "redis-config-fccm65mh4m" deleted $ kubectl --namespace=openedx get pods NAME READY STATUS RESTARTS AGE caddy-85589d6669-dnf65 1/1 Running 1 (11m ago) 52m cms-7dd67c5f55-b4rvb 1/1 Running 1 (11m ago) 52m cms-job-20220303113051-lrzh5 0/1 Completed 0 51m cms-worker-67846c7d7b-t2djr 1/1 Running 1 (11m ago) 52m elasticsearch-d8b6859f7-7rfmr 1/1 Running 1 (11m ago) 52m lms-788dfd9669-spgdw 1/1 Running 1 (11m ago) 52m lms-job-20220303113217-gwj82 0/1 Completed 0 50m lms-worker-794866f475-zhzzb 1/1 Running 1 (11m ago) 52m mfe-7757f78d77-qcpmr 0/1 ImagePullBackOff 0 49m mongodb-f4f6dc446-jn7ck 1/1 Running 1 (11m ago) 52m mysql-7cb5f98d7-tlf76 1/1 Running 1 (11m ago) 52m redis-7ddff4c6dd-hcxb6 1/1 Running 1 (11m ago) 52m smtp-7454bf587b-6cl9l 1/1 Running 1 (11m ago) 52m $ tutor k8s stop kubectl delete --namespace openedx --selector=app.kubernetes.io/instance=openedx-KXV0satwwOtuGRz7lwgslwU7,app.kubernetes.io/component!=loadbalancer deployments,services,configmaps,jobs No resources found $