Course content exports failing to find draft children

Description

As part of exporting data for the Research Data Exchange (RDX), we reran data exports for a list of courses for organizations that are participating in RDX. Part of the data export process is dumping course content for each course, using an LMS management command. In the latest run (on 5/11/2016), there were eight course exports that all failed with a similar error in the same location in code, and all eight had succeeded in earlier runs back in December (12/21/2015). The list of courses is:

  • course-v1:BUx+Math226.3x+3T2015

  • course-v1:EPFLx+PHYS-209x+3T2015

  • course-v1:EPFLx+TrigoExpX+1T2015

  • course-v1:GeorgetownX+PHLX101-02x+1T2015

  • course-v1:RiceX+AdvBio.2x+2015T3

  • course-v1:RiceX+AdvENSCI.1x+2015T3

  • course-v1:RiceX+AdvPHY2.1x+2015T3

  • course-v1:RiceX+BIOC300.2x+2T2015

An example of the command line is:

17:03:55 CalledProcessError: Command 'sudo -E -u jenkins CONFIG_ROOT=${WORKSPACE}/config/baked-config-secure/prod-edx SERVICE_VARIANT=lms /var/lib/jenkins/shiningpanda/jobs/79fcf488/virtualenvs/d41d8cd9/bin/django-admin.py export_course --settings=lms.envs.aws --pythonpath=/var/lib/jenkins/shiningpanda/jobs/79fcf488/virtualenvs/d41d8cd9/edx-platform course-v1:RiceX+AdvBio.2x+2015T3 - > /mnt/ephemeral-01/analytics-exporter/course-data/ricex_1dhPtX/ricex-2016-05-08/RiceX-AdvBio.2x-2015T3-course-prod-analytics.xml.tar.gz' returned non-zero exit status 1

The traceback and error message for all eight courses is similar in each case to that produced by the command above:

The only difference is the particular vertical that is not found:

Steps to Reproduce

None

Current Behavior

None

Expected Behavior

None

Reason for Variance

None

Release Notes

None

User Impact Summary

None

Activity

Show:
Brian Wilson
June 9, 2016, 6:11 PM

Yes, as of last week (5 june 2016), this was the only course that was encountering this problem.

Awais Jibran
June 9, 2016, 9:40 AM

Just to confirm, The only course which is causing export problem is https://studio.edge.edx.org/course/MITx/8.MReV/summer2014. And the traceback is

Please confirm.

Awais Jibran
June 9, 2016, 9:33 AM
Adam Palay
June 7, 2016, 8:45 PM

Hey , here are my answers:

  1. use the read replica for mongo to check what's going on with that vertical: https://openedx.atlassian.net/wiki/display/EdxOps/How+to+Access+a+Read+Replica

  2. I'm not really sure, off the top of my head. If you can't reproduce it organically for a test, just mock parent.children not to have the child location in it.

Awais Jibran
June 7, 2016, 10:39 AM
Edited

I am trying to reproduce this locally, but still I am unable to succeed.

After discussion with ,
1. We need to check the database if `Orbits_Find_Boost_randxyz2TKQQVVF` vertical exists or not?

2. Make the code more robust to handle the scenario.

I can not reproduce this locally, can you please guide me how to reproduce it locally? When a parent does not have index of the vertical (orphan) but the vertical has parent information. The fix seems like straight forward, but this the scenario which is causing confusion.

Fixed

Assignee

Awais Jibran

Reporter

Brian Wilson

Reach

None

Impact

None

Platform Area

None

Customer

None

Partner Manager

None

URL

None

Contributor Name

None

Groups with Read-Only Access

None

Actual Points

None

Category of Work

None

Platform Map Area (Levels 1 & 2)

None

Platform Map Area (Levels 3 & 4)

None

Priority

CAT-2