Submission Archive - Fix handling of ghost files

Description

TL;DR - Ghost files can appear in submission archive for students who experience upload failures. Update submission archive to not include (and warn) about missing files.

Why does this happen?

  • Attempting to upload a file saves file info in student state before it uploads it to S3. On failure, the failed upload is not cleared from the student state.

  • This can create a mismatch: file pointers exist in student state that don't exist in S3. We hide these on the frontend so they are invisible to both the student and reviewers but the data is still there.

  • When we generate the submission archive, we get the file path (edx-ora2/openasessment/data.py) by combining the LMS_ROOT_URL and the path-to-file. Naturally, if the path does not exist, we end up just getting the LMS_ROOT_URL!

  • These "ghost files" have the name of the failed file upload but contain the HTML of the edX home page.

  • The causes instructors to see extra files in the submission archive, which appear as broken files due to the filetype mismatch (a PDF or PNG extension for HTML code), and can make it hard to determine the correctly uploaded file.

AC:

  • If a file is not found (hint,get_download_url() raises or returns empty) the submission archive should omit the file

  • The Download CSV contains all files, noting the ones that failed to upload.

  • Log events for failed file retrievals so we can make more informed decisions on a fix moving forward.

Steps to Reproduce

None

Current Behavior

None

Expected Behavior

None

Reason for Variance

None

Release Notes

None

User Impact Summary

None

Activity

Show:
Nathan Sprenkle
February 16, 2021, 3:28 PM

, here’s a related CR for some file submission failures w/ some comment threads about the submission archive.

Assignee

Unassigned

Reporter

Nathan Sprenkle

Labels

Reach

None

Impact

None

Platform Area

None

Customer

None

Partner Manager

None

URL

None

Contributor Name

None

Groups with Read-Only Access

None

Story Points

2

Actual Points

None

Category of Work

None

Platform Map Area (Levels 1 & 2)

None

Platform Map Area (Levels 3 & 4)

None

Sprint

Priority

Unset