DISCOVERY: Allow for more storage for file uploads

Description

  • Model the various caps of file storage: 10mb (current), 50mb, 100mb, 500mb or 1gb.

    • Use the read-replica to figure out how many submission records exist with uploaded files.

    • Group that by month to get a rough idea of growth rate of file uploads.

    • Multiply that number by 10MB, 100MB, etc. to get an idea of total size at these different levels.

    • Understand what our file-retention policy is for ORA uploads, if any.

      • Incorporate this into the rough model developed above.

  • Verify the truth of the statements below:

    • For instance, we have a restriction at the nginx layer that files uploaded to edX systems cannot be larger than 20MB.

    • May need to discuss how to support multi-part file upload in LMS if we’re supporting much larger file uploads (i.e. > 50MB)

    • The current ORA2 upload workflow seems to be:

      • The client-side asks the XBlock/python backend for a URL to upload file contents to.

      • The XBlock method produces such a URL, which is a generated S3 URL in production.

      • The client takes this URL and submits a PUT request to it, along with the file contents.

      • Notice that nowhere in the above steps do we upload a file through an edX nginx instance - so that hard, system-wide cap may not actually apply in this use-case.

  • If we want to support “additive uploads” (as opposed to the “one-shot” uploads we currently support), we’ll need a way to query the file storage backend (S3 in production) for a set of key metadata to determine cumulative storage allocation for a given (student, ORA block). Provide definitive answers to the following questions:

    • Can we fetch S3 key metadata efficiently? That is, if a student has uploaded 17 items, do we need to do 17 different queries for S3 keys, or can we do one query on an appropriate key prefix?

    • Is there a compelling reason that we would store file metadata, including size, in a model outside of S3, so that we don’t have to ask S3 for this metadata? (An answer to this question is a nice-to-have, provided that the answer to the first question is “yes, we can efficiently query S3 key metadata”).

 

Steps to Reproduce

None

Current Behavior

None

Expected Behavior

None

Reason for Variance

None

Release Notes

None

User Impact Summary

None

Status

Assignee

Andrew Tesker

Reporter

Simon Chen

Labels

Reach

None

Impact

None

Customer

None

Partner Manager

None

URL

None

Contributor Name

None

Groups with Read-Only Access

None

Actual Points

None

Category of Work

None

Stakeholders

None

Story Points

2

Epic Link

Priority

Unset
Configure