Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

the VEDA SLA is currently 24 hours. Videos are often non-deterministic in their encoding, a single video can fail encoding once, and be successful the following run without making any code or encoding strategy changes. However, if a video is consistently failing to complete, this likely points to a deeper problem with VEDA itself.

Splunk

VEDA is currently logging to Splunk under index=prod-edx-veda.

...

  1. Discovery
  2. Ingest
    1. Enqueue
  3. Encode
  4. Deliver
  5. HEAL

Splunk Alerts

Video IDs with uncompleted encodes that are older than 25 hours to veda-dev@edx.org


Runsheets:

I. "A single video isn't completing"

Search for the studio ID in Splunk or log into VEDA django admin an search the provided studio ID, and it should correspond to a VEDA generated ID that's human parsable:

...

MOST COMMON: If youtube is the only encode missing, then an issue exists with the youtube version of the file. If the file hasn't shown up as a "youtube duplicate", then reading of the youtube logs might be appropriate to glean further knowledge. It's possible that the file is a duplicate, but the status is not catching. You can check for logs on Splunk for prod-edx-veda index to understand where the code broke.

II. "My whole course isn't completed"

Find an ID associated with this course using the steps above and check to see if the file is uploaded and has a few successful encodes.

...

Reprocess the course once the issue is resolved via HEAL.

III. "Nothing is completing"

Most likely a service is broken and hasn't paged.

...

Ingest and Youtube callback are the most fragile, so check them first (they both live on the same node, so it's relatively easy). The logs on each machine can provide some context, as a recent code change might be breaking something. 

Alerting:

NewRelic

New relic records are available via the 'veda_production' app on the edX newrelic site. There are currently no push alerts set up.

https://rpm.newrelic.com/accounts/88178/applications/32434455

Email Alerts

An email alert is sent out via AWS SES if a process crashes. This is sent to the list at veda-dev@edx.org

...