Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Make better use of New Relic’s new expected error functionality.

    • See https://openedx.atlassian.net/browse/ARCHBOM-1898

    • Maybe replace/remove our custom functionality for expected errors?

      • Custom expected errors has missing errors.

        • Use _nr_exc_info in new middleware. See New Relic code.

        • Test on DRF permission failures locally.

    • Also, on Oct 12 Agent v7.2.1.168 was released that presumably fixes the Ignored Error problem.

      • How much of this functionality do we want to keep in place?

  • New Relic policy that sends notifications to warroom for rare events that many might care about:

    • memcached lost all its data?

    • ignored errors warning

  • Add user_id to error message in frontend-platform for login_refresh:

  • Fix/Rethink RequestCustomAttributesMiddleware

    • Note: currently this observability data goes missing for certain exceptions.

    • Most of it applies to any request, and the middleware (or some of it) probably better belongs in edx-django-utils monitoring.

    • Splitting out user and authentication monitoring into separate middleware would enable the existing middleware to move higher in the list, so we don’t lose good info during exceptions in other middleware.

      • Note: would auth exception monitoring middleware need to be higher than auth middleware? Needs thought.

  • Deployment metadata in New Relic ideas:

  • Pull edx-platform healthcheck from apdex

  • Proposal: OpsGenie team template with example configurations

  • Improve and track GoCD failures

  • Simplify New Relic onboarding and OneLogin SSO (SRE Support)

  • Simplify alert creation for new services. (SRE Support)

  • Simplify how someone can determine what version of a service is deployed where. (SRE Support)

  • Docs answering common questions in New Relic. (SRE Support)

    • How do I find x in New Relic?

      • Note: github repo frontend-app-admin-portal, but it's just prod-edx-portal in NR. Can this be fixed?

    • Why is this red in New Relic?

    • How do I do x in New Relic?

  • Ensure applications all follow best practices and have New Relic configured from the get go, rather than waiting till there is a fire. (SRE Support)

  • See Discovery: Recent Deployers (WIP)

    Expected errors

    Fix missing errors.

  • Use _nr_exc_info in new middleware. See New Relic code.

  • Test on DRF permission failures locally.

Organize Hnycon (Honeycomb) Notes

...