MFE JS Error and Info Logging

Summary

This document explains strategies to use in cleaning up your team’s MFE JS errors - so that non-noisy alerts can be used to alert your team when serious bugs/problems occur.

Introduction

All MFEs log their JS errors to New Relic. Those errors can then be viewed and alerted on within New Relic by selecting “JS Errors” within the desired Browser Application.

Not all JS errors are caused by the Open edX codebase. Here’s a partial list of reasons why a JS error could be logged:

  • Third-party JS throws an error.

  • Authentication errors, due to:

    • network issues

    • rate limiting

    • attempted to access a forbidden resource

    • JWT renewal failed

    • etc.

  • React component download failed

  • Course content throws a JS error

  • Browser plug-in injections into the DOM causes a JS error

  • Malicious user attempts XSS exploit

  • Bug in Open edX MFE React code

  • Browser/iframe re-sizing bugs

  • etc. (the list goes on and on…)

A micro-frontend ultimately becomes Javascript which operates on the DOM within a user’s browser. While the application server environment is in edX’s control, the client’s browser environment is ultimately controlled by the end-user. The implications:

  • JS errors will never be zero (over time).

  • Sometimes a large burst of JS errors will occur that obscure other JS errors.

  • JS error types will change over time - it’s a dynamic picture.

Actual Examples

Example #1

We’ve seen bursty MFE JS errors in frontend-app-learning at certain moments, where the overall frequency of the error greatly outweighs all other errors. For example, here’s a particular segment of frontend-app-learning JS errors:

https://one.nr/0kERzzJqzRr

Screenshot showing large magnitude of ResizeObserver JS errors

 

The ResizeObserver loop limit exceeded error overshadowed all other errors during this time, even though the error had no functional impact on the courseware MFE.

Example #2

We’ve also seen user-caused errors, such as Blocked a frame with origin "https://learning.edx.org" from accessing a cross-origin frame., shown tailing off here:

https://one.nr/0eqwym0Lxwn

Screenshot showing cross-origin frame errors with high, constant frequency, then tailing off.

 

This error was apparently caused by a user (or users) from a single country probing for XSS errors in a single course. While we do want to know when exploits like this one are attempted, they again outweigh and obscure other JS errors on which we’d need to take action.

Strategies

So what tools are available to tame the dynamic picture of an MFE’s JS errors?

Fix MFE bugs

An obvious strategy: Look through your MFE’s JS errors at regular intervals and proactively fix both serious and minor errors caused by MFE code when they are logged as JS errors in New Relic. Removing the JS error noise which is under the team’s control will be an all-around help.

Shift JS errors to New Relic page actions

If the MFE has errors on which your team will likely not take immediate action, you can consider not treating them as JS errors at all - while still logging and tracking them as New Relic page actions. If the team chooses to use this strategy, there are a couple of options.

NOTE: All the features discussed below are in frontend-platform v1.10.3+ only. Upgrade your MFE!

Change logError() to logInfo()

JS errors which are caught by MFEs can be directed as errors or page actions using logError()or logInfo().

logError

logError() sends the error to New Relic as a JS error. The error then joins all other errors in errors counted towards alerts. JS errors can be viewed using the “JS errors” option in the New Relic Browser applications sidebar:

and can be queried via NRQL with queries like:

SELECT uniques(pageUrl) FROM JavaScriptError WHERE errorMessage LIKE 'Script error%' AND appName = 'prod-frontend-app-learning'

logInfo

logInfo() sends the error to New Relic as a page action. The error is then not counted towards error alerting thresholds. The error can then be queried from page action using NRQL like:

SELECT * FROM PageAction WHERE appName = 'prod-frontend-app-learning'

It’s important to note that an error object can be sent to logInfo() and it will be handled properly. For example, both these calls are possible:

logInfo('Just so you know...'); logInfo(new Error('Unimportant error'));

Custom Attributes

To help filter errors and page actions, both logInfo() and logError() support the addition of custom attributes. The added attributes can:

  • capture more important information about the error or page action

  • enable an alert on an specific error or group of errors that were redirected from errors to page actions

Here’s some custom attributes added to the calls above:

The additional what, a, & b fields above will be saved to the page action as top-level fields queryable using NRQL.

Example

Here’s an example of explicitly shifting a JS error to be recorded as a page action - suppose you had the following MFE code:

Suppose you decide that a 403 (forbidden) error response is only happening in certain known circumstances and that you’d prefer not to clutter your JS errors/alerts with that particular error. In this case, you’d change your code to:

Use IGNORED_ERROR_REGEX

Some JS errors originate in other parts of the browser code and are caught generically and sent to New Relic via logError(). In this case, it’s not possible to direct the error to New Relic as a page action using logInfo(). However, a new feature was recently added to frontend-platform to enable this redirection.

A new optional config parameter called IGNORED_ERROR_REGEX is now available in frontend-platform. If this new parameter is defined:

  • Whenever logError() is called, the error’s message is checked against the regular expression defined in IGNORED_ERROR_REGEX.

  • If the error’s message matches the regular expression, the error is sent instead to New Relic as a page action instead of a JS error.

This redirection via regular expression matching is a powerful way to filter errors between JS errors and page actions.

NOTE: Some JS errors are thrown outside the React error boundary - and do not flow through the frontend-platform New Relic logging code. For example, Tag Manager script errors and 3rd party generic “Script error.” errors are thrown outside of React. Adding regular expressions for these errors will not redirect them to page actions. If this becomes an issue, we can attempt an enhancement using https://docs.newrelic.com/docs/browser/new-relic-browser/browser-agent-spa-api/set-error-handler/.

Examples

ResizeObserver

When rolling out the new courseware MFE, a handful of courses emitted a large volume of these errors:

ResizeObserver loop limit exceeded

We’ve since determined that the errors are caused by underlying code and have no lasting functional impact. So the following code has been added to edx-internal:

This configuration will direct all errors matching that message to page actions instead of JS errors.

Other examples, as well as how to test your regular expressions, can be found in this docstring.

Axios Errors

Axios is a Promise-based HTTP client for browser. The frontend-platform code uses it to make various HTTP/S requests. Axios errors that occur are logged using logError() and are a source of some noise in an MFE’s JS errors. If an MFE owner decides they’d rather handle/alert the Axios errors related to connectivity issues using page actions, they could do so by simply using this regular expression:

and then using NRQL alerts on the page actions with Axios Error messages.

Consider Adding Page Action Alerts

If the errors redirected by the strategies above could still indicate an exploit or some other problem, consider adding New Relic alerts on those page actions. To create a page action alert, make your NRQL query in the “Query builder”, then select “Create alert condition” from the top right menu:

From there, you can set a warning/critical thresholds, associate the alert with a policy, and save/activate the alert.