Page MenuHomePhabricator

Set up and test Sentry on Labs for JS error logging
Closed, ResolvedPublic

Description

Per T525, Sentry seems the most mature FOSS error logging software. We should set up a test instance - as unlikely as it is to scale up to the traffic of WMF sites, it would spare us a lot of development time if it still does. And even if it does not, it's commercial-quality error logging code and reviewing it will help us figure out how to build our own error logger.

Also, while we are at it, we could probably try to use it to log PHP errors (and whatever else there is) as well.

Related Objects

StatusSubtypeAssignedTask
DeclinedNone
ResolvedTgr
Resolved Gilles
OpenNone
DeclinedNone
DeclinedTgr
ResolvedTgr
ResolvedTgr
Resolved csteipp
ResolvedTgr
ResolvedTgr
ResolvedAklapper
ResolvedTgr
ResolvedTgr
Resolved jlinehan
ResolvedTgr
DeclinedTgr
DeclinedTgr
DeclinedTgr
ResolvedTgr
DeclinedTgr
ResolvedTgr
ResolvedTgr
ResolvedKrinkle
DeclinedNone
OpenNone
ResolvedTgr
DeclinedNone
DeclinedNone
InvalidNone
DeclinedTgr
ResolvedTgr
Resolvedjcrespo
ResolvedTgr

Event Timeline

Tgr raised the priority of this task from to Medium.
Tgr updated the task description. (Show Details)
Tgr added a project: Multimedia.
Tgr changed Security from none to None.
Tgr added subscribers: Tgr, He7d3r, Gilles.

@Legoktm mentioned that there's already an unused Sentry server running on labs.

For other uses of Sentry, see T70820.

I set up raven.js on http://multimedia-alpha.wmflabs.org/ to log to the Sentry instance Bryan set up earlier: http://sentry-beta.wmflabs.org/jserrors/jserrors/ (I'll push the integration code to mediawiki/extensions/Sentry once it is created).

Sentry uses a js logger called Raven, which is basically TraceKit (an error catching/browser compatibility library) plus remote logging plus a plugin system.

TraceKit's browser compatibility seems decent:

Supports:

  • Firefox: full stack trace with line numbers, plus column number on top frame; column number is not guaranteed
  • Opera: full stack trace with line and column numbers
  • Chrome: full stack trace with line and column numbers
  • Safari: line and column number for the top frame only; some frames may be missing, and column number is not guaranteed
  • IE: line and column number for the top frame only; some frames may be missing, and column number is not guaranteed

In theory, TraceKit should work on all of the following versions:

  • IE5.5+ (only 8.0 tested)
  • Firefox 0.9+ (only 3.5+ tested)
  • Opera 7+ (only 10.50 tested; versions 9 and earlier may require Exceptions Have Stacktrace to be enabled in opera:config)
  • Safari 3+ (only 4+ tested)
  • Chrome 1+ (only 5+ tested)
  • Konqueror 3.5+ (untested)

Raven adds support for automatic wrapping of native funcions (like setTimeout) and jQuery event and AJAX handlers. Generates event ids (on the client side, without any deduping). Client-side throttling/samplig is supported via callbacks. The error reporting mechanism is not configurable but is done via an invisible pixel integration with Varnish should be possible. There are all sorts of nice features for adding additional information.

Sentry claims to support source maps. There is a Phabricator plugin for easy ticket creation.

Did some dummy testing: exception and custom message reporting works; onerror logging works but seems flawed (the message is not logged). Stack traces don't show up for either case. This could just be a side effect of me triggering the errors from the debug console - need to test with more realistic errors. Deduping should also be tested (works for the trivial case of generating the same error in the same browser multiple time, but should be checked for different browsers / languages).

On the whole, seems promising for small MediaWiki installations, assuming stacktraces work. I doubt it scales to WMF traffic but maybe that can be handled by channelling through Varnish and logstash and deduping there.

In T1345#847274, @Tgr wrote:

I doubt it scales to WMF traffic

It does, but with a very different server-side setup than the default one. They provide guidelines on how to make it run on a very large traffic environments. The default setup should be fine for beta, though.

As for deduping, they don't do much there but the little they do is more than the alternatives. According to them, nobody has bothered solving the deduping issue at large (eg. across languages), be it open or closed source.

Tested by adding a nonexistent function call to a MMV file; the results are not so great so far.

Change 179987 had a related patch set uploaded (by Gergő Tisza):
Initial commit

https://gerrit.wikimedia.org/r/179987

Patch-For-Review

Judging from the comments in https://github.com/getsentry/sentry/issues/889 Sentry dedupes strictly based on the stack trace, so it should be able to merge errors across browsers/languages.

TraceKit by the way seems abandoned, with no contributions in 2014 and key pull requests like support for exceptions in Chrome window.onerror pending since several months.

Raven.js has a forked version of TraceKit, and that seems well-maintained.

In T1345#848946, @Tgr wrote:

This is apparently the interplay of three different bugs. Reported upstream:
https://github.com/getsentry/raven-js/issues/300
https://github.com/getsentry/sentry/issues/1334

Logging of errors which happen on clicking on a link and which do not block navigation seems to suffer from the same problem as EventLogging for such clicks (T44815). Will be solved eventually via https://github.com/getsentry/raven-js/issues/293

Wrapping all JS modules via Raven.wrap (i.e. not relying on window.onerror) makes error logging completely reliable (it still misidentifies the file though). Doing that might require a patch to ResourceLoader, and it would mean that raven.js is loaded as a startup module, blocking page rendering (that's about 25K which is a 14% increase to the startup module size).

To sum it up, blockers for using Sentry:

  • wrap scripts in error reporting code (T513) if it is deemed acceptable (the code is already provided by raven.js)
  • source maps for non-debug mode (T47514)
  • raven.js#300 (could be fixed in at least two different ways, one of which sounds trivial)
  • figure out scaling

None of those seem extremely hard to solve, while redeveloping the same functionality would be a huge effort (just the browser compatibility logic in TraceKit seems like it would take months to figure out) so this should definitely be the way to go, even if we don't consider the value of possibly tracking all kind of client- and server-side errors in the same application.

I guess the next step would be to ask Roan or Timo about the ResourceLoader changes, and then get the extension deployed on the beta cluster? We could more easily test issues like CORS and handling of multiple wikis there.

In T1345#849752, @Tgr wrote:

Wrapping all JS modules via Raven.wrap (i.e. not relying on window.onerror) makes error logging completely reliable

Indeed, the sentry guys told me that it's the only way to have reliable reporting, hitting onerror should be avoided since it provides insufficient/unreliable information. This isn't specific to their library, but a larger issue with onerror.

You should join Sentry on freenode, they're very responsive there if you have any questions.

Indeed, the sentry guys told me that it's the only way to have reliable reporting, hitting onerror should be avoided since it provides insufficient/unreliable information. This isn't specific to their library, but a larger issue with onerror.

onerror should be reliable on modern browsers, the problem with it is that older ones don't provide the stack trace. TraceKit does something complicated for IE8 compatibility, with rethrowing exceptions and using timeout to see if they hit onerror, and I think that messes up logging on Chrome.

You should join Sentry on freenode, they're very responsive there if you have any questions.

Yeah, I did (and indeed they are quick to answer) but it's still nice to try things out.

Change 180309 had a related patch set uploaded (by Gergő Tisza):
Add jobs for Sentry

https://gerrit.wikimedia.org/r/180309

Patch-For-Review

Change 180309 merged by jenkins-bot:
Add jobs for Sentry

https://gerrit.wikimedia.org/r/180309

Sentry is now live on the beta cluster and collecting JS errors (well, some of them - it will be more useful once T78809 is merged). Pinging WMF-Legal so they are aware as this includes private visitor data (IP addresses, user agents). The (temporary) sentry instance is at http://sentry-beta.wmflabs.org/jserrors/jserrors/ ; for now access is given manually, and only to WMF employees and volunteers with NDAs. Ping me if interested.

Aklapper removed Tgr as the assignee of this task.Jun 19 2020, 4:27 PM

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)

Can this be declined?
We now have client error logging.

This has been done a long time ago (and undone later), as described in T1345#1022228. Not sure why I left the task open.

Tgr claimed this task.