Performance review for PageNotice extension
Open, Needs TriagePublic
Actions

Assigned To

Authored By

	TTO
	Jan 30 2024, 8:34 AM

Description

@Ladsgroup intends to do a performance review to make sure the additional messages that the extension loads do not adversely affect performance.

This review blocks further deployment of the extension.

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T151658 Add some new MediaWiki system messages inside #mw-content-text
Open		None	T138652 [Story] Add a new MediaWiki system message at the footer of content pages
Open		None	T151682 Add a new MediaWiki system message as a content header inside #mw-content-text
In Progress	Feature	None	T6469 Provide per-namespace site notices
Stalled		None	T61245 Review the PageNotice extension for deployment
Open		Ladsgroup	T356159 Performance review for PageNotice extension

Event Timeline

TTO created this task.Jan 30 2024, 8:34 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 30 2024, 8:34 AM

I hope it is okay to directly assign this task to you, @Ladsgroup .

Sure!

SD0001 subscribed.Feb 1 2024, 8:08 AM

Generally, looks good. I think I found one potential issue:

Maybe I'm missing something super obvious but it looks like Message::parse, doesn't cache the parsed result. It caches the value but parsing is expensive on its own. Ironically, MessageCache::parse is the part that's missing the cache. Maybe it puts them in PC? I haven't seen that and still making db calls is expensive, unless it's backed by memcached somewhere in the parser. I think I need to run this in debug and look.
- If true, it's not fault of this extension but needs to be fixed regardless.

One note: This really should be in core. It'd be great if just that fifty lines moved to core and undeploy this extension.

Yup, the wikitext of the message gets cached by not parsing it... https://performance.wikimedia.org/excimer/profile/4cef32d5fdd53209

Jdforrester-WMF subscribed.Feb 1 2024, 6:23 PM

@Ladsgroup thanks for taking a look.

In T356159#9506670, @Ladsgroup wrote:

it looks like Message::parse, doesn't cache the parsed result.

If templates or modules are called from the PageNotice message, the parsed result will vary per page, so it would make sense to cache the parsed result in the same place the page content itself is cached. I don't know what that place is, or how to do it, though.

In T356159#9506670, @Ladsgroup wrote:

If true, it's not fault of this extension but needs to be fixed regardless.

Anything I can do to assist?

In T356159#9508281, @TTO wrote:

@Ladsgroup thanks for taking a look.

In T356159#9506670, @Ladsgroup wrote:

it looks like Message::parse, doesn't cache the parsed result.

If templates or modules are called from the PageNotice message, the parsed result will vary per page, so it would make sense to cache the parsed result in the same place the page content itself is cached. I don't know what that place is, or how to do it, though.

It should be in ParserCache. This should be basically and parser cache value. The same way language links are. I think it's called extension options. @daniel would know for sure

In T356159#9506670, @Ladsgroup wrote:

If true, it's not fault of this extension but needs to be fixed regardless.

Anything I can do to assist?

Parsing is...complicated. I'd say I'll have to bother CTT on this. Give me a bit. This is more work than usual. We can unblock this extension by simply putting that into PC but for other cases, I'll need to find a solution.

In T356159#9508915, @Ladsgroup wrote:

It should be in ParserCache. This should be basically and parser cache value. The same way language links are. I think it's called extension options. @daniel would know for sure

You can attach data to the ParserOutput using the setExtensionData() emthod. The data can be a nested array, but must not contain objects.

But this must happen before the output gets written to the parser cache, probably in ParserAfterParseHook. It is currently not easily possible to update a ParserCache entry once it has been saved.

In T356159#9509041, @daniel wrote:

In T356159#9508915, @Ladsgroup wrote:

It should be in ParserCache. This should be basically and parser cache value. The same way language links are. I think it's called extension options. @daniel would know for sure

You can attach data to the ParserOutput using the setExtensionData() emthod. The data can be a nested array, but must not contain objects.

Yup, that was it.

But this must happen before the output gets written to the parser cache, probably in ParserAfterParseHook.

Yeah, the deployment is blocked unless this happens. Someone needs to do it but not me.

It is currently not easily possible to update a ParserCache entry once it has been saved.

That's an acceptable compromise for the sake of performance

TTO mentioned this in T61245: Review the PageNotice extension for deployment.Feb 9 2024, 2:49 AM

Novem_Linguae subscribed.Feb 9 2024, 2:55 AM

@daniel @Ladsgroup Does this mean we should switch PageNotices to be in the page content language? Not sure if it's possible to put userlang stuff in PC. Right now they are rendered in user language but should be fine to switch to content language as it's a new feature.

Yeah it should be page language. I don't think it's reasonable to make this be parsed on the fly on every page view :(

In T356159#9527878, @SD0001 wrote:

@daniel @Ladsgroup Does this mean we should switch PageNotices to be in the page content language? Not sure if it's possible to put userlang stuff in PC. Right now they are rendered in user language but should be fine to switch to content language as it's a new feature.

It's possibel to split the parser cache by user language. All you have to do is call getUserLangObj() on the ParserOptions object. But we'd be duplicating the actual page content in the cache... For a small-ish number of pages, that would be ok I guess.

This extension is going to be showing a notice over every page in the wiki (mw message per ns)

The part of PageNotice extension that provides page-specific notices is not planned to be used in WMF prod, but only the namespace-level notices. They can use variables like {{PAGENAME}} so a cache strategy like that of core sitenotices (which are parsed once and reused across all pages) doesn't work. @daniel @Ladsgroup Can you confirm if parser caching is still workable? I am at a loss to understand how cache invalidation would work:

When a namespace notice is updated, we can't purge the PC of all pages in the namespace as that could be in the millions.
We could instead store the rev_id of the notice in the page's PC, and in the ArticleViewHeaderHook, check if the cached rev_id is still the latest. If not, ignore the cached notice and parse the notice now. In a deferred update, purge the page so that a cached notice would be available the next time.
- However, this only works for logged-in views. For anonymous views, the entire HTML which includes the notices are served from the CDN cache, so users would see stale notices on pages which haven't been edited/purged after the notice update.

In T356159#9529575, @SD0001 wrote:

The part of PageNotice extension that provides page-specific notices is not planned to be used in WMF prod, but only the namespace-level notices. They can use variables like {{PAGENAME}} so a cache strategy like that of core sitenotices (which are parsed once and reused across all pages) doesn't work. @daniel @Ladsgroup Can you confirm if parser caching is still workable? I am at a loss to understand how cache invalidation would work:

When a namespace notice is updated, we can't purge the PC of all pages in the namespace as that could be in the millions.

Yeah exactly, the idea is that we just don't and let it get regenerated with refreshlinks/edits/expiry. I don't think page notices are time sensitive (similar to templates). I could be wrong though.

Parsing the message in every pageview (status quo) is also quite taxing and that's why needs to be addressed.

We could instead store the rev_id of the notice in the page's PC, and in the ArticleViewHeaderHook, check if the cached rev_id is still the latest. If not, ignore the cached notice and parse the notice now. In a deferred update, purge the page so that a cached notice would be available the next time.

However, this only works for logged-in views. For anonymous views, the entire HTML which includes the notices are served from the CDN cache, so users would see stale notices on pages which haven't been edited/purged after the notice update.

That could work but templates used in the page notice won't trigger an update but maybe an acceptable loss.

In T356159#9533353, @Ladsgroup wrote:

I don't think page notices are time sensitive (similar to templates).

I don't think they are either. What is the expiry duration of the parser cache?

In T356159#9537149, @TTO wrote:

I don't think they are either. What is the expiry duration of the parser cache?

Three weeks, I think.

Very annoying when the notice is wrong for some reasons.

Can we take a couple of steps back?

This is injecting a system message into the page output, depending on the namespace.
We inject dozens of system messages in the user's language into the output on every request to build the skin.

I don't see why this is a problem, or why it needs caching. I mean, if the message is slow to parse, then we'd run into problems.

I am not very familiar with MessageCache. I was assuming that it would also cache parsed messages. But i'm not seeting the relevant code. Perhaps @Nikerabbit or @abi_ can shed some light.

daniel added subscribers: abi_, Nikerabbit.Feb 13 2024, 11:26 AM

In T356159#9533353, @Ladsgroup wrote:

Yeah exactly, the idea is that we just don't and let it get regenerated with refreshlinks/edits/expiry. I don't think page notices are time sensitive (similar to templates). I could be wrong though.

Well, when a namespace notice is created (or updated), one would expect to see it on all pages – not just on newly edited ones. I think it will be very confusing otherwise.

In T356159#9537164, @daniel wrote:

This is injecting a system message into the page output, depending on the namespace.
We inject dozens of system messages in the user's language into the output on every request to build the skin.

I don't see why this is a problem, or why it needs caching.

I concur with this, although Ladsgroup seems to disagree :) We already have a couple of system messages that act like page notices: Scribunto-doc-page-show (shown on module pages) and Clearyourcache (shown on user/site level CSS/JS/JSON pages), that wikis customise heavily. I understand those pages are back-areas, and other namespaces like ns0 get a lot more traffic, but most of those hits would be anonymous and served from Varnish. I would prefer to see some data showing performance impact before we make premature optimizations. It seems clear now that using the parser cache would be a hacky solution.

In T356159#9537164, @daniel wrote:

Can we take a couple of steps back?

This is injecting a system message into the page output, depending on the namespace.
We inject dozens of system messages in the user's language into the output on every request to build the skin.

They are from software and thus don't have complex templates with lots of lookups and such, and also should be cached (to the degree possible) but currently aren't.

I don't see why this is a problem, or why it needs caching. I mean, if the message is slow to parse, then we'd run into problems.

This message would come from user input. The chance of users not knowing the impact of this and accidentally bringing down the site is quite high.

I am not very familiar with MessageCache. I was assuming that it would also cache parsed messages. But i'm not seeting the relevant code. Perhaps @Nikerabbit or @abi_ can shed some light.

Nope, I checked multiple times, we cache the content of message but not the parsed value. We should :D (in cases that it doesn't involve fancy stuff like user lang etc.)

They can use variables like {{PAGENAME}} so a cache strategy like that of core sitenotices (which are parsed once and reused across all pages) doesn't work.

Note: Since we can use such variables, some wikis may configure it more heavily once it is available. One example (copied from T138652) is to automatically show content from Wikidata (e.g. the table for authority control), so it will have completely different (and not cheap to get) content in different pages.

In T356159#9527878, @SD0001 wrote:

@daniel @Ladsgroup Does this mean we should switch PageNotices to be in the page content language? Not sure if it's possible to put userlang stuff in PC. Right now they are rendered in user language but should be fine to switch to content language as it's a new feature.

I think content language actually even makes more sense from a user point of view. Since this is some kind of content, maintained by wiki administrators, they probably won’t want to maintain multiple subpages. On monolingual wikis, admins won’t bother maintaining it in any language other than the content language. On multilingual wikis, probably an {{int:lang}}-based hack will be used instead of duplicating the whole notice in multiple languages – the extension should be prepared for the case that the message content splits the cache by language, but it doesn’t have to worry about the user language on its own.

One thing to consider is that this is page notice (header and footer) and not site notice. So it makes sense to be dependent on the page. Whether it's authority control data or the message's language.

If there is a need for a general purpose message, that should go to the sitenotice.

In T356159#9539031, @Ladsgroup wrote:

So it makes sense to be dependent on the page. Whether it's […] the message's language.

MediaWiki has three notions of languages:

Content language – this is defined in LocalSettings.php and the same for all users and on all pages.
Interface language – this defaults to the content language but can be changed in the preferences or using the uselang URL parameter.
Page language – this also defaults to the content language, can be changed with Special:PageLanguage or by hooks. It’s sometimes changed to the interface language (e.g. on special pages), though this is not well supported and can lead to cache pollution.

Language category	Specific to	Accessible in wikitext with
Content	wiki	`{{CONTENTLANGUAGE}}` (supported)
Interface	user (mostly)	`{{int:lang}}` (hack, only if set up, e.g. on Commons; splits the parser cache by interface language if used)
Page	page (ideally)	`{{PAGELANGUAGE}}` (supported)

Currently the extension uses the interface language, which is dependent on the user, not the page. I propose that it uses the content language, and if a notice needs to do things differently based on the page language – which won’t be common I think –, it can use {{PAGELANGUAGE}} (of course, the notice needs to be parsed in the context of the page).

Exactly, that's what I'm saying :)

Then it’s okay. Your message sounded like you want to contradict me.

Ivi104 subscribed.Sep 5 2024, 8:12 PM

	F41740410: grafik.png
	Feb 1 2024, 4:29 PM

Performance review for PageNotice extensionOpen, Needs TriagePublicActions

Description

Related ObjectsSearch...

Event Timeline

Performance review for PageNotice extension
Open, Needs TriagePublic
Actions

Related Objects
Search...