https://en.wikipedia.org/w/index.php?title=N%C3%A9lida_Sifuentes_Cueto&action=edit&oldid=678515032
created with
<span class="cx-segment cx-highlight" data-segmentid="78"></span>
https://en.wikipedia.org/w/index.php?title=N%C3%A9lida_Sifuentes_Cueto&action=edit&oldid=678515032
created with
<span class="cx-segment cx-highlight" data-segmentid="78"></span>
To be able to measure it, I requested a new AbuseFilter in the English Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Edit_filter/Requested#Pages_created_by_ContentTranslation_with_unnecessary_syntax
OK, the AbuseFilter now works and you can see the results here:
https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchFilter=765
From analyzing some pages that had these unnecessary tags in the last few days, I can see that they tend to appear in the beginning or the end of the paragraph, but I don't know if it happens always.
Change 294030 had a related patch set uploaded (by Santhosh):
Improve the HTML preparation before publishing
This is still happening: https://en.wikipedia.org/w/index.php?title=%C3%89lizabeth_Teissier&action=edit&oldid=726956309
The translation of this article definitely began after the deployment of https://gerrit.wikimedia.org/r/#/c/294030/ .
You can find more examples at https://en.wikipedia.org/w/index.php?title=Special:AbuseLog&wpSearchFilter=765 .
cx-segment elements are removed by replacing them with their HTML content:
$section.find( '.cx-segment' ).replaceWith( function () { return $( this ).html(); } );
Recently, when I find articles that have the <span class="cx-segment"> after publishing, they usually appear in the ends of the paragraphs and they are empty. Maybe if they are empty, the replaceWith() function doesn't do anything?
Another curiosity, which may or may not be related: If you go to https://en.wikipedia.org/w/index.php?title=%C3%89lizabeth_Teissier&action=edit&oldid=726956309 , preview the page and inspect the second paragraph, there will be a <span class="cx-segment" data-segmentid="87"></span> element in the wiki source, but it won't be in the DOM of the previewed content.
Change 296216 had a related patch set uploaded (by Santhosh):
Really remove empty segment spans
I cannot see any new appearances of this in the English Wikipedia during the last week, so this is certainly resolved.
This appears to be happening again:
https://en.wikipedia.org/w/index.php?title=University_of_Dhaka&diff=prev&oldid=880832213
The above may have been copied from some version of:
Also happened here:
https://en.wikipedia.org/w/index.php?title=Batalo&diff=prev&oldid=844455310
and here:
https://en.wikipedia.org/w/index.php?title=Miyu_Kubota&diff=prev&oldid=852629666
This article is not created using CX(if so, it will have CX related tags, edit summary would be different etc). So there is a chance that the user copied the content from CX editor and pasted in VE another browser tab? A case of T220495: Content copied from Content Translation into Visual Editor exposes internal attributes
@santhosh: Hi! This task has been assigned to you a while ago. Do you still plan to work on this task? Thanks! :)
This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!
For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)