Parent task for the New Pages Feed work to allow pages in the feed to be filtered by an automated prediction of their likelihood to contain copyright violations ("copyvio").
Allowing pages to be filtered by copyvio prediction is one of three major work items for T193782. Please see that epic or this task's subtasks for more information.
The following are acceptance criteria for this copyvio work:
- All the criteria below apply equally to the NPP and AfC sides of the New Pages Feed.
- All pages in the New Pages Feed that are in the Article and Draft spaces should be scanned according to the rules of CopyPatrol. User space pages do not need to be scanned. If the rules of CopyPatrol don't scan a page (for instance, because it is too small), then that is okay.
- When this feature is put into production, the several thousand legacy pages that are in the New Pages Feed at that time should also have the results of their scans. Ticketed in T203207.
- When CopyPatrol identifies any diff to have over 50% copyvio (and therefore be present in the CopyPatrol interface), the page of that diff should have an indicator in New Pages Feed that says "Copyvio". This is true whether it is the initial revision of the page or a subsequent edit.
- That "Copyvio" indicator should be a bold blue link alongside "Potential issues:" as shown in the image below. If there are other issues found by the ORES draftquality model, those should be listed first and separated with a dot. The implementation in that image below is the correct evolution from the original mockup on T202161.
- That bold blue link should open a new tab with a permanent-link CopyPatrol page that lists all the CopyPatrol results for revisions of that page. The ticket for testing this is T203120.
- "Copyvio" should be added as a checkbox option under a "Potential issues" heading in the "Set filters" menu of the New Pages Feed. The order of the issues in the checkbox list should put "Copyvio" second to last, before "None". It should also be listed the same as the other ORES issues in the "Showing:" component of the interface above the "Set filters" menu.
- The ORES and copyvio "Potential issues" should be treated like an "OR". For instance, if a user selects "Vandalism", "Attack", and "Copyvio", the feed should list all pages that have any of those three potential issues.
- Copyvio checking should populate the New Pages Feed in under a minute. If it takes longer than a minute, we will need to consider a way to indicate which pages have not been checked yet. This is being investigated in T202914.