When the import fails, we're probably left with a corrupted file page, either having the text revisions but no file, or a partial history. This file may not be properly marked as produced by FileImporter, since we do that in the final or penultimate step. We should try to delete the bad page after a failed import, or otherwise mark for deletion if the importing user doesn't have the rights to delete.
If we tag every imported revision, it will be possible to detect incomplete imports, as well as build a comprehensive list of all edits made by FileImporter on the target wiki.
Acceptance criteria
[] Always delete all zombie file page revisions ( and related file revisions ) that were created by failed FileImporter after a failed imports
- Alternatively add a template (or tag) to find all imports, and make it possible to mark zombie files for deletion
Demo instructions
- Perform an import on the beta cluster.
- Look at the imported's page's history, to verify that the earliest imported revision has a FileImporter tag.
Follow-up
- Write a maintenance script which cleans up zombie revisions.