The outlink topic model is be able to do topic detection in all language we should use this data to possibly drop the need to import ORES articletopics model predictions. Predictions should be available in their own stream thanks to the work done in T315994.
The goal of this ticket is to stop using the ORES articletopics predictions in favor of the new outlink topic model that does support all languages out of the box.
This ticket in its current shape does only list the few questions that I could think of but @EBernhardson as the author of this data pipeline might have more ideas/questions on how to achieve this.
Questions:
- Do we want to replace&reuse the existing articletopics weighted_tags prefix with the data obtained from this model? existing ORES predictions will stay until the page is edited and have the new predictions pushed.
-- If not we will have to backfill but also cleanup the ORES prediction from the production index, do we have a clear procedure for this?
- The articletopics model requires some thresholds to be computed to better filter the predictions (https://gist.github.com/halfak/630dc3fd811995c2a0260d43da462645), does the outlink topic model requires something similar or can we simplify this by using static thresholds for all predictions (e.g. hardcode 0.5)?
- We also import the ORES drafttopics model predictions, can they be replaced with the outlink topic model ones?
AC:
- outlink topic model predictions are pushed to the CirrusSearch indices and are queryable (via the existing articletopics keyword or a new one)
- have migration steps for the Growth team if a new keyword is required or if the set of predictable topics are differents
- possibly stop using the ORES drafttopics model in favor of this one