The last image suggestions pipeline run resulted in no ALIS data.
Context
- Missing wmf_raw.mediawiki_revision's (ALIS's upstream dependency) 2023-07 snapshot, see thread 1 and thread 2
- 2023-07_backup snapshot available
- 2 pipeline runs timed out waiting for 2023-07 and got skipped, i.e., 2023-07-31 and 2023-08-07
- forced the execution of 2023-08-14 with the 2023-07_backup snapshot
- the pipeline succeeded
- no ALIS data
Tasks
- rename all 2023-08-14 partitions to no_alis
- set the previous_weekly_snapshot DAG property to no_alis
- let the wmf_raw.mediawiki_revision sensor point to the 2023-06 snapshot
- clear the last execution
Still no ALIS.
- reproduce the same execution in an Airflow test instance
- add a breakpoint where the wmf_raw.mediawiki_revision is read
- debug the execution
Outcome
- The upstream dependency issue doesn’t seem to be the cause
- no ALIS even with the older snapshot
- a debug session resulted in non-empty data joined with that dependency
- manually ran the single ALIS task and suggestions are there!
Recovery plan
- recompute a proper full index by manually running the pipeline in the Airflow test instance
- copy the output to the production DB
- let 2023-08-21's production run compute the proper delta