A training technique where selected tokens in an input sequence are obscured, requiring the model to predict them based on the remaining contextual information.
A training technique where selected tokens in an input sequence are obscured, requiring the model to predict them based on the remaining contextual information.