Authors:
Reemt Hinrichs
;
Angelo Sitcheu
and
Jörn Ostermann
Affiliation:
Institut für Informationsverarbeitung/L3S Research Center, Leibniz University Hannover, Appelstr. 9a, Hannover, Germany
Keyword(s):
Transformer, Continuous Sign-Language Recognition, Pose Estimation.
Abstract:
Sign language is used by deaf to communicate with other humans. It consists of not only hand signs or gestures but encompasses also facial expressions and further body movements. To make machine-human interaction accessible for deaf, automatic sign language recognition has to be implemented which allows a machine to understand the signs and gestures of deaf. For this purpose, continous sign-language recognition, which is the mapping of a (visual) sequence of signs forming a (sign) sentence to a sequence of (text) words, has to be developed. In this work, continuous sign-language recognition using transformers is proposed. Using additional pose estimation, body markers are extracted and augmented through data imputation and velocity-like features, and then used together with a transformer network for continuous sign-language recognition. Using the proposed method, better than state-of-the-art results were obtained on the RWTH-PHOENIX-Weather 2014 dataset, achieving 19.2%/19.5% dev/tes
t word error rate (WER) on the signer-independent subset and 16.9%/17.4% dev/test WER on the simpler multi-signer subset. The feature augmentation was found to improve the baseline word error rate by about 2.7 %/ 2.9 % dev/test.
(More)