fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP

Geng, Zhichao; Yan, Hang; Qiu, Xipeng; Huang, Xuanjing

Computer Science > Computation and Language

arXiv:2009.08633 (cs)

[Submitted on 18 Sep 2020 (v1), last revised 31 May 2021 (this version, v2)]

Title:fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP

Authors:Zhichao Geng, Hang Yan, Xipeng Qiu, Xuanjing Huang

View PDF

Abstract:We present fastHan, an open-source toolkit for four basic tasks in Chinese natural language processing: Chinese word segmentation (CWS), Part-of-Speech (POS) tagging, named entity recognition (NER), and dependency parsing. The backbone of fastHan is a multi-task model based on a pruned BERT, which uses the first 8 layers in BERT. We also provide a 4-layer base model compressed from the 8-layer model. The joint-model is trained and evaluated on 13 corpora of four tasks, yielding near state-of-the-art (SOTA) performance in dependency parsing and NER, achieving SOTA performance in CWS and POS. Besides, fastHan's transferability is also strong, performing much better than popular segmentation tools on a non-training corpus. To better meet the need of practical application, we allow users to use their own labeled data to further fine-tune fastHan. In addition to its small size and excellent performance, fastHan is user-friendly. Implemented as a python package, fastHan isolates users from the internal technical details and is convenient to use. The project is released on Github.

Comments:	ACL2021 Demo Track
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2009.08633 [cs.CL]
	(or arXiv:2009.08633v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2009.08633

Submission history

From: Zhichao Geng [view email]
[v1] Fri, 18 Sep 2020 05:41:52 UTC (169 KB)
[v2] Mon, 31 May 2021 03:54:02 UTC (5,461 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2020-09

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Hang Yan
Xipeng Qiu
Xuanjing Huang

export BibTeX citation

Computer Science > Computation and Language

Title:fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:fastHan: A BERT-based Multi-Task Toolkit for Chinese NLP

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators