Revealing hidden structures behind sentences｜Exhibition Program｜NTT Communication Science Laboratories OPEN HOUSE 2021

Exhibition Program

Science of Communication and Computation

10	Revealing hidden structures behind sentences Neural rhetorical structure parsing with pseudo-labeled data

Abstract

This poster presents a method to identify the hidden structures of documents. Each document has a rhetorical structure, which expresses the relations among clauses. Since building a rhetorical structure parser is based on supervised learning, it requires large amounts of manually annotated training data for accurate parsing. However, conventional methods suffer from a lack of training data, resulting in poor performance because manual annotation is quite labor intensive. To tackle this problem, we propose a method that uses silver data: automatically annotated pseudo-labeled data. We pre-trained the parser with silver data and fine-tuned it with gold data: manually annotated data. Our experimental results demonstrated that our method achieved the best performance. The new parser will contribute to various natural language processing applications, such as machine translation and automatic summarization.

Revealing hidden structures behind sentences

References

[1] N. Koabayashi, T. Hirao, H. Kamigaito, M. Okumura, M. Nagata, “Improving Neural RST Parsing Model with Silver Agreement Subtrees,” in Proc. 2021 Annual Conference of the Noth American Chapter of the Association for Computational Linguistics, 2021.

Poster

Please click the icon to open the full-size PDF file.

Contact

Tsutomu Hirao / Linguistic Intelligence Research Group, Innovative Communication Laboratory
Email: cs-openhouse-ml@hco.ntt.co.jp

Click here for other research exhibits

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Revealing hidden structures behind sentences

Neural rhetorical structure parsing with pseudo-labeled data

Contact

Download