首页/Home | 帮助中心/Help Center

Endless possibilities in academia

面向专病电子病历的实体语料库构建方法

Construction methodology of entity corpus for special diseases electronic medical records

陈思旭1,刘独玉1*,谭小琴1,齐 星1,罗 彬2

1. 西南民族大学电气工程学院,成都 610041 

2. 四川互慧软件有限公司,绵阳 621000


[作者简介] 陈思旭,硕士研究生.  E-mail: chen1296647721@163.com 

* 通信作者(Corresponding author.  Tel: 15828397145, E-mail: liuduyu10000@163.com

[收稿日期] 2024-07-07 [接受日期] 2024-09-02 [发表日期]2024-09-30

[基金项目] 国家重点研发计划(2021YFF0704100),西南民族大学中央高校基本科研业务费专项资金(2023NYXXS016). Supported by State Key Research and Development Project (2021YFF0704100),the Fundamental Research Funds for the Central Universities,Southwest Minzu University(2023NYXXS016).


伦理声明 无。 

利益冲突 所有作者声明不存在利益冲突。 

作者贡献 陈思旭:论文选题、撰稿;刘独玉、谭小琴、齐 星、罗彬:论文修改。

DOI:https://doi.org/10.61189/409428oucija

摘要/Abstract

本研究针对医疗领域中电子病历命名实体识别任务资源匮乏问题,在医学专家的指导下制定了统一的专病实体语料库标注方法,并构建了2种专病实体语料库——儿童支气管肺炎实体语料库和糖尿病实体语料库。在 BERT-BiLSTMCRF 和 ERNIE-BiLSTM-CRF 模型上,将儿童支气管肺炎实体语料库与公开数据集进行比较,验证本文提出的专病实体语料库标注方法的有效性;再将专病实体语料库标注方法重新应用于糖尿病电子病历,以评价模型鲁棒性。模型验证结果显示:2种自建专病实体语料库的F1值均优于公开数据集,说明本文提出的专病实体语料库标注方法的鲁棒性。

Addressing the issue of resource scarcity for named entity recognition tasks in the medical field, a unified annotation methodology for special diseases entity corpora was formulated under the guidance of medical experts, and two special diseases entity corpora were constructed, namely Pediatric Bronchopneumonia Entity Corpus and Diabetes Entity Corpus. To verify the effectiveness of the proposed special disease entity corpus annotation method, the Pediatric Bronchopneumonia Entity Corpus was first compared with the publicly available dataset using BERT-BiLSTM-CRF and ERNIE-BiLSTM-CRF models. Then, the methodology was reapplied to diabetes electronic medical records to evaluate the robustness of the model. The results showed that both special diseases entity corpora got higher F1 scores than the public datasets, which suggests that special diseases entity corpus annotation methodology proposed in this paper has good robustness.

关键词/Keywords: 电子病历;命名实体识别;语料库构建;儿童支气管肺炎实体语料库;糖尿病实体语料库 / electronic medical record; named entity recognition; corpus construction; Pediatric Bronchopneumonia Entity Corpus; Diabetes Entity Corpus

陈思旭,刘独玉,谭小琴,等.  面向专病电子病历的实体语料库构建方法[J]. 元宇宙医学,2024,1(3):41-46. 

CHEN SX, LIU DY, TAN XQ, et al. Construction methodology of entity corpus for special diseases electronic medical records[J]. Metaverse Med, 2024, 1(3):41-46.

复制/Copy