Named entity recognition in chinese electronic medical records based on large language models-ZENTIME PUBLISHING CORPORATION LIMITED

Home Metaverse in Medicine All issues 产学研融合/Integration of IUR

基于大语言模型的中文电子病历命名实体识别

Named entity recognition in chinese electronic medical records based on large language models

程洁，刘独玉*，陈思旭，钱姝妤

西南民族大学电气工程学院，四川成都 610041

［作者简介］程洁，硕士研究生. E-mail： 19822909065@163.com

* 通信作者（Corresponding author）. Tel:15828397145, E-mail: liuduyu10000@163.com

［收稿日期］ 2025-08-14 ［接受日期］ 2025-09-16［发表日期］ 2025-09-30

伦理声明 无。

利益冲突 所有作者声明不存在利益冲突。

作者贡献 程洁：论文选题、撰稿；刘独玉：论文修改；陈思旭：论文修改；钱姝妤：共同标注数据。

DOI: https://doi.org/10.61189/502047ilpumd

Abstract

命名实体识别（NER）作为自然语言处理（NLP）的核心任务，在电子病历（EMRs）中识别疾病、症状等医学实体，对临床辅助决策和医学知识库构建意义重大。但传统方法依赖大量标注数据与复杂模型，训练及推理成本较高。本文提出一种融合语义检索与提示学习的大语言模型生成式医学NER方法。首先，构建句子级向量数据库，对电子病历进行语义编码以实现可检索表示；然后，基于输入语句进行语义相似度检索，将相似示例动态注入提示模板，引导模型完成实体抽取；最后，通过结构化特殊标记生成实体类型标注结果，实现直接解码输出。实验表明，该方法在自建电子病历数据集和瑞金医院糖尿病数据集上均表现良好，尤其在低资源场景下具备较强的鲁棒性与迁移能力。

Named Entity Recognition, as a core task in Natural Language Processing, plays a crucial role in identifying medical entities such as diseases and symptoms in Electronic Medical Records, which is of great significance for clinical decision support and the construction of medical knowledge bases. However, traditional methods rely heavily on large amounts of annotated data and complex models, resulting in high training and inference costs. This paper proposes a generative medical NER method that integrates semantic retrieval and prompt learning with large language models. First, a sentence-level vector database is constructed to semantically encode EMRs for retrievable representations. Then, based on the input sentence, semantic similarity retrieval is performed, and similar examples are dynamically injected into a prompt template to guide the model in entity extraction. Finally, entity type annotation results are generated through structured special markers, enabling direct decoding output. Experimental results demonstrate that the proposed method performs well on both a self-constructed EMR dataset and the Ruijin Hospital diabetes dataset, and exhibits strong robustness and transferability, especially in low-resource scenarios.

Keywords: 命名实体识别；电子病历；大语言模型 / named entity recognition; electronic medical records; large language models

Download

Cite

Views

603

Downloads

Lastest Issue

CONTENTS IN BRIEF How can GPT be applied to empower the physical and mental health of high school students CONTENTS IN BRIEF Exploration of the application of did in metaverse medicine CONTENTS IN BRIEF The reconstruction of the medical research paradigm by artificial intelligence CONTENTS IN BRIEF AI empowers future medicine and opens a new chapter in the medical model My opinion on the development and application of MGPT