命名实体识别(NER)作为自然语言处理(NLP)的核心任务,在电子病历(EMRs)中识别疾病、症状等医学实体,对临床辅助决策和医学知识库构建意义重大。但传统方法依赖大量标注数据与复杂模型,训练及推理成本较高。本文提出一种融合语义检索与提示学习的大语言模型生成式医学NER方法。首先,构建句子级向量数据库,对电子病历进行语义编码以实现可检索表示;然后,基于输入语句进行语义相似度检索,将相似示例动态注入提示模板,引导模型完成实体抽取;最后,通过结构化特殊标记生成实体类型标注结果,实现直接解码输出。实验表明,该方法在自建电子病历数据集和瑞金医院糖尿病数据集上均表现良好,尤其在低资源场景下具备较强的鲁棒性与迁移能力。
Named Entity Recognition, as a core task in Natural Language Processing, plays a crucial role in identifying medical entities such as diseases and symptoms in Electronic Medical Records, which is of great significance for clinical decision support and the construction of medical knowledge bases. However, traditional methods rely heavily on large amounts of annotated data and complex models, resulting in high training and inference costs. This paper proposes a generative medical NER method that integrates semantic retrieval and prompt learning with large language models. First, a sentence-level vector database is constructed to semantically encode EMRs for retrievable representations. Then, based on the input sentence, semantic similarity retrieval is performed, and similar examples are dynamically injected into a prompt template to guide the model in entity extraction. Finally, entity type annotation results are generated through structured special markers, enabling direct decoding output. Experimental results demonstrate that the proposed method performs well on both a self-constructed EMR dataset and the Ruijin Hospital diabetes dataset, and exhibits strong robustness and transferability, especially in low-resource scenarios.
关键词/Keywords: 命名实体识别;电子病历;大语言模型 / named entity recognition; electronic medical records; large language models