论文标题

数据驱动的正则表达式使用遗传编程进行医学文本分类的演变

Data-Driven Regular Expressions Evolution for Medical Text Classification Using Genetic Programming

论文作者

Liu, J, Bai, R, Lu, Z, Ge, P, Liu, D, Aickelin, Uwe

论文摘要

在医疗领域,文本分类是可以通过结构化信息数字化和智能决策支持大大减少人类工作量的最重要任务之一。尽管基于学习的文本分类技术很普遍,但由于学习的黑匣子性质,人类很难理解或手动调整分类结果,以获得更好的精确度和回忆。这项研究提出了一种基于正则表达式的新型文本分类方法,利用遗传编程(GP)方法进化了正则表达式,该方法可以以令人满意的精度对给定的医学文本查询进行分类,同时允许人类在必要时读取分类器并相应地读取分类器并进行微调。鉴于正则表达式的种子种群(可以由专家进行随机初始化或手动构建),我们的方法使用新颖的正则表达语法和一系列精心选择的复制操作员,根据所选健身函数将正则表达式群体进化。我们的方法通过在线医疗保健提供商的现实医学文本查询进行评估,并显示出令人鼓舞的表现。更重要的是,我们的方法生成了可以完全理解,检查和更新医生的分类器,这对于与医学相关的实践至关重要。

In medical fields, text classification is one of the most important tasks that can significantly reduce human workload through structured information digitization and intelligent decision support. Despite the popularity of learning-based text classification techniques, it is hard for human to understand or manually fine-tune the classification results for better precision and recall, due to the black box nature of learning. This study proposes a novel regular expression-based text classification method making use of genetic programming (GP) approaches to evolve regular expressions that can classify a given medical text inquiry with satisfactory precision and recall while allow human to read the classifier and fine-tune accordingly if necessary. Given a seed population of regular expressions (can be randomly initialized or manually constructed by experts), our method evolves a population of regular expressions according to chosen fitness function, using a novel regular expression syntax and a series of carefully chosen reproduction operators. Our method is evaluated with real-life medical text inquiries from an online healthcare provider and shows promising performance. More importantly, our method generates classifiers that can be fully understood, checked and updated by medical doctors, which are fundamentally crucial for medical related practices.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源