AI知道明年的韩国CSAT将出现哪些单词

论文标题

AI知道明年的韩国CSAT将出现哪些单词

AI Knows Which Words Will Appear in Next Year's Korean CSAT

论文作者

Ban, Byunghyun, Lee, Jejong, Hwang, Hyeonmok

论文摘要

本文介绍了一种基于文本挖掘的单词类分类方法和基于LSTM的词汇模式预测方法。首先描述了基于简单的文本外观频率分析的预处理方法。该方法是作为数据筛选工具开发的，但显示出比以前的工作高4.35〜6.21倍。还建议使用LSTM深度学习方法进行词汇外观模式预测方法。 AI执行具有先前考试的各种数据窗口的回归，以预测下一项考试中单词外观的概率。在各种数据窗口上的AI的预测值作为加权总和将其处理为单个分数，我们称之为“ AI得分”，这代表了明年考试中单词外观的概率。建议的方法在100分区域的范围内显示出100％的准确性，在分数超过60点的部分中仅显示1.7％的预测误差。所有源代码均可在作者的Git Hub存储库中免费获得。（https://github.com/needleworm/bigdata_voca）

A text-mining-based word class categorization method and LSTM-based vocabulary pattern prediction method are introduced in this paper. A preprocessing method based on simple text appearance frequency analysis is first described. This method was developed as a data screening tool but showed 4.35 ~ 6.21 times higher than previous works. An LSTM deep learning method is also suggested for vocabulary appearance pattern prediction method. AI performs a regression with various size of data window of previous exams to predict the probabilities of word appearance in the next exam. Predicted values of AI over various data windows are processed into a single score as a weighted sum, which we call an "AI-Score", which represents the probability of word appearance in next year's exam. Suggested method showed 100% accuracy at the range 100-score area and showed only 1.7% error of prediction in the section where the scores were over 60 points. All source codes are freely available at the authors' Git Hub repository. (https://github.com/needleworm/bigdata_voca)

下载PDF全文

下载文献需遵守相关版权规定

论文标题