论文标题
一个新的中文单词分割的聚类神经网络
A New Clustering neural network for Chinese word segmentation
论文作者
论文摘要
In this article I proposed a new model to achieve Chinese word segmentation(CWS),which may have the potentiality to apply in other domains in the future.It is a new thinking in CWS compared to previous works,to consider it as a clustering problem instead of a labeling problem.In this model,LSTM and self attention structures are used to collect context also sentence level features in every layer,and after several layers,a clustering model is applied to split characters into groups,which are the final分割结果。我称此型号CLNN。该算法可以达到F分数的98%(无OOV单词),在培训数据集中达到85%至95%的F分数(带有OOV单词)。ERROR分析表明,OOV单词将大大降低表现,这在将来需要更深入的研究。
In this article I proposed a new model to achieve Chinese word segmentation(CWS),which may have the potentiality to apply in other domains in the future.It is a new thinking in CWS compared to previous works,to consider it as a clustering problem instead of a labeling problem.In this model,LSTM and self attention structures are used to collect context also sentence level features in every layer,and after several layers,a clustering model is applied to split characters into groups,which are the final segmentation results.I call this model CLNN.This algorithm can reach 98 percent of F score (without OOV words) and 85 percent to 95 percent F score (with OOV words) in training data sets.Error analyses shows that OOV words will greatly reduce performances,which needs a deeper research in the future.