论文标题

方法论上的建议,通过随机探索来识别Twitter用户的国籍

Methodological proposal to identify the nationality of Twitter users through Random-Forests

论文作者

Quijano, Damián, Gil-Herrera, Richard

论文摘要

我们披露了一种方法,以确定与本地关系(例如国籍)的讨论及其在社交网络中的贡献中的参与者,从而在过程中提供了一定水平的信任和效率。动态是一个挑战,要求研究和最近解决方案的一些近似值。该研究解决了在意见请求之前(政治性质和社会参与),在Twitter社交网络中识别用户国籍的问题。采用的方法论通过机器学习分类,Twitter用户的国籍在三个中美洲国家进行意见研究。随机森林算法用于使用小型训练样本生成分类模型,该模型基于用户之间不同相互作用的次数,使用了仅数值特征。当通过推断每个国家 /地区国民比率所达到的比例时,在初始数据中,平均计算了77.40%,而应用自动分类模型后平均为91.60%,平均增加14.20%。总之,可以看出,建议的一组方法在面对意见问题时提供了合理的方法和效率。

We disclose a methodology to determine the participants in discussions and their contributions in social networks with a local relationship (e.g., nationality), providing certain levels of trust and efficiency in the process. The dynamic is a challenge that has demanded studies and some approximations to recent solutions. The study addressed the problem of identifying the nationality of users in the Twitter social network before an opinion request (of a political nature and social participation). The employed methodology classifies, via machine learning, the Twitter users' nationality to carry out opinion studies in three Central American countries. The Random Forests algorithm is used to generate classification models with small training samples, using exclusively numerical characteristics based on the number of times that different interactions among users occur. When averaging the proportions achieved by inferences of the ratio of nationals of each country, in the initial data, an average of 77.40% was calculated, compared to 91.60% averaged after applying the automatic classification model, an average increase of 14.20%. In conclusion, it can be seen that the suggested set of method provides a reasonable approach and efficiency in the face of opinion problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源