论文标题
我们都在杜鲁门演出中吗?通过自我训练发现Instagram人群
Are We All in a Truman Show? Spotting Instagram Crowdturfing through Self-Training
论文作者
论文摘要
有影响力的营销在2022年产生了160亿美元。通常,更受欢迎的影响者的合作获得了更多。因此,创建了许多服务,以通过机器人或假帐户来提高配置文件的受欢迎程度。但是,真正的人最近开始使用其真正的货币奖励的真实账户参加此类增强活动,从而产生了难以发现的无生成内容。迄今为止,还没有尝试在Instagram上发现这种新现象,即CrowdTurfing(CT)。 在这项工作中,我们提出了第一个Instagram CT参与探测器。我们的算法通过半监督的学习来利用配置文件的特征,以发现参与CT活动的帐户。与迄今为止用于识别伪造帐户的监督方法相比,半监督模型可以利用大量未标记的数据来提高性能。我们从11个提供商购买并研究了1293个CT配置文件,以构建我们的自我训练分类器,达到95 \%F1分数。我们通过检测和分析来自20个大型影响者(即有超过一百万追随者)的CT参与度来测试野外模型,并发现超过20%的人是人为的。我们分析了CT配置文件和评论,表明很难仅根据其生成的内容来检测这些活动。
Influencer Marketing generated $16 billion in 2022. Usually, the more popular influencers are paid more for their collaborations. Thus, many services were created to boost profiles' popularity metrics through bots or fake accounts. However, real people recently started participating in such boosting activities using their real accounts for monetary rewards, generating ungenuine content that is extremely difficult to detect. To date, no works have attempted to detect this new phenomenon, known as crowdturfing (CT), on Instagram. In this work, we propose the first Instagram CT engagement detector. Our algorithm leverages profiles' characteristics through semi-supervised learning to spot accounts involved in CT activities. Compared to the supervised approaches used so far to identify fake accounts, semi-supervised models can exploit huge quantities of unlabeled data to increase performance. We purchased and studied 1293 CT profiles from 11 providers to build our self-training classifier, which reached 95\% F1-score. We tested our model in the wild by detecting and analyzing CT engagement from 20 mega-influencers (i.e., with more than one million followers), and discovered that more than 20% was artificial. We analyzed the CT profiles and comments, showing that it is difficult to detect these activities based solely on their generated content.