美团骑手检测出虚假定位

Coordination is one of the central features of information operations and disinformation campaigns, which can be defined as concerted efforts to target people with false or misleading information, often with some strategic objective (political, social, financial), while using the affordances provided by social media platforms.

协调是信息运营和虚假宣传运动的主要特征之一,可以定义为协同努力,以具有虚假或误导性信息的人们为目标,通常以某些战略目标(政治,社会,财务)为目标,同时使用社会提供的能力媒体平台。

Since 2017, platforms like Facebook have developed new framings, policies, and entire teams to counter bad actors using its platform to interfere in elections, drive political polarization, or manipulate public opinion. After several iterations, the platform has settled on describing malicious activity like this as “coordinated inauthentic behavior,” referring to a group of actors “working in concert to engage in inauthentic behavior…where the use of fake accounts is central to the operation.”

自2017年以来,Facebook之类的平台已开发出新的框架,政策和整个团队,以利用其平台干预选举,推动政治两极分化或操纵舆论来对抗不良行为者。 经过几次迭代,该平台决定将这样的恶意活动描述为“ 协调不真实的行为 ”,指的是一组参与者“协同工作以进行不真实的行为……其中使用伪造帐户是操作的中心。”

More broadly, Twitter uses the phrase “platform manipulation,” which includes a range of actions, but also focuses on “coordinated activity…that attempts to artificially influence conversations through the use of multiple accounts, fake accounts, automation and/or scripting.”

更广泛地讲,Twitter使用了“ 平台操纵 ”这个短语,其中包括一系列动作,但也侧重于“协调活动……它试图通过使用多个帐户,伪造帐户,自动化和/或脚本来人为地影响对话。”

Unfortunately, open source researchers have limited data at their disposal to assess some of the characteristics of disinformation campaigns. Figuring out the authenticity of an account requires far more information than publicly available (data like browser usage, IP logging, device IDs, accounts e-mails, etc.). And while we might be able to figure out when a group of accounts are using stock photos, we can’t say anything about the origin of the people using these accounts.

不幸的是,开源研究人员只能使用有限的数据来评估虚假信息宣传活动的某些特征。 要弄清帐户的真实性,需要的信息比公开可用的信息要多得多(浏览器使用情况,IP日志记录,设备ID,帐户电子邮件等数据)。 尽管我们也许可以弄清楚一组帐户何时使用了照片,但是我们却无法说出使用这些帐户的人的来历。

Even the terms “coordinated” and “inauthentic” are not straightforward or neutral judgments, a problem highlighted by scholars like Kate Starbird and Evelyn Douek. Starbird has used multiple case studies to show that there are not clear distinctions between “orchestrated” and “organic” activity:

甚至“协调”和“不真实”这两个词也不是简单或中立的判断,这是像凯特·斯塔伯德 ( Kate Starbird)和伊夫琳 ·杜耶克 ( Evelyn Douek)这样的学者所强调的问题。 Starbird使用多个案例研究表明,“精心策划”的活动与“有机”活动之间没有明显的区别:

In particular, our work reveals entanglements between orchestrated action and organic activity, including the proliferation of authentic accounts (real people, sincerely participating) within activities that are guided by and/or integrated into disinformation campaigns.

尤其是,我们的工作揭示了精心策划的行动与组织活动之间的纠缠,包括在虚假宣传活动指导和/或整合的活动中,真实账户(真实的人,真诚地参与)的扩散。

In this post, I focus specifically on the digital artifacts of “coordination” to show how this behavior can be used to identify accounts that may be part of a disinformation campaign. But I also show how coming to this conclusion always requires additional analysis, especially to distinguish between “orchestrated action and organic activity.”

在这篇文章中,我特别关注“协调”的数字工件,以展示如何将此行为用于识别可能是虚假信息活动一部分的帐户。 但是,我还展示了如何得出这个结论总是需要额外的分析,尤其是要区分“精心策划的行动和有机活动”。

We’ll look at original data I collected prior to a Facebook takedown of pro-Trump groups associated with “The Beauty of Life” (TheBL) media site, reportedly tied to the Epoch Media Group. The removal of these accounts came with great fanfare in late 2019, in part because some of the accounts used AI-generated photos to populate their profiles. But another behavioral trait of the accounts— and one more visible in digital traces— was their coordinated amplification of URLs to the thebl.com and other assets within the network of accounts. We’ll also look at data from QAnon accounts on Instagram to get a better sense of what coordination can look like within a highly active online community.

我们将查看我在Facebook删除与“生活之美”(TheBL)媒体网站相关的亲特朗普组织之前收集的原始数据,据报道,该网站与Epoch Media Group相关。 这些帐户的删除在2019年末大肆宣传 ,部分原因是其中一些帐户使用了AI生成的照片来填充个人资料。 但是,帐户的另一行为特征(在数字跟踪中更明显)是它们协调地放大了thebl.com和帐户网络中其他资产的URL。 我们还将查看来自Instagram上QAnon帐户的数据,以更好地了解高度活跃的在线社区中的协调情况。

Note: I’ll be using the programming language R in my analysis and things get a bit technical from here.

注意:我将在分析中使用编程语言R,因此此处的内容会有点技术性。

成对相关,聚类和{tidygraph} (Pairwise correlations, clustering, and {tidygraph})

One method I’ve used consistently to help identify coordination comes from David Robinson’s {widyr} package, which has functions that allow you to compute pairwise correlations, distance, cosine similarity, and counts, as well as k-means and hierarchical clustering (in development).

我一直用来帮助识别协调性的一种方法来自David Robinson的{ widyr }程序包,该程序包具有允许您计算成对相关性,距离,余弦相似度和计数以及k均值和层次聚类的功能。 (开发中)。

Using some of the functions from {widyr}, we can answer a question like: which accounts have a tendency to link to the same domains? This kind of question can be applied to other features as well, like sequences of hashtags (an example we’ll explore next), mentions, shared posts, URLs, text, and even content.

使用{widyr}中的某些功能,我们可以回答类似的问题: 哪些帐户倾向于链接到相同的域 ? 这种问题也可以应用于其他功能,例如井号标签序列(我们将在下面探讨的示例),提及,共享帖子,URL,文本甚至内容。

Here, I’ve extracted the domains from URLs shared by Facebook groups from TheBL takedown, filtered domains that occurred less than 20 times, removed social media links, and put it all into a dataframe called domains_shared.

在这里,我从TheBL移除中从Facebook组共享的URL中提取了域,过滤了发生次数少于20次的域,删除了社交媒体链接,并将其全部放入一个名为domains_shared的数据框中

Using {tidygraph} and {ggraph}, we can then visualize the relationships between these groups. In the graph below, I’ve filtered the data to show relationships with a correlation above .5 and highlighted those groups (nodes) which I know were removed for inauthentic behavior. This graph shows that the greatest correlation (phi coefficient) is between the groups engaged in coordinated behavior because they had a much higher tendency to share URLs to content on thebl.com. An analysis focusing on the page-sharing behavior of the groups would yield a similar graph, showing the groups that were more likely to share posts by TheBLcom page and other assets.

使用{tidygraph}和{ggraph},我们可以可视化这些组之间的关系。 在下面的图形中,我过滤了数据以显示具有大于.5的相关性的关系,并突出显示了我知道由于不真实行为而被删除的那些组(节点)。 该图显示,最大的相关性(phi系数)在参与协调行为的组之间,因为他们有更高的倾向将URL共享到blbl.com上的内容。 着重于各组页面共享行为的分析将得出相似的图形,显示出更有可能通过TheBLcom页面和其他资产共享帖子的组。

This method can quickly help an investigator cluster accounts based on behaviors — domain-sharing in this case — but it still requires further examination of the accounts and the content they shared. This may sound familiar to disinformation researchers because it’s essentially the ABCs of Disinformation, by Camille François, which focuses on manipulative actors, behaviors, and content.

这种方法可以根据行为(在这种情况下为域共享)快速地帮助调查人员群集帐户,但仍然需要进一步检查帐户及其共享的内容 。 这可能听起来很熟悉造谣研究人员,因为它本质上是造谣的基本知识 ,由卡米尔弗朗索瓦 ,其重点是操纵行为行为内容

In the graph, you’ll see many other groups (unlabeled in gray) that are linked based on their domain-sharing behavior, but with lower correlations. For each group an analyst would need to examine other details like: administrators and their associated information (profile photos, friends, timelines, tagged photos, etc.); other groups managed by the administrators; creation date of the groups; which specific domains linked the groups together; shared posting patterns within the groups, etc., all in order to determine if they too could have been inauthentic.

在图中,您将看到许多其他组(未标记为灰色),这些组基于它们的域共享行为进行链接,但相关性较低。 对于每个小组,分析人员都需要检查其他详细信息,例如:管理员及其相关信息(个人资料照片,朋友,时间表,带标签的照片等); 由管理员管理的其他组; 组的创建日期; 哪些特定领域将各组联系在一起; 组等中的所有共享发布模式,以确定它们是否也可能是不真实的。

In the case of TheBL takedown, group administrators showed many signals of inauthenticity, but almost none of those signals were available in digital trace data. (This is well illustrated in independent analysis by Graphika and the Atlantic Council’s Digital Forensic Research Lab (DFRLab), who were able to review a list of accounts prior to Facebook’s takedown).

在TheBL撤消的情况下,组管理员显示了许多不真实的信号,但在数字跟踪数据中几乎没有这些信号。 (这在Graphika和大西洋理事会的数字取证研究实验室(DFRLab)进行的独立分析中得到了很好的说明,他们能够在Facebook删除之前查看帐户清单)。

Post timestamps, however, are in the data, and using those we can visualize the groups’ temporal “signatures” for posts linking to thebl.com. The graph below shows that some groups had distinct hourly signatures, with some variation in their frequency of posting:

但是,发布时间戳是 在数据中,并使用这些数据,我们可以可视化小组的时态“签名”,以显示链接到thebl.com的帖子。 下图显示了某些组具有不同的小时签名,其发布频率有所不同:

We can also use k-means clustering to try and distinguish groups better by their posting patterns. The {widyr} package has a function, widely_kmeans, that makes this straightforward to accomplish. First, we start with a dataframe where I’ve aggregated the number of hourly posts to the bl.com for all groups, shown here as trump_frequencies.

我们还可以使用k-means聚类尝试通过组的发布方式更好地区分组。 {widyr}包具有一个函数wide_kmeans ,可以轻松完成此操作。 首先,我们从一个数据帧开始,在该数据帧中,我汇总了所有群组每小时在bl.com上发布的帖子数,此处显示为trump_frequencies

We then scale n and use widely_kmeans(), where group_name is our item, hour is our feature, and scaled_n our value. We can inspect one of the clusters and see how well it grouped inauthentic accounts together.

然后,我们缩放n并使用broadly_kmeans() ,其中group_name是我们的商品, hour是我们的功能,而scaled_n是我们的值。 我们可以检查其中一个群集,并查看其如何将非真实帐户分组在一起。

We can visualize the temporal signatures again, this time faceting by clusters. This graph shows the signatures of inauthentic groups in red and authentic groups in gray. We can clearly see the similarity of temporal signatures in each cluster — the inauthentic groups have very distinct signatures that set them apart from the rest. Even so, some inauthentic groups are clustered with many other authentic groups (Cluster 5), illustrating the need for manual verification.

我们可以再次可视化时间签名,这次是聚类。 此图以红色显示不真实组的签名,以灰色显示真实组的签名。 我们可以清楚地看到每个群集中的时间签名的相似性-不真实的组具有非常不同的签名,这使它们与其余的区别开来。 即便如此,一些不真实的组也与许多其他真实的组一起聚在一起(集群5),这说明需要手动验证。

协调链接共享行为 (Coordinated link sharing behavior)

Another method I’ve used comes from {CooRnet}, an R library created by researchers at the University of Urbino Carlo Bo and IT University of Copenhagen. This method focuses entirely on “coordinated link sharing behavior,” which “refers to a specific coordinated activity performed by a network of Facebook pages, groups and verified public profiles (Facebook public entities) that repeatedly shared the same news articles in a very short time from each other.” The package uses an algorithm to determine the time period in which coordinated link sharing is occurring (or you can specify it yourself) and groups accounts together based on this behavior. There are some false positives, especially for link-sharing in groups, but it’s a useful tool and it can be used to look at other kinds of coordinated behavior.

我使用的另一种方法来自{ CooRnet },这是一个由Urbino Carlo Bo大学和哥本哈根IT大学的研究人员创建的R库。 此方法完全专注于“协调的链接共享行为”,“这是指由Facebook页面,群组和经过验证的公共资料(Facebook公共实体)网络执行的特定协调活动,这些活动在很短的时间内重复共享了相同的新闻文章。彼此之间。” 该软件包使用一种算法来确定发生协调链接共享的时间段(或您可以自行指定),并根据此行为将帐户分组在一起。 有一些误报,尤其是对于组中的链接共享,它是一个有用的工具,可用于查看其他类型的协调行为。

Recently, I adapted code to use {CooRnet}’s get_coord_shares function — the primary way to detect “networks of entities” engaged in coordination— on data from Instagram. A working paper out of the Center for Complex Networks and Systems Research lays out a network-based framework for uncovering accounts that are engaged in coordination, relying on data from content, temporal activity, handle-sharing, and other digital traces.

最近,我修改了代码,以使用{CooRnet}的get_coord_shares函数(用于检测参与协调的“实体网络”的主要方式)对Instagram数据进行处理。 复杂网络和系统研究中心的工作文件提出了一个基于网络的框架,用于发现参与协调的帐户,这些帐户依赖于来自内容,临时活动,句柄共享和其他数字跟踪的数据。

In one case study the authors propose identifying coordinated accounts using highly similar sequences of hashtags across messages (see Figure 5, from their paper); they theorize that while assets may try to obfuscate their coordination by paraphrasing similar text in messages, “even paraphrased text is likely to include the same hashtags based on the targets of a coordinated campaign.”

在一个案例研究中,作者建议使用跨消息的标签标签序列高度相似来识别协调帐户(请参见论文中的图5)。 他们的理论是,尽管资产可能会试图通过在消息中用相似的文字来掩饰其协调性,但“即使是经过改写的文字也可能会基于协调运动的目标而包含相同的主题标签。”

Given the content of Instagram messages, I thought this would be a good opportunity to test this method out with{CooRnet}. Using CrowdTangle, I retrieved 166,808 Instagram messages mentioning “QAnon” or “wg1wga” since January 1, 2020. I then extracted the sequence of hashtags used in each message of the dataset and later removed sequences that had not been used more than 20 times. This resulted in hashtag sequences that look like this (the QAnon community isn’t exactly known for its brevity):

鉴于Instagram消息的内容,我认为这是一个很好的机会,可以使用{CooRnet}测试此方法。 自2020年1月1日以来,我使用CrowdTangle检索了166,808条Instagram消息,其中提及“ QAnon”或“ wg1wga”。然后,我提取了数据集中每条消息中使用的#标签序列,然后删除了未使用超过20次的序列。 这导致了如下所示的主题标签序列(QAnon社区并不以其简短而著称):

QAnon WWG1WGA UnitedNotDivided TheGreatAwakening Spexit España EspañaViva VivaEspaña ArribaEspaña NWO Bilderberg Rothschild MakeSpainGreatAgain AnteTodoEspaña Comunismo Marxismo Feminismo Socialismo PSOE UnidasPodemos FaseLibertad Masones Satanismo ObamaGate Pizzagate Pedogate QAnonEspaña DV1VTq qanon wwg1wga darktolight panicindc sheepnomore patriotshavenoskincolor secretspaceprogram thegreatawakeningasleepnomore savethechildren itsallaboutthekids protectthechildren momlife familyiseverything wqke wakeupamerica wakeupsheeple wwg1wga qanon												

美团骑手检测出虚假定位_在虚假信息活动中检测协调相关推荐

  1. opencv 识别长方形_使用OpenCV从图像中检测最大矩形

    我问了前面的一个问题here,根据答案中的建议,我构建了下面的程序,我认为该程序可以检测出大矩形,但它根本检测不到矩形.不过,它确实在这个image上起作用. 我希望解决方案不仅适用于此图像,而且适用 ...

  2. 终端服务器安全层在协议流中检测到错误,终端服务器安全层在协议流中检测到错误,并已取消客户端连接...

    事件类型: 错误 事件来源: TermDD 描述: RDP 的 "DATA ENCRYPTION" 协议组件在协议流中检测到一个错误并且中断了客户机. 经过网上查找资料及分析,原来 ...

  3. 如何用python实现地图定位_基于 PyQt5 实现地图中定位相片拍摄位置

    项目简介:本次项目主要学习了如何查找相片中的 Exif 信息,并通过 Exif 信息中的 GPS 数据在百度地图中进行定位标点,以确定相片的拍摄地点.本次实验的目的旨在通过包含 GPS 信息的相片进行 ...

  4. 三个变量中怎么找出中间值_一文理解神经网络中的偏差和方差

    在深度学习中,数据过拟合,欠拟合的问题很常见,先总结一下:过拟合称为高方差,欠拟合称为高偏差. 可能只看偏差,方差不是很理解,下面先来个百度百科看一下. 偏差(统计学概念) 偏差又称为表观误差,是指个 ...

  5. linux 应用层gpio中断_如何在嵌入式Linux中检测GPIO线路上的中断?

    在pandaboard OMAP4中,GPIO_39上每隔10ms产生一次中断.我已在Linux驱动程序代码中为此注册了处理程序,但由于未检测到中断,因此未处理该处理程序. 我确保在硬件级别(通过探测 ...

  6. ffmpeg检测文件是否损坏_教你一招如何检测硬盘,让你知道硬盘是否有损坏?还有几天寿命?...

    8月底的南方小城镇依然非常炎热,临近下班,坐上我的敞篷座机-电动小毛驴,正准备开启兜风模式,美-女同-事小莉叫住了我,说她家里的电脑这几天老是蓝屏,让我去帮她看看.美-女有-约,怎么能忍心拒绝?虽然她 ...

  7. excel如何晒出重复数据_怎么筛选出excel中重复数据

    本文收集整理关于怎么筛选出excel中重复数据的相关议题,使用内容导航快速到达. 内容导航: Q1:Excel的数据怎么筛选一列中重复的数据 假如1在A2单元格,在B2单元格输入公式, =IF(COU ...

  8. python离群点检测_如何从熊猫DataFrame中检测峰点(离群值)

    我有一个带有多个速度值的熊猫数据帧,这些速度值是连续移动的值,但它是一个传感器数据,因此我们经常在中间出现误差的情况下,移动平均值似乎也无济于事,所以我可以采用什么方法用于从数据中删除这些离群值或峰点 ...

  9. 如何检测python是否安装_使用Python检查系统中是否安装了一个软件包?

    How can I check is some package is installed in my system. My system is Linux, but even better if it ...

最新文章

  1. c语言 可变参数的宏,可变参数的宏__ VA_ARGS__的用法
  2. mui ios中form表单中点击输入框头部导航栏被推起及ios中form表单中同时存在日期选择及输入框时,日历选择页面错乱bug...
  3. Python基本数据类型之列表
  4. ThinkPHP系的两个东东OneThink和ThinkCMF
  5. SpringBoot的@Conditional和自动配置类生效
  6. EOS Platform 7.2 安装
  7. Geotools之“Hello World”——打开本地shp文件并显示
  8. xshell使用xftp传输文件 、使用pure-ftpd搭建ftp服务
  9. paip.从HTML select 获取数据
  10. mamp 扩展 php,Mac OS 下如何在 MAMP Pro 中安装php 扩展 zip
  11. 联想计算机如何设置bios密码,联想bios网络管理员密码的设置方法
  12. 博物馆施工组织设计方案
  13. iOS逆向(八)逆向工具 otool 介绍
  14. 算法分析与设计 八大排序算法
  15. Centos7/8 Oracle11g R2 图形化部署
  16. 编译原理 实验3《算符优先分析法设计与实现》
  17. 数值分析思考题(钟尔杰版)参考解答——第一章
  18. 准备了个freyja实例项目(单数据源版)
  19. linux skype的安装
  20. 1038 Recover the Smallest Number (30 分)-字符串分段排序

热门文章

  1. C++输入输出:cin/cout 还是 scanf/printf?
  2. Linux下的I/O多路复用select,poll,epoll浅析
  3. Linux内核同步机制之completion
  4. java小程序开发平台,隔壁都馋哭了
  5. 如何解决PIP命令不可用
  6. itcast-ssh-crm实践
  7. 在ionic/cordova中使用百度地图插件
  8. jquery插件dataTables自增序号。
  9. 加密算法—MD5、RSA、DES
  10. 基于opencv在摄像头ubuntu根据视频获取