人工智能和分布式账本如何帮助我们解决假新闻

在本文中，我讨论了为什么我们应该创建一个由人工智能，分布式分类帐和全球社区支持的，带有标签的“假新闻”的全局，不变的注册表。 (In this article, I discuss why we should create a global, immutable registry of labeled ‘fake news’ powered by Artificial Intelligence, distributed ledgers, and a global community.)

Fake News is a major problem in our connected world. Although misinformation and propaganda have been around for ages, ‘Fake News’ is now becoming a real threat, partly due to the ease of creating, diffusing and consuming content online.

假新闻是我们互联世界中的主要问题。尽管错误信息和宣传已经存在了很长一段时间，但“假新闻”现在正成为一种真正的威胁，部分原因是易于在线创建，传播和消费内容。

What makes Fake News a hard problem to solve is the difficulty in identifying, tracking and controlling unreliable content. Even when there is early evidence for a fake story being circulated online, removing it or preventing people from sharing it could be perceived as an attempt of intervention and censorship.

使得假新闻成为一个难以解决的问题的原因是难以识别，跟踪和控制不可靠的内容。即使有早期证据表明在网上流传了一个虚假故事，将其删除或阻止人们共享也可能被视为干预和审查的尝试。

People, websites, blogs, social media are all part of the problem to some extent— intentionally or not. False or misleading stories can be easily created and diffused via the global online networks with a few clicks – in many cases silently impacting the public opinion.

人员，网站，博客，社交媒体在某种程度上都是问题的一部分-有意或无意。只需单击几下，即可通过全球在线网络轻松创建和散布虚假或误导性的故事-在许多情况下，它们会悄悄影响公众舆论。

With the so-called ‘deepfakes’, it is already extremely difficult to tell if what you see is true or not: latest technologies enable hacking real videos or creating artificial ones, presenting people saying things they never did – in a very realistic way. Moreover, synthesized speech matching the voice of a known person can be used to claim statements or words never said.

使用所谓的“ deepfake ”，已经很难说出您所看到的是否真实：最新技术可以黑客入侵真实视频或创建虚假视频，并以一种非常现实的方式向人们展示他们从未做过的事情。此外，与已知人的声音相匹配的合成语音可用于声明从未说过的陈述或单词。

The times when something was perceived as true just because it was ‘seen on TV’ or in a photo or in a video, are gone.

仅仅因为某种东西“在电视上”，在照片或视频中被视为真实的时代已经过去。

Fake News influences or even shapes public opinion and (re)sets the agenda. It is distributed by platforms and users - both intentionally and unintentionally. ‘Unintentional sharing’ is based on the general lack of awareness of the problem: people do not realize how often they are exposed to Fake News. They don’t know if they are influenced by misleading content, of if they are part of the problem itself by unintentionally sharing Fake News and influencing others.

假新闻影响甚至影响舆论，并(重新)设定议程。它由平台和用户有意和无意地分发。 “无意识共享”是基于对问题的普遍缺乏认识：人们没有意识到他们经常接触到虚假新闻。他们不知道自己是否受到误导性内容的影响，是否不经意间共享虚假新闻并影响他人而成为问题本身的一部分。

There are ongoing efforts within news corporations and social media companies to mitigate the problem. And some of them may prove to be somehow effective. But the fake news problem is bigger — it goes beyond the corporate boundaries.

新闻公司和社交媒体公司正在不断努力以减轻这一问题。其中一些可能被证明是有效的。但是，假新闻问题更大-它超出了公司范围。

This article describes a global, immutable registry of labeled ‘fake news’ as the basis of a universal solution to the problem – on top of news organizations, social media and search engines. Utilizing technologies such as Blockchain, IPFS, and Natural Language processing the described platform can empower a global network of evaluators who establish continuous feedback, and who label and evaluate a representative random sample of our global content.

本文将在新闻组织，社交媒体和搜索引擎之上，描述一个标记为“假新闻”的全局，不变的注册表，以此作为对该问题的通用解决方案的基础。利用诸如区块链 ， IPFS和自然语言之类的技术来处理所描述的平台，可以使全球评估人员网络建立持续的反馈，并为我们的全球内容提供代表性的随机样本并对其进行评估。

The objective of the system is to quantify the problem and raise global awareness by systematically taking snapshots of online content which are assessed and labeled by humans. Furthermore the platform can offer specialized APIs to expose the patterns and knowledge extracted from the on-going analysis of content in order to enable 3^rd parties to predict the trustworthiness of new content – at ‘publish time’ or ‘share time’.

该系统的目的是通过系统地拍摄由人类评估和标记的在线内容快照来量化问题并提高全球知名度。此外，该平台可提供专门的API来暴露从正在进行的内容分析，以使^第三各方预测新内容的可信度提取的模式和知识-在“发布时间”或“时间共享”。

假新闻：病毒式传播 (Fake news: Viral by design)

Fake content is designed to be viral. Its creators want it to spread organically and rapidly. Fake stories are engineered to attract attention and trigger emotional reactions so users instantly share the ‘news’ with their social networks. With the right tricks and timing, a false story can go viral in hours. The ‘fake news industry’ takes advantage of the following ‘flaws’ or our online reality:

虚假内容旨在传播病毒。它的创造者希望它能够有机地Swift传播。伪造的故事旨在吸引人们的注意力并引发情感React，因此用户可以立即与自己的社交网络分享“新闻”。只要有适当的技巧和时机，一个虚假的故事就会在数小时内传播开来。 “假新闻行业”利用以下“缺陷”或我们的在线现实：

1. Our online world is set up for ‘attention and instant sharing’: The performance of the global ‘news distribution network’, including social media, news corporations, opinion leaders and influencers, is measured in terms of ‘attention’ and ‘user engagement’ – in many cases taking the oversimplified form of a CTR – Click Through Rate - and sharing statistics. A piece of content with high CTR will probably make it to the top of social feeds or stay more on the home page of a news site – regardless of how informative, trustworthy or useful it is.

1.我们的在线世界旨在“关注和即时共享” ：包括社交媒体，新闻公司，舆论领袖和影响者在内的全球“新闻发布网络”的表现均以“关注”和“用户”来衡量参与度”(在许多情况下，采用点击率的过分简化形式)“点击率”并共享统计信息。点击率高的内容很可能会成为社交新闻的顶部，或者更多地停留在新闻网站的首页上，而不管其信息性，可信性或实用性如何。

With this approach in measuring performance, content with fancy photos and ‘over-promising titles’ do extremely well – regardless of the quality of the underlying story (if there is one). Very frequently, a fancy ‘promo card’ for an article with an impressive title is enough for people to start sharing with their friends and networks. This behavior can then lead to viral effects for content with no substance – or even worse, for content with false information and misleading messages.

借助这种衡量效果的方法，无论基础故事的质量如何(如果有的话)，带有精美照片和“过分有名的标题”的内容都非常好。很多时候，一张花哨的“促销卡”用于标题令人印象深刻的文章就足以使人们开始与他们的朋友和网络分享。这样，这种行为就会对没有实质内容的内容产生病毒影响，甚至对含有虚假信息和误导性消息的内容产生病毒影响。

Instead of the quality and trustworthiness of the content, 'expected performance' is what attracts attention and drives sharing on social media: websites and other online entities rush to reproduce stories that appear to be potentially viral. Then they promote them so they get more traffic and serve more ads, to achieve their ambitious monetization goals.

“期望的表现”不是内容的质量和可信赖性，而是吸引关注并推动在社交媒体上共享的东西：网站和其他在线实体急于复制似乎具有病毒性的故事。然后，他们对他们进行宣传，以吸引更多的流量并投放更多的广告，以实现雄心勃勃的获利目标。

Content quality is rarely part of KPIs — at least not among the important ones: popular websites set goals on CTRs, page views, social sharing, and related metrics; and when there are complaints about poor content, they simply edit it or remove it.

内容质量很少是KPI的一部分-至少不在重要的KPI中：流行的网站为CTR，页面浏览量，社交共享和相关指标设定了目标；当投诉内容不佳时，他们只需对其进行编辑或删除即可。

2. Online users tend to ‘share a lot, easily’: Another aspect of the problem is this massive group of online users who act primarily as distributors/ re-sharers of content – without having the necessary understanding or even a genuine interest in what they share.

2.在线用户倾向于“轻松地分享很多东西” ：问题的另一方面是，如此庞大的在线用户群体主要充当内容的分发者/再共享者，对什么内容没有必要的理解甚至是真正的兴趣。他们分享。

It is sad to realize that in an era characterized by instant access to world’s knowledge, the majority of the online users are ‘passive re-sharers’; they don’t create original content, they just recycle whatever appears to be trendy or likable, with little or no judgment and critical thinking.

可悲的是要认识到的特点是即时获得世界的知识，网上的绝大多数用户都是“被动重新共享者”的时代; 他们不会创造原始内容，只会回收几乎没有或没有判断力和批判性思维的任何时髦或讨人喜欢的东西。

Users of this class may consume and circulate fake news — and other types of poor content — and unintentionally become part of the fake news distribution mechanism.

此类用户可能会消费和传播虚假新闻(以及其他类型的不良内容)，并且无意间成为虚假新闻分发机制的一部分。

量化和意识问题。 (A problem of quantification and awareness.)

Obviously, there are entities who intentionally drive fake news - to achieve certain political, commercial or other goals. As mentioned above, there is also a massive group of online users (acting as individuals or on behalf of companies) who unintentionally participate in the exponential spread of false stories. In fact, due to a lack of understanding and awareness, many users will probably never realize that they are part of the ‘fake news system’.

显然，有一些实体故意传播虚假新闻，以达到某些政治，商业或其他目标。如上所述，还有大量的在线用户(以个人或代表公司的身份)无意中参与了虚假故事的指数传播。实际上，由于缺乏理解和认识，许多用户可能永远不会意识到他们是“假新闻系统”的一部分。

Raising the global awareness around Fake News should be a key objective in every serious attempt to tackle the problem: instead of silently removing fake stories when identified and deactivating fake social media accounts, we must systematically measure the circulation of fake news across the globe, and the degree of unintentional participation of media sites and online users.

提高对假新闻的全球意识应该是解决该问题的每一个认真尝试的主要目标：与其在识别时不假删除假新闻并停用假社交媒体帐户，不如在全球范围内系统地衡量假新闻的传播，以及媒体网站和在线用户无意参与的程度。

We need to understand the patterns and share the knowledge we get by continuously analyzing a representative sample of world’s digital content. We need to create a global registry of enriched content, analyzed and labeled by both humans and intelligent AI agents. It will be a global community powered by Artificial Intelligence on top of an immutable, unified content store. A platform powered by genuine human collaboration of spirit assisted by our most advanced intelligent technology.

我们需要通过不断分析世界数字内容的代表性样本来理解模式并分享我们获得的知识。我们需要创建一个内容丰富的全球注册表，并由人类和智能AI代理进行分析和标记。这将是一个由人工智能驱动的全球性社区，它将构成一个不变的统一内容存储库。该平台由真正的人类精神协作提供动力，并借助我们最先进的智能技术。

解决方案：贴有虚假新闻的全球注册表 (The solution: A global registry of labelled Fake News)

The proposed ‘Fake News Evaluation Network’ takes a different approach – it focuses less on the real-time classification of new content and more on a retroactive, large-scale ‘fake news’ analysis with the intent to quantify the problem, extract patterns, and share the derived knowledge. It puts emphasis on measuring the level of responsibility of each of the involved parties with the objective to educate, raise global awareness and influence the Corporate Social Responsibility strategies of online corporations.

拟议的“ 假新闻评估网络 ”采用了不同的方法–它较少关注新内容的实时分类，而更多地关注于追溯性，大规模的“假新闻”分析，旨在量化问题，提取模式，并分享衍生的知识。它着重于衡量每个参与方的责任水平，以教育，提高全球意识并影响在线公司的企业社会责任战略。

Imagine a ‘content sampling’ process running on a daily basis — sampling the global content publishing and sharing activity. Powered by special crawlers, this process ‘listens’ for stories and ‘news’ across a representative set of major websites, social media, and popular blogs. It discovers and organizes ‘fresh content’ and ‘new content references’ into a unified, de-duplicated and immutable content store – specially designed to handle stories, facts, and their associations.

想象一下每天运行的“内容采样”过程-对全球内容发布和共享活动进行采样。在特殊爬虫的支持下，此过程可在具有代表性的主要网站，社交媒体和流行博客中“监听”故事和“新闻”。它发现“新鲜内容”和“新内容引用”并将其组织到一个统一的，去重复的，不可变的内容存储库中，该存储库专门用于处理故事，事实及其关联。

Newly identified content is unified and linked to its master copy’, related ‘stories’ and relevant factual information. It is then compared against the already labeled content, with the objective to estimate the ‘degree of deviation from reality’ using ‘fact-checked versions of the same story’ and known patterns.

新识别的内容是统一的，并链接到其“主副本”，相关的“故事”和相关的事实信息。然后将其与已经标记的内容进行比较，目的是使用“相同故事的事实检查版本”和已知模式来估计“与现实的偏离程度”。

Artificial Intelligence adds significant value by identifying the story in the content (the elements of a story such as the named entities, the events, the occasion, the timeline, etc.) and matching the variations found in a large pool of noisy content of various sources and levels of quality.

人工智能通过识别内容中的故事( 故事的元素，例如命名实体，事件，场合，时间轴等)并匹配各种嘈杂内容的大量内容中的变化，从而增加了重要的价值。质量的来源和水平。

It then creates lists of stories that need to be evaluated and simplifies the assessment process through intelligent suggestions and recommendations (specifically what needs to be checked within each story).

然后，它会创建需要评估的故事列表，并通过智能建议和建议(特别是每个故事中需要检查的内容)简化评估过程。

The global community of professionals and ‘active digital citizens’ discovers, evaluates and votes for/against certain aspects of the story – with proper justification, inline references, and annotations.

由专业人员和“活跃的数字公民”组成的全球社区发现，评估和投票支持/反对故事的某些方面-带有适当的理由，内嵌参考和注释。

As soon as a story gets enough votes and factual checks, AI generalizes the findings to all known variations of the story and different types of coverage – allowing quantification of the reliability of both the core story and its different instances. AI components pick the patterns and keep monitoring each core story for new facts and events that need to be checked. All these as part of the immutable content store – labeled content, assessments, publisher scores, and metadata permanently stored as part of a global history – no deletions, no ‘phantom’ fake news.

一旦故事获得足够的选票和事实检查， 人工智能就会将调查结果归纳为故事的所有已知变体和不同类型的报道，从而量化核心故事及其不同实例的可靠性。 AI组件会选择模式，并继续监视每个核心故事以了解需要检查的新事实和新事件。所有这些作为不可变内容存储的一部分-标记的内容，评估，发布者分数以及作为全局历史的一部分永久存储的元数据-不会删除，也不会出现“幻像”假新闻。

Social media, news corporations, blogs, and other entities consume the APIs of this platform to self-assess their compliance and progress towards a ‘better content for the world’ mission.

社交媒体，新闻公司，博客和其他实体使用此平台的API来自我评估其合规性，并朝着“为世界提供更好的内容”使命迈进。

As content is being evaluated in terms of trustworthiness, the reliability of those who produce it, promote it, or distribute it is also affected: a ‘publisher evaluation system’ quantifies how website A or social media B or news corporation C is part of the global fake news problem.

在评估内容的可信度时，也会影响生产，推广或分发内容的人员的可靠性：“发布者评估系统”量化了网站A或社交媒体B或新闻集团C如何成为网站的一部分全球虚假新闻问题。

Having this information, media entities can take action, learn and measure the level of their responsibility in spreading fake stories. They can let their users know that certain stories they have shared proved to be false and misleading. They can help the global effort by educating their users and demonstrating real social responsibility and meaningful actions towards a better-informed society.

有了这些信息，媒体实体就可以采取行动，了解并衡量其传播假新闻的责任等级。他们可以让用户知道他们分享的某些故事被证明是虚假和误导的。他们可以通过教育用户并展示其真正的社会责任和采取有意义的行动以建立一个更全面的信息社会来帮助全球开展工作。

Companies could also integrate special APIs in order to cross-check content at ‘share time’ and notify their users if the content is already flagged or there are signals for limited trustworthiness (while leaving the sharing decision to the user). Social media could notify users who have already engaged with ‘verified fake news’ stories (liked, shared, saved, commented on, or simply consumed) and explain how to avoid such content in the future.

公司还可以集成特殊的API，以便在“共享时间”对内容进行交叉检查，并在内容已被标记或存在可信任度受限的信号时通知其用户(同时将共享决策留给用户)。社交媒体可以通知已经参与“经验证的假新闻”报道(喜欢，共享，保存，评论或简单消费)的用户，并说明将来如何避免此类内容。

There are countless interesting use cases — including measurements of additional aspects of content quality, global trend analysis and articulation of the dynamics of the phenomenon.

有无数有趣的用例-包括内容质量其他方面的度量，全局趋势分析和现象动态的表达。

This could be based on a unified content on top of IPFS or Swarm — an immutable system hosting samples of the world’s content, unified, labeled and scored in terms of trustworthiness and other qualities of digital content.

这可以基于IPFS或Swarm之上的统一内容-一个不可变的系统，其中包含世界内容的样本，并根据可信赖性和其他数字内容质量对它们进行了统一，标记和评分。

Comments, thoughts and suggestions on particular technologies that could add value to this solution are welcome. Based on an idea posted on ideachain

欢迎对可以为该解决方案增加价值的特定技术提出意见，想法和建议。根据发布的想法 想法链

References

参考文献

https://en.wikipedia.org/wiki/1%25_rule_(Internet_culture)

https://zh.wikipedia.org/wiki/1%25_rule_(互联网文化)

https://en.wikipedia.org/wiki/Deepfake

https://zh.wikipedia.org/wiki/Deepfake

https://www.youtube.com/watch?v=gLoI9hAX9dw

https://www.wired.com/2017/03/google-and-facebook-cant-just-make-fake-news-disappear/

Images

图片

https://pixabay.com/vectors/george-orwell-eric-blair-1984-3176732/

https://pixabay.com/vectors/fake-news-propaganda-deceit-3801637/

翻译自: https://www.freecodecamp.org/news/a-global-immutable-registry-of-labeled-fake-news/