脉搏波形分析

Co-authored with Aadit Barua

与Aadit Barua合着

2020年11月3日的低点 (The low down on 3rd November, 2020)

What are the top issues on Twitter users’ minds for the 2020 U.S. presidential election? How do they associate the democratic nominee, former Vice President Biden, and President Trump with these issues? Is there any difference between the battleground states — Michigan, Pennsylvania and Wisconsin — and the rest of the country in terms of what is important to voters, and how they feel about the candidates? How does this election differ from the one in 2016? What should each candidate focus on between now and 3rd November? These are the questions we addressed in our study using 47,901 tweets from 18,656 unique users in the U.S.A. between July 1 and August 16, 2020 (the day before the start of the Democratic National Convention), and 14,000 tweets from 10,838 unique users in 2016.

Twitter用户对2020年美国总统大选最关注的问题是什么? 他们如何将民主提名人,前副总统拜登和特朗普总统与这些问题联系起来? 在对选民来说重要的方面以及他们对候选人的感受方面,战场州(密歇根州,宾夕法尼亚州和威斯康星州)与美国其他地区之间是否有任何区别? 这次选举与2016年的选举有何不同? 从现在到11月3日,每个候选人应该关注什么? 这些是我们在研究中解决的问题,使用了2020年7月1日至8月16日(民主党全国代表大会开始的前一天)美国18,656位独特用户的47,901条推文,以及2016年来自10,838位独特用户的14,000条推文。

The top issues according to our 2020 data are USPS, Russia, rigging/stealing the election, COVID-19, Black Lives Matter (BLM), deaths from COVID-19, democracy, misinformation, racism, and China. While people are associating President Trump quite strongly with many of these issues, the troubling news for him is that the public sentiments are all negative. Curiously, though, the only two issues which people are relating to Mr. Biden are BLM and China, with positive sentiments for BLM, and negative for China. Our analysis supports the line of thinking that this election is not so much about Mr. Biden; rather, as some political pundits have speculated, it is a referendum on Mr. Trump. While Mr. Biden has used healthcare and character traits such as honesty, integrity, empathy, etc. in his campaign ads and speeches, and while Mr. Trump has heavily emphasized law and order, and the performance of the pre-COVID-19 economy under his watch, these are not among the top-10 issues on the minds of Twitter users right now.

根据我们2020年的数据,最主要的问题是USPS,俄罗斯,操纵/窃取选举,COVID-19,黑社会问题(BLM),COVID-19死亡,民主,错误信息,种族主义和中国。 人们将特朗普总统与许多这些问题紧密联系在一起时,令他感到不安的消息是,公众的情绪都是负面的。 令人奇怪的是,与拜登先生有关的仅有的两个问题是BLM和中国,对BLM持积极态度,而对中国则持消极态度。 我们的分析支持以下思路:这次选举与拜登先生的关系不大。 相反,正如一些政治专家推测的那样,这是对特朗普先生的全民公决。 拜登先生在竞选广告和演讲中使用了诚实,正直,同理心等医疗保健和品格特质,而特朗普先生则十分强调法律和秩序以及19世纪COVID之前经济的表现在他的监督下,这些都不是Twitter用户目前认为的十大问题之一。

Categorizing tweets as coming either from the battleground states of Michigan, Pennsylvania and Wisconsin or from the rest of the country, we find that people in the battleground states are associating more with USPS and BLM as key issues compared to the rest of the country. Furthermore, while Vice President Pence is conspicuously absent in the tweets we collected, democratic Vice-Presidential nominee Senator Kamala Harris is more likely to be mentioned in the battleground states relative to the rest of the country, though user sentiments about her are weakly negative. The battleground states sentiments towards Mr. Trump are moderately negative, while those for Mr. Biden are neutral.

将推文分类为来自密歇根州,宾夕法尼亚州和威斯康星州的战场州或该州其他地区,我们发现与该国其他地区相比,战场州的人们将与USPS和BLM的联系更多。 此外,虽然我们收集到的推文中没有明显显示Pence副总裁的身影,但相对于美国其他地区,在战场上,民主副总统候选人参议员Kamala Harris更有可能被提及,尽管用户对此的看法微弱。 战场上对特朗普先生的看法是负面的,而对拜登先生的看法是中立的。

Comparing the 2020 data from the same time period in the 2016 presidential election, we find that Twitter users were not yet focusing on a well-defined set of issues. Only when we collected tweets from 1st October until 7th November, 2016, did we find a clear set of issues such as FBI, Latino, emails, ISIS (terrorism) and rigging. It is apparent that issues have taken shape quite early for the 2020 election, partly because the pandemic, BLM movement, Russia story and others, started months ago.

对比2016年总统大选同一时期的2020年数据,我们发现Twitter用户尚未关注明确定义的问题。 仅当我们收集了2016年10月1日至11月7日的推文时,我们才发现一系列明确的问题,例如FBI,Latino,电子邮件,ISIS(恐怖主义)和操纵。 显然,问题已在2020年大选的初期形成,部分原因是大流行,BLM运动,俄罗斯的故事等开始于几个月前。

While politics can be fickle, where positions shift in a very short time, the results of our analysis is that as of today, Mr. Biden can maintain the status quo, other than working to improve upon a weakly negative association with China, and not venture into anything new; however, while Mr. Trump has gained the attention of the nation, he must overcome the strong negative sentiments associated with himself. Mr. Biden’s campaign can also interpret our results as an opportunity for him to make inroads with voters on multiple issues, which he is currently not associated with in the public’s eye. Mr. Trump, however, has to carefully weigh the pros and cons of many of the ideas he is promoting, especially the ones related to USPS and mail-in ballots. He also needs to take a hard look at his messages related to BLM in the three crucial battleground states he hopes to win once again in November.

尽管政治可能是善变的,职位在很短的时间内就会发生变化,但我们的分析结果是,到今天为止,拜登先生可以维持现状,而不是努力改善与中国之间的弱势消极关系,而不是尝试任何新事物; 然而,尽管特朗普先生已经引起了国家的关注,但他必须克服与自己相关的强烈负面情绪。 拜登先生的竞选活动还可以将我们的结果解释为他有机会就多个问题与选民进行接触,而这在公众眼中是他目前没有的。 然而,特朗普先生必须仔细权衡他所倡导的许多想法的利弊,尤其是与美国邮政和邮寄选票有关的想法。 他还需要认真思考他希望在11月再次获胜的三个关键战场上与BLM有关的消息。

为什么选择Twitter(或更笼统地说,社交媒体)? (Why Twitter (or more generally, social media)?)

Given that multiple national polls are being conducted virtually every week, can tweets provide us with any additional insights? We think so. A poll asks very specific questions the poll designer considers important; e.g., which candidate has stronger moral character? Do you feel President Trump is doing a good job with the economy? However, polls typically do not check if these issues are considered important by the public. Similarly, are there other issues that people care about, but which were not included in the survey? With social media data, we let the public write what is on their mind, and select issues that have been mentioned frequently. There is also a response bias in polls, whereby respondents may provide answers that are socially more acceptable. With 75–80% of Americans using social media, and with ¾ of them also depending on social media for news, analyzing Twitter data can provide additional insights above and beyond traditional polling.

鉴于几乎每周都会进行多次全国民意调查,因此推文能否为我们提供任何其他见解? 我们是这样认为的。 民意调查提出了非常具体的问题,民意设计者认为重要。 例如,哪个候选人的道德品质更强? 您觉得特朗普总统在经济方面做得很好吗? 但是,民意调查通常不会检查公众是否认为这些问题很重要。 同样,人们是否关心其他问题,但调查未包括这些问题? 利用社交媒体数据,我们可以让公众写下他们的想法,并选择经常提到的问题。 民意调查中也存在回应偏见,即受访者提供的答案在社会上更容易接受。 由于有75-80%的美国人使用社交媒体,并且其中¾也依赖社交媒体获取新闻,因此分析Twitter数据可以提供除传统民意测验之外的其他见解。

Social media data has its own limitations as well. We don’t know important user attributes such as gender, education, race, age, and political affiliations. Though these can be predicted from what they write, such an exercise will require extensive training data. Our goal is not to predict the outcome of the November election. Rather, we aim to uncover the major issues that are on the public’s mind, and how they associate such issues with the candidates. We believe that our findings will provide inputs to both candidates regarding the strategies and messages they may want to craft during the time leading up to 3rd November. We provide our detailed methodology at the end of the article. Now, onto our main findings.

社交媒体数据也有其自身的局限性。 我们不知道重要的用户属性,例如性别,教育程度,种族,年龄和政治背景。 尽管可以通过编写的内容来预测这些内容,但这种练习需要大量的培训数据。 我们的目标不是预测11月大选的结果。 相反,我们旨在揭示公众心目中的重大问题,以及它们如何将这些问题与候选人联系起来。 我们相信,我们的调查结果将为候选人提供有关他们在11月3日之前可能想要制定的策略和信息的信息。 我们在文章结尾提供了详细的方法。 现在,进入我们的主要发现。

分享提及 (Share of mentions)

Not unexpectedly, Mr. Trump rules Twitterverse in terms of how frequently he is mentioned by Twitter users, as shown in Figure 1. Vice President Pence on the other hand, has a negligible share of mentions, while the Democratic VP nominee Senator Kamala Harris appears in a large number of tweets. Her VP nomination being finalized recently could contribute to her visibility, and her team will want to track this number over time.

毫不意外,特朗普先生根据Twitter用户被提及的频率来统治Twitterverse,如图1所示。另一方面,副总统便士的提及份额微不足道,而民主党副总统提名人参议员卡马拉·哈里斯(Kamala Harris)出现了。在大量推文中。 她最近确定的副总裁提名可能有助于提高她的知名度,她的团队将希望随着时间的推移跟踪这个数字。

Figure 1: Share of mentions of candidates and other politicians (nationwide)
图1:提及候选人和其他政治人物的比例(全国)

公众关注的问题:全国范围内的分析 (Issues on the public’s mind: Nationwide analysis)

Figure 2 shows the issues with the highest frequency of mentions in the tweets we collected.

图2显示了我们收集的推文中提及频率最高的问题。

Figure 2: Top issues in 2020 (nationwide)
图2:2020年的主要问题(全国范围)

We chose to keep COVID-19 and COVID-19 deaths as separate issues because it was apparent from the tweets that while most users did not blame President Trump for COVID-19, they had a lot to say about his handling of the pandemic. Based on the strongly negative sentiment we found, Twitter users appear to be upset over the large number of Americans who have died from the disease. The surprise for us in Figure 2 was a relatively low frequency of mentions of certain issues emphasized by the two candidates. These include jobs and the economy, touted by Mr. Trump, or character traits like honesty, integrity and empathy, attributes that Mr. Biden has been promoting. Similarly, the stock market and clean energy, favorite topics of Mr. Trump and Mr. Biden respectively, hardly got any mentions. Next, we analyzed how Twitter users are relating these issues to the two candidates using a data analytics metric called lift (see methodology section at the end of the article). When there is some association between a candidate and an issue, we conducted a sentiment analysis as well. The results are shown in Table 1 and Figure 3.

我们选择将COVID-19和COVID-19的死亡问题作为单独的问题进行处理,因为从推文中可以明显看出,尽管大多数用户没有将COVID-19归咎于特朗普总统,但他们对他的流感大流行有很多话要说。 基于我们发现的强烈负面情绪,Twitter用户似乎对因该病死亡的大量美国人感到不安。 图2给我们带来的惊喜是,两位候选人强调某些问题的提及频率相对较低。 这些包括特朗普先生吹捧的工作和经济,或者拜登先生一直倡导的诚实,正直和同情等性格特征。 同样,分别提到特朗普和拜登先生最喜欢的话题的股市和清洁能源几乎没有被提及。 接下来,我们使用称为提升的数据分析指标分析了Twitter用户如何将这些问题与两个候选人联系起来(请参阅本文结尾处的方法部分)。 当候选人与问题之间存在某种关联时,我们也会进行情绪分析。 结果示于表1和图3。

Table 1: Associations and sentiments for election issues and candidates
表1:选举问题和候选人的协会和情绪

A blank cell in Table 1 means that users do not associate a candidate with an issue (e.g., Biden and COVID-19). Figure 3 provides a visualization of the results in Table 1.

表1中的空白单元格表示用户不会将候选对象与问题相关联(例如,Biden和COVID-19)。 图3直观地显示了表1中的结果。

Figure 3: Issues, candidates and their associations in the public’s minds
图3:公众心目中的议题,候选人及其关联

Clipart was taken from pixabay.com (royalty free and commercial usage is allowed), while the candidate photos were obtained from Wikipedia.com and Wikimedia.com (labeled for reuse). The USPS logo from USPS.com is being used under the assumption of Fair Use.

剪贴画取自picture.com(免版税,可用于商业用途),而候选照片则来自Wikipedia.com和Wikimedia.com(标记为可重复使用)。 USPS.com的USPS徽标是在合理使用的前提下使用的。

What stands out in Table 1 and Figure 3 for Mr. Biden is that Twitter users only associate him with BLM and China. While the Biden campaign will be happy to see no negative association between him and issues like Russia, rigging, misinformation, death and racism, he and his team have the opportunity to work on issues such as USPS, COVID-19, and democracy, where he can project himself as the leader who can guide the country through this time of crisis. In contrast to Mr. Biden, Mr. Trump has a high association with many of the issues, but the sentiments range from weakly to strongly negative.

拜登先生在表1和图3中突出的是,Twitter用户仅将他与BLM和中国联系在一起。 拜登竞选活动很高兴看到他与俄罗斯,操纵,错误信息,死亡和种族主义等问题之间没有消极联系,但他和他的团队有机会致力于诸如USPS,COVID-19和民主等问题。他可以将自己塑造成可以领导国家度过危机时期的领导人。 与拜登相反,特朗普先生在许多问题上都有很高的关联度,但情绪从弱到强烈不等。

三个摇摆州之战 (The battle of the three swing states)

It has been argued that Michigan, Pennsylvania and Wisconsin will, as they did in 2016, play a crucial role again this November. We found no difference between Mr. Biden and Mr. Trump in terms of who is more likely to be mentioned in a tweet from these states compared to the rest of the country. However, Democratic VP nominee Kamala Harris is significantly more likely to be mentioned in the battleground states relative to the rest of the country. That being said, the sentiment associated with Senator Harris in the battleground states is weakly negative. We also note that the strong likelihood of her being mentioned is probably due to her recent selection as Mr. Biden’s running mate. According to our analysis, Twitter users from the battleground states have a moderately negative sentiment toward Mr. Trump, while they are neutral toward Mr. Biden, as shown in Figure 4. As with the rest of the country, Vice President Pence has an extremely low frequency of mentions in tweets from the battleground states.

有人争辩说,密歇根州,宾夕法尼亚州和威斯康星州将像2016年一样,在今年11月再次发挥关键作用。 我们发现,在这些州的推文中,与国家的其他地区相比,拜登先生和特朗普先生之间没有区别。 但是,与该国其他地区相比,在战场上更有可能提及民主党副总统候选人卡马拉·哈里斯。 话虽这么说,在战场上与哈里斯参议员有关的情绪还是很负面的。 我们还注意到,被提及的可能性很大,这可能是由于她最近被选为拜登先生的竞选伙伴。 根据我们的分析,来自战场州的Twitter用户对特朗普先生有一定程度的负面情绪,而对拜登先生则持中立态度,如图4所示。与该国其他地区一样,彭斯副总统的态度极为极端来自战场国家的推文中提及的频率较低。

The major issues that battleground state users are likely to emphasize are also shown in Figure 4. Given the importance of these states in the election, both campaigns will want to study these issues, assess what users are saying, and devise messages and strategies accordingly.

图4中还显示了战场状态用户可能要强调的主要问题。鉴于这些州在选举中的重要性,两个竞选都将要研究这些问题,评估用户在说什么,并据此设计消息和策略。

Figure 4: The scene in the three battleground states
图4:三个战场状态下的场景

回顾到2016年 (Throwback to 2016)

What was it like at this time in 2016? We initially collected tweets from July 1, 2016 until August 16, 2016; however, we did not find a set of well-defined issues. Eventually we got tweets from October 1, 2016 until 7th November, 2016, to obtain a set of election issues, as shown in Figure 5. It is evident that FBI and Latinos were key issues, along with Democratic nominee Hillary Clinton’s emails, ISIS and concerns of rigging.

2016年这个时候是什么样的? 我们最初收集了2016年7月1日至2016年8月16日的推文; 但是,我们没有发现一系列明确定义的问题。 最终,我们从2016年10月1日到2016年11月7日获得了推文,以获取一系列选举问题,如图5所示。很明显,联邦调查局和拉丁裔是主要问题,民主党候选人希拉里·克林顿的电子邮件,ISIS和索具的担忧。

Figure 5: Issues from the 2016 U.S. Presidential election
图5:2016年美国总统大选的议题

我们为候选人提供的2美分建议 (Our 2 cents worth of advice for the candidates)

The lack of association we found between Mr. Biden and most of the election issues, along with the high association of the same issues with Mr. Trump, supports the idea that this election is more of the public’s verdict on Mr. Trump than about Mr. Biden. However, Mr. Biden and his team can put in the effort to obtain a larger mindshare of multiple key issues, to contrast himself with Mr. Trump. Given that healthcare is not getting much traction among the public, Mr. Biden can emphasize its importance and his approach to dealing with rising healthcare costs to distinguish himself.

我们发现拜登先生与大多数选举问题之间缺乏联系,再加上与特朗普先生存在着同样的问题,这支持了这样的观点,即这次选举更多是公众对特朗普先生的裁决,而不是关于特朗普先生的裁决。拜登 但是,拜登先生和他的团队可以努力使更多的关键问题得到更大的关注,以与特朗普先生形成鲜明对比。 鉴于医疗保健在公众中的吸引力不大,拜登先生可以强调其重要性以及他应对日益增长的医疗保健费用以脱颖而出的方法。

Mr. Trump appears to have quite a challenge ahead of him in overcoming negative sentiments in every issue he is associated with in the public’s mind. He and his team should take note that the battleground states consider USPS and BLM to be even more important compared to the rest of America, and carefully craft and communicate his messages regarding these issues. USPS has quickly come into the limelight; Mr. Trump is not faring well in this matter, and needs to rethink his strategy. Vice President Pence has very few mentions, and the Trump campaign has to decide whether Mr. Pence should be more visible, especially given the high profile of Senator Harris.

在克服公众心目中与他相关的每一个问题上的负面情绪时,特朗普先生似乎都面临着巨大的挑战。 他和他的团队应该注意,战场上的国家认为USPS和BLM与美国其他地区相比更为重要,并认真制定和传达有关这些问题的信息。 USPS已Swift成为人们关注的焦点。 特朗普先生在这件事上表现不佳,需要重新考虑他的战略。 彭斯副总统很少提及,特朗普竞选必须决定是否应该更显露彭斯,特别是考虑到哈里斯参议员的高调。

The 2020 election appears to be a totally different ballgame compared to that in 2016, with top issues getting defined in the public’s mind well in advance. As we have noted, issues and opinions can change rapidly in a volatile political environment, and therefore we urge both campaigns to replicate our analysis on a regular basis to keep checking the pulse of the nation in Twitter space and other social media.

与2016年相比,2020年大选似乎是一场完全不同的比赛,人们已经提前确定了重大问题。 正如我们已经指出的那样,在动荡的政治环境中,问题和观点可能会Swift改变,因此,我们敦促这两个运动定期复制我们的分析,以不断检查国家在Twitter空间和其他社交媒体中的脉搏。

方法 (Methodology)

获得推文 (Getting tweets)

We obtained tweets from July 1 until August 16, 2020, as well as those from 2016 using getoldtweets3 in Python, searching for “2020 (or 2016) U.S. Presidential Election”. After removing duplicates, we got 33,119 tweets (nationwide) for 2020, and 14,000 for 2016. Getoldtweets3 does not provide the registered location of the users. While it has a feature which allows a location and distance to be specified (e.g., — near “Michigan” — within 200mi), we considered it to be unreliable. For location data, we used the Twitter API, which only provides a maximum of 3000 tweets per search which are 3–4 days old. We used the API multiple times, and obtained 14,782 tweets for 2020 after removing duplicates.

我们获取了2020年7月1日至8月16日的推文,以及2016年使用Python中的getoldtweets3推文,搜索了“ 2020年(或2016年)美国总统选举”。 删除重复项后,我们在2020年获得了33,119条(全国性)推文,在2016年获得了14,000条。Getoldtweets3不提供用户的注册位置。 虽然它具有允许指定位置和距离的功能(例如-在“密歇根州”附近-200英里之内),但我们认为它是不可靠的。 对于位置数据,我们使用了Twitter API,该API每次搜索最多只能提供3000条推文,这些推文存在3至4天。 我们多次使用该API,并在删除重复项后获得了2020年的14,782条推文。

真正的用户还是机器人? (Real users or bots?)

It is reported that Twitter has been actively deleting or blocking accounts it considers to be bots. Still, we ran the list of users in our data through a free online bot detection service. Approximately 6% of the users were strongly predicted to be bots, which we deleted from the data set.

据报道,Twitter一直在积极删除或阻止它认为是机器人的帐户。 尽管如此,我们仍通过免费的在线漫游器检测服务在数据中运行了用户列表。 强烈预测大约6%的用户是机器人,我们从数据集中将其删除。

单词替换 (Word replacements)

Since different words can be used mean the same thing, we performed a word frequency analysis, and did a set of word replacements. For instance, post office, DeJoy, mailin, mail, etc. was replaced by USPS, while Covid, coronavirus and pandemic were replaced by COVID-19. This step makes it easier to reduce the redundancy between issues.

由于可以使用不同的词表示同一意思,因此我们执行了词频分析,并进行了一组词替换。 例如,邮局,DeJoy,mailin,邮件等被USPS取代,而Covid,冠状病毒和大流行被COVID-19取代。 此步骤使减少问题之间的冗余变得更加容易。

提升度:单词联想的指标 (Lift: A metric for word associations)

While percentages are easily understood and widely reported, we use a metric from data analytics called lift, through which we capture how strongly the public relates or associates two things such as an issue and a candidate. The formula for lift (or association in this case) is given by

尽管百分比很容易理解并被广泛报道,但我们使用了名为lift的数据分析中的指标,通过该指标,我们可以了解公众与问题和候选人等两件事之间的密切联系或关联。 提升(在这种情况下为关联)的公式由

A tweet, “President Trump has managed the economy well” is an example of both an issue (economy) and candidate (Trump) being mentioned in a post. By contrast, a tweet, “in this election, it is the economy that matters” will count toward the mention of an issue (economy), but not a candidate. So, let us say we collect 10k tweets, and that Mr. Trump and the economy appear together in 4k tweets. Furthermore, Mr. Trump is mentioned in 7k tweets, and the economy gets mentioned in 5k tweets. Now we can do the arithmetic:

一条推文“特朗普总统的经济管理得很好”是一个帖子(问题)和候选人(特朗普)都被提及的例子。 相比之下,“在这次选举中,重要的是经济”这条推文将被视为提及问题(经济),而不是候选人。 因此,让我们说我们收集了1万条推文,而特朗普先生和经济一起出现在4万条推文中。 此外,在7k的推文中提到了特朗普先生,而在5k的推文中提到了经济。 现在我们可以做算术了:

Probability of economy and Trump being co-mentioned = 4k/10k = .4

经济和特朗普被共同提及的概率= 4k / 10k = .4

Probability of economy being mentioned = 5k/10k = .5

提到经济的可能性= 5k / 10k = 0.5

Probability of Trump being mentioned = 7k/10k = .7

提到特朗普的概率= 7k / 10k = 0.7

So lift(economy, Trump) = .4 / (.5*.7) = 1.14

因此,提升(经济,特朗普)= .4 /(.5 * .7)= 1.14

A lift value greater than 1 means that when people think about the economy, they also think of Mr. Trump, and vice versa. That is, the economy and Mr. Trump are associated in their minds; the higher the lift value, the stronger the association. To make it easy to interpret our results, we use the following ranges:

提升值大于1意味着人们在考虑经济时也会想到特朗普先生,反之亦然。 也就是说,经济和特朗普先生在他们心中息息相关; 提升值越高,关联越强。 为了易于解释我们的结果,我们使用以下范围:

· Weak association: Lift value greater than 1 but less than 1.4.

·弱关联:提升值大于1但小于1.4。

· Moderate association: Lift value equal to or greater than 1.4, but less than 1.75.

·中等关联:提升值等于或大于1.4,但小于1.75。

· Strong association: Lift value equal to or greater than 1.75

·强关联性:提升值等于或大于1.75

A somewhat nerdy but statistically correct way of thinking about lift is that it tells us how strong the association is relative to the case where people do not connect the issue and the candidate (independent events). For example, a lift value of 1.14 between the economy and Mr. Trump should be interpreted as follows: There is a 14% higher chance — a significant, but weak association — that we will see economy and Mr. Trump being mentioned in a tweet relative to what we would have expected if Twitter users did not consider them as related.

一个关于书呆子的书呆子,但在统计上是正确的,它告诉我们联想与人们不联系问题和候选人(独立事件)的情况相比有多强。 例如,经济与特朗普先生之间的升值1.14应该解释如下:在推文中提到经济与特朗普先生的可能性增加了14%,这是一个重要的但虚弱的联系。相对于Twitter用户不认为它们相关的预期。

为什么要抬? (Why lift?)

So why use lift values, which some may find difficult to understand or interpret? To appreciate the importance of lift over simple percentages, let us keep all but one of the numbers in the example above the same. Let us change the number of tweets mentioning Mr. Trump to 8k. Now the Lift(economy, Trump) becomes .4/(.5*.8) = 1. So, in this case, even though 40% of the tweets mentioning both economy and Mr. Trump may look like an impressive number, there is actually NO association between the economy and Mr. Trump in people’s minds. If we are interested in answering the question — when voters think of the economy, do they think of Mr. Trump (and vice versa) — only lift, and NOT percentages, can give us the correct answer.

那么,为什么要使用某些可能难以理解或解释的提升值呢? 要了解提升简单百分比的重要性,让我们让上面示例中的数字(除了其中一个数字)保持不变。 让我们将提及特朗普先生的推文数量更改为8k。 现在,Lift(经济,特朗普)变为.4 /(。5 * .8)=1。因此,在这种情况下,即使40%的推文中提到经济和特朗普先生看起来都像一个令人印象深刻的数字,在人们看来,经济与特朗普之间实际上没有任何联系。 如果我们有兴趣回答这个问题-当选民想到经济时,他们会想到特朗普先生(反之亦然)-只有提高而不是百分比,才能为我们提供正确的答案。

A lift value of 1 or less means that there is no association between a state and a candidate. Of course, lift values do not tell us if users are talking positively or negatively about a candidate, which has to be determined by sentiment analysis. A high lift value along with a positive sentiment is the best thing a candidate can hope for, and a high lift value with a negative sentiment is what a candidate would fear the most.

提升值等于或小于1表示状态与候选者之间没有关联。 当然,提升值并不能告诉我们用户是在正面还是负面地谈论候选人,而这必须通过情感分析来确定。 高升力值和积极的情绪是候选人最希望得到的东西,高升力值和消极的情绪是候选人最担心的事情。

情绪分析 (Sentiment analysis)

Automated (unsupervised) sentiment analysis is widely used in natural language processing and text analytics, but is tricky, and can lead to misleading results. Consider two tweets in our data, which received a lot of retweets:

自动化(无监督)情绪分析在自然语言处理和文本分析中被广泛使用,但是非常棘手,并且可能导致误导性结果。 考虑一下我们数据中的两条推文,它们收到了很多推文:

“China will help Biden win the 2020 election.”

“中国将帮助拜登赢得2020年大选。”

“To win the election Trump is spreading misinformation about peaceful black protestors.”

“为了赢得选举,特朗普正在散布关于和平黑人抗议者的错误信息。”

Both tweets have a negative sentiment, the first one involving Mr. Biden and China, and the second linking Mr. Trump to misinformation. However, passing these tweets through a sentiment analyzer incorrectly results in positive sentiment scores. After all, help and win in the first tweet are positive words, when considered without this specific context. In the second tweet, the words win and peaceful are positive words, which dominate the negative word misinformation. To deal with these issues, we carefully adjusted the weights of frequently occurring words for each candidate and each issue. Then we manually checked the sentiment scores for most retweeted tweets to ensure that the automated sentiment analyzer’s scores were accurate and consistent.

两条推文都有负面情绪,第一条涉及拜登先生和中国,第二条将特朗普先生与虚假信息联系起来。 但是,通过情感分析器正确传递这些推文会导致积极的情感评分。 毕竟,如果在没有这种特定上下文的情况下考虑,则第一条推文中的“ 帮助胜利 ”是肯定的。 在第二条推文中,“ 胜利”和“ 和平 ”一词是肯定词,在否定词错误信息中占主导地位。 为了解决这些问题,我们针对每个候选人和每个问题仔细调整了经常出现的单词的权重。 然后,我们手动检查了大多数转发的推文的情绪分数,以确保自动情绪分析器的分数准确且一致。

翻译自: https://medium.com/swlh/the-pulse-of-the-nation-a-twitter-analysis-of-the-2020-u-s-presidential-election-15f9afa86ec7

脉搏波形分析


http://www.taodudu.cc/news/show-2814810.html

相关文章:

  • 从李嘉诚先生那里得到的
  • 阅兵式直播刺杀总统,无人机敢死队有多可怕
  • 我们选择登月(肯尼迪总统在赖斯大学的演讲)
  • 穆勒的报告显示特朗普没“通俄”后,总统先生开心的像只猴子
  • 学习资源之mk
  • NetBeans的学习资源
  • C#学习资源网站汇总
  • UE5 虚幻引擎学习资源汇总
  • android学习资源大整合(持续更新ing)
  • 百度网盘2T学习资源分享干货满满
  • 各种学习资源网址
  • WebGIS学习资源推荐(包含学习路线、软件和数据资源推荐)
  • 迁移学习资源整理
  • 渗透前期学习资源分享
  • 深度学习资源
  • java后台学习资源汇总
  • 强烈推荐33个 GitHub 前端学习资源
  • 全球60个线上学习资源网站分享!
  • golang学习资源
  • 这10个学习资源网站,一年能帮你省下几十万的学费
  • 9个学习资源分享给大家,总有你需要的干货!
  • MATLAB实现在不同Es/N0情况下,QPSK、16QAM、64QAM误码率结果仿真图(包含软硬判决)
  • 银联云闪付小程序支付对接和坑
  • 华为p20支持手机云闪付吗_银联云闪付支持HuaweiPay
  • java调用银联支付接口开发,银联在线Java接口开发
  • 银联进件渠道教程-云闪付收银台最新方法(可对接易支付)
  • 云闪付怎么用
  • PHP开发银联云闪付二维码支付
  • 银联手机网页如何调用云闪付(银联钱包)
  • html 银联图标,银联标志logo图片 云闪付app扫银联标识领获红包

脉搏波形分析_国家脉搏2020年美国总统大选的推特分析相关推荐

  1. Python实战项目:基于jupyter notebook处理16年美国总统大选数据

    Python实战项目:基于jupyter notebook处理16年美国总统大选数据 设计需求说明 源码展示以及实验截图: 总结: 设计需求说明 设计任务 按照月份求两位候选人民调数据的和,用折线图反 ...

  2. Python数据分析实战,,美国总统大选数据可视化分析[基于pandas]

    目录 前言 一.任务详情 二.数据集来源 三.实现过程 四.运行代码 前言 在学习Python数据分析的过程中,是离不开实战的. 今天跟大家带来数据分析可视化经典项目,美国总统大选数据可视化分析,希望 ...

  3. rstp 小米网络摄像头_国家部门调查联邦美国快递,联通VoLTE试商用开启,iOS蜂窝网络下载上限提高,小米申请屏下摄像头专利,这就是今天的其他大新闻!...

    今天是6月1日 农历四月廿八 今天小黑胖过节啦 大家快给我投食鸡腿 下面是今天的其他大新闻  # 国家有关部门决定立案调查美国联邦快递  ( IT之家 )根据央视网的报道,最近,美国联邦快递在我国发生 ...

  4. 【2008美国总统大选—奥巴马当选演讲:美国的变革】【中英双字幕】

    Obama: 奥巴马: Hello, Chicago. 芝加哥,我来了! If there is anyone out there who still doubts that America is a ...

  5. 【数据分析学习笔记day09】数据分析实战案例:2016美国大选民意调查统计+2016年美国总统大选民意调查数据统计+示例代码1 +示例代码2:

    文章目录 2016年美国大选民意调查数据统计: 示例代码1 : 示例代码2: 2016年美国大选民意调查数据统计: 项目地址:https://www.kaggle.com/fivethirtyeigh ...

  6. 支持区块链技术的前星巴克CEO Howard Schultz计划参选美国总统

    点击上方 "蓝色字" 可关注我们! 暴走时评: 亿万富翁霍华德·舒尔茨(Howard Schultz)是星巴克前首席执行官,同时也是一名区块链支持者.比特币抨击者,由于厌恶美国政治 ...

  7. 74岁的程序员,暴富,撩嫩模,逃亡,现在要竞选美国总统!

    编辑:Emma 74岁的约翰.迈克菲(John McAfee),又宣布竞选美国总统了. 他这次的参选声明,和2016年那次一样,主流时政头版鲜有报道,一些IT网站却没少宣传和转载.人不亲行业亲,谁让他 ...

  8. 2020美国纽约大学计算机科学排名,2020年美国纽约大学专业排名

    纽约大学成立于1831年,是一所位于美国纽约市曼哈顿的私立研究型综合大学,主要的校区都位于纽约市曼哈顿格林威治村,以华盛顿广场为中心,提供本科.硕士.博士.职业文凭.副学士.文凭课程,六种学位类型.接 ...

  9. 美国考试能用计算机吗,2020年美国cpa考试允许考生自带计算器吗?

    2020年美国cpa考试允许考生自带计算器吗?机考计算器的基本使用方法有哪些?许多考生都在关注这个问题,一起来看看吧! 每个考试都有自己的规定,AICPA考试也不例外.AICPA考试严禁携带计算器,手 ...

最新文章

  1. 零基础入门学习Python(28)文件系统
  2. linux doc下生成学号,linux的.doc
  3. 5月3日 条件语句、循环语句的复习练习
  4. DPDK 报文收发流程(二十五)
  5. 企业轻资产化趋势难挡,易点租适时而起未来可期
  6. handsontable pro 授权码 key 生成器(JS函数)(仅供学习交流)
  7. 什么是shell? bash和shell有什么关系?
  8. 人人商城小程序服务器根目录,人人商城前端小程序如何配置 人人商城搭建教程...
  9. 基于WEB的自行车租赁管理系统设计与实现
  10. 红外遥控器-VS1838B/HS0038红外接收方案(包含原理图+PCB+BOM表+程序)
  11. 51单片机c语言脉冲计数实验报告,单片机计数器实验报告.doc
  12. linux双系统无u盘安装教程视频教程,window 与Linux Mint 双系统U盘安装方法
  13. 使用docx4j追尾合并多个docx文件为一个docx文件
  14. 时间记录APP———Time Meter
  15. 南通大学关于 2022 年下半年成人学士学位英语考试的通知
  16. 为什么模型loss为负
  17. 专升本英语——语法知识——高频语法——第六节 名词性从句(主语从句-表语从句-同位语从句-宾语从句)【学习笔记】
  18. 配置无状态IPv6地址自动配置基础实验
  19. poi版本冲突导致连续报错NoSuchMethodError、VerticalAlignment无法转换为short和ClassNotFoundException的解决办法及兼容性问题解决
  20. Python对文件的创建和读写操作

热门文章

  1. HDU 2121 Ice_cream’s world II (最小树形图+虚根)
  2. android使用google gcm接收push消息需要注意的地方
  3. 几种贴图压缩方式详解
  4. 蓝牙 aptx android,蓝牙aptx干嘛用的
  5. Save More Mice (贪心 二分)
  6. 这些操作技巧能够让你的公众号迅速增粉
  7. html select 选中触发,实现select中指定option选中触发事件
  8. 苹果手机白屏_安卓用久了卡顿 苹果用久了闪退 究竟是因为啥?
  9. project 2016软件
  10. 2023四川农业大学计算机考研信息汇总