Emotion, Event detection

Emotion Detection
- Second order (derived) emotions:
- Sentiment vs emotion:
- Emotion detection
- - Problems in data collection
  - Use hashtag based on data collection
  - emoticons
- 整体流程
Event Detection
- Supplementary
- Useful of event detection
- Topic Detection Tracking (TDT)
- - Basic TDT Clustering Approach
- Event Detection
- - Event
  - Current approaches
  - ENTITY-BASED Event detection
  - - Entity
  - Evaluation of ENTITY-BASED Event detection

Emotion Detection

What triggers emotions?
stimulus event

external
- natural phenomena
- other people’s behavior
internal
- Neuroendocrine or Physiological Changes
- memories

Universal emotion categories:
anger, disgust, fear, happiness, sadness and surprise,
also are basic (primary) emotions
can be reduced to 4 categories:

happiness, sadness, fear/surprise, and anger/disgust

Second order (derived) emotions:

Emotional states that are not so basic, like chagrin, irritation

Sentiment vs emotion:

Sentiments can be formed and retained for longer time
Emotion lasts for shorter time
Sentiments are target centred, hence directed
Emotions are not target centred
A text can have multiple emotions

Emotion detection

eg: '7 dead in apartment building fire’
Anger 10%, disgust 1%, fear 30%, joy 0, sadness 50% surprise 5%

Problems in data collection

Uncertainty, incompleteness and even mistakes among the ground truth label due to annotators expertise or task’s difficulty

Use hashtag based on data collection

Direct access to user’s intent
List the emotion hashtags of 28 affected words or extend the list by WordNet synsets
Collect tweets that contain one or more hashtags that fall in the defined list of emotions hashtags
Consider tweets only with hashtags
Add score based on this
直接访问用户的意图
列出 28 个受影响词的情感标签或通过 WordNet 同义词集 扩展列表
收集包含一个或多个主题标签的推文，这些主题标签属于已定义的情绪标签列表
只考虑带有主题标签的推文
根据标签算分

emoticons

符号表示心情例如

https://emojipedia.org/twitter/

List of emoticons and their associations to eight emotions
Annotate a tweet into that category if the emoticons appear at the end
Sometimes we meet conflicts, (happy+sad)
表情符号列表及其与八种情绪的关联
如果表情符号出现在末尾，则将推文注释到该类别中
有时我们会遇到冲突，（快乐+悲伤），这时候需要比较 emotion word lexicon，hashtags lexicon 和 Emoticon lexicon 的score，哪个高就分给哪个分类

整体流程

Event Detection

Supplementary

Tokenization, Lemmazation, Stopword
Named Entity Recognition
i)Detect a named entity
ii) Categorize the entity
- Person
- Organization
- Time
- Location
Part-of-Speech Tagging
Word sense disambiguation
Textual Entailment
Extract a directional relation between text fragments
提取文本片段之间的方向关系
If you help the needy, God will reward you →\rightarrow→
- Giving money to a poor man has good consequences
- Giving money to a poor man has no consequences
- Giving money to a poor man will make you better person
Automatic summarization
Extract a readable summary from text (news, articles, documents…)
从文本中提取可读的摘要（新闻、文章、文件……）
Sentiment Analysis
Exact subjective polarity from documents: positive, negative, or neutral
来自文档的确切主观极性：正面、负面或中性
Vector-Space Representation of Documents
TDM or DTM matrix
维基百科
https://zh.wikipedia.org/wiki/%E5%80%92%E6%8E%92%E7%B4%A2%E5%BC%95

“对相同的文字，我们得到后面这些完全反向索引，由文档数量和当前查询的单词结果组成的的成对数据。同样，文档数量和当前查询的单词结果都从零开始。所以，“banana”: {(2, 3)} 就是说 "banana"在第三个文档里 ({\displaystyle T_{2}}T_{2})，而且在第三个文档的位置是第四个单词(地址为 3)。”
Inverted File / Index

An Inverted File (or Inverted Index) is a documentterm matrix representation “inverted” so that rows become columns and columns become rows
倒排文件（或倒排索引）是“倒置”的文档术语矩阵表示，因此行变为列，列变为行

TF-IDF
Importance of Term to Document in Collection (Corpus):
- Term Frequency (TF): Count of times a term appears in a document.
- Inverse Document Frequency (IDF): determine whether a term is common or rare across all documents.
  术语对集合中文档的重要性（语料库）
- 词频 (TF)：词出现在文档。
- 逆向文档频率 (IDF)：确定是否一个术语在所有文档中都很常见或很少见。

Useful of event detection

Delay between new event appearing on Twitter and the time taken for the same event to be updated on Wikipedia

On average twitter is about 2 hours ahead of Wikipedia
Twitter’s real-time nature has allowed it to be used as a co-ordination tool for protestors and demonstrators
Event detection & Stock Movements
A tool which automatically detect, track and organize these events would be valuable to Journalists, Finance (equities, forex, commodities, even cryptocurrencies), Security and Intelligence Services

出现在 Twitter 上的新事件与同一事件将在 Wikipedia 上更新

平均而言，推特比维基百科早约 2 小时
Twitter 的实时性使其可以用作协调工具为抗议者和示威者
事件检测和股票走势
自动检测、跟踪和组织这些事件的工具对记者、金融（股票、外汇、大宗商品，甚至加密货币）、安全和情报服务很有价值

Topic Detection Tracking (TDT)

Monitoring broadcast news
Newswire documents
Almost all systems used a version of online, nearest neighbor clustering
监控广播新闻
新闻专线文件
几乎所有系统都使用在线最近邻集群的一个版本

Basic TDT Clustering Approach

Compare it to every news article that has been seen before
Cosine Similarity
If a similar article was found (i.e. they discuss the same event):
- Add the new article to the same cluster as the most similar article
If no similar article was found (i.e. it’s a newevent):
- Create a new cluster and add the new article to it
  \
将其与以前看过的每篇新闻文章进行比较
Cosine Similarity
如果发现了类似的文章（即他们讨论相同的事件）：
- 将新文章添加到与最相似文章相同的集群中
如果没有找到类似的文章（即这是一个新事件）：
- 创建一个新集群并将新文章添加到其中

Cosine Similarity
sim(A,B)=cos(θ)=A⋅B∣∣A∣∣∣∣B∣∣sim(A,B)=cos(\theta)=\frac{A\cdot B}{||A||||B||}sim(A,B)=cos(θ)=∣∣A∣∣∣∣B∣∣A⋅B

Issueswith the TDT approach

Clustering
- Clustering is slow
- Some are bad and some groups are not that bad
- Growth of groups
- Order of documents
Assumes all content is newsworthy

Event Detection

Event

An event is a significant thing that happens at some specific time and place.
事件是发生在一些特定的时间和地点
Issues

Insignificant and mundane events
- newsworthiless
- Fragmented events
微不足道和平凡的事件
- 没有新闻价值
- 碎片化事件

How to solve
Entity based approach (McMinn et al)

Aggressive filtering of mundane events
More structured approach to event detection
Event can contain several entities and topics
Reducing likelihood that a single real-world event can be detected as several real-world events

基于实体的方法（McMinn）

积极过滤平凡的事件
更结构化的事件检测方法
事件可以包含多个实体和主题
降低单个现实世界事件被检测为多个现实世界事件的可能性

Current approaches

Monitor events real-time
Locality Sensitive Hashing (LSH) ---- scalable, real-time event detection, proposed by Petrovic et. al
- Places similar documents into buckets of a hash table
- Nearest neighbor with a high probability
- Clustering can then be done real-time with variance reduction technique
LSH can be thought of replacing the inverted index shown in the pseudocode from earlier, it does the same job as the inverted index but much more efficiently reduces the number of comparisons that need to be made
\
实时监控事件
Locality Sensitive Hashing (LSH) ---- 可扩展的实时事件检测，由 Petrovic 等人提出
- 将相似的文档放入哈希表的桶中
- 最近的邻居有高概率
- 然后可以使用方差减少技术实时进行聚类
- LSH 可以用来替换之前伪代码中显示的倒排索引，它与倒排索引相同，但更有效地减少了需要进行的比较次数

ENTITY-BASED Event detection

Entity

A thing with distinct and independent existence.
An entity is any singular, identifiable and separate object. For example, a particular person, organization or location.
具有鲜明而独立存在的事物。
实体是任何单一的、可识别的和独立的对象。例如，一个特定的人、组织或地点。

Entities are used to be used to

Sports event detection
Domain knowledge exploited

Pipeline

Pre-processing
- Remove noise, redundant tweets (Filtering)
- Filters out unwanted tweets such as advertisements
- Parsing and tagging (Part of Speech Tagging (POS), Named Entity Recognition (NER))
- remove retweets
Clustering
- the most similar tweets will always discuss the same entities
- For each entity we pull out documents/tweets they are in, add them to this inverted index, then, tweets get added to an index for each named entity they contain
- Tweets are clustered on a per-entity basis
Burst detection
- Three-sigma rule to detect positive outliers (bursts)
Event Creation
Cluster selection
- Find clusters that represent a new topic or change in topic around the entity， likely related to whatever caused the burst
- Use centroid times (mean time of all documents in cluster) to identify event clusters and add them to events
- Filter out older clusters (likely to be noise or background topics) and smaller clusters (less than 10 tweets)
- If we can’t find clusters, the burst is probably caused by random noise or a background topic.
Event merging
- Many events are about multiple entities, so we need to identify these links and combine the separate events into one
- Only needs 1-way relationship: small events can link themselves to larger events
- Split person names to improve effectiveness: “Barack Obama” → “barack” and “obama”
- Check for possible merges after every tweet, and merge recursively
预处理
- 去除噪音、多余的推文（过滤）
- 过滤掉不需要的推文，例如广告
- 解析和标记（部分语音标记（POS），命名实体识别（NER））
- 去掉重复推文
聚类
- 最相似的推文总是讨论相同的实体
- 对于我们提取它们所在的文档/推文的每个实体，将它们添加到这个倒排索引中，然后，推文被添加到它们包含的每个命名实体的索引中
- 推文是按实体聚集的
突发检测
- 按时间间隔5, 10, 20, 40, 80, 160, 320 minutes 监测，移除320分钟（六小时左右）以上的推文，因为“A ~6 hour old tweet isn’t much use in a breaking news situation。”
- Three-sigma rule:
事件创建
- 检测到burst事件，就停止更新实体频率信息
- 直到实体频率下降到爆发值以下(+1 and 1/2 the window length)
集群选择
- 需要在实体周围找到代表新主题或主题变化的集群，可能与导致爆发的原因有关
- 使用质心时间（集群中所有文档的平均时间）来识别事件集群并将它们添加到事件中
- 过滤较旧的集群（可能是噪音或背景主题），较小的集群（10 条推文以下）
- 如果找不到集群，则突发可能是由随机噪声或背景主题造成的。
事件合并
- 许多事件涉及多个实体需要
- 识别链接将单独的事件合并为一个。
- 只需要单向关系：小事件可以链接到更大的事件事件
- 拆分人名以提高效率：“Barack Obama” → “barack”和“obama”
- 在每条推文之后检查可能的合并，并递归合并

Evaluation of ENTITY-BASED Event detection

Precision
The fraction of retrieved documents that are relevant to the query
precision=ABprecision=\frac{A}{B}precision=BA
Recall
The fraction of the relevant documents that are successfully tested
Recall=ARRecall=\frac{A}{R}Recall=RA
F Measure
Harmonic mean of precision and recall
f=1α(1/P)+(1−α)(1/R)f=\frac{1}{\alpha(1/P)+(1-\alpha)(1/R)}f=α(1/P)+(1−α)(1/R)1

LSH: Locality Sensitive Hashing
CS: Cluster Summarization

Web Science笔记 Emotion, Event detection相关推荐

阅读笔记——2019_004 A SURVEY OF TECHNIQUES FOR EVENT DETECTION IN TWITTER
A SURVEY OF TECHNIQUES FOR EVENT DETECTION IN TWITTER 这篇文章是在阅读笔记003的参考文献中溯源而得,文章年限比较久了,但其中的一些事件检测技术还 ...
论文笔记 ACL 2021|Low-resource Event Detection with Ontology Embedding
文章目录 1 简介 1.2 创新 2 方法 2.1 Event Detection (Ontology Population) 2.2 Event Ontology Learning 2.3 Even ...
论文笔记 EMNLP 2020|Edge-Enhanced Graph Convolution Networks for Event Detection with Syntactic Relation
文章目录 1 简介 1.1 动机 1.2 创新 2 背景知识 3 方法 4 实验 1 简介论文题目:Edge-Enhanced Graph Convolution Networks for Even ...
论文笔记 EMNLP 2021|Treasures Outside Contexts: Improving Event Detection via Global Statistics
文章目录 1 简介 1.1 动机 1.2 创新 2 方法 2.1 语义特征提取器 2.2 统计特征提取器 3 实验 1 简介论文题目:Treasures Outside Contexts: Impr ...
论文笔记 NAACL findings 2022|Zero-Shot Event Detection Based on Ordered Contrastive Learning and Prompt-
文章目录 1 简介 1.1 动机 1.2 创新 2 方法 2.1 Contrastive sample generator 2.2 Event encoder 2.3 Ordered contrast ...
论文笔记 EMNLP 2021|Modeling Document-Level Context for Event Detection via Important Context Selection
文章目录 1 简介 1.1 创新 2 方法 2.1 预测模型 2.2 上下文选择 2.3 训练 3 实验 1 简介论文题目:Modeling Document-Level Context for E ...
《Word Sense Disambiguation Improves Event Detection via Neural Representation Matching》阅读笔记
文章目录一.motivation 二.method Pretrain + fine-tune ALT 说明:<Similar but not the Same: Word Sense Disa ...
Sound Event Detection: A Tutorial 学习笔记
原文链接目录一.日常环境中的声音世界检测二.声音事件监测的挑战三.通用的机器学习方法四.数据五.信号处理方法 A 数据增强 B 特征表示六.SED 机器学习 A CRNN B 先进方法 ...
论文笔记 EMNLP 2018|Collective Event Detection via a Hierarchical and Bias Tagging Networks with Gated
文章目录 1 简介 1.1 创新 2 背景知识 3 方法 4 实验 1 简介论文题目:Collective Event Detection via a Hierarchical and Bias T ...
论文笔记 EMNLP 2021|Lifelong Event Detection with Knowledge Transfer
文章目录 1 简介 1.1 创新 2 方法 2.1 baseline 2.2 新旧事件类型的知识迁移 3 实验 1 简介论文题目:Lifelong Event Detection with Know ...

Web Science笔记 Emotion, Event detection

Emotion, Event detection

Emotion Detection

Second order (derived) emotions:

Sentiment vs emotion:

Emotion detection

Problems in data collection

Use hashtag based on data collection

emoticons

整体流程

Event Detection

Supplementary

Useful of event detection

Topic Detection Tracking (TDT)

Basic TDT Clustering Approach

Event Detection

Event

Current approaches

ENTITY-BASED Event detection

Entity

Evaluation of ENTITY-BASED Event detection

Web Science笔记 Emotion, Event detection相关推荐

最新文章

热门文章