诸神缄默不语-个人CSDN博文目录

论文名称：Gender and Racial Stereotype Detection in Legal Opinion Word Embeddings
论文ArXiv下载地址：https://arxiv.org/abs/2203.13369
论文AAAI官方预印版下载地址：https://www.aaai.org/AAAI22Papers/AISI-10870.MatthewsS.pdf
官方讲解视频：https://aaai-2022.virtualchair.net/poster_aisi10870（这作者英语说得飞快，啪啪的）

本文是2022年AAAI论文，关注机器学习的公平性问题，检验用美国民法judicial opinions（法律意见书legal opinion）训练出的词嵌入中的性别和种族刻板印象（stereotype，bias）。
本文关注historical和representation bias。实验会证明historical因素（用于词嵌入训练的案例的所属时代）不是主要影响因素。

感觉公平性问题的检测很奇怪，人工工作比机器工作还要多。

文章目录

1. Background
2. 难点及对应的解决方案
3. 代码复现
4. 其他与公平性相关的实践

1. Background

Implicit Association Test (IAT) 衡量人类参与者对目标单词（花或昆虫）与属性术语（快乐或不快乐）进行分类的反应时间。

Word Embedding Association Test (WEAT)：衡量目标单词分组（如分男女）与属性术语（如情感正面或负面）的相似性，如测量男性相关的词嵌入是否与正面情感的词嵌入更靠近。方法是衡量两组目标单词（如典型男性人名或女性人名）与属性术语（如快乐的（love peace）或不快乐的（ugly hatred））的词嵌入之间的相似性（association，余弦相似度）

bias分类：historical, representation, measurement, aggregation, evaluation, and deployment biases

2. 难点及对应的解决方案

法律文本比较正式化，会多次使用正规人称代词，人名、姓氏、人称代词都可能会嵌入bias，只检测人名的话就会导致其他bias丢失。→使用与种族有关的姓氏。
法律工作者中本身就缺少女性，可能会导致出现gender-occupational stereotypes。
法律领域不能直接用open-domain的情感词表→WEAT测试的属性词表用了通用的词表，加上domain specific and expanded word lists（选取了一些标志性词语作为seed terms，然后用词嵌入来生成expanded word lists（正向词：与已有正向词和已有负向词向量差这一向量余弦相似度高。负向词与之相反），然后人工评审删掉具有明显种族或性别特征的词语）
IAT检验主要考虑属性的正负性，但是对法律问题来说，结果的影响程度更大→使用一些衡量法律意见书对结果的grant或deny的指示词来衡量结果的正负性。

抽取短语（Idiomatic Phrase Extraction）→ 训练skip-gram word2vec model词嵌入（在所有语料、按时间或legal topic切割出的子语料上分别训练）→ 在性别和种族上进行WEAT检测
性别：人名和其他典型指示代词
种族：姓氏

优化：

Idiomatic Phrase Extraction：为了防止n-gram dictionaries过大，只考虑了共同出现频率高的短语，用Normalized Point-wise Mutual Information (NPMI)指标来选择加入词典的短语。
姓氏可能跟公司名等重合的问题：
1. Title cased the surnames to target proper nouns.
2. Idiomatic phrase extraction排除了一些非人名。
3. Centroid-based filtering to remove multi-sense words.（计算所有姓氏的表征，计算各姓氏与所有姓氏表征的centroid的余弦相似度，删除20%相似度最低的姓氏）（人名的处理方式类似）

实验设置：

短语抽取阶段NPMI的阈值为0.5。
词嵌入维度为300，词语最低出现频率为30，sampling threshold为 $10^{-4}$ ，学习率为0.05，window size为10，negative samples为10。
WEAT检测计算了标准差（by sub-sampling the word lists with a simple bootstrapping procedure）
考虑到美国历史上有更严重的歧视问题，因此排除了时间因素（temporal effect），但是仍然有不公平问题。
做法：按时间分割语料，在不同时间段上训练词嵌入，进行WEAT test
性别刻板印象，使用不同的目标词语：
考虑到不同legal topic：将语料按照不同topic进行分割。（为了防止低频影响，删掉了出现频率小于30的属性词语）

3. 代码复现

论文没有给出公开代码，但是看起来复现不难（只要搞到数据集），等我服务器好了而且有时间了就写一份！

4. 其他与公平性相关的实践

LeSICiN¹和ECHR²是将命名实体进行了mask，来减少demographic bias。

Re6：读论文 LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification fro ↩︎
Neural Legal Judgment Prediction in English ↩︎

Re13：读论文 Gender and Racial Stereotype Detection in Legal Opinion Word Embeddings相关推荐

Re38：读论文 NeurJudge: A Circumstance-aware Neural Framework for Legal Judgment Prediction
诸神缄默不语-个人CSDN博文目录本文是2021年SIGIR论文,官方下载地址:https://dl.acm.org/doi/10.1145/3404835.3462826 官方GitHub地址:y ...
读论文　Automatic generation and detection of highly reliable fiducial markersnunder occlusion
论文讲述了marker生成,检测,姿态评估的算法. 使用markers board,可以提高健壮性,以及遮挡情况下的姿态估算. 使用HSV颜色分割,marker使用蓝绿色,效果也不错. 1.提取轮廓, ...
Re28：读论文 CECP Charge Prediction by Constitutive Elements Matching of Crimes
诸神缄默不语-个人CSDN博文目录论文名称:Charge Prediction by Constitutive Elements Matching of Crimes IJCAI官方下载地址:htt ...
Re6：读论文 LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Identification fro
诸神缄默不语-个人CSDN博文目录论文名称:LeSICiN: A Heterogeneous Graph-based Approach for Automatic Legal Statute Ide ...
Re 39：读论文 CTM Augmenting Legal Judgment Prediction with Contrastive Case Relations
诸神缄默不语-个人CSDN博文目录论文名称:Augmenting Legal Judgment Prediction with Contrastive Case Relations 论文下载地址:h ...
【读论文】Loop Closure Detection for Visual SLAM Systems Using Convolutional Neural Network
[读论文]Loop Closure Detection for Visual SLAM Systems Using Convolutional Neural Network 发表于2017年,作者是南 ...
Re15：读论文 LEVEN: A Large-Scale Chinese Legal Event Detection Dataset
诸神缄默不语-个人CSDN博文目录论文名称:LEVEN: A Large-Scale Chinese Legal Event Detection Dataset 本文是2022年ACL论文,作者来自 ...
搞科研，从好好读论文开始：沈向洋带你读论文了
「或许你永远不知道你以前读过的书能在什么时候派上用场,但请保持阅读,因为阅读的过程也是在你大脑中建立认知的过程.」对于科研人员来说,读论文是一种必修技能.去年,沈向洋博士曾在线上公开课<You ...
Re23：读论文 How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence
诸神缄默不语-个人CSDN博文目录论文名称:How Does NLP Benefit Legal System: A Summary of Legal Artificial Intelligence ...

Re13：读论文 Gender and Racial Stereotype Detection in Legal Opinion Word Embeddings