
by Amber Thomas


在2016年最大的电影中,女性只说了27%的单词。 (Women only said 27% of the words in 2016’s biggest movies.)

Movie trailers in 2016 promised viewers so many strong female characters. Jyn Erso. Dory. Harley Quinn. Judy Hopps. Wonder Woman. I felt like this could be the year for gender equality in Hollywood’s biggest films.

2016年的电影预告片向观众承诺了这么多坚强的女性角色。 珍妮·艾索(Jyn Erso)。 海ry 哈雷奎恩。 朱迪·霍普斯(Judy Hopps)。 神奇女侠。 我觉得这可能是好莱坞最大的电影中实现性别平等的一年。

I was wrong.


And I don’t make this statement lightly.


As a scientist, I turn to data to answer questions I have about the world. And I’ve got the data to back up my claim. In fact, you can have the data, code, and resulting data visualization that I made trying to better understand this topic. But first, let me tell you how I became so interested.

作为科学家,我求助于数据来回答关于世界的问题。 而且我有数据来支持我的主张。 实际上,您可以获取我试图更好地理解该主题的数据,代码和结果数据可视化 。 但是首先,让我告诉您我是如何变得如此感兴趣的。

It all started when I went to see Rogue One: A Star Wars Story. All promotional materials for the movie indicated that Jyn Erso (played by Felicity Jones) was the main character. I mean, just look at the poster.

当我去看《侠盗一号:星球大战外传》时,一切就开始了。 电影的所有宣传材料都表明,金恩·埃索(由Felicity Jones饰演)是主角。 我的意思是,只看海报。

When your picture is several times larger than everyone else’s, you’re probably the main character.


What I didn’t notice at first was that Jyn is the only woman on that poster.


I went into the movie theater expecting to see men and women fighting side by side. I left feeling certain that I could count every female character from the movie on one hand. While Jyn was the main character, I was profoundly aware that she was often the only woman in any scene.

我走进电影院,希望看到男人和女人并肩作战。 我离开时确定自己可以一方面统计电影中的每个女性角色。 虽然Jyn 主要角色,但我深刻地意识到,她通常是任何场景中的唯一女性。

It felt strangely familiar to have a lead female character be so outnumbered. Then I realized that Jyn and Princess Leia suffered the same inequality 39 years apart. I was overwhelmed with a need to know exactly how female representation in Star Wars movies has changed. But it seemed unfair to compare movies made today with movies made decades ago.

拥有如此多的女主角令我感到奇怪。 然后我意识到Jyn和Leia公主相距39年,经历了同样的不平等。 我不知道要确切地知道《星球大战》电影中女性形象的变化,这让我不知所措。 但是,将今天制作的电影与几十年前制作的电影进行比较似乎是不公平的。

So instead, I decided to look for female equality across the Top 10 Worldwide Highest Grossing Films of 2016. They were:


  • Captain America: Civil War


  • Finding Dory


  • Zootopia


  • The Jungle Book


  • The Secret Life of Pets


  • Batman V. Superman: Dawn of Justice


  • Rogue One: A Star Wars Story


  • Deadpool


  • Fantastic Beasts and Where to Find Them


  • Suicide Squad


With so many powerful women in these films, some of them must be gender-equal, right?


数据 (The Data)

Now that I decided what I wanted to investigate, I needed to figure out how to do it. Similar data exploration projects have focused on dialogue or screen-time equality. Both seemed like good options, but I wanted the ability to report on equality at the movie and character level.

既然我确定了要调查的内容,就需要弄清楚该如何做。 类似的数据探索项目也将重点放在对话或屏幕时间平等上。 两者似乎都是不错的选择,但我希望能够在电影和角色级别上报道平等。

In the end, I decided to explore the movies’ dialogue. This choice gave me the ability to focus on characters with an active role in the story and to cut non-speaking characters from my analysis.

最后,我决定探索电影的对话。 这种选择使我能够专注于故事中活跃角色的角色,并从我的分析中切出不说话的角色。

Luckily for me, dedicated movie fans often transcribe a movie’s dialogue and make it freely available online. If I couldn’t find a transcript, I used closed-caption files instead. For those, I re-watched the movie and manually assigned characters to their spoken lines.

对我来说幸运的是,忠实的电影迷经常抄录电影的对白并免费在线上观看。 如果找不到笔录,请改用隐藏字幕文件。 为此,我重新观看了电影,并手动将角色分配给了他们的口语行。

This process was a labor of love. It was time consuming, but I have no regrets.

这个过程是爱的劳动。 这很耗时,但我不后悔。

分析 (Analysis)

Once I had all of the transcripts, I just needed to read the .txt files into R and separate the characters from their lines. For the Rogue One transcript, that process looked like this:

拥有所有成绩单后,我只需要将.txt文件读入R并将字符与行分开即可。 对于“流氓一号”笔录,该过程如下所示:

Now that I had a data frame with both Character and Words columns, I had to assign genders to each Character. To remain consistent with my categorizations, I came up with a few simple rules:

现在,我有了一个同时包含“字符”和“单词”列的数据框,我必须为每个字符分配性别。 为了与分类保持一致,我提出了一些简单的规则:

  1. When possible, assign gender according to the pronouns that other characters use. For example, if a character is referred to by others as “he” or “him”, then he is categorized as “male”.如果可能,根据其他字符使用的代词分配性别。 例如,如果一个角色被其他人称为“他”或“他”,则他被归类为“男性”。
  2. If there is no pronoun used throughout the movie but the character is named or credited (on IMDB), use the gender of the actor or actress. Note that the gender of an actor or actress was assumed based on publicly available information as of January 2017.

    如果在电影中没有使用代词,但是角色(在IMDB上 )已被命名或记为角色,请使用演员的性别。 请注意,根据截至2017年1月的公开信息,假定了演员的性别。

  3. If no pronoun is used for the character and the character is not named or credited, refer to the closed captions. Sometimes they will identify the character that spoke.如果该字符没有使用代词,并且该字符未命名或使用,则请参考隐藏字幕。 有时他们会识别说话的角色。
  4. If all else fails, make an educated guess based on the character’s voice.如果其他所有方法均失败,请根据角色的声音做出有根据的猜测。

I’ll be the first to say that these methods are not perfect. In fact, here are some caveats:

我将第一个说这些方法并不完美。 实际上,这里有一些警告:

  1. If a male character was voiced by a female actress (or vice versa) and the character was never addressed by other characters using pronouns, he may be incorrectly labelled. (I don’t think this happened, but anything is possible.)如果男性角色由女性女演员发声(反之亦然),而该角色从未被其他角色使用代词讲话,那么他的标签可能不正确。 (我不认为这发生了,但是一切皆有可能。)
  2. Voices that are not associated with a physical embodiment of a character (e.g., the voice of a computer) were categorized according to the gender of their voice actor/actress.与角色的物理实施方式不相关的语音(例如,计算机的语音)是根据其语音演员的性别来分类的。
  3. I can never really know the gender of any character, but I’m using the cues and information that I have at my disposal.


Again, I am far from infallible, so if you caught a mistake on my part, please let me know.

同样,我绝不是万无一失,因此,如果您遇到了我的失误,请告诉我 。

So now I just needed to count the number of words spoken by each character. Again, I was able to do this in R using the dplyr and stringi packages.

所以现在我只需要计算每个字符说出的单词数即可。 同样,我能够使用dplyrstringi包在R中做到这一点。

It’s worth noting that I included every speaking character in this analysis. So yes, every stormtrooper who shouts a simple “Wait, stop!” before getting shot is included.

值得注意的是,我在分析中包括了每个说话的角色。 所以,是的,每位冲锋队大喊一个简单的“等等,停下来!” 包括拍摄之前。

数据可视化 (Data Visualization)

I had my data. Unfortunately, tables upon tables of word counts and character names don’t give anyone much insight. Like any good data exploration project, it was time to visualize my results. I had to work through a few iterations before I found the best one.

我有我的数据。 不幸的是,字数统计表和字符名称表并没有给任何人以太多的见识。 像任何好的数据探索项目一样,是时候可视化我的结果了。 在找到最佳迭代之前,我必须经过几次迭代。

Scatterplots and bar charts both masked characters with small roles.


A simple bubble chart was better but it became difficult to identify individual characters. It was also challenging to understand movie-level statistics.

一个简单的气泡图比较好,但是识别单个字符变得困难。 了解电影级统计数据也具有挑战性。

In the end, I decided to learn enough d3.js to make an interactive graphic. Here, each bubble represents a character, and the bubble’s area is scaled based on the number of words spoken. Female and male bubbles can be separated for better insight. The stacked bars below indicate movie-level information.

最后,我决定学习足够的d3.js来制作交互式图形 。 在这里,每个气泡代表一个字符,气泡的面积根据说出的单词数进行缩放。 可以将雌性和雄性气泡分开以更好地了解情况。 下面堆叠的条表示电影级信息。

Go ahead, check out the full interactive version.

继续,查看完整的交互式版本 。

Interested in exploring the raw word-count data for yourself? I’ve made all of the data and code used to generate these visualizations open source. It’s available here:

有兴趣探索自己的原始字数统计数据吗? 我已经将用于生成这些可视化的所有数据和代码公开了。 在这里可用:

ProQuestionAsker/2016MovieDialogueContribute to 2016MovieDialogue development by creating an account on GitHub.github.com

ProQuestionAsker / 2016MovieDialogue 通过在GitHub上创建一个帐户为2016MovieDialogue开发 做出 贡献。 github.com

外卖 (Takeaways)

Ok, so the analysis is done. I’ve got a fancy (and fun-to-play-with) visualization. What did I find?

好的,分析完成了。 我有一个花哨的(而且很有趣的)可视化效果。 我找到了什么?

I recommend taking a quick second to look at something “a-Dory-ble” before going on, because this post is about to get real depressing real fast.

我建议在继续之前先花点时间看一下“ a-Dory-ble”,因为这篇文章很快就会令人沮丧。

Aw, so cute. Feeling good?

真可爱 感觉好吗?

All right, here we go.


This is a static version of what the visualization for all 10 movies looks like:


(If you’d like to check out the interactive visualization, go here.)

(如果您想查看交互式可视化,请转到此处 。)

There are a couple of things here that I need to point out:


Not one of the top 10 movies of 2016 had a 50% speaking, female cast.


Finding Dory was the closest to this level of equality with 43% female characters. To be equal, the movie would have needed 8 more speaking, female roles.

寻找多莉(Dory)最接近这个平等水平,女性角色占43%。 为了平等起见,这部电影还需要再增加8位女性角色。

Rogue One was the worst. Only 9% of its speaking characters were female. Of those 10 characters, 1 was a computer voice, 1 appeared on screen for no more than 5 seconds, and 1 was a CGI cameo that said 1 word.

流氓一号最糟糕。 它的说话角色中只有9%是女性。 在这10个字符中,有1个是计算机语音,有1个出现在屏幕上的时间不超过5秒,有1个是CGI客串,说了1个字。

Only 1 of 2016’s top 10 movies had 50% dialogue by a female character.


Finding Dory comes out on top here too with 53% female dialogue. But, 76% of that dialogue came from Dory alone.

在女性对话中,找到海莉也位居榜首。 但是,这种对话中有76%仅来自Dory。

Trailing at the end was The Jungle Book with only 10% of its dialogue spoken by a female character. Keep in mind, this is after casting Scarlett Johansson as the voice of the historically-male snake, Kaa.

排在最后的是《丛林书》,其中只有10%的对话是由女性角色讲的。 请记住,这是将斯嘉丽·约翰逊(Scarlett Johansson)选作历史上雄性蛇Kaa的声音之后。

Here’s a few more:


  • Finding Dory and Zootopia were the only 2 movies in 2016’s top 10 in which a female character had the most dialogue.在2016年的前10名电影中,《寻找海莉》和《动物世界》是仅有的两部女性角色对话最多的电影。
  • Female characters were outnumbered in Captain America: Civil War’s final battle 5:1. Throughout the movie, they only contributed 16% of the dialogue.在《美国队长:内战》的最后一场战斗中,女性角色的数量超过了5:1。 在整部电影中,他们只贡献了16%的对话。
  • Batman spoke 2.4 times more than Superman and 6 times more than Wonder Woman in Batman V. Superman.蝙蝠侠在蝙蝠侠V.超人中的说话能力是超人的2.4倍,是《神力女超人》的6倍。
  • 78% of the female-spoken lines in Rogue One came from Jyn Erso.Rogue One中78%的女性口语语系来自Jyn Erso。
  • While Harley Quinn was a highly advertised character in Suicide Squad, she only spoke 42% as many words as Floyd/Deadshot (played by Will Smith). Notably, Amanda Waller (played by Viola Davis) spoke frequently, totaling just 222 words (16%) short of Deadshot’s word count.虽然哈雷·奎因(Harley Quinn)是《自杀小队》(Supericide Squad)中一个备受推崇的角色,但她说的话只占弗洛伊德(Floyd / Deadshot)(威尔·史密斯(Will Smith)饰演)的42%。 值得注意的是,阿曼达·沃勒(Viola Davis饰演)经常讲话,仅比Deadshot少222个单词(16%)。

I started this project because I had a feeling that Rogue One’s cast and dialogue were not equally divided between male and female characters. I was shocked (and saddened) to find that almost none of the top 10 movies from last year were gender equal.

我之所以开始这个项目,是因为我觉得Rogue One的演员和对话在男女角色之间并不均等。 令我震惊(感到难过)的是,去年的前十部电影中几乎没有两性平等。

We can do better.


Added: If you’re looking for more studies and data explorations like this, check out:

补充 :如果您正在寻找更多类似的研究和数据探索,请查看:

  • Inequality in 800 popular films from 2007–2015 (includes gender, race/ethnicity, sexual orientation, and disability)

    2007年至2015年间800部受欢迎的电影中的不平等现象 (包括性别,种族/民族,性​​取向和残疾)

  • This exploration of 2000 randomly selected movie scripts from 1980’s — 2010's


  • This research on 200 biggest movies from 2014 & 2015


  • Female representations in 2014’s biggest movies


  • This Twitter thread about gender equality in 2016’s animated films


TL;DR Version: Women represent (on average) 30–35% of speaking roles across each of these investigations.

TL; DR版本:在每个调查中,女性平均占说话角色的30–35%。

Added: Have questions or comments about my methodology or conclusions? Check out my follow-up article featuring the most frequently asked questions.

补充 :对我的方法论或结论有疑问或意见吗? 查看我的后续文章,其中包含最常见的问题。

I analyzed the dialogue in 2016’s biggest movies and it started a lot of conversations.A few weeks ago I published a story about my analysis of the dialogue in 2016’s 10 Highest Grossing Films. I am so…medium.com

我分析了2016年最大电影中的对话,并开始了很多对话。 几周前,我发表了一个关于我对2016年10部最卖座电影中对话的分析的故事。 我是如此… medium.com

If you liked this article and want to see more like it, please click the green heart below and share away on your social media network of choice.


I am currently spending my time working on personal projects and data visualizations like this while I look for a data science job. So, if you have a fun project idea (or a job inquiry) you’d like to discuss with me, please reach out to me on Twitter or by email.

我目前正在寻找数据科学工作时,将时间花在诸如此类的个人项目和数据可视化上。 因此,如果您想与我讨论有趣的项目构想(或工作要求),请通过Twitter或通过电子邮件与我联系。

Thank you!


翻译自: https://www.freecodecamp.org/news/women-only-said-27-of-the-words-in-2016s-biggest-movies-955cb480c3c4/



  1. 用python做元旦贺卡_用AI帮你画新年贺卡:只需输入几个单词,就能模仿大师名作...

    原标题:用AI帮你画新年贺卡:只需输入几个单词,就能模仿大师名作 晓查 发自 凹非寺 量子位 出品 | 公众号 QbitAI 如果你的手法拙劣,没有任何艺术细菌,自己作画完全无法见人.但是你想给妹子送 ...

  2. 2016年世界编程大赛_在2016年学习的最佳编程语言是什么?

    2016年世界编程大赛 Craig's Best Programming Language to Learn in 2015 article was a huge hit, and in this a ...

  3. 无符号数的算术四则运算中的各类单词的识别_文本反垃圾在花椒直播中的应用概述...

    奇技指南 本文主要以文本为对象,简要地介绍花椒平台在文本反垃圾方面所采用的文本垃圾拦截技术 本文转载自花椒技术公众号 背景 随着花椒用户和主播用户的数量不断增加,一些非法用户(垃圾虫)利用花椒平台数据 ...

  4. 全国计算机一级office2016版,全国一级计算机基础及MS-Office应用课件2016版.ppt

    全国一级计算机基础及MS-Office应用课件2016版 5.3 插入图形与图像 为了在演示过程中对内容做更加清晰明确的介绍,用户可以通过插入图形或图片的形式,通过图文并茂的方式让观看者对演示内容进行 ...

  5. 通达oa精灵的下载步骤_通达OA精灵2016版官方下载_通达OA精灵2016版电脑版_通达OA精灵2016版20161212-华军软件园...

    通达OA精灵2016是通达OA网络智能办公系统的移动版软件,通达OA是国内领先的办公软件产品,已拥有上万家企事业单位用户.移动版提供了邮件.公告通知.日程安排.通讯薄.工作流.微讯等功能模块.其中,微 ...

  6. 三星galaxy a9android,【三星2016版GALAXYA9评测】最新版智能管理器_三星 2016版GALAXY A9_手机评测-中关村在线...

    系统上三星Galaxy A9(A9000)搭载基于Android 5.1.1定制的最新TouchWiz,风格上也非常契合年轻消费群体,UI及系统应用都偏向淡雅配色和轻盈的扁平设计. 而作为一款中高端定 ...

  7. JAVA程序员一定知道的优秀第三方库(2016版)

    几乎每个程序员都知道要"避免重复发明轮子"的道理--尽可能使用那些优秀的第三方框架或库,但当真正进入开发时,我却经常发现他们有时并不知道那些轮子在哪里.最近,我在业余时间带几个年轻 ...

  8. 2016 版 Laravel 系列入门教程(一)

    https://www.golaravel.com/post/2016-ban-laravel-xi-lie-ru-men-jiao-cheng-yi/ 2016 版 Laravel 系列入门教程(一 ...

  9. 数据科学教程:R语言与DataFrame[2016版]

    数据科学教程:R语言与DataFrame[2016版] r HarryZhu 2016年03月16日发布 保存 标签:至少1个,最多5个 r× 开发语言 平台框架 服务器 数据库和缓存 开发工具 系统 ...


  1. java array arraylist_java 基础 array arraylist..越详细越好。
  2. WAIC 2021 | 百度量子计算段润尧:从理论到实践谈量子人工智能
  3. 手工清理C:\windows\alg.exe病毒
  4. symfony小练习-表白墙
  5. mysql从某表中查询数据插入到另一表的处理
  6. Gentoo 安装日记 10 (配置内核 :General setup)
  7. linux 操作系统安装配置vnc
  8. PHP的break与continue
  9. 第二章 选择符和属性
  10. ARM Linux 如何--注册和触发--软中断
  11. vi/vim 编辑器详解
  12. 计算机启动软件,计算机软件及应用启动会-20210703001237.pptx-原创力文档
  13. win10搭建hadoop环境
  14. 服务器登陆地址怎么修改,服务器登陆地址怎么修改
  15. Atitit 项目常见问题 总结 prj prblm sumup 目录 第一章 提升可读性 复杂度简化 2 第二章 结构扁平化 2 第一节 缩短com.xxx.xxx名称 2 第二节 mod转
  16. ROS学习—【在solidworks环境中将六自由度机械臂转换为URDF模型】
  17. drools-基本使用
  18. android 手写字体识别,Android手写识别 (Tesseract-OCR的使用)
  19. python是跨平台语言吗_python可以跨平台么
  20. 方差分析(ANOVA)分类、应用举例及matlab代码


  1. 领扣(LeetCode)对称二叉树 个人题解
  2. [TimLinux] JavaScript 元素动态显示
  3. 学习vue.js的自我梳理笔记
  4. BZOJ 1878: [SDOI2009]HH的项链
  5. 亲历腾讯WEB前端开发三轮面试经历及面试题
  6. Atitit. 提升软件开发效率and 开发质量---java 实现dsl 4gl 的本质and 精髓 O725
  7. 解决EF 4.0 中数据缓存机制
  8. 使用WEUI uploader上传图片
  9. Unity3d鼠标点击屏幕来控制人物的走动
  10. PE文件感染和内存驻留