用户代理字符串

A very long time ago (read: ten years ago), we were in-between the so-called First and Second Browser Wars. Internet Explorer had killed Netscape Navigator by taking advantage of their desktop monopoly and Scrooge McDuck-like financial reserves to install a free copy of Internet Explorer on every single computer in the world (basically). Internet Explorer 6 was the dominant browser, and Netscape as a company was over.

很久以前(读:十年前),我们处于所谓的“第一次和第二次浏览器大战”之间。 Internet Explorer通过利用其台式机垄断和类似Scrooge McDuck的财务准备金杀死了Netscape Navigator,从而在世界上的每台计算机上(基本上)安装了Internet Explorer的免费副本。 Internet Explorer 6是主要的浏览器,而Netscape公司已经结束。

Netscape, before their demise, had embarked on a project to totally rewrite their web browser. Their new code was open-sourced and given to the Mozilla foundation. In hindsight, this was a stunningly successful move, with the ever-awesome Mozilla Foundation going from strength to strength now, nearly ten years after its foundation.

Netscape灭亡之前,已经着手进行一个项目以完全重写其Web浏览器。 他们的新代码是开源的,并提供给Mozilla基金会。 事后看来,这是一次令人惊讶的成功举动,令人难以置信的Mozilla基金会在成立近十年后,如今正不断壮大。

The second browser war was initially a festering cold war between the reborn Netscape Navigator (now entitled Mozilla Firefox) and the dormant Internet Explorer 6 (eventually updated to IE 7 after a 6 year development freeze). Later, other parties like Google Chrome joined the party. Oh, and Safari and Opera were kinda floating around in this war too, but honestly they’re not that important to the story I’m trying to tell.

第二次浏览器之战最初是重生的Netscape Navigator(现称Mozilla Firefox)和Hibernate的Internet Explorer 6(在长达6年的开发冻结后最终更新为IE 7)之间的一场激烈的冷战。 后来,像Google Chrome这样的其他聚会也加入了聚会。 哦,在这场战争中,Safari和Opera也在其中徘徊,但说实话,它们对我要讲的故事并不那么重要。

Anyway, long story still kinda long, as part of these two browser wars, browsers felt the need to compete with each other on features. However, to use these features you needed to get web developers to build web sites that used them. The problem is that your new feature would only work on your browser. This meant that, when some poor soul came along trying to view your super-awesome ActiveX powered web page, and they had the misfortune to be using Netscape Navigator, your website would at best look awful, and at worst explode in several mysterious ways.

无论如何,长话短说还算长,作为这两次浏览器大战的一部分,浏览器感到有必要在功能上相互竞争。 但是,要使用这些功能,您需要使Web开发人员构建使用它们的网站。 问题在于您的新功能只能在您的浏览器上使用。 这意味着,当一些可怜的人试图查看您的超赞ActiveX驱动的网页,而他们不幸使用Netscape Navigator时,您的网站最好看起来糟透了,最糟糕的是会以几种神秘的方式爆炸。

These people would then go away and tell their friends about your crappy website that wouldn’t even render properly! And they’d say that their friends should use your competitor’s website, even though your competitor can’t even spell ActiveX! And you’d go out of business and your children would have to go to a state school, and it would just be horrible.

然后这些人会走开,并告诉他们的朋友您的糟糕网站甚至无法正确呈现! 他们说他们的朋友应该使用您竞争对手的网站,即使您的竞争对手甚至无法拼写ActiveX! 而且您将倒闭,您的孩子将不得不上公立学校,这简直太可怕了。

So you needed some way to tell what features a browser had. There was a way to do that, of course: Javascript. Unfortunately, some features couldn’t be easily detected in Javascript, and writing Javascript was, well, weird, and Javascript was slow, and so lots of websites didn’t want to do that (or didn’t know they should). What would they do instead?

因此,您需要某种方式来告诉浏览器具有哪些功能。 当然,有一种方法可以实现:Javascript。 不幸的是,某些功能无法用Javascript轻易检测到,并且编写Javascript很奇怪而且Javascript速度很慢,因此很多网站都不想这样做(或者不知道应该这样做)。 他们会怎么做?

Well, RFC 1945 and RFC 2616 (the HTTP 1.0 and HTTP 1.1 specifications) stated that all browsers, web crawlers and other tools that interacted with web servers should identify themselves using a special header in the HTTP they send: the User-Agent header. This header should be (as much as possible) unique to a specific type of agent. This means that Internet Explorer should send a User-Agent header that is different to all other browsers and to all other versions of IE.

好吧,RFC 1945和RFC 2616(HTTP 1.0和HTTP 1.1规范)规定,所有与Web服务器交互的浏览器,Web爬网程序和其他工具都应使用其发送的HTTP中的特殊标头来标识自己: User-Agent标头。 此标头应(尽可能)对于特定类型的代理是唯一的。 这意味着Internet Explorer应该发送与所有其他浏览器和所有其他版本的IE不同的User-Agent标头。

“Perfect!” cry the web developers. “Our servers can check for this string,. And so begins the the trouble.

“完善!” 让网络开发人员大哭。 “我们的服务器可以检查此字符串。 这样就开始了麻烦。

麻烦 (The Trouble)

You see, the problem with using the User-Agent string to check for features is that the User-Agent string tells you nothing about what features a given User-Agent has. After all, that’s not what it’s for! So you, naïve late-1990s web programmer, might write your site when only Mozilla Firefox has support for the hot new Twiddlor feature (note: not a real feature). So you only server Twiddlor-enabled pages to people whose User-Agent strings identify them as being a version of Firefox.

您会看到,使用User-Agent字符串检查功能的问题在于,User-Agent字符串无法告诉您给定User-Agent具有的功能。 毕竟,这不是它的目的! 因此,您(1990年末才真正的Web程序员)可能只在Mozilla Firefox支持新的热门Twiddlor功能(注意:不是真正的功能)时编写您的网站。 因此,您仅将启用了Twiddlor的页面服务器提供给其User-Agent字符串将其标识为Firefox版本的用户。

The problem is, six months later the guys in Redmond get around to adding Twiddlor support to Internet Explorer. But all their users are still complaining that none of their favourite websites will let them use Twiddlor, instead claiming that the website is “Best used in Mozilla Firefox” or some such nonsense.

问题是,六个月后,雷德蒙德的家伙开始为Internet Explorer添加Twiddlor支持。 但是所有用户仍然抱怨他们最喜欢的网站都不会让他们使用Twiddlor,而是声称该网站是“ Mozilla Firefox中最佳使用”网站或类似的废话。

How does Microsoft get you to show them the Twiddlor-enabled page? Simple: they change their User-Agent string! Sadly, I’m not even joking: this is actually what happened. To prove it, I’m going to show you a few modern browser UA strings.

Microsoft如何让您向他们显示启用Twiddlor的页面? 很简单:他们更改了用户代理字符串! 可悲的是,我什至没有在开玩笑:这实际上是发生了什么。 为了证明这一点,我将向您展示一些现代的浏览器UA字符串。

Here’s the UA string sent by Google Chrome version 27.0.1453.47 beta (yeah), running on my Mac:

这是在我的Mac上运行的Google Chrome版本27.0.1453.47 beta(是)发送的UA字符串:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.47 Safari/537.36
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.47 Safari/537.36
 

“What is all that crap?”, I hear you ask, quite rightly. Why does it say it’s Mozilla? It’s not Mozilla! You’re quite right. But enough people have tested for Firefox by just checking that the word ‘Mozilla’ is in the UA string that everyone puts it there. And I mean everyone. Check out Safari, also on my Mac:

“那是什么废话?”,我很正确地听到你问。 为什么说是Mozilla? 不是Mozilla! 你说得很对。 但是,已经有足够多的人通过仅检查UA字符串中是否包含每个人都在其中的“ Mozilla”一词来测试Firefox。 我是指每个人 。 也在我的Mac上查看Safari:

Notice that both Safari and Chrome claim to be versions of Safari. That’s pretty damn weird.

请注意,Safari和Chrome都声称是Safari的版本。 真是不可思议。

What about Internet Explorer 10, on my Windows machine?

我的Windows机器上的Internet Explorer 10怎么样?

Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)
Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)
 

At least it’s not claiming to be Safari! In fact, this is the best UA string I’ve seen, being a fairly honest representation of the browser.

至少它不是自称为Safari! 实际上,这是我见过的最好的UA字符串,是浏览器的一个非常诚实的表示。

Finally, let’s check Firefox, also on my Windows box.

最后,让我们在Windows框中选中Firefox。

用户代理字符串应该是什么样? (What Should A User-Agent String Look Like?)

To see an example of how these were supposed to look when the standard was originally proposed, we can see what Requests sends.

要查看有关最初提出该标准时这些外观的示例,我们可以看到请求发送的内容。

python-requests/1.2.0 CPython/2.7.2 Darwin/12.2.0
python-requests/1.2.0 CPython/2.7.2 Darwin/12.2.0
 

Short and to the point. The ‘browser’ and its version, the ‘platform’ and its version, and the OS (sort of) and its version.

简明扼要。 “浏览器”及其版本,“平台”及其版本,以及OS(某种)及其版本。

为什么如此重要? (Why Does This Matter?)

In principle, the new Javascript-heavy world should have cured us of this problem. People should write JS that tests for features and then uses them, and serves a less interesting version of the web page if you don’t support it. And, mostly, this is what happens! Libraries like JQuery have taken a lot of the hard work out of doing this, so most websites you’ll encounter nowadays do the right thing.

原则上,新的Java繁重的世界应该已经解决了这个问题。 人们应该编写用于测试功能的JS,然后再使用它们,并且如果您不支持该功能,则可以提供不太有趣的网页版本。 而且,大多数情况下,这就是发生的事情! 像JQuery这样的库已经为完成此工作付出了很多辛苦的工作,因此,如今您会遇到的大多数网站都做对了。

The problem is, sometimes they don’t. And when they don’t, you can encounter strange and confusing bugs. These bugs then tie up developer time and generally make everyone’s life worse. To provide an example, I’m going to briefly walk you through a bug that appeared on the Requests GitHub page a few days ago.

问题是,有时他们没有。 如果没有,您可能会遇到奇怪而令人困惑的错误。 这些错误会占用开发人员的时间,并且通常会使每个人的生活变得更糟。 为了提供示例,我将简要介绍几天前出现在Requests GitHub页面上的错误。

一个例子 (An Example)

A user reported that, when he accessed a specific web page by doing a simple GET with no complicated stuff, he was getting a httplib.IncompleteRead exception thrown into his face.

一个用户报告说,当他通过执行简单的GET而没有复杂的内容访问特定的网页时,他的脸上出现了httplib.IncompleteRead异常。

This was odd in itself. This exception is only ever thrown when either the user or the remote server is using chunked encoding, but the user reported that he didn’t think either party was doing so. He also kindly provided the URL, so that I could reproduce the bug locally. (This is excellent practice, by the way: I’m far more likely to help out if I can easily reproduce your bug on my machine.)

这本身很奇怪。 仅当用户或远程服务器使用分块编码时才会抛出此异常,但是用户报告他认为任何一方都没有这样做。 他还提供了URL,以便我可以在本地重现该错误。 (顺便说一句,这是一种很好的做法:如果我可以轻松地在计算机上重现您的错误,我很有可能会提供帮助。)

When I made the same request, I also got the IncompleteRead exception thrown in my face. Further investigation showed that the web server claimed to be serving using chunked encoding, but in fact was just sending the page as normal. This is pretty bad, and there’s not much Requests can do about this: the web server is simply doing the wrong thing. First note for website developers: do NOT claim to be using chunked encoding when you are not!

当我发出相同的请求时,我的脸上也抛出了IncompleteRead异常。 进一步的调查表明,Web服务器声称使用分块编码进行服务,但实际上只是正常发送页面。 这是非常糟糕的,请求对此无能为力:Web服务器只是在做错事。 网站开发人员的首要注意事项:请勿在未使用时声明使用分块编码!

I was interested to see if we could get the page data anyway, so I patched my local copy of the standard library to see what we got when I returned the data instead of throwing an exception. What I saw was the second unpleasant thing this web site had done. The HTML for this page was about 20 lines long. All it did was embed, at full size, a frame containing another page, or a warning if your browser doesn’t support frames.

我很想看看是否仍然可以获得页面数据,所以我修补了标准库的本地副本,以查看返回数据时得到的结果,而不是引发异常。 我看到的是该网站造成的第二个令人不快的事情。 该页面HTML大约有20行。 它所做的全部是全尺寸嵌入包含另一个页面的框架 ,或者如果您的浏览器不支持框架则发出警告。

This is pretty obnoxious: why not just server the other page? Why require frames? You aren’t even doing anything with them, you’re just using them for the sake of using them! Second note for website developers: do not use frames when you don’t need them! They are awkward for anything that isn’t a browser.

这很令人讨厌:为什么不只服务器其他页面? 为什么需要镜架? 您甚至没有对它们做任何事情,只是为了使用它们而使用它们! 网站开发人员的第二个注意事项:不需要框架时不要使用框架! 他们对于不是浏览器的任何东西都很尴尬。

In an attempt to be helpful, I pulled the URL being framed out of the HTML and suggested the user hit that instead. Out of sheer curiosity, I then did a Requests GET on the URL.

为了提供帮助,我从HTML中拉出了被框架化的URL,并建议用户点击该URL。 出于好奇,我随后在URL上执行了Requests GET。

Requests threw an exception again.

请求再次引发异常。

I was pretty surprised here, the page rendered fine in my browser. So I looked at the exception. Connection Reset By Peer, read the socket error text. For those who don’t know their network protocols, this indicates that the TCP connection to the web server was closed while we were expecting data on it.

我在这里感到很惊讶,页面在浏览器中呈现良好。 因此,我查看了异常。 Connection Reset By Peer ,读取套接字错误文本。 对于那些不了解其网络协议的用户,这表明在我们期望其上有数据时,与Web服务器的TCP连接已关闭。

This is very odd. Requests sent a totally compliant, basic HTTP GET request, and the remote server was shutting the connection in response to it. Doing this is totally against the HTTP specification. Any compliant server is required to respond with an HTTP error code and a Connection: close header if it wants to tear the connection down. Additionally, why did it work fine in Chrome but fail in Requests?

这很奇怪。 请求发送了完全合规的基本HTTP GET请求,并且远程服务器正在响应该请求而关闭连接。 这样做完全违反了HTTP规范。 如果任何兼容的服务器想要断开连接,则需要使用HTTP错误代码和Connection: close标头进行响应。 此外,为什么它在Chrome中工作正常,但在请求中失败?

There’s really only one obvious thing to do. I grabbed Chrome’s User-Agent string and got Requests to send that instead of its own UA string. (For those who want to spoof their UA string, Requests allows you to pass it as a header. We only set one ourselves if you don’t provide one for us.)

确实只有一件显而易见的事情要做。 我抓取了Chrome的User-Agent字符串,并收到了发送该请求而不是其自己的UA字符串的请求。 (对于那些想要欺骗其UA字符串的用户,Requests允许您将其作为标头传递。如果您不为我们提供一个,我们只会设置一个。)

Success! The page rendered and returned to us.

成功! 页面呈现并返回给我们。

For those who want a summary, what was happening here is that the remote site was sniffing the User-Agent header. Instead of checking for features, however, what it was doing was using the header as a gatekeeper! If you don’t have the right User-Agent, you don’t just get a less feature-filled site: you get nothing. Not even an HTTP error page.

对于那些想要摘要的人,这里发生的是远程站点正在嗅探User-Agent标头。 但是,它没有检查功能,而是在使用标头作为网守! 如果您没有合适的User-Agent,您不仅会获得功能较少的网站:您一无所获。 甚至没有HTTP错误页面。

This is probably the worst example of User-Agent sniffing I’ve ever seen. This was a website developer using a bad practice to violate the HTTP specification. In addition to simply being rude, this is also a genuine cost for many developers. And crap like this leads to stupid UA strings like the ones I showed above.

这可能是我见过的最严重的User-Agent监听示例。 这是一位网站开发人员,使用不良做法违反了HTTP规范。 除了简单起见,这对于许多开发人员来说也是一笔真正的代价。 像这样的胡扯会导致愚蠢的UA字符串,就像我上面显示的那样。

This is also the third note for website developers: always send HTTP error codes, don’t just close connections.

这也是网站开发人员第三个注意事项:始终发送HTTP错误代码,而不仅仅是关闭连接

这个故事的主旨 (The Moral Of The Story)

The most important lesson, however, is this.

然而,最重要的一课是这个。

Ignore the User-Agent string unless you absolutely have to.

除非绝对必要,否则请忽略User-Agent字符串。

Detecting browser features is not what the User Agent string is for, so please don’t use it for that. And if you do (which I’m sure you will, because no-one listens to me anyway), make sure that you don’t refuse service based on the User-Agent. If you want to render a slightly different page, fine, I get that. But don’t refuse to render it at all. It’s obnoxious, it’s brittle, and it’s so 1990s. And besides, as I showed above, all modern User-Agents can lie in their User-Agent string! You can set it in Firefox, and in Chrome, and (probably) in IE, Safari and Opera as well. So not only are you mis-using it, you’re not even getting accurate information!

检测浏览器功能不是用户代理字符串的用途,因此请勿将其用于此目的。 而且如果您这样做了(我相信您会这样做,因为无论如何也没人听我说),请确保您不拒绝基于User-Agent的服务。 如果您要呈现稍有不同的页面,可以了。 但是,请不要拒绝渲染它。 它令人讨厌,它很脆,并且是1990年代。 此外,正如我在上面显示的那样,所有现代User-Agent都可以位于其User-Agent字符串中! 您可以在Firefox,Chrome和(可能)IE,Safari和Opera中进行设置。 因此,您不仅会滥用它,甚至无法获得准确的信息!

翻译自: https://www.pybloggers.com/2013/04/user-agent-strings-or-dont-make-me-come-after-you/

用户代理字符串

用户代理字符串_用户代理字符串(或者,不要让我追随您)相关推荐

  1. java 字符数组与字符串_用于字符串和数组的5种简单有效的Java技术

    java 字符数组与字符串 Java通常会排在前五种最流行的编程语言中,因此可以理解. 它是一种通用语言,非常容易学习,因为它是一种高级语言,并且可以满足许多用例. 由于这些原因,Java是一门很棒的 ...

  2. access mysql连接字符串_[数据库连接字符串] Access 连接字符串

    [数据库连接字符串] Access 连接字符串 //ODBC 标准安全策略 Driver={Microsoft Access Driver (*.mdb)};Dbq=C:\mydatabase.mdb ...

  3. 字符串操作截取后面的字符串_对字符串的5个必知的熊猫操作

    字符串操作截取后面的字符串 We have to represent every bit of data in numerical values to be processed and analyze ...

  4. python construct 字符串_通过字符串变量在Python中设置和获取@property方法

    目前我有一个通用函数,你可以传入一个属性名和一个类(它也适用于特定的对象实例,但我正在使用类),该函数将通过调用查找并操作该属性 getattr(model_class, model_attribut ...

  5. python找最长的字符串_在字符串python中查找最长的唯一子字符串

    我正在尝试一个古老的问题(有很多版本),即寻找一个不包含重复字符的字符串的最长子字符串.我不明白为什么我的尝试没有成功:def findLongest(inputStr): resultSet = [ ...

  6. 767 重构字符串_重构字符串型系统

    767 重构字符串 去年,我加入了一个项目,该项目从另一个软件公司接手,但未能满足客户需求. 如您所知,在"继承"的项目及其代码库中,有许多事情可以并且应该加以改进. 可悲的是(但 ...

  7. [转载] 字符串操作截取后面的字符串_对字符串的5个必知的熊猫操作

    参考链接: 修剪Java中的字符串(删除前导和尾随空格) 字符串操作截取后面的字符串 We have to represent every bit of data in numerical value ...

  8. vba mysql 非法字符串_非法字符串处理.sql

    if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[f_replace]') and xtype in (N'F ...

  9. java list 去空字符串_从字符串列表中删除空字符串

    我想从python中的字符串列表中删除所有空字符串. 我的想法如下: while '' in str_list: str_list.remove('') 有没有更多的Python方式可以做到这一点? ...

最新文章

  1. php 第三方DB库NOTORM
  2. CodeForces - 1453E Dog Snacks(树形dp+贪心)
  3. 上传 mp4 格式判断_视频如何转换成通用的MP4格式?按下这个键,10秒就能搞定...
  4. 什么是HOOK功能?
  5. Java-环境搭建(Mac版)
  6. code blocks代码性能分析_Julia系列教程13--如果写出高性能的Julia代码
  7. nextcloud+nginx+mysql_nextcloud网盘搭建:Ubuntu18.04+Nginx+Mysql
  8. 51nod 1101 换零钱 【完全背包变形/无限件可取】
  9. Ubuntu18.04之有道词典安装
  10. 英雄启动出错解决方法_超纯水设备高压泵不启动解决方法
  11. python编写递归函数和非递归函数、输出斐波那契数列_C语言编程:用递归和非递归法输出斐波那契数列...
  12. k近邻法的实现:kd树
  13. 2020-10-20 Java基础_定义和语法
  14. linux dm9000驱动分析,ARM-Linux驱动--DM9000网卡驱动分析(二)
  15. iOS调用系统相机、相册里面的文字显示英文
  16. 解决Redis manger 连接不上linux redis的问题
  17. 计算机数学基础知识点归纳,计算机数学基础--详细介绍
  18. 用 .pth 文件附加 Python 模块搜索路径
  19. php for android
  20. 【自然语言处理】【文本生成】Transformers中使用约束Beam Search指导文本生成

热门文章

  1. 微软 FoxPro 15年回忆录 之:细数微软 Visual FoxPro 的战略
  2. 如何使Winamp看起来像iTunes
  3. VBS弹出选择打印机对话框
  4. V4L2文档翻译(十二)
  5. P2754 [CTSC1999] 家园
  6. 大学计算机专业类的活动,大学计算机技能培养的活动方案
  7. 2011夏,桂林阳朔龙脊详细攻略 游记
  8. 泛癌分析·找出各个癌症的预后相关基因
  9. dreamweaver郑州旅游网页设计制作 简单静态HTML网页作品 我的家乡网页作业成品 学生旅游网站模板
  10. simulink示波器数据导入到matlab并画图