What is RSS?

by Mark Pilgrim
December 18, 2002

marginwidth="0" marginheight="0" src="http://ad.doubleclick.net/adi/xml.ds/art;pos=_art;sz=336x280;ord=1915013365?" frameborder="0" width="336" scrolling="no" height="280">

RSS is a format for syndicating news and the content of news-like sites, including major news sites like Wired, news-oriented community sites like Slashdot, and personal weblogs. But it's not just for news. Pretty much anything that can be broken down into discrete items can be syndicated via RSS: the "recent changes" page of a wiki, a changelog of CVS checkins, even the revision history of a book. Once information about each item is in RSS format, an RSS-aware program can check the feed for changes and react to the changes in an appropriate way.

RSS-aware programs called news aggregators are popular in the weblogging community. Many weblogs make content available in RSS. A news aggregator can help you keep up with all your favorite weblogs by checking their RSS feeds and displaying new items from each of them.

A brief history

Related Reading

Content Syndication with RSS
By Ben Hammersley

Table of Contents
Index
Sample Chapter

Read Online--Safari Search this book on Safari:  
Only This BookAll of Safari
Code Fragments only

But coders beware. The name "RSS" is an umbrella term for a format that spans several different versions of at least two different (but parallel) formats. The original RSS, version 0.90, was designed by Netscape as a format for building portals of headlines to mainstream news sites. It was deemed overly complex for its goals; a simpler version, 0.91, was proposed and subsequently dropped when Netscape lost interest in the portal-making business. But 0.91 was picked up by another vendor, UserLand Software, which intended to use it as the basis of its weblogging products and other web-based writing software.

In the meantime, a third, non-commercial group split off and designed a new format based on what they perceived as the original guiding principles of RSS 0.90 (before it got simplified into 0.91). This format, which is based on RDF, is called RSS 1.0. But UserLand was not involved in designing this new format, and, as an advocate of simplifying 0.90, it was not happy when RSS 1.0 was announced. Instead of accepting RSS 1.0, UserLand continued to evolve the 0.9x branch, through versions 0.92, 0.93, 0.94, and finally 2.0.

What a mess.

So which one do I use?

That's 7 -- count 'em, 7! -- different formats, all called "RSS". As a coder of RSS-aware programs, you'll need to be liberal enough to handle all the variations. But as a content producer who wants to make your content available via syndication, which format should you choose?

RSS versions and recommendations
Version Owner Pros Status Recommendation
0.90 Netscape   Obsoleted by 1.0 Don't use
0.91 UserLand Drop dead simple Officially obsoleted by 2.0, but still quite popular Use for basic syndication. Easy migration path to 2.0 if you need more flexibility
0.92, 0.93, 0.94 UserLand Allows richer metadata than 0.91 Obsoleted by 2.0 Use 2.0 instead
1.0 RSS-DEV Working Group RDF-based, extensibility via modules, not controlled by a single vendor Stable core, active module development Use for RDF-based applications or if you need advanced RDF-specific modules
2.0 UserLand Extensibility via modules, easy migration path from 0.9x branch Stable core, active module development Use for general-purpose, metadata-rich syndication

What does RSS look like?

Imagine you want to write a program that reads RSS feeds, so that you can publish headlines on your site, build your own portal or homegrown news aggregator, or whatever. What does an RSS feed look like? That depends on which version of RSS you're talking about. Here's a sample RSS 0.91 feed (adapted from XML.com's RSS feed):

<rss version="0.91">
  <channel>
    <title>XML.com</title>
    <link>http://www.xml.com/</link>
    <description>XML.com features a rich mix of information and services for the XML community.</description>
    <language>en-us</language>
    <item>
      <title>Normalizing XML, Part 2</title>
      <link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>
      <description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>
    </item>
    <item>
      <title>The .NET Schema Object Model</title>
      <link>http://www.xml.com/pub/a/2002/12/04/som.html</link>
      <description>Priya Lakshminarayanan describes in detail the use of the .NET Schema Object Model for programmatic manipulation of W3C XML Schemas.</description>
    </item>
    <item>
      <title>SVG's Past and Promising Future</title>
      <link>http://www.xml.com/pub/a/2002/12/04/svg.html</link>
      <description>In this month's SVG column, Antoine Quint looks back at SVG's journey through 2002 and looks forward to 2003.</description>
    </item>
  </channel>
</rss>

Simple, right? A feed comprises a channel, which has a title, link, description, and (optional) language, followed by a series of items, each of which have a title, link, and description.

Now look at the RSS 1.0 version of the same information:

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns="http://purl.org/rss/1.0/"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
>
  <channel rdf:about="http://www.xml.com/cs/xml/query/q/19">
    <title>XML.com</title>
    <link>http://www.xml.com/</link>
    <description>XML.com features a rich mix of information and services for the XML community.</description>
    <language>en-us</language>
    <items>
      <rdf:Seq>
        <rdf:li rdf:resource="http://www.xml.com/pub/a/2002/12/04/normalizing.html"/>
        <rdf:li rdf:resource="http://www.xml.com/pub/a/2002/12/04/som.html"/>
        <rdf:li rdf:resource="http://www.xml.com/pub/a/2002/12/04/svg.html"/>
      </rdf:Seq>
    </items>
  </channel>
  <item rdf:about="http://www.xml.com/pub/a/2002/12/04/normalizing.html">
    <title>Normalizing XML, Part 2</title>
    <link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>
    <description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>
    <dc:creator>Will Provost</dc:creator>
    <dc:date>2002-12-04</dc:date>    
  </item>
  <item rdf:about="http://www.xml.com/pub/a/2002/12/04/som.html">
    <title>The .NET Schema Object Model</title>
    <link>http://www.xml.com/pub/a/2002/12/04/som.html</link>
    <description>Priya Lakshminarayanan describes in detail the use of the .NET Schema Object Model for programmatic manipulation of W3C XML Schemas.</description>
    <dc:creator>Priya Lakshminarayanan</dc:creator>
    <dc:date>2002-12-04</dc:date>    
  </item>
  <item rdf:about="http://www.xml.com/pub/a/2002/12/04/svg.html">
    <title>SVG's Past and Promising Future</title>
    <link>http://www.xml.com/pub/a/2002/12/04/svg.html</link>
    <description>In this month's SVG column, Antoine Quint looks back at SVG's journey through 2002 and looks forward to 2003.</description>
    <dc:creator>Antoine Quint</dc:creator>
    <dc:date>2002-12-04</dc:date>    
  </item>
</rdf:RDF>

Quite a bit more verbose. People familiar with RDF will recognize this as an XML serialization of an RDF document; the rest of the world will at least recognize that we're syndicating essentially the same information. In fact, we're including a bit more information: item-level authors and publishing dates, which RSS 0.91 does not support.

Despite being RDF/XML, RSS 1.0 is structurally similar to previous versions of RSS -- similar enough that we can simply treat it as XML and write a single function to extract information out of either an RSS 0.91 or RSS 1.0 feed. However, there are some significant differences that our code will need to be aware of:

  1. The root element is rdf:RDF instead of rss. We'll either need to handle both explicitly or just ignore the name of the root element altogether and blindly look for useful information inside it.

  2. RSS 1.0 uses namespaces extensively. The RSS 1.0 namespace is http://purl.org/rss/1.0/, and it's defined as the default namespace. The feed also uses http://www.w3.org/1999/02/22-rdf-syntax-ns# for the RDF-specific elements (which we'll simply be ignoring for our purposes) and http://purl.org/dc/elements/1.1/ (Dublin Core) for the additional metadata of article authors and publishing dates.

    We can go in one of two ways here: if we don't have a namespace-aware XML parser, we can blindly assume that the feed uses the standard prefixes and default namespace and look for item elements and dc:creator elements within them. This will actually work in a large number of real-world cases; most RSS feeds use the default namespace and the same prefixes for common modules like Dublin Core. This is a horrible hack, though. There's no guarantee that a feed won't use a different prefix for a namespace (which would be perfectly valid XML and RDF). If or when it does, we'll miss it.

    If we have a namespace-aware XML parser at our disposal, we can construct a more elegant solution that handles both RSS 0.91 and 1.0 feeds. We can look for items in no namespace; if that fails, we can look for items in the RSS 1.0 namespace. (Not shown, but RSS 0.90 feeds also use a namespace, but not the same one as RSS 1.0. So what we really need is a list of namespaces to search.)

  3. Less obvious but still important, the item elements are outside the channel element. (In RSS 0.91, the item elements were inside the channel. In RSS 0.90, they were outside; in RSS 2.0, they're inside. Whee.) So we can't be picky about where we look for items.

  4. Finally, you'll notice there is an extra items element within the channel. It's only useful to RDF parsers, and we're going to ignore it and assume that the order of the items within the RSS feed is given by their order of the item elements.

But what about RSS 2.0? Luckily, once we've written code to handle RSS 0.91 and 1.0, RSS 2.0 is a piece of cake. Here's the RSS 2.0 version of the same feed:

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>XML.com</title>
    <link>http://www.xml.com/</link>
    <description>XML.com features a rich mix of information and services for the XML community.</description>
    <language>en-us</language>
    <item>
      <title>Normalizing XML, Part 2</title>
      <link>http://www.xml.com/pub/a/2002/12/04/normalizing.html</link>
      <description>In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.</description>
      <dc:creator>Will Provost</dc:creator>
      <dc:date>2002-12-04</dc:date>    
    </item>
    <item>
      <title>The .NET Schema Object Model</title>
      <link>http://www.xml.com/pub/a/2002/12/04/som.html</link>
      <description>Priya Lakshminarayanan describes in detail the use of the .NET Schema Object Model for programmatic manipulation of W3C XML Schemas.</description>
      <dc:creator>Priya Lakshminarayanan</dc:creator>
      <dc:date>2002-12-04</dc:date>    
    </item>
    <item>
      <title>SVG's Past and Promising Future</title>
      <link>http://www.xml.com/pub/a/2002/12/04/svg.html</link>
      <description>In this month's SVG column, Antoine Quint looks back at SVG's journey through 2002 and looks forward to 2003.</description>
      <dc:creator>Antoine Quint</dc:creator>
      <dc:date>2002-12-04</dc:date>    
    </item>
  </channel>
</rss>

As this example shows, RSS 2.0 uses namespaces like RSS 1.0, but it's not RDF. Like RSS 0.91, there is no default namespace and items are back inside the channel. If our code is liberal enough to handle the differences between RSS 0.91 and 1.0, RSS 2.0 should not present any additional wrinkles.

How can I read RSS?

Now let's get down to actually reading these sample RSS feeds from Python. The first thing we'll need to do is download some RSS feeds. This is simple in Python; most distributions come with both a URL retrieval library and an XML parser. (Note to Mac OS X 10.2 users: your copy of Python does not come with an XML parser; you will need to install PyXML first.)

from xml.dom import minidom
import urllib

def load(rssURL):
  return minidom.parse(urllib.urlopen(rssURL))

This takes the URL of an RSS feed and returns a parsed representation of the DOM, as native Python objects.

The next bit is the tricky part. To compensate for the differences in RSS formats, we'll need a function that searches for specific elements in any number of namespaces. Python's XML library includes a getElementsByTagNameNS which takes a namespace and a tag name, so we'll use that to make our code general enough to handle RSS 0.9x/2.0 (which has no default namespace), RSS 1.0 and even RSS 0.90. This function will find all elements with a given name, anywhere within a node. That's a good thing; it means that we can search for item elements within the root node and always find them, whether they are inside or outside the channel element.

DEFAULT_NAMESPACES = /
  (None, # RSS 0.91, 0.92, 0.93, 0.94, 2.0
  'http://purl.org/rss/1.0/', # RSS 1.0
  'http://my.netscape.com/rdf/simple/0.9/' # RSS 0.90
  )

def getElementsByTagName(node, tagName, possibleNamespaces=DEFAULT_NAMESPACES):
  for namespace in possibleNamespaces:
    children = node.getElementsByTagNameNS(namespace, tagName)
    if len(children): return children
  return []

Finally, we need two utility functions to make our lives easier. First, our getElementsByTagName function will return a list of elements, but most of the time we know there's only going to be one. An item only has one title, one link, one description, and so on. We'll define a first function that returns the first element of a given name (again, searching across several different namespaces). Second, Python's XML libraries are great at parsing an XML document into nodes, but not that helpful at putting the data back together again. We'll define a textOf function that returns the entire text of a particular XML element.

def first(node, tagName, possibleNamespaces=DEFAULT_NAMESPACES):
  children = getElementsByTagName(node, tagName, possibleNamespaces)
  return len(children) and children[0] or None

def textOf(node):
  return node and "".join([child.data for child in node.childNodes]) or ""

That's it. The actual parsing is easy. We'll take a URL on the command line, download it, parse it, get the list of items, and then get some useful information from each item:

DUBLIN_CORE = ('http://purl.org/dc/elements/1.1/',)

if __name__ == '__main__':
  import sys
  rssDocument = load(sys.argv[1])
  for item in getElementsByTagName(rssDocument, 'item'):
    print 'title:', textOf(first(item, 'title'))
    print 'link:', textOf(first(item, 'link'))
    print 'description:', textOf(first(item, 'description'))
    print 'date:', textOf(first(item, 'date', DUBLIN_CORE))
    print 'author:', textOf(first(item, 'creator', DUBLIN_CORE))
    print

Running it with our sample RSS 0.91 feed prints only title, link, and description (since the feed didn't include any other information on dates or authors):

$ python rss1.py http://www.xml.com/2002/12/18/examples/rss091.xml.txt
title: Normalizing XML, Part 2
link: http://www.xml.com/pub/a/2002/12/04/normalizing.html
description: In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.
date:
author:

title: The .NET Schema Object Model
link: http://www.xml.com/pub/a/2002/12/04/som.html
description: Priya Lakshminarayanan describes in detail the use of the .NET Schema Object Model for programmatic manipulation of W3C XML Schemas.
date:
author:

title: SVG's Past and Promising Future
link: http://www.xml.com/pub/a/2002/12/04/svg.html
description: In this month's SVG column, Antoine Quint looks back at SVG's journey through 2002 and looks forward to 2003.
date:
author:

For both the sample RSS 1.0 feed and sample RSS 2.0 feed, we also get dates and authors for each item. We reuse our custom getElementsByTagName function, but pass in the Dublin Core namespace and appropriate tag name. We could reuse this same function to extract information from any of the basic RSS modules. (There are a few advanced modules specific to RSS 1.0 that would require a full RDF parser, but they are not widely deployed in public RSS feeds.)

Here's the output against our sample RSS 1.0 feed:

$ python rss1.py http://www.xml.com/2002/12/18/examples/rss10.xml.txt
title: Normalizing XML, Part 2
link: http://www.xml.com/pub/a/2002/12/04/normalizing.html
description: In this second and final look at applying relational normalization techniques to W3C XML Schema data modeling, Will Provost discusses when not to normalize, the scope of uniqueness and the fourth and fifth normal forms.
date: 2002-12-04
author: Will Provost

title: The .NET Schema Object Model
link: http://www.xml.com/pub/a/2002/12/04/som.html
description: Priya Lakshminarayanan describes in detail the use of the .NET Schema Object Model for programmatic manipulation of W3C XML Schemas.
date: 2002-12-04
author: Priya Lakshminarayanan

title: SVG's Past and Promising Future
link: http://www.xml.com/pub/a/2002/12/04/svg.html
description: In this month's SVG column, Antoine Quint looks back at SVG's journey through 2002 and looks forward to 2003.
date: 2002-12-04
author: Antoine Quint

Running against our sample RSS 2.0 feed produces the same results.

This technique will handle about 90% of the RSS feeds out there; the rest are ill-formed in a variety of interesting ways, mostly caused by non-XML-aware publishing tools building feeds out of templates and not respecting basic XML well-formedness rules. Next month we'll tackle the thorny problem of how to handle RSS feeds that are almost, but not quite, well-formed XML.

What is RSS?相关推荐

  1. C#与RSS亲密接触

    讲述动态生成RSS文件的方法. 动态生成RSS文件也基本有两种方法,一种是用字符串累加的方法,另一种是使用xml文档生成的方法.字符串累加的方法也比较简单,我也就不多说了,这里着重说一下生成XmlDo ...

  2. 博客 rss 如何使用_如何使用RSS从您的GatsbyJS博客自动交叉发布

    博客 rss 如何使用 With the recent exodus from Medium many developers are now creating their own GatsbyJS B ...

  3. Linux RSS/RPS/RFS/XPS对比

    RSS适合于多队列网卡,把不同的流分散的不同的网卡多列中,至于网卡队列由哪个cpu处理还需要绑定网卡队列中断与cpu RPS:适合于单队列网卡或者虚拟网卡,把该网卡上的数据流让多个cpu处理 RFS: ...

  4. 用ASP.NET建立一个在线RSS新闻聚合器(3)

    显示特定聚合摘要的新闻项 我们面临的下一个任务是创建 DisplayNewsItems.aspx 页面.这个页面会以链接的形式显示所选聚合摘要的新闻项标题,当点击标题时,新闻的内容就会显示在右下部分的 ...

  5. 顶级生物信息学 RSS 订阅源

    早在 2018 年的时候我在"生信草堂"的公众号上写过一篇关于 RSS 的文章<使用 RSS 打造你的科研资讯头条>,介绍了关于 RSS 的一些内容和如何使用 inor ...

  6. rss阅读器保存html文件,轻量级RSS阅读器网页版:selfoss安装教程

    说明:关于RSS阅读器,我们知道的有Feedbin.FreshRSS等,功能都挺强大的,这里就再介绍个轻量级的RSS阅读器selfoss,使用起来是非常简单的,界面颜值也还不错,支持很多种订阅和网站, ...

  7. 新浪微博RSS Feed实现中的问题

    下载代码: http://code.google.com/p/rss-feed/ 把三个文件上传到支持php的空间.(文件没做修改) 在web上访问: http://sinojelly.20x.cc/ ...

  8. Emacs中的RSS阅读器--newsticker

    1 简介 ------- newsticker是一个RSS阅读器,它支持以下几种格式 * RSS 0.91 * RSS 0.92 * RSS 1.0 * RSS 2.0 * Atom 0.3 * At ...

  9. 21 个HTML网页转RSS Feeds的工具

    如果你拥有一个html静态网站,或你喜欢的某个网站不支持RSS Feeds输出,你可以使用本文介绍的这些工具,将HTML网页转换为RSS Feeds. The RSS Wizard 这是一款可以让你创 ...

  10. GraphQL 配合 JWT 使用 —— Laravel RSS (二)

    我们了解了 jwt 和 GraphQL 的使用,那接下来看看他们如何结合使用. 小试牛刀 创建 myProfile query <?php /*** User: yemeishu* Date: ...

最新文章

  1. 2021年春季学期-信号与系统-第一次作业参考答案-第三题
  2. virtualBox linux操作系统centos 挂载光盘
  3. js创建对象的高级模式
  4. 力扣: 88. 合并两个有序数组
  5. Java 集合系列06: Vector深入解析
  6. C语言荣获2019年度最佳编程语言
  7. 什么原因成就了一位优秀的程序员?(转)
  8. ES6中关于set数据结构详解
  9. 真会玩!竟然可以这样用IDEA通过数据库生成lombok版的POJO...
  10. YYH的球盒游戏(NOIP模拟赛Round 6)
  11. java短信发送接口代码示例demo分享
  12. memcpy、memmove、memcmp、memset函数的使用说明和模拟实现
  13. Ubuntu10.04 硬盘安装
  14. 分布式网页爬虫系统 设计和实现
  15. 华为暑期实习一面凉经
  16. 双向链表的一个简单的例子
  17. 立创eda学习笔记九:图层
  18. 简单的闪避游戏的c语言,谁有一些简单小游戏的C语言程序?
  19. MIMX8MD6CVAHZAB I.MX 8MDUAL Cortex-A53 - 微处理器
  20. 数据挖掘与matlab,用MATLAB实现数据挖掘的一种算法

热门文章

  1. 如何将邮件导出为PDF
  2. 蔡司镜头的魅力:vivo X60 Pro评测体验
  3. Linux 探索之旅 | 第五部分第八课:用 Shell 做统计练习
  4. python课程小作业之桌面小工具系统
  5. EXCEL里的Trend函数如何做到数据预测?
  6. 第十届蓝桥杯省赛再现(编程部分)
  7. 【上市啦】“Python 之父” 力荐的蓝皮书,你知道是哪本吗?
  8. 海气耦合模态--学习笔记
  9. Java运算符(1)
  10. 基于GC - MS的代谢组学研究揭示:SD大鼠和Wistar大鼠之间存在系统的代谢差异及乙醇灌胃反应差异