What Is XML?

XML (Extensible Markup Language) is a buzzword you will see everywhere on the Internet, but it’s

also a rapidly maturing technology with powerful real-world applications, particularly for the

management, display, and organization of data. Together with its many related technologies,

which are covered in later chapters, XML is an essential technology for anyone working with data,

whether publicly on the web or privately within your own organization. This chapter introduces

you to some XML basics and begins to show you why learning about it is so important.

This chapter covers the following:

❑The two major categories of computer file types—binary files and text files—and the

advantages and disadvantages of each

❑The history behind XML, including other markup languages such as SGML and HTML

❑How XML documents are structured as hierarchies of information

❑A brief introduction to some of the other technologies surrounding XML, which you will

work with throughout the book

❑A quick look at some areas where XML is useful

.

Xml在internet上是一个新词,但是它正在伴随着强有力的现实应用程序下日趋成熟,特别在数据的管理,展现和组织上。不管是web上公开数据还是在内部使用数据,XML都是一种很好的技术。和XML相关的技术会在以后的章节陆续讲到,本章主要讲述xml的一些基本知识,以及xml的重要性等,详细如下:

1. 两种主要的文件类型:二进制文件、文本文件以及它们的优缺点

2. Xml的历史

3. 与xml相关技术的简单简单介绍

4. Xml主要用在哪些领域

Of Data, Files, and Text

XML is a technology concerned with the description and structuring of data, so before you can

really delve into the concepts behind XML, you need to understand how computers store and

access data. For our purposes, computers understand two kinds of data files: binary files and

text files.

关于数据、文件以及文本

Xml是与数据描述、数据的结构相关的一种技术,因此在了解xml之间你需要知道计算机是怎样去存储与读取数据的。大家都知道计算机可以读取二进制文件和文本文件。

Binary Files

A binary file, at its simplest, is just a stream of bits (1s and 0s). It’s up to the application that created a

binary file to understand what all of the bits mean. That’s why binary files can only be read and produced

by certain computer programs, which have been specifically written to understand them.

For instance, when a document is created with Microsoft Word, the program creates a binary file with an

extension of “doc,’’ in its own proprietary format. The programmers who wrote Word decided to insert

certain binary codes into the document to denote bold text, codes to denote page breaks, and other codes

for all of the information that needs to go into a “doc’’ file. When you open a document in Word, it interprets

those codes and displays the properly formatted text or prints it to the printer.

The codes inserted into the document are meta data, or information about information. Examples could

be “this word should be in bold,” “that paragraph should be centered,” and so on. This meta data is

really what differentiates one file type from another; the different types of files use different kinds of

meta data. For example, a word processing document has different meta data than a spreadsheet document,

because they are describing different things. Not so obviously, documents from different word

processing applications, such as Microsoft Word and WordPerfect, also have different meta data, because

the applications were written differently (see Figure 1-1).

二进制文件:

二进制文件就是一些1或0的比特流。它的创建取决于特定应用程序对于比特流的理解。这就是为什么只有一些特定的应用程序对能去记取这些比特流,而某些计算机程序则不能读取。

比如,当我们用microsoft Word 来建立一个文件,word应用程序就会创建一个以”doc”为扩展名的二进制文件,而这个二进制文件就是word自己的格式。当我们对word文档中的字体加粗、分页或是进行其他操作时,word应用程序就会在二进制文件中加入相应的表示符号,这样你在下次打开或打印时就会按相应的样式来显示。

这些表示格式的二进制信息就是word文档的元数据,也可以称为关于信息的信息,就像对文字加粗、创建段落等。当然这些元数据会因为应用程序的不同而不同,比如word和电子表格的元数据就是用不同的符号去表示。同时对于不同的文档处理软件,比如microsoft word和WordPerfect,它们的元数据也是不一样的。(看图1-1)

Figure 1-1

You can’t assume that a document created with one word processor will be readable by another, because

the companies who write word processors all have their own proprietary formats for their data files.

Word documents open in Microsoft Word, and WordPerfect documents open in WordPerfect.

Luckily, most word processors come with translators or import utilities, which can translate documents

from other word processors into formats that can be understood natively. If I have Microsoft Word

installed on my computer and someone gives me a WordPerfect document, I might be able to import it

into Word so that I can read the document. Of course, many of us have seen the garbage that sometimes

occurs as a result of this translation; sometimes applications are not as good as we’d like them to be at

converting the information.

Binary file formats are advantageous because it is easy for computers to understand these binary codes—

meaning that they can be processed much faster than nonbinary formats—and they are very efficient for

storing this meta data. There is also a disadvantage, as you’ve seen, in that binary files are proprietary.

You might not be able to open binary files created by one application in another application, or even in the

same application running on another platform.

同时你不要指望不一样的文本处理器生成的文件它们之间可以相互读取,因为它们的所用来表示的二进制流不一样,但幸运的是现在大家的文档处理器都带有一个翻译器或其他的导入实用程序,它们可以把不同格式的二进制流翻译成自己能够理解的格式,比如我们用Microsoft Word可以打开WordPerfect的文档。但是它并不会像我们希望中的那样,在翻译的时候有时也会发生错误。

二进制文件有以下几个优点:便于计算机理解,这样意味着在处理同一些数据时,二进制格式的就相对会快些,同时二进制文件存储这些元数据效率高。二进制文件也有一大弱点就是因为其创建时是根据特定的应用程序,所以我们不能用其他的应用程序去打开,即使同一个应用程序,在不同的操作平台上也不能打开。

Text Files

Like binary files, text files are also streams of bits. However, in a text file these bits are grouped together

in standardized ways, so that they always form numbers. These numbers are then further mapped to

characters. For example, a text file might contain the following bits:

1100001

This group of bits would be translated as the number 97, which could then be further translated into the

letter a.

This example makes a number of assumptions. A better description of how numbers are represented in

text files is given in the “Encoding” section in Chapter 2.

Because of these standards, text files can be read by many applications, and can even be read by

humans, using a simple text editor. If I create a text document, anyone in the world can read it (as long

as they understand English, of course) in any text editor they wish. Some issues still exist, such as the

fact that different operating systems treat line-ending characters differently, but it is much easier to share

information when it’s contained in a text file than when the information is in a binary format.

Figure 1-2 shows some of the applications on my machine that are capable of opening text files. Some of

these programs only allow me to view the text, while others will let me edit it as well.


Figure 1-2

文本文件:

文本文件像二进制文件一样也是由比特流组成的,但是文本文件的比特流是一些按照一定的规则排列的,因些它们形成有一定的数字,这些数字则对应相应的字符,比如一个文本文件包含有一下的比特序列:

1100001

这个序列代表数字97,而97又对应字符“a”;

根据文本文件的规则,文本文件可以被很多应用程序读取,同时我们用一个文字阅读器也可以读取。如果我们创建一个文本文件,任何人都可以用他喜欢的文本阅读器来读取它,(如果他能读懂英语)。但是仍旧会存在一些问题,比如不同的操作系统对换行符的表示不一样,但是比起二进制文件来说,它更容易让我们去共享信息。

图1-2是我计算机上可以阅读文本文件的应用程序,有些应用程序不仅可以阅读,还可以编辑。

In its early days, the Internet was almost completely text-based, which enabled people to communicate

with relative ease. This contributed to the explosive rate at which the Internet was adopted, and to the

ubiquity of applications such as e-mail, the World Wide Web, newsgroups, and so on.

The disadvantage of text files is that adding other information—our meta data, in other words—is

more difficult and bulky. For example, most word processors enable you to save documents in text form,

but if you do, you can’t mark a section of text as bold or insert a binary picture file. You will simply get

the words with none of the formatting.

在早些时候,internet 也是基于文本的,这样会方便我们交流与internet相关的东西。它使email,www,新闻组等以爆炸性地增长。

文本文件也有它自己的弱点,在添加像二进制文件中的元数据时会非常困难并且文件会变得很大。比如,很多文本处理器都允许你保存文件为文本格式,但是你不能是文件中的某些部分以加粗格式存储,另外你也不能在其中插入二进制图片,你得到的仅仅是没有任何格式的文本。

转载于:https://www.cnblogs.com/yunhuasheng/archive/2007/10/17/927953.html

双语学习xml系列----之一 什么是xml?(第一小节)相关推荐

  1. java递归遍历xml所有元素_Java学习之Xml系列二:xml按条件查询、xml递归遍历所有元素和属性...

    2019独角兽企业重金招聘Python工程师标准>>> xml中加入了几条,为了方便查询时作为示例. 话不多说见代码注释: DTD文件:SwordTypeDefinition.dtd ...

  2. java 递归遍历对象所有属性_Java学习之Xml系列二:xml按条件查询、xml递归遍历所有元素和属性...

    xml中加入了几条,为了方便查询时作为示例. 话不多说见代码注释: DTD文件:SwordTypeDefinition.dtd XML文件:SwordLib.xml SwordLibrary SYST ...

  3. XML系列之--解析电文格式的XML(二)

    上一节介绍了XML的结构以及如何创建.讲到了XML可作为一种简单文本存储数据,把数据存储起来,以XML的方式进行传递.当接收到XML时,必不可少的就是对其进行解析,捞取有效数据,或者将第三方数据以节点 ...

  4. 掌握XML系列(四)---创建格式良好的XML文档

    掌握XML系列(四)---创建格式良好的XML文档 前言:创建良好格式的XML文档,虽然写了这些,其实,等到大家真正的看完之后美酒会发现,其实也很简单的,我们这里所做的是带着大家全面的看看怎么样创建. ...

  5. [CXF REST标准实战系列] 一、JAXB xml与javaBean的转换(转)

    转自:[CXF REST标准实战系列] 一.JAXB xml与javaBean的转换 文章Points: 1.不认识到犯错,然后得到永久的教训. 2.认识JAXB 3.代码实战 1.不认识到犯错,然后 ...

  6. OpenCV学习笔记(四):XML,YAML(.txt,.doc)文件读写操作

    OpenCV学习笔记(四):XML,YAML(.txt,.doc)文件读写操作 一.Write_XML_and_YAML_File(写入XML) #include <opencv2/opencv ...

  7. 安卓开发Android studio学习笔记12:读取解析XML(案例演示)

    Android studio学习笔记 第一步:配置Student.XML 第二步:配置activity_main.xml 第三步:配置student.xml 第四步:配置Student用户类 第五步: ...

  8. Spring学习总结(7)——applicationContext.xml 配置文详解

    web.xml中classpath:和classpath*:  有什么区别? classpath:只会到你的class路径中查找找文件; classpath*:不仅包含class路径,还包括jar文件 ...

  9. ASP.NET 2.0 XML 系列(4):用XmlReader类介绍

    (本文摘自MSDN) XmlReader 类是一个提供对 XML 数据的非缓存.只进只读访问的抽象基类.该类符合 W3C 可扩展标记语言 (XML) 1.0 和 XML 中的命名空间的建议. XmlR ...

  10. ASP.NET 2.0 XML 系列(2): XML技术

    1. DTD XML最强大之处是允许自定义标签,但是对任何既有的应用程序来说,任何类型的标签以任意顺序出现并没有实际意义,所以必须对次序和标签的嵌套加以约束. DTD就是干这件事的,但是DTD缺少强类 ...

最新文章

  1. SunPower携Sunverge纽约合作开发4MWh储能项目
  2. Programming WCF Services 学习笔记四、Instance Management
  3. 《软件测试方法和技术》,《软件测试方法和技术》.ppt
  4. php 复制文件夹并压缩到最小_PHP压缩文件夹的方法
  5. 数据库-优化-案例-max()函数优化
  6. 零基础爬虫requests初阶教程,手把手教你爬数据
  7. 一汽奔腾b7o价位_全新第三代奔腾B70有何资格对合资品牌降维打击?
  8. zabbix使用zabbix-java-gateway监控jvm/tomcat性能
  9. java executequery_java execute、executeQuery和executeUpdate之间的区别
  10. java北大oj1001_这个——北大 ACM POJ 1001 Exponentiation C/C++
  11. 70.Android开发知识点总结
  12. 使用URLOS在linux系统中极速部署NFS共享存储服务
  13. 布莱克曼哈尔窗matlab,基于matlab的布莱克曼窗函数法设计的低通滤波器
  14. 通达信自带指标 均线多头排列(DTPL)
  15. unity3d游戏资源提取
  16. 基于卷积和递归神经网络的物联网流量分类器
  17. mysql 错误 1548_mysql报1548错误-Cannot load from mysql.proc. The table is probably corrupted
  18. 什么是市盈率 什么是市盈率的概念
  19. 7-97 约会成功了吗
  20. Android应用CPU实时监控工具-全机型适用

热门文章

  1. 全局路径规划:图搜索算法介绍1(BFS/DFS)
  2. mysql 从 a表updateb表_mysql A表自动更新和插入B表的数据
  3. 信安精品课:第2章网络攻击原理与常用方法精讲笔记
  4. 大型互联网架构演变历程-《淘宝技术这10年》
  5. css单位介绍em ex ch rem vw vh vm cm mm in pt pc px
  6. 使用Python音频双通道分离
  7. 【小程序】当前“页面B”动态更改title,点击返回按钮,更改的标题会显示在“来源页面A”...
  8. 数组巧去重new Set
  9. BZOJ1729: [Usaco2005 dec]Cow Patterns 牛的模式匹配
  10. tomcat上部署CGI