

1. About Background

The World Wide Web (Web) is a popular and interactive medium to disseminate information today. The Web is huge, diverse, and dynamic.

With the explosion of the World Wide Web, a wealth of data on many different subjects has become available online.

The explosive growth and popularity of the World Wide Web has resulted in a huge amount of information sources on the Internet.

Data on the web continues to grow at a torrid pace .

As the data on the Web grows at explosive rates, a tremendous research effort has been initiated to make such data available.

The World Wide Web is becoming the dominant medium for information delivery and electronic commerce. The number of users who routinely use the web to buy goods and services continues to increase at a rapid pace .

With the Web, computer users have gained access to a large variety of comprehensive information repositories .

The explosion in the use and availability of wireless devices and the ability they give people to access information anytime and anywhere has great promise .

The Web is a medium for access ing a great variety of information stored in different parts of the world. The rapid expansion of the web is causing the constant growth of this information.

While search engines provide some help in locating information of interest to users on the World Wide Web, a large number of the web pages returned by filling in search forms are not indexable by most search engines today as they are generated dynamically by querying a back-end (relational or object-relational) database. The set of such web pages, referred to as the Deep Web or Hidden Web, is estimated to be around 500 times the size of the “surface web”.

The rapid expansion of the Internet has made the WWW a popular place for disseminating and collecting information.

A significant portion of the data on the World Wide Web is in the form of HTML pages.

With the rapid expansion of the Web, the content of the Web is becoming richer and richer. People are increasingly using the Web to learn an unfamiliar topic because of the Web’s convenience and its abundance of information and knowledge.

The World Wide Web has become one of the most important connections of various information sources. A large proportion of the Web data is embedded in HTML documents.

In the last few years the number of people that have used the Internet has enormously increased.

Nowadays the Web poses itself as the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient access to relevant information within this huge repository.

The Web today contains documents that are highly volatile , distributed and heterogeneous . The content of a web page is usually much more diverse compared with traditional plain text document and encompasses multiple regions with unrelated topics .

Recently, the development of the web, web-based applications and web-based access to databases, has triggered an interest in information extraction from the web.

The Deep Web is an important yet largely-unexplored frontier for information search.

With the phenomenal growth of the Web, there is an ever-increasing volume of data and information published in numerous Web pages.

Throughout the day, many people keep track of a large sea of information from a variety of data sources.

Information such as news headlines, weblogs, stock quotes, peers' web sites, organizational intranets , etc ., all provide information that can enrich the lives of the individual and can support the productivity of the worker.

At all times, people are using the Internet for making purchases, doing research, seeking out entertainment, and building their own web sites . All of this behavior can be monitored and used to derive information without ever having to interrupt the user s intentions by asking him questions.

2. About Related work

Many recent works in the literature have presented approaches for doing sth.

Sth. has been a hot topic for several years.

Several ** approaches have been reported in the literature for doing sth, e.g. ***.

Ever since the inception of the Web, ** has been an active research area . So far , many ** techniques have been proposed and some of them are also widely used in practice.

Many researchers have considered using **.

Although ** is an important problem, relatively little work has been done to deal with it.

To our knowledge, there is little work directly addressing the problem we consider in this paper. Below we review some works that are considered relevant to ours.

** has been a subject of intense research for several years. However, in the context of the World Wide Web and web services, research in ** have received renewed attention .

At least two broad views of this problem have evolved recently. The first one, **. The second one, **.

3. About Figure

as Figure 1 conceptually illustrates .

Figure *: Distribution of Sth. over **  Sth. **( 横轴内容 ) 上的分布图

Figure 1 shows ..


