Asp.net 简单的站内搜索引擎

众所周知，搜索引擎的制作是非常繁琐和耗时的，对于企业级的搜索引擎的制作，需要有良好的蜘蛛程序，定期更新搜索资源库，并且完善优化搜索引擎的速度和方法（比如全文搜索等），减少垃圾网页的出现，是一个很值得深入研究的话题。

这里我们当然不是教大家去做类似Google这样强大的搜索引擎（个人力量有限），也不是简单的调用googl的API来实现，这里主要提供给大家怎么对网页信息进行筛选和查询的功能，我们可以制作一个这样简单的搜索网页的功能，放在我们的个人主页上，作为站内搜索的工具。

[本示例完整源码下载(0分)]

ASPNETStripHtmlCode_asp.net赋值html控件-C#文档类资源-CSDN下载

好了，言归正传，我们简单看一下这个功能的实现过程：

首先我们建立一系列的站内网页文件，这里命名为WebPage0~9，包含一些简单的信息

给出一个示例HTML：

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server"><title>Onecode</title>
</head>
<body><form id="form1" runat="server"><div>Hi, Onecode team.</div></form>
</body>
</html>

接着建立一个SearchEngine的web页面，此页面提供程序的主界面，拥有一个TextBox，Button和GridView控件，接收用户输入的关键字，并且返回对应的网页信息：

HTML代码如下：

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server"><title></title>
</head>
<body><form id="form1" runat="server"><div>Key word:<asp:TextBox ID="tbKeyWord" runat="server"></asp:TextBox><br /><asp:Button ID="btnSearchPage" runat="server" Text="Search your web page" onclick="btnSearchPage_Click" /><asp:GridView ID="gvwResource" runat="server" AutoGenerateColumns="False"><Columns><asp:BoundField DataField="Title" HeaderText="Page Name" /><asp:HyperLinkField DataNavigateUrlFields="Link" DataTextField="Link" HeaderText="Page URL" /></Columns><EmptyDataTemplate>No result</EmptyDataTemplate></asp:GridView></div></form>
</body>
</html>

我们需要建立一个WebPage的实体类存储有关的网页信息，并且便于Linq的查询和数据绑定，创建一个类文件，命名为WebPageEntity.cs

C#代码，这里只存储了最简单信息（网页名称，内容（HTML），链接，标题，内容（文本））：

    /// <summary>/// web page entity class, contain page's basic information,/// such as name, content, link, title, body text./// </summary>[Serializable]public class WebPageEntity{private string name;private string content;private string link;private string title;private string body;public string Name{get{return name;}set{name = value;}}public string Content{get{return content;}set{content = value;}}public string Link{get{return link;}set{link = value;}}public string Title{get{return title;}set{title = value;}}public string Body{get{return body;}set{body = value;}}}

创建一个RegexMethod类，包含提取网页标题，内容的方法，你可以选择扩展这个类，建立自己独有的搜索和排序方法：

代码（RegexMethod.cs）

    public class RegexMethod{/// <summary>/// The method is use to retrieve title text of pages./// </summary>/// <param name="htmlCode"></param>/// <returns></returns>public string GetTitleString(string htmlCode){string regexTitle = @"<title>([^<]*)</title>";string tagClean = @"<[^>]*>";Match match = Regex.Match(htmlCode, regexTitle, RegexOptions.IgnoreCase);string text = match.Groups[0].Value.ToString();string titleText = Regex.Replace(match.Value, tagClean, string.Empty, RegexOptions.IgnoreCase);return titleText;}/// <summary>/// The method is use to retrieve body text of pages./// </summary>/// <param name="htmlCode"></param>/// <returns></returns>public string GetBodyString(string htmlCode){string regexBody = @"(?m)<body[^>]*>(\w|\W)*?</body[^>]*>";string tagClean = @"<[^>]*>";MatchCollection matches = Regex.Matches(htmlCode, regexBody, RegexOptions.IgnoreCase);StringBuilder strPureText = new StringBuilder();foreach (Match match in matches){string text = Regex.Replace(match.Value, tagClean, string.Empty, RegexOptions.IgnoreCase);strPureText.Append(text);}return strPureText.ToString();}}

准备工作已经OK，我们开始在SearchEngine.aspx.cs页面写主要的实现方法了，思想是这样的，首先建立一个List<T>实例，获取网页的资源信息，为了保持回传的状态，将这个List保留在ViewState中，获取网页资源将用到HttpWebRequest和HttpWebResponse类，并且用lock关键字定义互斥段代码。实体类中Name和Link用于展示网页名称和链接，允许用户通过点击访问网页，Title和Body作为搜索条件，Content用于通过RegexMethod class截取Title和Body。当收取网页实体类完成后（注意这里我们也可以收集外部网站的内容，同样可以借助我们的方法来执行搜索，这里我加入Bing网站，www.bing.com），是信息筛选阶段，使用Linq的Contain方法判断标题和网页内容是否包含对应的关键字，如果符合加入到选中的list中，并显示出来：

全部代码如下（Default.aspx.cs）

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Net;
using System.IO;
using System.Text;namespace CSASPNETDisplayDataStreamResource
{public partial class SearchEngine : System.Web.UI.Page{private List<WebPageEntity> webResources;private bool isLoad = true;protected void Page_Load(object sender, EventArgs e){if (!IsPostBack){this.LoadList();}}/// <summary>/// Store web resources in ViewState variables./// </summary>public List<WebPageEntity> WebResources{get{if (ViewState["Resource"] != null){this.LoadList();}return (List<WebPageEntity>)ViewState["Resource"];}}/// <summary>/// The method is use to load resource by specifically web pages./// </summary>public void LoadList(){RegexMethod method = new RegexMethod();webResources = new List<WebPageEntity>();lock (this){for (int i = 0; i < 10; i++){string url = Page.Request.Url.ToString().Replace("SearchEngine", string.Format("WebPage{0}", i));string result = this.LoadResource(url);if (isLoad){WebPageEntity webEntity = new WebPageEntity();webEntity.Name = Path.GetFileName(url);webEntity.Link = url;webEntity.Content = result;webEntity.Title = method.GetTitleString(result);webEntity.Body = method.GetBodyString(result);webResources.Add(webEntity);}}string extraUrl = "http://www.bing.com/";string bingResult = this.LoadResource(extraUrl);if (isLoad){WebPageEntity webEntity = new WebPageEntity();webEntity.Name = Path.GetFileName(extraUrl);webEntity.Link = extraUrl;webEntity.Content = bingResult;webEntity.Title = method.GetTitleString(bingResult);webEntity.Body = method.GetBodyString(bingResult);webResources.Add(webEntity);}ViewState["Resource"] = webResources;}}/// <summary>/// Use HttpWebRequest, HttpWebResponse, StreamReader for retrieving/// information of pages, and calling Regex methods to get useful /// information./// </summary>/// <param name="url"></param>/// <returns></returns>public string LoadResource(string url){HttpWebResponse webResponse = null;StreamReader reader = null;try{HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);webRequest.Timeout = 30000;webResponse = (HttpWebResponse)webRequest.GetResponse();string resource = String.Empty;if (webResponse == null){this.isLoad = false;return string.Empty;}else if (webResponse.StatusCode != HttpStatusCode.OK){this.isLoad = false;return string.Empty;}else{reader = new StreamReader(webResponse.GetResponseStream(), Encoding.GetEncoding("utf-8"));resource = reader.ReadToEnd();return resource;}}catch (Exception ex){this.isLoad = false;return ex.Message;}finally{if (webResponse != null){webResponse.Close();}if (reader != null){reader.Close();}}}/// <summary>/// The search button click event is use to compare key words and /// page resources for selecting relative pages. /// </summary>/// <param name="sender"></param>/// <param name="e"></param>protected void btnSearchPage_Click(object sender, EventArgs e){if (!isLoad){Response.Write("Resource file load failed, please refresh your page.");return;}if (tbKeyWord.Text.Trim() != string.Empty){List<WebPageEntity> allSelectedResources = new List<WebPageEntity>();string[] keys = tbKeyWord.Text.Split(' ');foreach(string key in keys){string oneKey = key;var webSelectedResources = from entity in this.WebResourceswhere entity.Body.ToLower().Contains(string.Format("{0}", oneKey.ToLower()))|| entity.Title.ToLower().Contains(string.Format("{0}", oneKey.ToLower()))select entity;foreach (WebPageEntity entity in webSelectedResources){if (!allSelectedResources.Contains(entity)){allSelectedResources.Add(entity);}}                 }                              gvwResource.DataSource = allSelectedResources;gvwResource.DataBind();}else{var webSelectedResource = from entity in this.WebResourcesselect new{entity.Title,entity.Link,};gvwResource.DataSource = webSelectedResource;gvwResource.DataBind();}}}
}

请按Ctrl+F5尝试运行你的网站，输入你的关键字开始搜索吧，比如onecode，bing，azure，hotmail等等。

Asp.net 简单的站内搜索引擎相关推荐

站内搜索引擎之比较〔转〕
有很多网站都在网页上加个"站内搜索引擎"."搜索引擎"."全文检索"等等相关字样. 用户一用,结果发现,既不能多关键组合查询,也不能支持国际 ...
王通：站内搜索引擎的SEO策略
越来越多的大中型网站都有了站内搜索引擎,站内搜索引擎如果采用正确的SEO策略,可以产生大量非常合理的关键词页面,可以在各大搜索引擎中带来巨大的流量.站内搜索引擎该如何SEO呢?很简单,只需要做好以下三 ...
用C++来设计开发的基于boost文档的站内搜索引擎项目，点赞收藏起来！
So Easy搜索引擎项目描述主要技术项目特点 0. 准备工作 1. 预处理模块 2. 索引模块 3. 搜索模块 4. 服务器模块项目难点和提升结束语项目描述 boost官网虽然提供了在线 ...
站内搜索引擎初探：haystack全文检索，whoosh搜索引擎，jieba中文分词
在做django项目当中,不免要使用到站内搜索引擎,网站呈现的内容除了列表,详细页,首页之外,用户也需要通过搜索引擎来找到自己需要的内容. 安装: pip install django-haystac ...
加入一个基于GOOGLE的站内搜索引擎
由于这一次的客户只能提供虚拟主机作为项目运行平台,无法搭配中文分词组件,原来自行开发的站内搜索引擎无法发挥最大的功效(主要是不能自动分析关键词,只能通过指定相关索引字段,以及手工输入TAG的机制来生成 ...
【项目】基于BOOST的站内搜索引擎
目录 1. 简介建立搜索引擎的宏观体系技术栈和项目环境正排索引 and 倒排索引 2. 数据去标签与数据清洗模块 -- Parser 数据去标签 parser.cc parser.cc 的代码结 ...
如何搭建一个站内搜索引擎(一) 第1章写在最前
搜索引擎,对很多人来说,熟悉又陌生.熟悉,是因为每个人每天都能接触到,比如百度.google.淘宝内部搜索:陌生,是因为鲜有人了解他的原理. 因为工作需要,有幸参与负责了一个站内搜索的项目.所以可以从 ...
基于swiftype应用于Hexo-Yilia-主题的站内搜索引擎
本文基于Hexo,Yilia主题添加站内搜索功能与使用swiftype实现站内搜索文章之前首先感谢以上两位作者YeHbeats与 huangjunhui swiftype Swiftype 可以为网 ...
Django博客开发（十三）—一个简单的站内搜索
最近在努力的翻译Apple的测试文档,英文差果然是一个硬伤. 背景博客内容越来越多了,慢慢的发现有时候需要一个搜索引擎来处理一些搜索的要求.当然,不需要那么复杂的逻辑,我们的博客只需要一个很小很小的 ...
使用 LayUI+SpringBoot+Solr 模仿百度、做站内搜索引擎
一.前言全文检索于 sql 模糊查询,最大的区别,在于 ① 前者能将要查询的关键字符串先进行灵活分词,再进行匹配, ② 后者只会直接死板匹配. ③ 很多网站都有站内搜索,每个后台的应该会,故做了个 ...

Asp.net 简单的站内搜索引擎

Asp.net 简单的站内搜索引擎相关推荐

最新文章

热门文章