新闻系统(3)内容保护的探索

这些年，互联网垃圾站已经成为一个非常大规模的产业，所谓天下文章一大抄，特别是在版权意识不强的中国，这个现象尤为严重，当一个网站辛苦整理的资料被人疯狂转载的时候，原创的网站可能都会被当成是垃圾网站了。在以往，搜索引擎不是很喜欢论坛和博客的资料，也许以为灌水过多，价值不大，可是现在发现搜索引擎非常青睐博客和论坛这些原创资料。
  我们总是希望搜索引擎多收录我们的资料，以便提高流量，所谓seo,可是反过来，当我们被搜索引擎抓取的时候，也是非常适合那些垃圾站的抓取。这可是有点两难。我们看到很多网站都开始转向偏向于宁可少被百度收录，也要保护自己的版权。
  常见的办法是把内容做成图片，把文字做成图片的软件还是比较多，我们就不赘述了，我们主要讨论下能不能用程序来解决这个问题。
  Html要绘制成图片，首先得做html的解析，这个是很有难度的，相当于你得做一个浏览器的内核，那么最简单的办法莫过于直接使用ie的浏览器。
  在net中，我们知道WebBrowser。那么我们就可以依托这个，来做一个抓屏的效果，我们首先让html在一个web地址上呈现，再通过我们的WebBrowser去读取这个页面，然后把内容生成图片。
  代码不多，请看代码。首先声明以下很多代码转自网络，我只做了部分扩展。
  using System;
  using System.Drawing;
  using System.Drawing.Imaging;
  using System.Windows.Forms;
  using mshtml;
  namespace webabc
  {
  public class HtmlToImage
  {
  int S_Height;
  int S_Width;
  int F_Height;
  int F_Width;
  string MyURL;
  public int ScreenHeight
  {
  get
  {
  return S_Height;
  }
  set
  {
  S_Height = value;
  }
  }
  public int ScreenWidth
  {
  get
  {
  return S_Width;
  }
  set
  {
  S_Width = value;
  }
  }
  public int ImageHeight
  {
  get
  {
  return F_Height;
  }
  set
  {
  F_Height = value;
  }
  }
  public int ImageWidth
  {
  get
  {
  return F_Width;
  }
  set
  {
  F_Width = value;
  }
  }
  public string WebSite
  {
  get
  {
  return MyURL;
  }
  set
  {
  MyURL = value;
  }
  }
  public HtmlToImage(string WebSite, int ScreenWidth, int ScreenHeight, int ImageWidth, int ImageHeight)
  {
  this.WebSite = WebSite;
  this.ScreenHeight = ScreenHeight;
  this.ScreenWidth = ScreenWidth;
  this.ImageHeight = ImageHeight;
  this.ImageWidth = ImageWidth;
  }
  public Bitmap GetBitmap()
  {
  WebPageBitmap Shot = new WebPageBitmap(this.WebSite, this.ScreenWidth, this.ScreenHeight);
  Shot.GetIt();
  Bitmap Pic = Shot.DrawBitmap(this.ImageHeight, this.ImageWidth);
  return Pic;
  }
  }
  public class WebPageBitmap
  {
  WebBrowser MyBrowser;
  string URL;
  int Height;
  int Width;
  public WebPageBitmap(string url, int width, int height)
  {
  this.URL = url;
  this.Width = width;
  this.Height = height;
  MyBrowser = new WebBrowser();
  //if (System.Web.HttpContext.Current.Cache["dd"] == null)
  //{
  // System.Web.HttpContext.Current.Cache["dd"]=MyBrowser ;
  //}
  //else
  //{
  // MyBrowser = (WebBrowser)System.Web.HttpContext.Current.Cache["dd"];
  //}
  MyBrowser.ScrollBarsEnabled = false;
  MyBrowser.Size = new Size(this.Width, this.Height);
  }
  public void GetIt()
  {
  MyBrowser.Navigate(this.URL);
  while (MyBrowser.ReadyState != WebBrowserReadyState.Complete)
  {
  Application.DoEvents();
  }
  IHTMLDocument2 doc2 = (IHTMLDocument2)MyBrowser.Document.DomDocument;
  IHTMLDocument3 doc3 = (IHTMLDocument3)MyBrowser.Document.DomDocument;
  IHTMLElement2 body2 = (IHTMLElement2)doc2.body; //doc2.body;
  IHTMLElement2 root2 = (IHTMLElement2)doc3.documentElement;//doc3.documentElement;
  // Determine dimensions for the image; we could add minWidth here
  // to ensure that we get closer to the minimal width (the width
  // computed might be a few pixels less than what we want).
  int __width = Math.Max(body2.scrollWidth, root2.scrollWidth);
  int __height = Math.Max(root2.scrollHeight, body2.scrollHeight);
  this.Height = __height;
  this.Width = __width;
  MyBrowser.Size = new Size(__width, __height);
  }
  public Bitmap DrawBitmap(int theight, int twidth)
  {
  Bitmap myBitmap = new Bitmap(this.Width, this.Height);
  Rectangle DrawRect = new Rectangle(0, 0, this.Width, this.Height);
  MyBrowser.DrawToBitmap(myBitmap, DrawRect);
  System.Drawing.Image imgOutput = myBitmap;
  System.Drawing.Bitmap oThumbNail = new Bitmap(this.Width, this.Height, imgOutput.PixelFormat);
  Graphics g = Graphics.FromImage(oThumbNail);
  //g.Clear(Color.Transparent);
  g.CompositingQuality = System.Drawing.Drawing2D.CompositingQuality.HighQuality;
  g.SmoothingMode = System.Drawing.Drawing2D.SmoothingMode.HighQuality;
  g.InterpolationMode = System.Drawing.Drawing2D.InterpolationMode.HighQualityBicubic;
  Rectangle oRectangle = new Rectangle(0, 0, this.Width, this.Height);
  ImageAttributes attr = new ImageAttributes();
  //attr.SetColorKey(Color.White,Color.White);
  g.DrawImage(imgOutput, oRectangle, 0, 0, imgOutput.Width, imgOutput.Height, GraphicsUnit.Pixel, attr);
  try
  {
  return oThumbNail;
  }
  catch
  {
  return null;
  }
  finally
  {
  imgOutput.Dispose();
  imgOutput = null;
  MyBrowser.Dispose();
  MyBrowser = null;
  }
  }
  }
  }
  一些深刻技术分析的文章在博客园也又不少分析，请搜索下他们的代码。我们要说明的几点是。
  IHTMLDocument2 doc2 = (IHTMLDocument2)MyBrowser.Document.DomDocument;
  IHTMLDocument3 doc3 = (IHTMLDocument3)MyBrowser.Document.DomDocument;
  IHTMLElement2 body2 = (IHTMLElement2)doc2.body; //doc2.body;
  IHTMLElement2 root2 = (IHTMLElement2)doc3.documentElement;//doc3.documentElement;
  // Determine dimensions for the image; we could add minWidth here
  // to ensure that we get closer to the minimal width (the width
  // computed might be a few pixels less than what we want).
  int __width = Math.Max(body2.scrollWidth, root2.scrollWidth);
  int __height = Math.Max(root2.scrollHeight, body2.scrollHeight);
  this.Height = __height;
  this.Width = __width;
  MyBrowser.Size = new Size(__width, __height);
  这一部分，我们通过对内容的分析，可以得出文档的高，而不是显示器一屏的高。在这里我们还可以扩展一下，直接插入html让WebBrowser进行绘制，而不用去浏览某个网址。大家可以尝试一下。
  web调用的代码如下，要注意这个： newThread.SetApartmentState(ApartmentState.STA);
  using System;
  using System.Collections.Generic;
  using System.Drawing.Imaging;
  using System.Web;
  using System.Threading;
  /// <summary>
  ///My_html_to_img 的摘要说明
  /// </summary>
  public class My_html_to_img
  {
  public My_html_to_img()
  {
  //
  //TODO: 在此处添加构造函数逻辑
  //
  }
  public string nid="";
  public string oinfo = "E:";
  public string url = string.Empty;
  public string path = "";
  public string NewsContentToImages()
  {
  //dd();
  try
  {
  url = "http://www.21nm.net/NewsContentToImages.aspx?id=" + nid;
  //path = System.Web.HttpContext.Current.Server.MapPath("../uploads/newscontentimages/") + nid + ".gif";
  Thread newThread = new Thread(new ThreadStart(dd));
  newThread.Name = "a88";
  newThread.SetApartmentState(ApartmentState.STA);
  newThread.Start();
  oinfo = "1";
  }
  catch (Exception ex)
  {
  oinfo=ex.ToString();
  }
  return oinfo;
  }
  void dd()
  {
  try
  {
  webabc.HtmlToImage thumb = new webabc.HtmlToImage(url, 1024, 768, 320, 240);
  System.Drawing.Bitmap x = thumb.GetBitmap();
  x.Save(path, ImageFormat.Gif);
  //Response.ContentType = "image/gif";
  oinfo += "ok";
  //oinfo += "{ " + System.Web.HttpContext.Current.Server.MapPath("../uploads/newscontentimages") + Request.QueryString["id"] + ".gif";
  }
  catch (Exception ex)
  {
  oinfo += ex.ToString();// +url + "{ " + System.Web.HttpContext.Current.Server.MapPath("../uploads/newscontentimages") + Request.QueryString["id"] + ".gif";
  //Response.Write(ex.ToString());
  }
  }
  }
  另外啰嗦一点就是，千万别在web的多线程中用web的对象，貌似使用后，vs不报错，就给一个类似死循环症状，要纠错很麻烦。
  一个实际中使用的地址：http://www.21nm.net/html/c61/267230p1.html
  这个程序也有一个毛病，就是由于ie内核的绘制，还是比较消耗资源的，不适合实时生成，最好是在添加新闻的时候一次生成图片。或者在服务器资源消耗不高的时候批量生成。
  图片的生成还有一个办法，如果文章中的主要是文字，不会有表格，图片什么的，我们可以直接用绘制图片，这样速度比较快，弊端是程序会比较复杂，工作量比较大，需要解析换行，加粗，什么的，要想显示图片或者表格这些就更难了。但是在做小说网这样内容那个，几乎不可能有图片的还是非常可行的。
  另外对于文章内容加密，我想到的一个办法，也尝试过的是，把内容des加密，用flash客户端来解密呈现。理论上应该是没问题，可是实施起来却遇到一个问题，没有现成的as的des解密能和net的通用，主要是补全的模式不一样，得自己写一个双方能通用的算法。这个也是比较费时间的。
  如果哪位高手能找到或者写出了以上两种的解决方案，或者有其他更好的解决方案，还请赐教吧。
  就文章内容是否值得加密，就不用讨论了，就算不值得，可只要客户需要，我们也得做啊。

新闻系统(3)内容保护的探索相关推荐

iOS新闻类App内容页技术探索
为了更好的阅读体验,建议阅读原文据相关数据显示,截至2017年底,中国手机新闻客户端用户规模达到6.36亿人,移动App已经成为新闻和内容传播的最重要途径之一.而伴随着行业的竞争和发展,App中的内 ...
制作生成静态页面的新闻系统
利用PHP生成静态HTML页面的好处很多: 1.静态页面不需要Web服务器解释执行,用户打开网页的速度会快些: 2.打开静态页面时,Web服务器不需要访问数据库,减轻了对数据库访问的压力: 3.静态H ...
ASP:关于生成HTML文件的新闻系统
一般的传递ID值的新闻系统见得比较多,制作起来也不是很复杂. 但是我们在新浪或是其它的门户类网站看到到的新闻不是用ID传递的,而是一个HTML或是Shtml文件,难道手工加上去的吗?当然不是了,其实这 ...
网站安全狗”响应内容保护“网页错误返回页面优化功能介绍
网站安全狗最新版本(主程序版本号:3.2.08157)在"资源保护"模块多了一个功能叫做:响应内容保护.如下图所示: <ignore_js_op> 该模块的主要功能是, ...
全球多媒体视频内容保护最佳实践
随着后疫情时代教育全面转向线上.短视频和影视剧市场需求增加,音视频平台对于内容保护的重视更是前所未有.国内,5G的加持下,视频的消费蓬勃发展,用户付费习惯养成,如何增加和保护收益,成为各视频平台和在线 ...
python优秀源码新闻系统_[内附完整源码和文档] 基于python的新闻检索系统
1 系统介绍 1.1 系统需求新闻检索系统:定向采集不少于 4 个中文社会新闻网站或频道,实现这些网站新闻信息及评论信息的自动爬取.抽取.索引和检索.本项目未使用 lucene,Goose 等成熟开 ...
国家互联网信息办公室公布《互联网新闻信息服务单位内容管理从业人员管理办法》【软件网每日新闻播报│第10-31期】
每一个企业级的人都置顶了中国软件网中国软件网为你带来最新鲜的行业干货小编点评互联网三大错觉 WP崛起索尼倒闭谷歌返华知情人士透漏谷歌新入华计划与搜索安卓无关人工智能是中心 ...
互联网公司数据安全保护新探索
背景近年来,数据安全形势越发严峻,各种数据安全事件层出不穷.在当前形势下,互联网公司也基本达成了一个共识:虽然无法完全阻止攻击,但底线是敏感数据不能泄漏.也即是说,服务器可以被挂马,但敏感数据不能被 ...
计算机网络hdcp是什么意思,4.HDCP:支持高带宽数字内容保护协议HDCP
2.HDCP:支持高带宽数字内容保护协议HDCP HDCP是High-bandwidth Digital Content Protection的缩写,中文就是高带宽数字内容保护协议,它是英特尔开发的为 ...

新闻系统(3)内容保护的探索

新闻系统(3)内容保护的探索相关推荐

最新文章

热门文章