Some interesting facts about SharePoint 2007 Search

Published 14 November 08 06:21 PM | harikumh

Can we search in any language other than English?  Do we need language pack for the same?

Language Pack has nothing to do with search in languages other than English or the language in which SharePoint is depoyed. Out of the box, MOSS already shipped with the major wordbreakers/stemmers, although with very bad quality for some of the languages such as Chinese and Deutsch

Despite the quality of the word breakers, by design you may encounter two problems,

1. In index time, the ifilters should emit a correct LCID for that language. However, this is not possible with some of the file types. For example, when processing file types like TXT, XLS(XLSX) and RTF, The ifilters will return 1033(en-us) instead of the correct ones. So what will happen? You may get nothing when you search for any long word, only single character for Japanese/Chinese in these files. Other language may have the same problem, but not as obvious like that.

2. In query time, when user submit the keyword through the browser, MOSS will detect the browser language setting that the user is using. And it will use the value to call the corresponding wordbreaker. If this wordbreaker does not match with the one used in index time, you will be in trouble again. For example if you use a English client to search for something in Chinese, without modifying browser language setting to chinese, even the files indexed in the right language, you will still get no result for a word.

The space needed for the index on the query machine is approximatley 2.8 times of the size of the actual index.  What is the logic behind this?

Lets say the index size is X.

During crawls, we accumulate more shadow indexes because of items that are indexed. When these shadow indexes cover about 0.1 times X (10% of X), we do a master merge.

A master merge takes the 1.1X (X + 0.1X) and creates a new index with that in the same location as the old index. The size of the new index is roughly 1.1X. So before the old index is deleted, the requirement is for at least 2.2X (for both indexes).

However, since query servers are expected to be online at all times, the master merge should have minimal impact on query latencies. To achieve this, we use more than the 1.1X space by creating temp files during the master merge.

This leads to the worst case number of 2.8X so that master merges can succeed while not impacting query latencies.

Then we will delete the old index on both, the indexer and query machines immediately after the master merge is complete.

How does the duplicate document is identified when we do a search?

Document similarity for purposes of identifying duplicates is based only on a hash of the content of the document.  No File properties (e.g. file name, type, author, create and modify dates) are input to this hash.  The MSSDuplicateHashes table in the SSP’s search database holds, for each document, all the 64bit hashes necessary to determine if one document is a near-duplicate of another.  This is read while doing a search if duplicate collapsing is enabled.

What are discovered definitions and how does search find those?

Discovered Definitions are a feature in MOSS that can be enabled/disabled in the properties of the SearchCoreResults webpart.  When enabled, the results web part will display not only document matches for a term, but also any definitions it has discovered for that word during crawling.

Definition extraction feature in MOSS 2007 is a feature that extracts meaning of definition from indexed text.

Definition Extraction is done during the crawl.  The crawler  looks for couple verbs like ‘is a’ or ‘is the’ and then, when a nebulous threshold  is reached, it extracts the definition of the related word for later use in search results display with the words “What people are saying about <term>”.

At query time passed search token is compared with existing entry in definitions database. If a match is found the definitions link is populated at the bottom of the search results page. Collapsing the link shows number of definitions.

Pasted from <http://blogs.technet.com/harikumh/archive/2008/11/14/some-interesting-facts-about-sharepoint-2007-search.aspx>

转载于:https://www.cnblogs.com/wenjielee/archive/2010/12/29/1921154.html

转:Some interesting facts about SharePoint 2007 Search相关推荐

  1. 一步一步SharePoint 2007系列文章目录

    一步一步SharePoint 2007之一:安装SharePoint http://tech.ddvip.com/2008-10/122535494387212.html 一步一步SharePoint ...

  2. SharePoint 2007 and 2010 的服务器场的端口

    由于要把一台SharePoint Server放到外网去,就把IP改到DMZ区了,结果除了系统管理员,其他帐号都无法验证通过,肯定是一些端口没开. 网上一查,SharePoint所需要的端口还真多,不 ...

  3. SharePoint 2007 安装与配置

    环境 win 2003 sp2 .net 3.0 framework SQL Server 2005 SharePoint 2007 server 安装 安装windows组件 网络服务/域名系统(D ...

  4. QuickPart : 用户控件包装器 for SharePoint 2007

    用户控件包装器升级了!现在它已经支持Microsoft Office SharePoint Server 2007 Beta2,可以让我们直接将ASP.NET 2.0的用户控件直接用在SharePoi ...

  5. SharePoint 2013 Search 配置总结

    转载自:http://www.cnblogs.com/jianyus/p/3328471.html 前言:SharePoint 2013集成了Fast搜索以后,搜索的配置有了些许改变,自己在配置过程中 ...

  6. SharePoint 2007 SDK v1.5

    SharePoint 2007 SDK 中文版 (*)中文版的目前是1.4的, 英文版的SDK有1.5版 WSS 3.0 SDK http://www.microsoft.com/downloads/ ...

  7. sharepoint 2007 网站操作 显示菜单不全

    sharepoint 2007 制作的网站: 现象: 1.编辑工具栏附近出现"未将对象引用设置到对象的实例" 2.网站操作 下拉菜单 显示不全,只有创建网页, 网站设置  导致以上 ...

  8. 使用SharePoint 2007 Web Service上传文件到文档库

    SharePoint 2010中有了全新的客户端模型,给我们在客户端操作SharePoint对象提供了很大的方便,但是在SharePoint 2007中我们可以使用的方式就比较有限,Web Servi ...

  9. SharePoint 2007 Backup Strategies

    SharePoint 2007 Backup Strategies from http://www.sharepointkicks.com/ 转载于:https://www.cnblogs.com/i ...

最新文章

  1. VS2013中Image Watch插件的使用(OpenCV)
  2. WPF快速指导1:资源
  3. Java客户端操作zookeeper:删除节点代码示例
  4. HDU_5249(百度之星D题)
  5. Linux下的磁盘空间管理
  6. 漫画:Java如何实现热更新?
  7. 5大AI主题,资助20-30项 | 2022腾讯AI Lab犀牛鸟专项研究计划开放申请中
  8. final修饰符、抽象类、接口、多态、内部类的简单小结
  9. java充血模型orm框架,关于领域驱动设计和贫血、失血、充血模型
  10. 枚举算法:求解不等式
  11. DC漫画公司正在考虑进军NFT市场
  12. 数学与编程——统计与编程(均匀分布仿真高斯分布)
  13. 如何在 Mac 上使用网络位置?
  14. ASP.NET中使用JQuery生成登陆验证码
  15. Vue3源码之createApp
  16. 基于Python的语音识别控制系统
  17. HDU6357 Hills And Valleys
  18. python写文件byte_python 将字节写入文本文件
  19. 【面试】面试的时候,如何自我介绍?
  20. 性能测试监控零散知识点

热门文章

  1. 这两天,我们还没毕业
  2. qq浏览器主页_安卓浏览器哪家强?这些小众好用的手机浏览器你知道吗
  3. numpy数组中冒号[:,:,0]与[...,0]的区别
  4. 谷歌浏览器有哪些好看的主题_Kibou 简洁的Typecho主题
  5. ios 权限提示语_iOS工作室都在用按键v1.6.1(体验版)
  6. python局域网大文件_[源码]Python简易http服务器(内网渗透大文件传输含下载命令)...
  7. python 16bit转8bit的工具_利用python读取YUV文件 转RGB 8bit/10bit通用
  8. 六、操作系统——内存管理的概念(空间的分配与回收、空间的扩充、地址转换、存储保护)
  9. 天池在线编程 2020国庆八天乐 - 8. 分糖果
  10. 程序员面试金典 - 面试题 01.05. 一次编辑(编辑距离,DP)