【推荐】LSI(latent semantic indexing) 完美教程

"instead of lecturing about SVD I want to show you how things work --step by step"

-- 如果大家认同这句话的话,Dr. E. Garcia写的此教程就是最适合你阅读的LSI / LSA教程。

原文比较长,直接贴链接了:

http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html

若觉得原文太长,还可以看Garcia写的精简版:

Latent Semantic Indexing (LSI) Fast Track Tutorial
Singular Value Decomposition (SVD) Fast Track Tutorial

摘录部分内容:

一、常见的对LSI的不正确认识:

1) is theming (analysis of themes).

2) is used by search engines to find all the nouns and verbs, and then associate them with related (substitution-useful) nouns and verbs.

3) allows search engines to "learn" which words are related and which noun concepts relate to one another.

4) is a form of on-topic analysis (term scope/subject analysis).can be applied to collections of any size.

5) has no problem addressing polysemy (terms with different meanings).

Pasted from <http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html>

二、LSI本质上识别了以文档为单位的second-order co-ocurrence的单词并归入同一个子空间。因此:

1)落在同一子空间的单词不一定是同义词,甚至不一定是在同情景下出现的单词,对于长篇文档尤其如是。

2)LSI根本无法处理一词多义的单词(多义词),多义词会导致LSI效果变差。

A persistent myth in search marketing circles is that LSI grants contextuality; i.e., terms occurring in the same context. This is not always the case. Consider two documents X and Y and three terms A, B and C and wherein:

A and B do not co-occur.

X mentions terms A and C

Y mentions terms B and C.

:. A---C---B

The common denominator is C, so we define this relation as an in-transit co-occurrence since both A and B occur while in transit with C. This is called second-order co-occurrence and is a special case of high-order co-occurrence.

However, only because terms A and B are in-transit with C this does not grant contextuality, as the terms can be mentioned in different contexts in documents X and Y. For example, this would be the case of X and Y discussing different topics. Long documents are more prone to this.

Even if X and Y are monotopic thesemight be discussing different subjects. Thus, it would be fallacious to assume that high-order co-occurrence between A and B while in-transit with C equates to a contextuality relationship between terms. Add polysemy to this and the scenario worsens, as LSI can fail to address polysemy.

Pasted from <http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html>

【推荐】LSI(latent semantic indexing) 完美教程相关推荐

  1. LSA(Latent semantic analysis)

    LSA最初是用在语义检索上,为了解决一词多义和一义多词的问题: 1.一词多义: 美女和PPMM表示相同的含义,但是单纯依靠检索词"美女"来检索文档,很可能丧失掉那些包含" ...

  2. Latent semantic analysis note(LSA)

    1 LSA Introduction LSA(latent semantic analysis)潜在语义分析,也被称为LSI(latent semantic index),是Scott Deerwes ...

  3. Latent Semantic Analysis (LSA) Tutorial

    本文转载于:http://www.puffinwarellc.com/index.php/news-and-articles/articles/33-latent-semantic-analysis- ...

  4. 潜在语义分析(Latent Semantic Analysis,LSA)

    文章目录 1. 单词向量空间.话题向量空间 1.1 单词向量空间 1.2 话题向量空间 2. 潜在语义分析算法 2.1 例子 3. 非负矩阵分解算法 4. TruncatedSVD 潜在语义分析实践 ...

  5. Latent Semantic Analysis (LSA) Tutorial第一部分(转载)

    译:http://www.puffinwarellc.com/index.php/news-and-articles/articles/33.html WangBen 2011-09-16 beiji ...

  6. Latent semantic analysis (LSA)

    1 LSA Introduction LSA(latent semantic analysis)潜在语义分析,也被称为LSI(latent semantic index),是Scott Deerwes ...

  7. 【译】潜在语义分析Latent Semantic Analysis (LSA)

    目录 目录 概述 Tutorial LSA的工作原理 How Latent Semantic Analysis Works 实例A Small Example Part 1 - Creating th ...

  8. 潜在语义分析(Latent Semantic Analysis)

    潜在语义分析(Latent Semantic Analysis) 潜在语义分析(Latent Semantic Analysis, LSA)是自然语言处理中的一种方法或技术.潜在语义分析(LSA)的主 ...

  9. NLP —— 图模型(三)pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)模型...

    LSA(Latent semantic analysis,隐性语义分析).pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)和 LDA(Late ...

最新文章

  1. 【干货】Github标星1.2K,Visual Transformer 最全最新资源,包含期刊、顶会论文
  2. 在Window Embedded CE(Wince)下使用OpenNETCF进行路由表的开发
  3. Error:Unable to resolve target android-19
  4. 根据指定行数拆分内表
  5. 销售科目确认相关配置
  6. 贝壳集团IPO背后,风投协议之外还要面临何时盈利的问题
  7. java toast_Android中Toast的用法简介
  8. 超越杭州、北京居首、广州晋级第一梯队……国内城市算力大起底!
  9. 【赛事】京东百万巨奖寻多传感器融合定位算法英雄
  10. html叠加层,JavaScript实现多个重叠层点击切换效果的方法
  11. 结构体在固件库中的应用
  12. Linux(12.1-12.6)学习笔记
  13. centos7添加新硬盘并挂载
  14. 土壤重金属含量分布、Cd镉含量、Cr、Pb、Cu、Zn、As和Hg、土壤采样点、土壤类型分布
  15. 邮件个性签名html,iphone发邮件添加个性签名方法
  16. 124-移动端游乐园项目
  17. Laravel 模型中 $hidden 的作用
  18. Letswave 教程:脑电数据预处理与叠加平均
  19. Ubuntu18.04 系统重装 若干问题及解决方法
  20. Live555源码阅读笔记(四):groupsock 目录详解

热门文章

  1. 64位Win10安装Pytorch
  2. Windows下Git上传项目代码记录
  3. vc操作windows服务(services.msc)
  4. 栈和队列的算法题总结
  5. hall's marriage theorem
  6. Ajax 基础——未完待续
  7. DbVisualizer Personal 7.0 数据库连接工具免安装版本获取,直接解压即可使用!
  8. windows、linux下命令行登录oracle数据库方法,查询sga参数值sql语句
  9. 从源码理解ReentrantLock
  10. 延长EEPROM使用寿命的程序优化方法