【推荐】LSI(latent semantic indexing) 完美教程
【推荐】LSI(latent semantic indexing) 完美教程
"instead of lecturing about SVD I want to show you how things work --step by step"
-- 如果大家认同这句话的话,Dr. E. Garcia写的此教程就是最适合你阅读的LSI / LSA教程。
原文比较长,直接贴链接了:
http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html
若觉得原文太长,还可以看Garcia写的精简版:
Latent Semantic Indexing (LSI) Fast Track Tutorial
Singular Value Decomposition (SVD) Fast Track Tutorial
摘录部分内容:
一、常见的对LSI的不正确认识:
1) is theming (analysis of themes).
2) is used by search engines to find all the nouns and verbs, and then associate them with related (substitution-useful) nouns and verbs.
3) allows search engines to "learn" which words are related and which noun concepts relate to one another.
4) is a form of on-topic analysis (term scope/subject analysis).can be applied to collections of any size.
5) has no problem addressing polysemy (terms with different meanings).
Pasted from <http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html>
二、LSI本质上识别了以文档为单位的second-order co-ocurrence的单词并归入同一个子空间。因此:
1)落在同一子空间的单词不一定是同义词,甚至不一定是在同情景下出现的单词,对于长篇文档尤其如是。
2)LSI根本无法处理一词多义的单词(多义词),多义词会导致LSI效果变差。
A persistent myth in search marketing circles is that LSI grants contextuality; i.e., terms occurring in the same context. This is not always the case. Consider two documents X and Y and three terms A, B and C and wherein:
A and B do not co-occur.
X mentions terms A and C
Y mentions terms B and C.
:. A---C---B
The common denominator is C, so we define this relation as an in-transit co-occurrence since both A and B occur while in transit with C. This is called second-order co-occurrence and is a special case of high-order co-occurrence.
However, only because terms A and B are in-transit with C this does not grant contextuality, as the terms can be mentioned in different contexts in documents X and Y. For example, this would be the case of X and Y discussing different topics. Long documents are more prone to this.
Even if X and Y are monotopic thesemight be discussing different subjects. Thus, it would be fallacious to assume that high-order co-occurrence between A and B while in-transit with C equates to a contextuality relationship between terms. Add polysemy to this and the scenario worsens, as LSI can fail to address polysemy.
Pasted from <http://www.miislita.com/information-retrieval-tutorial/svd-lsi-tutorial-1-understanding.html>
【推荐】LSI(latent semantic indexing) 完美教程相关推荐
- LSA(Latent semantic analysis)
LSA最初是用在语义检索上,为了解决一词多义和一义多词的问题: 1.一词多义: 美女和PPMM表示相同的含义,但是单纯依靠检索词"美女"来检索文档,很可能丧失掉那些包含" ...
- Latent semantic analysis note(LSA)
1 LSA Introduction LSA(latent semantic analysis)潜在语义分析,也被称为LSI(latent semantic index),是Scott Deerwes ...
- Latent Semantic Analysis (LSA) Tutorial
本文转载于:http://www.puffinwarellc.com/index.php/news-and-articles/articles/33-latent-semantic-analysis- ...
- 潜在语义分析(Latent Semantic Analysis,LSA)
文章目录 1. 单词向量空间.话题向量空间 1.1 单词向量空间 1.2 话题向量空间 2. 潜在语义分析算法 2.1 例子 3. 非负矩阵分解算法 4. TruncatedSVD 潜在语义分析实践 ...
- Latent Semantic Analysis (LSA) Tutorial第一部分(转载)
译:http://www.puffinwarellc.com/index.php/news-and-articles/articles/33.html WangBen 2011-09-16 beiji ...
- Latent semantic analysis (LSA)
1 LSA Introduction LSA(latent semantic analysis)潜在语义分析,也被称为LSI(latent semantic index),是Scott Deerwes ...
- 【译】潜在语义分析Latent Semantic Analysis (LSA)
目录 目录 概述 Tutorial LSA的工作原理 How Latent Semantic Analysis Works 实例A Small Example Part 1 - Creating th ...
- 潜在语义分析(Latent Semantic Analysis)
潜在语义分析(Latent Semantic Analysis) 潜在语义分析(Latent Semantic Analysis, LSA)是自然语言处理中的一种方法或技术.潜在语义分析(LSA)的主 ...
- NLP —— 图模型(三)pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)模型...
LSA(Latent semantic analysis,隐性语义分析).pLSA(Probabilistic latent semantic analysis,概率隐性语义分析)和 LDA(Late ...
最新文章
- 【干货】Github标星1.2K,Visual Transformer 最全最新资源,包含期刊、顶会论文
- 在Window Embedded CE(Wince)下使用OpenNETCF进行路由表的开发
- Error:Unable to resolve target android-19
- 根据指定行数拆分内表
- 销售科目确认相关配置
- 贝壳集团IPO背后,风投协议之外还要面临何时盈利的问题
- java toast_Android中Toast的用法简介
- 超越杭州、北京居首、广州晋级第一梯队……国内城市算力大起底!
- 【赛事】京东百万巨奖寻多传感器融合定位算法英雄
- html叠加层,JavaScript实现多个重叠层点击切换效果的方法
- 结构体在固件库中的应用
- Linux(12.1-12.6)学习笔记
- centos7添加新硬盘并挂载
- 土壤重金属含量分布、Cd镉含量、Cr、Pb、Cu、Zn、As和Hg、土壤采样点、土壤类型分布
- 邮件个性签名html,iphone发邮件添加个性签名方法
- 124-移动端游乐园项目
- Laravel 模型中 $hidden 的作用
- Letswave 教程:脑电数据预处理与叠加平均
- Ubuntu18.04 系统重装 若干问题及解决方法
- Live555源码阅读笔记(四):groupsock 目录详解