诸神缄默不语-个人CSDN博文目录

sent2vec官方GitHub项目:epfml/sent2vec: General purpose unsupervised sentence representations
该项目中除使用方法和安装包之外,还有通用预训练模型的下载地址。

原始论文:Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features

sent2vec是用于无监督学习词、短文本、句的Python包。本文是sent2vec的简易教程,包括安装教程和简单的使用方法。文中使用英语文本作为示例。
sent2vec除文中所提及内容之外的功能,以后可能会继续补充到本博文中。

文章目录

  • 1. 安装sent2vec包
  • 2. 调用预训练模型的示例代码

1. 安装sent2vec包

git clone项目,然后以项目文件夹为根目录,运行命令:pip install .
输出:

Processing github_projects/sent2vecDEPRECATION: A future pip version will change local packages to be built in-place without first copying to a temporary directory. We recommend you use --use-feature=in-tree-build to test your packages with this new behavior before it becomes the default.pip 21.3 will remove support for this functionality. You can find discussion regarding this at https://github.com/pypa/pip/issues/7555.Installing build dependencies ... doneGetting requirements to build wheel ... donePreparing wheel metadata ... done
Requirement already satisfied: numpy>=1.17.1 in anaconda3/envs/env_name/lib/python3.8/site-packages (from sent2vec==0.0.0) (1.22.3)
Collecting Cython>=0.29.13Using cached Cython-0.29.30-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Building wheels for collected packages: sent2vecBuilding wheel for sent2vec (PEP 517) ... doneCreated wheel for sent2vec: filename=sent2vec-0.0.0-cp38-cp38-linux_x86_64.whl size=1215260 sha256=92203c2fccb24b035e50f8e4f565c88824bcaffd645736fc09b210828f571294Stored in directory: /tmp/pip-ephem-wheel-cache-71hx4xbd/wheels/ba/85/af/f6f4bec5757dbc954716659c47962b0dae150da5d5a7e8fb4d
Successfully built sent2vec
Installing collected packages: Cython, sent2vec
Successfully installed Cython-0.29.30 sent2vec-0.0.0

值得注意的是,可以从输出中看到我本来安装的numpy是1.22.3版本,这会使我在运行sent2vec代码时报错:ImportError: numpy.core.multiarray failed to import
参考python - ImportError: numpy.core.multiarray failed to import - Stack Overflow,直接更新numpy包 pip install -U numpy 就能解决问题(输出信息中会指出scipy包不支持此numpy版本,但是我也没办法啊)。

2. 调用预训练模型的示例代码

sent2vec模块不会自动运行数据预处理,因此需要自己干:1. 去除标点符号。2. 分词。3. lowercase。
文中所使用的预训练模型来自LeSICiN1,文本原文也是其训练集的第一个样本的文本内容,预处理代码也参考对应官方代码。
(如果使用sent2vec/get_sentence_embeddings_from_pre-trained_models.ipynb就可以输入英文原始句子)

import sent2vec
model = sent2vec.Sent2vecModel()
model.load_model('pretrained/lesicin/ils2v.bin')texts=['(a), Section 5 r/w 27 of the Arms Act. The gist of theprosecution case relevant for the purpose of this proceedingmay be stated thus: With the growth of industry, commerceand trade in and around the city of Mumbai which generatessubstantial quantity of wealth, there has been increase oforganised activities by gangs of anti-socials to extractmoney from affluent sections of society like developers,hoteliers and other businessmen by putting them in fear ofdeath and then to demand substantial sums of money commonlyknown as "Khadani" i.e. protection money.', 'One such gangwas operating in the city under Amar Naik @ Bhai, who dieda couple of years before the decision in the case at anencounter with the police.', 'The prosecution alleged that inpursuance of a criminal conspiracy between 15.1.1994 to16.5.1994 the accused persons and others of the gangembarked upon preparatory acts like procuring theinformation about the names of the builders of M/s KalpataruConstruction Company which was engaged in developing aproperty at Pali Hill, named Nakshatra Building.', 'PW-7Sudhir Tambe was the Senior Vice-President of the companywith its head office at Nariman Point.', 'He used to sit inthe head- office.', 'PW 6 Pachapur, Civil Engineer, was anemployee of the company who used to remain at the site tosupervise the construction.', 'As the prosecution story runs,on 15.4.1994 between 11.30 a.m. and 12.00 noon while PW 6was on duty at the construction site, accused no.3, NitinVasant Venugurlekar armed with revolver and accused No.4Rajindera @ Rajan Mahadeo Margaj armed with a chopper andaccused no.5 Jayendra @ Jai Anandrao Jadhav also armed witha chopper visited the site of Nakshatra Building; theythreatened the workers at the site, forcibly brought PW 6Pachapur in a room on the ground floor and man-handled him.', 'Accused no.3, pointing a revolver at him demanded the name,address and telephone number of the builders.', 'PW 6disclosed the name of PW 7 Tambe and gave his telephonenumber to them.', 'The accused then asked him to go to theoffice of the builders at Nariman Point and make thearrangement for a telephonic talk with Tambe.', 'PW 6 rushedto the office and told Tambe of what had happened at theconstruction site.', 'This was followed by telephonic callsfrom the accused who wanted to speak to Tambe.', 'Attemptswere made by PW 6 and PW 7 to avoid any discussion with thegangsters.', 'Two or three days thereafter when the accusedgot Tambe on the telephone he (Tambe) gave them some othertelephone numbers and asked them to contact those personsincluding one D.N.Ghosh, the Security Contractor.', 'Eight/tendays thereafter again a telephone call was made to theoffice of Tambe which was received by PW 6 who was informedby the person making the call that they could not get D. N.Ghosh on the telephone numbers furnished by Tambe.', 'Thereafter PW 6 handed over the receiver to Tambe.', 'Thisincident was followed by several threats given by thegangsters to workers and also repeated telephone calls madeto the Head Office of the company to contact Tambe.', 'Thestaff of the site office absented from work resulting invirtual closure of construction activity.', 'On 11.5.1994 thedeceased Sanjay Patil telephoned to Tambe and warned himthat he is wasting time and should meet him without furtherdelay.', 'After some days there was one more similar call fromSanjay Patil and he asked Tambe that he should talk to Bhaiand saying so he handed over the receiver to another personwho gave his identity as Amar Naik (since deceased), whotold Tambe that he should pay Rs.10 lacs.', 'The later pleadedhis inability to pay such a heavy sum and after somediscussion agreed to pay Rs.5 lacs.', 'He was asked to come toNakshatra Building site on 16.5.1994 along with money.', 'Inthe meantime Tambe informed all the happenings to the Addl.', 'Commissioner of Police Mr.', 'Sanjeev Dayal and the then Dy.', 'Commissioner of Police of Zone VII Mr. Rajanish Shethwithin whose jurisdiction Khar Police Station fell.', 'On 16.5.1994 at about 12.00 noon the deceased SanjayPatil telephoned Tambe and inquired from him as to what hewas going to do about the payment and then Tambe repliedthat he will be leaving office at about 2.00 p.m. for PaliHill.', 'Sanjay Patil cautioned him that he should not makeany haste and he should wait for his call so that he willtake necessary instructions from his boss i.e. Amar Naik.', 'At about 2.00 p.m. on that day there was a telephone callfrom Sanjay Patil telling that Tambe should not meet him atthe Nakshatra Building site but instead he should meet himnear the Ceaser Palace Hotel.', 'This telephonic conversationwas tape-recorded.', 'Tambe was instructed on telephone thathis man shall carry a white plastic bag containing theamount of Rs.5 lacs and shall wait near the entrance gate ofCeaser Palace Hotel and the person coming to collect thesaid bag will introduce himself as Me Rawanacha Manus Hai.', 'Tambe informed to the DCP all these happenings and handedover the tape in which the telephonic conversation wasrecorded by him.', 'The DCP had made the arrangements to keepa regular watch near the building site.', 'PW 1 Sunil Deshmukhwas deployed to wait in cognito near the gate of the CeaserPalace Hotel and to carry the white plastic bag containingbundles of papers which would give an appearance like thebundles of currency notes.', 'The other officers, who werealso in cognito, had taken their position at strategicpoints near the hotel.', 'At about 4.05 p.m. Sunil Deshmukhnoticed that one red coloured Maruti van halted in front ofthe Ceaser Palace Hotel.', 'He noticed three persons gettingdown from the said van.', 'Those three persons were coming inhis direction, and the van went ahead 50 to 60 feets andhalted there.', 'The deceased Sanjay Patil and the accusedno.7 Bapu Sidhram Gaikwad got down from the said van andaccused no.6 Mohamed Ismail was sitting on the driver seatin the van.', 'Heenquired from PW1 about his identity and when PW 1 repliedthat he has been sent by Tambe Sahib.', 'PW 1 Sunil Deshmukhthen asked that person who are you (Tum Kaun Hai) and thenthe accused no.2 Umesh Bhatt told him that Hum Rawan KeAadmi Hai.', 'L.....I.........T.......T.......T.......T.......T.......T..J J U D G M E N T D.P. MOHAPATRA,J This appeal, filed by accused no.1 Babu KuttanRamkrishna Pillai and accused no.2 Umesh @ Babu PurshottamBhatt of TADA ACT Spl.', 'Thereafter accused no.1 Babu Kuttan extendedhis hand towards PW 1 who delivered the bag to him.', 'At thisjuncture the police officers who were standing nearby incognito rushed to the place and surrounded the threepersons.', 'When the police officers were trying to overpowerthem the deceased Sanjay Patil @ Avinash Amanna and theaccused no .7 Bapu Sidhram Gaikwad came forward withrevolvers in their hands and threatened the police party bysaying they should leave their men or else the policemenwill be killed.', 'Saying so they fired in the direction ofthe police party.', 'At this point PW 1 took out his revolverand pointed it in the direction of the accused and told themwe are all policemen and you should throw away yourrevolvers else we will fire.', 'Even then the accused personsfired some rounds in the direction of the police party, thenPW 1 and one other officer tried to rush towards them butthey sat in the said Maruti van and sped away from theplace.', 'After the situation calmed down, the police drew thepanchnamas Ex.22 in presence of some witnesses andconducted personal search of the three culprits.', 'On suchsearch accused no.1 Babu Kuttan Pillai was found to possessthe plastic bag containing the paper bundles (Art.1),accused no.2 Umesh Bhatt was found to possess a big Rampuriknife which was hidden at the waist under the pant by leftside.', 'After completion of investigation the police submittedthe charge-sheet.', 'The three persons at the spot wereremanded to the police custody.', 'Subsequently, the otheraccused persons were also arrested.', 'They were put to testidentification parade.', 'The learned Trial Judge onappreciation of the evidence on record convicted accusedno.1 Babu Kuttan Ramkrishna Pillai and the accused no.2Umesh @ Babu Purshottam Bhatt for the offence punishableunder section 395 of the Indian Penal Code and sentencedeach of them to suffer rigorous imprisonment of 5 years andto pay a fine of Rs.500, in default of payment of fine toundergo further Rigorous Imprisonment for 6 months.', 'Theywere also convicted under Section 120 B of the IPC but noseparate sentence was passed.', 'They were acquitted of theother offences with which they were charged.', 'The remainingaccused persons i.e. accused nos. 3,4,5,6 and 7 wereacquitted of all the charges framed against them.', '1 and 2, have filed this appeal assailing the judgmentpassed by the Designated Court at Brihan Mumbai,convicting/sentencing them as above.', 'On a reading of the judgment under challenge, we findthat the learned trial Judge has considered the entire caseled by the prosecution in great detail and after discussingthe charges framed against the appellants under sections3(2), 3(3) and 3(5) of TADA Act, rejected the prosecutioncase on that count.', 'Thereafter the learned trial Judge inparagraph 17 onwards considered the question of what offencewas made out against the appellants.', 'After a detaileddiscussion of the relevant evidence placed by theprosecution and after examining it in the light of thecontentions on behalf of the defence, the learned trialJudge believed the testimony of PW 1- Sunil Deshmukh, PW 7 -Tambe and PW 9 - L.J. Kamble and came to hold that theappellants are guilty of the offence of criminal conspiracypunishable under section 120-B and the offence of dacoitypunishable under section 395 IPC and convicted themthereunder and imposed the punishment as noted earlier.', 'We have perused the evidence of these witnesses.']#lowercase、去除标点符号。本来还有去除空句这一步的,但是据我观察没有
texts = [sent.strip().lower().translate(str.maketrans('', '', string.punctuation)) for sent in texts]print(len(texts))emb = model.embed_sentence(texts[0])
print(type(emb))
print(emb.shape)embs = model.embed_sentences(texts)
print(type(emb))
print(emb.shape)

输出:

63
<class 'numpy.ndarray'>
(1, 200)
<class 'numpy.ndarray'>
(1, 200)

可以看出,sent2vec包可以将字符串或者字符串列表都变成1个固定长度的向量。

load_model()入参inference_mode=True节约内存


  1. LeSICiN: A Heterogeneous Graph-Based Approach for Automatic Legal Statute Identification from Indian Legal Documents ↩︎

sent2vec教程相关推荐

  1. 诸神缄默不语-个人CSDN博文目录

    突然发现我也是一个有好多篇文章的博主了,因此设置一个自己的目录,方便查找. 感觉列在一篇文章里然后直接用Ctrl+F都比CSDN内置的目录和分类方便-- 优先按学科进行分类,此外列出面经和其他两个分类 ...

  2. 使用Docker搭建svn服务器教程

    使用Docker搭建svn服务器教程 svn简介 SVN是Subversion的简称,是一个开放源代码的版本控制系统,相较于RCS.CVS,它采用了分支管理系统,它的设计目标就是取代CVS.互联网上很 ...

  3. mysql修改校对集_MySQL 教程之校对集问题

    本篇文章主要给大家介绍mysql中的校对集问题,希望对需要的朋友有所帮助! 推荐参考教程:<mysql教程> 校对集问题 校对集,其实就是数据的比较方式. 校对集,共有三种,分别为:_bi ...

  4. mysql备份psb文件怎么打开_Navicat for MySQL 数据备份教程

    原标题:Navicat for MySQL 数据备份教程 一个安全和可靠的服务器与定期运行备份有密切的关系,因为错误有可能随时发生,由攻击.硬件故障.人为错误.电力中断等都会照成数据丢失.备份功能为防 ...

  5. php rabbmq教程_RabbitMQ+PHP 教程一(Hello World)

    介绍 RabbitMQ是一个消息代理器:它接受和转发消息.你可以把它当作一个邮局:当你把邮件放在信箱里时,你可以肯定邮差先生最终会把邮件送到你的收件人那里.在这个比喻中,RabbitMQ就是这里的邮箱 ...

  6. 【置顶】利用 NLP 技术做简单数据可视化分析教程(实战)

    置顶 本人决定将过去一段时间在公司以及日常生活中关于自然语言处理的相关技术积累,将在gitbook做一个简单分享,内容应该会很丰富,希望对你有所帮助,欢迎大家支持. 内容介绍如下 你是否曾经在租房时因 ...

  7. Google Colab 免费GPU服务器使用教程 挂载云端硬盘

    一.前言 二.Google Colab特征 三.开始使用 3.1在谷歌云盘上创建文件夹 3.2创建Colaboratory 3.3创建完成 四.设置GPU运行 五.运行.py文件 5.1安装必要库 5 ...

  8. 理解和实现分布式TensorFlow集群完整教程

    手把手教你搭建分布式集群,进入生产环境的TensorFlow 分布式TensorFlow简介 前一篇<分布式TensorFlow集群local server使用详解>我们介绍了分布式Ten ...

  9. 高级教程: 作出动态决策和 Bi-LSTM CRF 重点

    https://www.zhihu.com/question/35866596 条件随机场 CRF(条件随机场)与Viterbi(维特比)算法原理详解 https://blog.csdn.net/qq ...

最新文章

  1. easyui js拼接html,class属性失效的问题
  2. 计算机 维修 pdf,简单计算机维修..pdf
  3. abap中的弹出窗口函数
  4. 用python画哆啦a梦的代码解释_python画哆啦A梦和大雄
  5. 从零写一个编译器(二):语法分析之前置知识
  6. 在running android lint期间发生了内部错误.,Eclipse出现quot;Running Android Lint has encountered a problemquot...
  7. 防止SQL SERVER的事件探查器跟踪软件
  8. Oracle 随机取某一列的值
  9. Android 读取Assets资源
  10. restful架构风格设计准则(五)用户认证和session管理
  11. 从《致加西亚的信》看项目管理
  12. OC中__kindof的用法
  13. linux下的定时任务
  14. MTK平台 SIM双卡改成单卡修改
  15. Chrome双开(同一个版本配置两个独立的浏览器,附图)
  16. 《詹姆斯·高斯林Java白皮书1996自译》07:多线程
  17. Cocos2d-x 的3D游戏制作官方教程(中文翻译)
  18. 达内python培训资料
  19. 学计算机笔画,笔画宝宝(适合3-8岁儿童识字和学写字)
  20. 昂达v80p 刷linux,打破低价平板套路 昂达V80曝光再造新神器!

热门文章

  1. 算法与数据结构1800题 之栈和队列 (一)
  2. 微信小程序_文档_08_组件_媒体组件_地图_画布_开放能力
  3. VRRP概念及工作过程
  4. 什么是亚像素(子像素)?sub-pixel
  5. JavaRIM实现(PRC的其中一种方案)
  6. 如何解决Myeclipse不能自动编译
  7. 计算机网络应用竞赛样题答案,计算机网络技术竞赛选拔赛试题(含答案).doc
  8. 2015 重庆市赛 解题报告
  9. Facebook要做的事,这家公司4年前就在做了
  10. DFT(离散傅里叶变换)