Search is the base of many applications. Once data starts to pile up, users want to be able to find it. It’s the foundation of the internet and an ever-growing challenge that is never solved or done.

搜索是许多应用程序的基础。数据开始堆积之后，用户希望能够找到它。它是互联网的基础，并且是一个从未解决或完成的不断增长的挑战。

The field of Natural Language Processing (NLP) is rapidly evolving with a number of new developments. Large-scale general language models are an exciting new capability allowing us to add amazing functionality quickly with limited compute and people. Innovation continues with new models and advancements coming in at what seems a weekly basis.

随着许多新的发展，自然语言处理(NLP)领域正在Swift发展。大型通用语言模型是一项令人兴奋的新功能，使我们能够在有限的计算和人员的情况下快速添加惊人的功能。创新随着新模式的不断发展和进步的出现，似乎是每周一次。

This article introduces txtai, an AI-powered search engine that enables Natural Language Understanding (NLU) based search in any application.

本文介绍txtai，这是一种由AI支持的搜索引擎，可在任何应用程序中启用基于自然语言理解(NLU)的搜索。

txtai简介 (Introducing txtai)

txtai builds an AI-powered index over sections of text. txtai supports building text indices to perform similarity searches and create extractive question-answering based systems. txtai is open source and available on GitHub.

txtai在文本的各个部分上建立了一个AI驱动的索引。 txtai支持构建文本索引以执行相似性搜索并创建基于提取问题的系统。 txtai是开源的，可在GitHub上获得。

txtai is built on the following stack:

txtai构建在以下堆栈上：

Sentence Transformers

句子变形金刚
Transformers

变形金刚
Faiss, Annoy, Hnswlib

Faiss， Annoy ， Hnswlib
Python 3.6+
Python 3.6+

txtai and/or the concepts behind it has already been used to power the Natural Language Processing (NLP) applications listed below:

txtai和/或其背后的概念已用于为以下列出的自然语言处理(NLP)应用程序提供支持：

cord19q — COVID-19 literature analysis

cord19q — COVID-19文献分析
paperai — AI-powered literature discovery and review engine for medical/scientific papers

paperai —用于医学/科学论文的人工智能技术文献发现和审阅引擎
neuspo — a fact-driven, real-time sports event and news site

neuspo-以事实为导向的实时体育赛事和新闻网站
codequestion — Ask coding questions directly from the terminal

codequestion —直接从终端询问编码问题

安装并运行txtai (Install and run txtai)

The following code snippet shows how to install txtai and create an embeddings model.

以下代码段显示了如何安装txtai和创建嵌入模型。

pip install txtai

Next, we can create a simple in memory model with a couple sample records to try txtai out.

接下来，我们可以使用几个示例记录创建一个简单的内存模型来尝试txtai。

import numpy as npfrom txtai.embeddings import Embeddings# Create embeddings model, backed by sentence-transformers & transformers
embeddings = Embeddings({"method": "transformers", "path": "sentence-transformers/bert-base-nli-mean-tokens"})sections = ["US tops 5 million confirmed virus cases","Canada's last fully intact ice shelf has suddenly collapsed, forming a Manhattan-sized iceberg","Beijing mobilises invasion craft along coast as Taiwan tensions escalate","The National Park Service warns against sacrificing slower friends in a bear attack","Maine man wins $1M from $25 lottery ticket","Make huge profits without work, earn up to $100,000 a day"]print("%-20s %s" % ("Query", "Best Match"))
print("-" * 50)for query in ("feel good story", "climate change", "health", "war", "wildlife", "asia","north america", "dishonest junk"):# Get index of best section that best matches queryuid = np.argmax(embeddings.similarity(query, sections))print("%-20s %s" % (query, sections[uid]))

Running the code above will print the following:

运行上面的代码将打印以下内容：

Image for post — Embeddings query output

The example above shows for almost all of the queries, the actual text isn’t stored in the list of text sections. This is the true power of transformer models over token based search. What you get out of the box is

引入txtai，这是一种基于Transformers的AI驱动的搜索引擎相关推荐

主要内容: 本文提出了一种基于(ppo)的微电网最优调度方法。该方法采用强化学习(RL)来学习调度策略，并积累相应的调度知识。同时，引入ppo模型，将微电网调度策略动作从离散动作空间扩展到连续动作
MATLAB代码:微电网强化学习关键词:微电网强化学习 RL Reinforcement Learning 参考文档:<Optimal Scheduling of Microgrid Ba ...
Recsys'21 | 基于Transformers的行为序列建模
第一篇,推荐系统在NLP的肩膀上前进. 第二篇,基于Transformers的行为序列建模. 第三篇,业界基于Transformers的序列推荐建模调研. 本文提出了一整套序列推荐建模的pipelin ...
【组队学习】【28期】基于transformers的自然语言处理(NLP)入门
基于transformers的自然语言处理(NLP)入门论坛版块: http://datawhale.club/c/team-learning/39-category/39 开源内容: https: ...
CoBigICP：一种基于相关熵以及双向匹配的鲁棒且准确的配准方法
点击上方"3D视觉工坊",选择"星标" 干货第一时间送达标题:CoBigICP: Robust and Precise Point Set Registrat ...
独家 | Facebook AI发布DETR一种基于Transformer的对象检测方法！
作者:PRATEEK JOSHI 翻译:陈之炎校对:王晓颖本文约1800字,建议阅读8分钟. 每隔一段时间,一些新的机器学习的框架或者库就会改变整个领域的格局.今天,Facebook开源了-DET ...
ACM MM：一种基于情感脑电信号时-频-空特征的3D密集连接网络
本文介绍一篇于计算机领域顶级会议ACM MM 2020发表的论文<SST-EmotionNet: Spatial-Spectral-Temporal based Attention 3D Den ...
ACM MM 2020：一种基于情感脑电信号时-频-空特征的3D密集连接网络
本文介绍一篇于计算机领域顶级会议ACM MM 2020发表的论文<SST-EmotionNet: Spatial-Spectral-Temporal based Attention 3D Den ...
Hive数据分析——Spark是一种基于rdd（弹性数据集）的内存分布式并行处理框架，比于Hadoop将大量的中间结果写入HDFS，Spark避免了中间结果的持久化...
转自:http://blog.csdn.net/wh_springer/article/details/51842496 近十年来,随着Hadoop生态系统的不断完善,Hadoop早已成为大数据事实上 ...
matlab鬼成像,一种基于光计算的可视化计算鬼成像系统及成像方法与流程
本发明涉及成像探测领域,具体为一种基于光计算的可视化计算鬼成像系统及成像方法. 背景技术: 光学鬼成像技术是通过双路光信号符合探测恢复待测物体空间信息实现的.其特点是包含物体信息的信号光(signal ...

引入txtai，这是一种基于Transformers的AI驱动的搜索引擎

txtai简介 (Introducing txtai)

安装并运行txtai (Install and run txtai)

引入txtai，这是一种基于Transformers的AI驱动的搜索引擎相关推荐

最新文章

热门文章