基于bert的语义匹配

If you read my previous article on Towards Data Science you’ll know I’m a bit of a Star Trek nerd. There’s only one thing I like more than Star Trek, and that’s building cool new stuff with AI. So I thought I’d combine the two yet again!

如果您阅读我以前的《迈向数据科学》的文章,您会知道我有点像《星际迷航》的书呆子。 除了星际迷航,我只喜欢一件事,而这正是AI创造的新奇事物。 所以我想我又将两者结合了!

In this tutorial we’re going to build our own search engine to search all the lines from Star Trek: The Next Generation. We’ll be using Jina, a neural search framework which uses deep learning to power our NLP search, though we could easily use it for image, audio or video search if we wanted to.

在本教程中,我们将构建自己的搜索引擎来搜索《 星际迷航:下一代》中的所有文章 。 我们将使用Jina ,这是一个神经搜索框架,该框架使用深度学习为NLP搜索提供支持,但如果愿意,我们可以轻松地将其用于图像,音频或视频搜索。

We’ll cover:

我们将介绍:

  • Basic setup
    基本设定
  • Running a demo of our app (yes, even before we code it)
    运行我们的应用程序的演示(是的,甚至在我们编写代码之前)
  • Using cookiecutter to create project and boilerplate code
    使用cookiecutter创建项目和样板代码
  • Downloading our Star Trek dataset
    下载我们的星际迷航数据集
  • Loading, indexing, and searching our dataset
    加载,索引和搜索我们的数据集
  • A deeper look behind the scenes
    深入了解幕后情况
  • What to do if things go wrong
    如果出问题了该怎么办

If you’re new to AI or search, don’t worry. As long as you have some knowledge of Python and the command line you’ll be fine. If it helps, think of yourself as Lieutenant Commander Data Science.

如果您不熟悉AI或搜索,请不要担心。 只要您对Python和命令行有一定的了解,就可以了。 如果有帮助,可以将自己视为数据科学中尉。

Giphy吉菲

试试看 (Try It Out)

Before going through the trouble of downloading, configuring and testing your search engine, let’s get an idea of the finished product. In this case, it’s exactly the same as what we’re building, but with lines from South Park instead of Star Trek:

在解决下载,配置和测试搜索引擎的麻烦之前,让我们先了解一下成品。 在这种情况下,它与我们正在建造的建筑物完全相同,但是使用的是南方公园的线路,而不是星际迷航:

JinaboxJinabox

Jina has a pre-built Docker image with indexed data from South Park. You can run it with:

Jina有一个预先构建的Docker映像,其中包含来自South Park的索引数据。 您可以使用以下命令运行它:

docker run -p 45678:45678 jinaai/hub.app.distilbert-southpark

After getting Docker up and running, you can start searching for those South Park lines.

在启动并运行Docker之后,您可以开始搜索这些南方公园线。

用Jinabox查询 (Query with Jinabox)

Jinabox is a simple web-based front-end for neural search. You can see it in the graphic at the top of this tutorial.

Jinabox是用于神经搜索的基于Web的简单前端。 您可以在本教程顶部的图形中看到它。

  1. Go to jinabox in your browser

    在浏览器中转到jinabox

  2. Ensure you have the server endpoint set to http://localhost:45678/api/search

    确保将服务器端点设置为http://localhost:45678/api/search

  3. Type a phrase into the search bar and see which South Park lines come up
    在搜索栏中输入短语,看看出现了哪些南方公园线

Note: If it times out the first time, that’s because the query system is still warming up. Try again in a few seconds!

注意:如果第一次超时,那是因为查询系统仍在预热。 请在几秒钟后再试一次!

curl查询 (Query with curl)

Alternatively, you can open your shell and check the results via the RESTful API. The matched results are stored in topkResults.

或者,您可以打开外壳并通过RESTful API检查结果。 匹配的结果存储在topkResults

curl --request POST -d '{"top_k": 10, "mode": "search", "data": ["text:hey, dude"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:45678/api/search'

You’ll see the results output in JSON format. Each result looks like:

您将看到JSON格式的结果输出。 每个结果看起来像:

Now go back to your terminal running Docker and hit Ctrl-C (or Command-C on Mac) a few times to ensure you've stopped everything.

现在回到运行Docker的终端,然后Ctrl-C (或Mac上的Command-C )几次,以确保已停止所有操作。

从事! (Engage!)

Now that you know what we’re building, let’s get started!

现在您知道我们正在构建什么,让我们开始吧!

You will need:

你会需要:

  • A basic knowledge of Python
    Python基本知识
  • Python 3.7 or higher installed, and pip

    已安装Python 3.7或更高版本,并pip

  • A Mac or Linux computer (Jina doesn’t currently support Windows)
    Mac或Linux计算机(Jina当前不支持Windows)
  • 8 gigabytes or more of RAM
    8 GB或更多的RAM

克隆仓库 (Clone the Repo)

Let’s get the basic files we need to get moving:

让我们获取移动所需的基本文件:

git clone git@github.com:alexcg1/my-first-jina-app.git cd my-first-jina-app

运行Cookiecutter (Run Cookiecutter)

Giphy吉菲
pip install -U cookiecuttercookiecutter gh:jina-ai/cookiecutter-jina

We use cookiecutter to spin up a basic Jina app and save you having to do a lot of typing and setup.

我们使用cookiecutter来启动基本的Jina应用程序,从而省去了很多打字和设置过程。

For our Star Trek example, use the following settings:

对于我们的《星际迷航》示例,请使用以下设置:

  • project_name: Star Trek

    project_nameStar Trek

  • project_slug: star_trek (default value)

    project_slugstar_trek (默认值)

  • task_type: nlp

    task_typenlp

  • index_type: strings

    index_typestrings

  • public_port: 65481 (default value)

    public_port65481 (默认值)

Just use the defaults for all other fields. After cookiecutter has finished, let’s have a look at the files it created:

仅将默认值用于所有其他字段。 cookiecutter完成后,让我们看一下它创建的文件:

cd star_trekls

You should see a bunch of files:

您应该看到一堆文件:

  • app.py - The main Python script where you initialize and pass data into your Flow

    app.py主Python脚本,您可以在其中初始化并将数据传递到Flow

  • Dockerfile - Lets you spin up a Docker instance running your app

    Dockerfile让您启动运行您的应用程序的Docker实例

  • flows/ - Folder to hold your Flows

    flows/ -存放流量的文件夹

  • pods/ - Folder to hold your Pods

    pods/ -存放pods/文件夹

  • README.md - An auto-generated README file

    README.md自动生成的README文件

  • requirements.txt - A list of required Python packages

    requirements.txt所需的Python软件包列表

In the flows/ folder we can see index.yml and query.yml - these define the indexing and querying Flows for your app.

flows/文件夹中,我们可以看到index.ymlquery.yml它们定义了应用程序的索引和查询流。

In pods/ we see chunk.yml, craft.yml, doc.yml, and encode.yml - these Pods are called from the Flows to process data for indexing or querying.

pods/我们看到chunk.ymlcraft.ymldoc.ymlencode.yml -从流程中调用这些encode.yml来处理用于索引或查询的数据。

More on Flows and Pods later!

稍后更多关于Flows和Pod的信息!

安装要求 (Install Requirements)

In your terminal run this command to download and install all the required Python packages:

在您的终端中,运行以下命令以下载并安装所有必需的Python软件包:

pip install -r requirements.txt

下载数据集 (Download Dataset)

Our goal is to find out who said what in Star Trek episodes when a user queries a phrase. The Star Trek dataset from Kaggle contains all the scripts and individual character lines from Star Trek: The Original Series all the way through Star Trek: Enterprise.

我们的目标是找出当用户查询词组时谁在星际迷航情节中说了什么。 Kaggle的“ 星际迷航”数据集包含《 星际迷航:原始系列》中的所有脚本和单个字符行,一直到《 星际迷航:企业》

We’re just using a subset in this example, containing the characters and lines from Star Trek: The Next Generation. This has also been converted from JSON to CSV format, which is more suitable for Jina to process.

在此示例中,我们仅使用一个子集,其中包含《 星际迷航:下一代》中的字符和线条。 这也已从JSON转换为CSV格式,更适合Jina处理。

Now let’s ensure we’re back in our base folder and download the dataset by running:

现在,确保我们回到基本文件夹中,并通过运行以下命令下载数据集:

Once that’s finished downloading, let’s get back into the star_trek directory and make sure our dataset has everything we want:

下载完成后,让我们回到star_trek目录,并确保我们的数据集包含我们想要的一切:

cd star_trekhead data/startrek_tng.csv

You should see output consisting of characters (like MCCOY), a separator, (!), and the lines spoken by the character ( What about my age?):

您应该看到由字符(如MCCOY ),分隔符( ! )和字符所讲的行组成的输出( What about my age? ):

BAILIFF!The prisoners will all stand.BAILIFF!All present, stand and make respectful attention to honouredJudge.BAILIFF!Before this gracious court now appear these prisoners to answer for the multiple and grievous savageries of their species. How plead you, criminal? BAILIFF!Criminals keep silence!BAILIFF!You will answer the charges, criminals. BAILIFF!Criminal, you will read the charges to the court.BAILIFF!All present, respectfully stand. QBAILIFF!This honourable court is adjourned. Stand respectfully. Q MCCOY!Hold it right there, boy.MCCOY!What about my age?

Note: Your character lines may be a little different. That’s okay!

注意:您的字符行可能有所不同。 没关系!

载入资料 (Load Data)

Now we we need to pass startrek_tng.csv into app.py so we can index it. app.py is a little too simple out of the box, so let's make some changes:

现在我们需要将startrek_tng.csv传递到app.py以便我们对其进行索引。 app.py有点开箱即用,所以让我们进行一些更改:

Open app.py in your editor and check the index function, we currently have:

在编辑器中打开app.py并检查index功能,我们目前有:

As you can see, this indexes just 3 strings. Let’s load up our Star Trek file instead with the filepath parameter. Just replace the last line of the function:

如您所见,此索引仅包含3个字符串。 让我们使用filepath参数加载星际迷航文件。 只需替换函数的最后一行:

索引更少的文件 (Index Fewer Documents)

While we’re here, let’s reduce the number of documents we’re indexing, just to speed things up while we’re testing. We don’t want to spend ages indexing only to have issues later on!

当我们在这里时,让我们减少索引的文档数量,只是为了加快测试过程。 我们不想花费年龄索引只是为了以后有问题!

In the section above the config function, let's change:

config函数上方的部分中,我们进行更改:

to:

至:

That should speed up our testing by a factor of 100! Once we’ve verified everything works we can set it back to 50000 to index more of our dataset.

这样可以将我们的测试速度提高100倍! 验证一切正常后,我们可以将其设置回50000以索引更多数据集。

Now that we’ve got the code to load our data, we’re going to dive into writing our app and running our Flows! Flows are the different tasks our app performs, like indexing or searching the data.

现在我们已经有了加载数据的代码,我们将深入研究编写应用程序并运行Flows! 流程是我们的应用执行的不同任务,例如索引或搜索数据。

索引编制 (Indexing)

First up we need to build up an index of our file. We’ll search through this index when we use the query Flow later.

首先,我们需要建立文件索引。 稍后当我们使用查询Flow时,将搜索该索引。

python app.py index

Your app will show a lot of output in the terminal, but you’ll know it’s finished when you see the line:

您的应用程序将在终端中显示很多输出,但是当您看到该行时,您将知道它已完成:

Flow@133216[S]:flow is closed and all resources should be released already, current build level is 0

This may take a little while the first time, since Jina needs to download the language model and tokenizer to process the data. You can think of these as the brains behind the neural network that powers the search.

第一次可能要花一点时间,因为Jina需要下载语言模型和令牌生成器来处理数据。 您可以将它们视为推动搜索的神经网络背后的大脑。

正在搜寻 (Searching)

To start search mode run:

要启动搜索模式,请运行:

python app.py search

After a while you should see the terminal stop scrolling and display output like:

一段时间后,您应该看到终端停止滚动并显示如下输出:

Flow@85144[S]:flow is started at 0.0.0.0:65481, you can now use client to send request!

⚠️ Be sure to note down the port number. We’ll need it for curl and jinabox! In our case we'll assume it's 65481, and we use that in the below examples. If your port number is different, be sure to use that instead.

Be️请务必记下端口号。 我们需要它来curl和弹力盒! 在我们的例子中,我们假设它是65481 ,我们在以下示例中使用它。 如果您的端口号不同,请确保使用该端口号。

ℹ️ python app.py search doesn't pop up a search interface - for that you'll need to connect via curl, Jinabox, or another client.

ℹ️python python app.py search不会弹出搜索界面-为此,您需要通过curl ,Jinabox或其他客户端进行连接。

用Jinabox搜索 (Search with Jinabox)

JinaboxJinabox
  1. Go to jinabox in your browser

    在浏览器中转到jinabox

  2. Ensure you have the server endpoint set to http://localhost:65481/api/search

    确保将服务器端点设置为http://localhost:65481/api/search

  3. Type a phrase into the search bar and see which Star Trek lines come up
    在搜索栏中输入短语,看看出现了哪些“星际迷航”线路

卷曲搜索 (Search with curl)

curl --request POST -d '{"top_k": 10, "mode": "search", "data": ["text:picard to riker"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:65481/api/search'

curl will spit out a lot of information in JSON format - not just the lines you're searching for, but all sorts of metadata about the search and the lines it returns. Look for the lines starting with "matchDoc" to find the matches, like:

curl将以JSON格式吐出很多信息-不仅是您要搜索的行,还包括有关搜索及其返回行的各种元数据。 查找以"matchDoc"开头的行以找到匹配项,例如:

Congratulations! You’ve just built your very own search engine!

恭喜你! 您已经建立了自己的搜索引擎!

实际如何运作? (How Does it Actually Work?)

For a more general overview of what neural search is and how it works, check one of my other previous articles. Jina itself is just one way to build a neural search engine, and it has a couple of important concepts: Flows and Pods:

有关什么是神经搜索及其工作原理的更一般的概述,请查看我之前的其他文章之一 。 Jina本身只是构建神经搜索引擎的一种方法,它具有两个重要概念: FlowsPods

  • The Flow tells Jina what tasks to perform on the dataset, like indexing or searching. Each Flow is built from individual Pods.

    该流程告诉Jina在数据集上执行哪些任务,例如索引或搜索。 每个流都是从单独的Pod构建的。

  • The Pods comprise the Flow and tell Jina how to perform each task step by step, like breaking text into chunks, indexing it, and so on. They define the actual neural networks we use in neural search, namely the language models like distilbert-base-cased. (Which we can see in pods/encode.yml)

    Pods由Flow组成,并告诉Jina 如何逐步执行每个任务,例如将文本分成大块,对其进行索引等等。 他们定义了我们在神经搜索中使用的实际神经网络,即像distilbert-base-cased这样的语言模型。 (我们可以在pods/encode.yml看到)

流量 (Flows)

Jina 101Jina 101

Just as a plant manages nutrient flow and growth rate for its branches, a Flow manages the states and context of a group of Pods, orchestrating them to accomplish one task. Whether a Pod is remote or running in Docker, one Flow rules them all!

正如植物管理其分支机构的养分流量和生长速率一样,流程管理着一组豆荚的状态和环境,将它们编排在一起以完成一项任务。 无论Pod是远程的还是在Docker中运行,一个Flow都将它们统治!

We define Flows in app.py to index and query the content in our Star Trek dataset.

我们在app.py定义Flows以索引和查询“星际迷航”数据集中的内容。

In this case our Flows are written in YAML format and loaded into app.py with:

在这种情况下,我们的流程以YAML格式编写,并通过以下方式加载到app.py中:

It really is that simple! Alternatively you can build Flows in app.py itself without specifying them in YAML.

真的就是这么简单! 或者,您可以在app.py本身中构建Flow, 而无需在YAML中指定它们 。

No matter whether you’re dealing with text, graphics, sound, or video, all datasets need to be indexed and queried, and the steps for doing each (chunking, vector encoding) are more or less the same (even if how you perform each step is different — that’s where Pods come in!)

无论你在处理文字,图形,声音或视频是否所有数据集需要进行索引和查询,并做每一个步骤(分块,矢量编码)或多或少相同(即使你表现如何每个步骤都是不同的-这就是Pod进来的地方!)

索引编制 (Indexing)

Every Flow has well, a flow to it. Different Pods pass data along the Flow, with one Pod’s output becoming another Pod’s input. Look at our indexing Flow as an example:

每个流程都有一个良好的流程。 不同的Pod沿着流传递数据,其中一个Pod的输出成为另一Pod的输入。 以我们的索引流为例:

Jina DashboardJina仪表板

If you look at startrek_tng.csv you'll see it's just one big text file. Our Flow processes it into something more suitable for Jina, which is handled by the Pods in the Flow. Each Pod performs a different task.

如果您查看startrek_tng.csv您会发现它只是一个大文本文件。 我们的流程将其处理成更适合Jina的内容,由流程中的Pod处理。 每个Pod执行不同的任务。

You can see the following Pods in flows/index.yml:

您可以在flows/index.yml看到以下flows/index.yml

  • crafter - Split the Document into Chunks

    crafter -将文档拆分为块

  • encoder - Encode each Chunk into a vector

    encoder -将每个块编码为向量

  • chunk_idx - Build an index of Chunks

    chunk_idx建立块的索引

  • doc_idx - Store the Document content

    doc_idx存储文档内容

  • join_all - Join the chunk_idx and doc_idx pathways

    join_all加入chunk_idxdoc_idx路径

The full file is essentially just a list of Pods with parameters and some setup at the top of the file:

完整的文件实际上只是一个Pod列表,带有参数和文件顶部的一些设置:

Luckily, YAML is pretty human-readable. I regularly thank the Great Bird of the Galaxy it’s not in Klingon, or even worse, XML!

幸运的是,YAML非常易于阅读。 我经常感谢银河大鸟(Great Bird of the Galaxy),它不在Klingon中,甚至不在XML中!

  • The first couple of lines initialize the Flow and enable the logserver (which we’re not using in this tutorial).
    前两行初始化Flow并启用日志服务器(本教程中未使用)。
  • After that we can see the list of Pods, with their own YAML path and extra parameters being passed to each one.
    之后,我们可以看到Pod列表,它们具有自己的YAML路径和额外的参数传递给每个Pod。

So, is that all of the Pods? Not quite! We always have another Pod working in silence — the gateway pod. Most of the time we can safely ignore it because it basically does all the dirty orchestration work for the Flow.

那是所有的豆荚吗? 不完全的! 我们总是有另一个Pod在静默工作- gateway Pod。 大多数时候,我们可以放心地忽略它,因为它基本上完成了Flow的所有肮脏编排工作。

正在搜寻 (Searching)

Jina DashboardJina仪表板

In the query Flow we’ve got the following Pods:

在查询流中,我们具有以下Pod:

  • chunk_seg - Segments the user query into meaningful Chunks

    chunk_seg将用户查询细分为有意义的块

  • tf_encode - Encode each word of the query into a vector

    tf_encode将查询的每个单词编码为向量

  • chunk_idx - Build an index for the Chunks for fast lookup

    chunk_idx为块建立索引以快速查找

  • ranker - Sort results list

    ranker排序结果列表

  • doc_idx - Store the Document content

    doc_idx存储文档内容

Again, flows/query.yml gives some setup options and lists the Pods in order of use:

同样, flows/query.yml提供了一些设置选项,并按使用顺序列出了flows/query.yml

When we were indexing we broke the Document into Chunks to index it. For querying we do the same, but this time the Document is the query the user types in, not the Star Trek dataset. We’ll use many of the same Pods, but there are a few differences to bear in mind. In the big picture:

当我们建立索引时,我们将文档分解为多个块以对其进行索引。 对于查询,我们执行相同的操作,但是这次文档是用户键入的查询,而不是Star Trek数据集。 我们将使用许多相同的Pod,但要记住一些差异。 在大局中:

  • Index has a two-pathway design which deals with both Document and Chunk indexing in parallel, which speeds up message passing
    Index具有两种途径的设计,可同时处理Document和Chunk索引,从而加快了消息传递的速度
  • Query has a single pipeline
    查询只有一个管道

And digging into the flows/query.yml, we can see it has an extra Pod and some more parameters compared to flows/index.yml:

并深入到flows/query.yml ,我们可以看到,与flows/index.yml相比,它具有一个额外的Pod和更多参数:

  • rest_api:true - Use Jina's REST API, allowing clients like jinabox and curl to connect

    rest_api:true使用Jina的REST API,允许像jinabox和curl这样的客户端进行连接

  • port_expose: $JINA_PORT - The port for connecting to Jina's API

    port_expose: $JINA_PORT用于连接Jina API的端口

  • polling: all - Setting polling to all ensures all workers poll the message

    polling: all -将polling设置为all可确保所有工作人员轮询消息

  • reducing_yaml_path: _merge_topk_chunks - Use _merge_topk_chunks to reduce results from all replicas

    reducing_yaml_path: _merge_topk_chunks使用_merge_topk_chunks减少所有副本的结果

  • ranker: - Rank results by relevance

    ranker: -按相关性对结果进行排名

How does Jina know whether it should be indexing or searching? In our RESTful API we set the mode field in the JSON body and send the request to the corresponding API:

Jina如何知道应该索引还是搜索? 在我们的RESTful API中,我们在JSON主体中设置mode字段,并将请求发送到相应的API:

  • api/index - {"mode": "index"}

    api/index {"mode": "index"}

  • api/search - {"mode": "search"}

    api/search {"mode": "search"}

豆荚 (Pods)

Jina 101Jina 101

As we discussed above, a Flow tells Jina what task to perform and is comprised of Pods. And a Pod tells Jina how to perform that task (i.e. what the right tool for job is). Both Pods and Flows are written in YAML.

正如我们上面讨论的,Flow告诉Jina要执行什么任务,并由Pod组成。 一个Pod告诉Jina 如何执行该任务(即,正确的工作工具是什么)。 Pods和Flow均以YAML编写。

Let’s start by looking at a Pod in our indexing Flow, flows/index.yml. Instead of the first Pod crafter, let's look at encoder which is a bit simpler:

让我们从索引流Flow flows/index.yml中的Pod开始。 代替第一个Pod crafter ,让我们看一下encoder ,它稍微简单一些:

As we can see in the code above, the encoder Pod’s YAML file is stored in pods/encode.yml, and looks like:

正如我们在上面的代码中看到的那样, encoder Pod的YAML文件存储在pods/encode.yml ,如下所示:

The Pods uses the built-in TransformerTorchEncoder as its Executor. Each Pod has a different Executor based on its task, and an Executor represents an algorithm, in this case encoding. The Executor differs based on what's being encoded. For video or audio you'd use a different one. The with field specifies the parameters passed to TransformerTorchEncoder.

Pod使用内置的TransformerTorchEncoder作为其执行程序。 每个Pod根据其任务都有一个不同的执行器,执行器代表一种算法,在这种情况下为编码。 执行器因要编码的内容而异。 对于视频或音频,您将使用另一种。 with字段指定传递给TransformerTorchEncoder的参数。

  • pooling_strategy - Strategy to merge word embeddings into chunk embedding

    pooling_strategy将单词嵌入合并为块嵌入的策略

  • model_name - Name of the model we're using

    model_name我们正在使用的模型的名称

  • max_length - Maximum length to truncate tokenized sequences to

    max_length将标记化序列截断为的最大长度

When the Pod runs, data is passed in from the previous Pod, TransformerTorchEncoder encodes the data, and the Pod passes the data to the next Pod in the Flow.

当Pod运行时,数据从上一个Pod传入, TransformerTorchEncoder对数据进行编码,然后Pod将数据传递到Flow中的下一个Pod。

For a deeper dive on Pods, Flows, Executors and everything else, you can refer to Jina 101.

要深入了解Pod,Flows,Executors和其他所有内容,可以参考Jina 101 。

故障排除 (Troubleshooting)

Giphy吉菲

找不到模块 (Module not found)

Be sure to run pip install -r requirements.txt before beginning, and ensure you have lots of RAM/swap and space in your tmp partition (see below issues). This may take a while since there are a lot of prerequisites to install.

开始之前,请确保运行pip install -r requirements.txt ,并确保tmp分区中有大量RAM /交换空间(请参阅以下问题)。 由于要安装许多先决条件,因此可能需要一段时间。

If this error keeps popping up, look into the errors that were output onto the terminal to try to find which module is missing, and then run:

如果此错误不断弹出,请查看输出到终端上的错误以尝试查找缺少的模块,然后运行:

pip install <module_name>

pip install <module_name>

我的电脑挂了 (My computer hangs)

Machine learning requires a lot of resources, and if your machine hangs this is often due to running out of memory. To fix this, try creating a swap file if you use Linux. This isn’t such an issue on macOS, since it allocates swap automatically.

机器学习需要大量资源,并且如果机器挂起,通常是由于内存不足。 要解决此问题,如果使用Linux,请尝试创建交换文件 。 在macOS上,这不是问题,因为它会自动分配交换。

ERROR: Could not install packages due to an EnvironmentError: [Errno 28] No space left on device (ERROR: Could not install packages due to an EnvironmentError: [Errno 28] No space left on device)

This is often due to your /tmp partition running out of space so you'll need to increase its size.

这通常是由于/tmp分区空间不足,因此您需要增加其大小 。

command not found (command not found)

For this error you’ll need to install the relevant software package onto your system. In Ubuntu this can be done with:

对于此错误,您需要将相关的软件包安装到系统上。 在Ubuntu中,可以通过以下方式完成此操作:

sudo apt-get install <package_name>

恭喜你! 我们做到了! (Congratulations! We Did It!)

Giphy吉菲

In this tutorial you’ve learned:

在本教程中,您学习了:

  • How to install the Jina neural search framework
    如何安装Jina神经搜索框架
  • How to load and index text data from files
    如何从文件加载和索引文本数据
  • How to query data with curl and Jinabox

    如何使用curl和Jinabox查询数据

  • The nitty-gritty behind Jina Flows and Pods
    Jina Flows和Pod背后的本质
  • What do if it all goes wrong
    如果一切出错怎么办

Now that you have a broad understanding of how things work, you can try out some of more example tutorials to build image or video search, or stay tuned for our next set of tutorials that build upon your Star Trek app.

现在,您对事物的工作方式有了广泛的了解,可以尝试一些其他示例教程来构建图像或视频搜索,或者继续关注基于您的Star Trek应用程序构建的下一组教程。

Got an idea for a tutorial covering Star Trek and/or neural search? My commbadge is out of order right now, but you can leave a comment or note on this article for me to assimilate!

对涵盖《星际迷航》和/或神经搜索的教程有想法吗? 我的命令目前无法正常使用,但是您可以在这篇文章上留下评论或注释,以供我吸收!

Alex C-G is the Open Source Evangelist at Jina AI, and a massive Star Trek geek.

Alex CG是Jina AI的开源传播者,也是星际迷航的极客。

翻译自: https://towardsdatascience.com/build-a-bert-based-semantic-search-system-for-star-trek-7d7d28414cd8

基于bert的语义匹配

http://www.taodudu.cc/news/show-863790.html

相关文章:

  • 一个数据包的旅程_如何学习数据科学并开始您的惊人旅程
  • jupyter 托管_如何在本地托管的Jupyter Notebook上进行协作
  • fitbit手表中文说明书_如何获取和分析Fitbit睡眠分数
  • 熔池 沉积_用于3D打印的AI(第2部分):异常熔池检测的一课学习
  • 机器学习 可视化_机器学习-可视化
  • 学习javascript_使用5行JavaScript进行机器学习
  • 强化学习-动态规划_强化学习-第4部分
  • 神经网络优化器的选择_神经网络:优化器选择的重要性
  • 客户细分_客户细分:K-Means聚类和A / B测试
  • 菜品三级分类_分类器的惊人替代品
  • 开关变压器绕制教程_教程:如何将变压器权重和令牌化器从AllenNLP上传到HuggingFace
  • 一般线性模型和混合线性模型_线性混合模型如何工作
  • 为什么基于数字的技术公司进行机器人研究
  • 人类视觉系统_对人类视觉系统的对抗攻击
  • 在神经网络中使用辍学:不是一个神奇的子弹
  • 线程监视器模型_为什么模型验证如此重要,它与模型监视有何不同
  • dash使用_使用Dash和SHAP构建和部署可解释的AI仪表盘
  • 面向表开发 面向服务开发_面向繁忙开发人员的计算机视觉
  • 可视化 nltk_词嵌入:具有Genism,NLTK和t-SNE可视化的Word2Vec
  • fitbit手表中文说明书_使用机器学习预测Fitbit睡眠分数
  • redis生产环境持久化_在SageMaker上安装持久性Julia环境
  • alexnet vgg_从零开始:建立著名的分类网2(AlexNet / VGG)
  • 垃圾邮件分类 python_在python中创建SMS垃圾邮件分类器
  • 脑电波之父:汉斯·贝格尔_深度学习,认识聪明的汉斯
  • PyCaret 2.0在这里-新增功能?
  • 特征选择 回归_如何执行回归问题的特征选择
  • 建立神经网络来预测贷款风险
  • redshift教程_分析和可视化Amazon Redshift数据—教程
  • 白雪小町_町
  • 机器学习术语_机器学习术语神秘化。

基于bert的语义匹配_构建基于BERT的语义搜索系统…针对“星际迷航”相关推荐

  1. python实现语义分割_如何用PyTorch进行语义分割?一文搞定

    很久没给大家带来教程资源啦. 正值PyTorch 1.7更新,那么我们这次便给大家带来一个PyTorch简单实用的教程资源:用PyTorch进行语义分割. 图源:stanford 该教程是基于2020 ...

  2. electron 打开调试_构建基于 iOS 模拟器的前端调试方案

    作者:imyzf 本文将为大家介绍自动化控制 iOS 模拟器的原理,为开发基于 iOS 模拟器的前端调试方案提供帮助. 我们在开发 iOS App 内的前端页面时,有一个很大的痛点,页面无法使用 Sa ...

  3. arduino读取水位传感器的数据显示在基于i2c的1602a上_构建Arduino的LoRa远程智能空气质量监测系统...

    背景知识视频教程 Arduino分步指南:完整指南 - 国外课栈​viadean.com Arduino微控制器:学习Arduino制作项目 - 国外课栈​viadean.com 通过构建实际应用程序 ...

  4. 基于光照的物理模型(一)_使用基于物理的阴影:一种实用方法

    基于光照的物理模型(一) Throughout the development of Unity 5, we've used our Viking Village project internally ...

  5. 基于docker微服务架构_使用基于微服务的流架构更好地进行大规模的复杂事件处理(第1部分)...

    基于docker微服务架构 基于微服务的流架构与开源规则引擎相结合,使实时业务规则变得容易 这篇文章旨在详细介绍我将OSS业务规则引擎与Kafka风格的现代流消息传递系统集成在一起的项目. 该项目的目 ...

  6. 基于java的扫雷论文_毕业论文基于java的扫雷游戏的设计与实现.doc

    毕业论文基于java的扫雷游戏的设计与实现 JAVA程序设计A课程设计 题 目 基于JAVA的扫雷游戏的设计与实现 院 (系) 信息工程学院 专 业 班 级 计算机科学与技术(2)班 学 生 姓 名 ...

  7. 基于java的扫雷论文_毕业论文基于JAVA的扫雷游戏设计

    毕业论文基于JAVA的扫雷游戏设计 课 程 设 计 报 告 课程名称: 计算机技术综合课程设计 题 目: 基于JAVA语言的扫雷游戏设计 学 院: 信息工程 系: 计算机 专 业: 计算机科学与技术 ...

  8. python语义网络图_知识图谱 语义网络,语义网,链接数据和知识图谱 (二)--基础篇...

    知识图谱 语义网络,语义网,链接数据和知识图谱 (二)--基础篇 发布时间:2018-05-14 16:10, 浏览次数:370 一.语义网络(Semantic Network) 对于初学者来讲,这个 ...

  9. bert预训练模型解读_超越谷歌BERT!依图预训练语言理解模型入选NeurIPS

    机器之心发布 机器之心编辑部 在本文中,本土独角兽依图科技提出了一个小而美的方案--ConvBERT,通过全新的注意力模块,仅用 1/10 的训练时间和 1/6 的参数就获得了跟 BERT 模型一样的 ...

最新文章

  1. jQuery--AJAX传递xml
  2. python编写自定义模块_python 自定义Server酱模块编写
  3. 使用ABAP编程实现对微软Office Word文档的操作
  4. Zedboard学习(八):zedboard移植opencv
  5. android鼠标滚轮事件坐标,android 处理鼠标滚轮事件 【转】
  6. python爬取糗事百科
  7. 计算机网络课设telnet_【川大】计算机网络课程设计9013,奥鹏2017
  8. G-SYNC技术是什么
  9. Vidmore Player for Mac(多功能蓝光播放器)
  10. ModuleNotFoundError: No module named ‘MySQLdb‘
  11. NO Scala sdk module
  12. android手机iPhone对比,安卓手机与苹果手机功能对比【详细介绍】
  13. 冒泡排序c语言(包含完整代码)
  14. 计算机桌面快捷win10,win10系统桌面计算机快捷图标不见了的修复方案
  15. LaTeX Beamer 制作PPT时给某一页添加背景图片(并设置透明度)
  16. 推荐6款办公软件,好用还免费,效率翻倍
  17. 远程重启h3c路由器_H3C路由器简单命令
  18. 【HTML505】HTML基础05_区块_布局
  19. C#中什么是类,类和对象的关系,类的访问修饰符?
  20. px4原生源码学习-(1)

热门文章

  1. Linux课程第十二天学习笔记
  2. yum因被锁定导致无法执行相关操作的解决方法
  3. Web应用开发中的几个问题
  4. linux 的基本命令
  5. 2013-11-5 深圳尚游网络公司 - 服务器开发工程师
  6. php异步传输,php 异步处理-上传文件
  7. java电信计费项目论文_基于JAVA开发的电信IP计费系统设计毕业论文
  8. [蓝桥杯][2019年第十届真题]后缀表达式(正解!!)
  9. chrome jquery ajax请求,jQuery.ajax在Chrome中无法正常执行的解决办法
  10. Python机器学习---2.聚类分析代码部分