torchtext.data.Field

类接口

class torchtext.data.Field(sequential=True, use_vocab=True, init_token=None, eos_token=None, fix_length=None, dtype=torch.int64, preprocessing=None, postprocessing=None, lower=False, tokenize=None, tokenizer_language='en', include_lengths=False, batch_first=False, pad_token='<pad>', unk_token='<unk>', pad_first=False, truncate_first=False, stop_words=None, is_target=False)

功能

Defines a datatype together with instructions for converting to Tensor.
定义数据类型以及转换为张量的指令

Field class models common text processing datatypes that can be represented by tensors. It holds a Vocab object that defines the set of possible values for elements of the field and their corresponding numerical representations. The Field object also holds other parameters relating to how a datatype should be numericalized, such as a tokenization method and the kind of Tensor that should be produced.
字段类对可用张量表示的通用文本处理数据类型进行建模。它保存一个Vocab对象，该对象定义字段元素的可能值集及其相应的数值表示。Field对象还包含与数据类型应如何数值化有关的其他参数，例如标记化方法和应生成的张量类型。

If a Field is shared between two columns in a dataset (e.g., question and answer in a QA dataset), then they will have a shared vocabulary.
如果一个字段在数据集中的两列之间共享（例如，QA数据集中的问题和答案），那么它们将有一个共享词汇表

参数

sequential – Whether the datatype represents sequential data. If False, no tokenization is applied. Default: True.
use_vocab – Whether to use a Vocab object. If False, the data in this field should already be numerical. Default: True.
init_token – A token that will be prepended to every example using this field, or None for no initial token. Default: None.
eos_token – A token that will be appended to every example using this field, or None for no end-of-sentence token. Default: None.
fix_length – A fixed length that all examples using this field will be padded to, or None for flexible sequence lengths. Default: None.
dtype – The torch.dtype class that represents a batch of examples of this kind of data. Default: torch.long.
preprocessing – The Pipeline that will be applied to examples using this field after tokenizing but before numericalizing. Many Datasets replace this attribute with a custom preprocessor. Default: None.
postprocessing – A Pipeline that will be applied to examples using this field after numericalizing but before the numbers are turned into a Tensor. The pipeline function takes the batch as a list, and the field’s Vocab. Default: None.
lower – Whether to lowercase the text in this field. Default: False.
tokenize – The function used to tokenize strings using this field into sequential examples. If “spacy”, the SpaCy tokenizer is used. If a non-serializable function is passed as an argument, the field will not be able to be serialized. Default: string.split.
tokenizer_language – The language of the tokenizer to be constructed. Various languages currently supported only in SpaCy.
include_lengths – Whether to return a tuple of a padded minibatch and a list containing the lengths of each examples, or just a padded minibatch. Default: False.
batch_first – Whether to produce tensors with the batch dimension first. Default: False.
pad_token – The string token used as padding. Default: “”.
unk_token – The string token used to represent OOV words. Default: “”.
pad_first – Do the padding of the sequence at the beginning. Default: False.
truncate_first – Do the truncating of the sequence at the beginning. Default: False
stop_words – Tokens to discard during the preprocessing step. Default: None
is_target – Whether this field is a target variable. Affects iteration over batches. Default: False

torchtext.data.Field相关推荐

解决方法：AttributeError: module ‘torchtext.data‘ has no attribute ‘Field‘
将 from torchtext.data import Field 改为 from torchtext.legacy.data import Field 同理,对于 from torchtext.d ...
AttributeError: module ‘torchtext.data‘ has no attribute ‘Field‘ 解决
AttributeError: module 'torchtext.data' has no attribute 'Field' 前言伴随着 3月5日TorchText 0.9.0更新,一些API调 ...
Import Error: from torchtext.data import to_map_style_dataset解决方案
大家好,我是爱编程的喵喵.双985硕士毕业,现担任全栈工程师一职,热衷于将数据思维应用到工作与生活中.从事机器学习以及相关的前后端开发工作.曾在阿里云.科大讯飞.CCF等比赛获得多次Top名次.现 ...
微信小程序 - Setting data field “xxx“ to undefined is invalid.报错原因及解决方案
前言 ## 完整报错(SEO) ## 翻译: 将数据字段"XXX"设置为未定义是无效的. Setting data field "xxx" to undefin ...
module ‘torchtext.data‘ has no attribute ‘LabelField‘
吐槽一句,torchtext比较垃圾,却不知道为啥这么多人用. 言归正传,原因是:torchtext更新了,新版本不这么用了. 解决办法: 将 from torchtext import data 改 ...
微信小程序——tap undefined Setting data field currType to undefined is invalid.
微信小程序,点击tap 赋值报这个错误. tapType: function (e){ var that = this; console.log(e) const currType = e.curre ...
torchtext field.build_vocab问题
因为毕设项目,第一次做文本数据预处理,太菜了, 踩了一堆坑,后来遇到一个问题 <ipython-input-11-107660a37432> in <module>()1 #T ...
‘function‘ object has no attribute ‘splits‘（Torchtext加载数据集出现的问题）
如加载WikiText时torchtext.datasets.WikiText2.splits(TEXT)报错解决方案:在datasets前加legacy 如:torchtext.legacy.da ...
三两下实现NLP训练和预测，这四个框架你要知道
作者 | 狄东林刘元兴朱庆福胡景雯编辑 | 刘元兴,崔一鸣来源 | 哈工大SCIR(ID:HIT_SCIR) 引言随着人工智能的发展,越来越多深度学习框架如雨后春笋般涌现,例如PyTorc ...

torchtext.data.Field

torchtext.data.Field

类接口

功能

参数

torchtext.data.Field相关推荐

最新文章

热门文章