torchtext.data.Field

类接口

class torchtext.data.Field(sequential=True, use_vocab=True, init_token=None, eos_token=None, fix_length=None, dtype=torch.int64, preprocessing=None, postprocessing=None, lower=False, tokenize=None, tokenizer_language='en', include_lengths=False, batch_first=False, pad_token='<pad>', unk_token='<unk>', pad_first=False, truncate_first=False, stop_words=None, is_target=False)

功能

Defines a datatype together with instructions for converting to Tensor.
定义数据类型以及转换为张量的指令

Field class models common text processing datatypes that can be represented by tensors. It holds a Vocab object that defines the set of possible values for elements of the field and their corresponding numerical representations. The Field object also holds other parameters relating to how a datatype should be numericalized, such as a tokenization method and the kind of Tensor that should be produced.
字段类对可用张量表示的通用文本处理数据类型进行建模。它保存一个Vocab对象,该对象定义字段元素的可能值集及其相应的数值表示。Field对象还包含与数据类型应如何数值化有关的其他参数,例如标记化方法和应生成的张量类型。

If a Field is shared between two columns in a dataset (e.g., question and answer in a QA dataset), then they will have a shared vocabulary.
如果一个字段在数据集中的两列之间共享(例如,QA数据集中的问题和答案),那么它们将有一个共享词汇表

参数

  • sequential – Whether the datatype represents sequential data. If False, no tokenization is applied. Default: True.
  • use_vocab – Whether to use a Vocab object. If False, the data in this field should already be numerical. Default: True.
  • init_token – A token that will be prepended to every example using this field, or None for no initial token. Default: None.
  • eos_token – A token that will be appended to every example using this field, or None for no end-of-sentence token. Default: None.
  • fix_length – A fixed length that all examples using this field will be padded to, or None for flexible sequence lengths. Default: None.
  • dtype – The torch.dtype class that represents a batch of examples of this kind of data. Default: torch.long.
  • preprocessing – The Pipeline that will be applied to examples using this field after tokenizing but before numericalizing. Many Datasets replace this attribute with a custom preprocessor. Default: None.
  • postprocessing – A Pipeline that will be applied to examples using this field after numericalizing but before the numbers are turned into a Tensor. The pipeline function takes the batch as a list, and the field’s Vocab. Default: None.
  • lower – Whether to lowercase the text in this field. Default: False.
  • tokenize – The function used to tokenize strings using this field into sequential examples. If “spacy”, the SpaCy tokenizer is used. If a non-serializable function is passed as an argument, the field will not be able to be serialized. Default: string.split.
  • tokenizer_language – The language of the tokenizer to be constructed. Various languages currently supported only in SpaCy.
  • include_lengths – Whether to return a tuple of a padded minibatch and a list containing the lengths of each examples, or just a padded minibatch. Default: False.
  • batch_first – Whether to produce tensors with the batch dimension first. Default: False.
  • pad_token – The string token used as padding. Default: “”.
  • unk_token – The string token used to represent OOV words. Default: “”.
  • pad_first – Do the padding of the sequence at the beginning. Default: False.
  • truncate_first – Do the truncating of the sequence at the beginning. Default: False
  • stop_words – Tokens to discard during the preprocessing step. Default: None
  • is_target – Whether this field is a target variable. Affects iteration over batches. Default: False

torchtext.data.Field相关推荐

  1. 解决方法:AttributeError: module ‘torchtext.data‘ has no attribute ‘Field‘

    将 from torchtext.data import Field 改为 from torchtext.legacy.data import Field 同理,对于 from torchtext.d ...

  2. AttributeError: module ‘torchtext.data‘ has no attribute ‘Field‘ 解决

    AttributeError: module 'torchtext.data' has no attribute 'Field' 前言 伴随着 3月5日TorchText 0.9.0更新,一些API调 ...

  3. Import Error: from torchtext.data import to_map_style_dataset解决方案

      大家好,我是爱编程的喵喵.双985硕士毕业,现担任全栈工程师一职,热衷于将数据思维应用到工作与生活中.从事机器学习以及相关的前后端开发工作.曾在阿里云.科大讯飞.CCF等比赛获得多次Top名次.现 ...

  4. 微信小程序 - Setting data field “xxx“ to undefined is invalid.报错原因及解决方案

    前言 ## 完整报错(SEO) ## 翻译: 将数据字段"XXX"设置为未定义是无效的. Setting data field "xxx" to undefin ...

  5. module ‘torchtext.data‘ has no attribute ‘LabelField‘

    吐槽一句,torchtext比较垃圾,却不知道为啥这么多人用. 言归正传,原因是:torchtext更新了,新版本不这么用了. 解决办法: 将 from torchtext import data 改 ...

  6. 微信小程序——tap undefined Setting data field currType to undefined is invalid.

    微信小程序,点击tap 赋值报这个错误. tapType: function (e){ var that = this; console.log(e) const currType = e.curre ...

  7. torchtext field.build_vocab问题

    因为毕设项目,第一次做文本数据预处理,太菜了, 踩了一堆坑,后来遇到一个问题 <ipython-input-11-107660a37432> in <module>()1 #T ...

  8. ‘function‘ object has no attribute ‘splits‘(Torchtext加载数据集出现的问题)

    如加载WikiText时torchtext.datasets.WikiText2.splits(TEXT)报错 解决方案:在datasets前加legacy 如:torchtext.legacy.da ...

  9. 三两下实现NLP训练和预测,这四个框架你要知道

    作者 | 狄东林 刘元兴 朱庆福 胡景雯 编辑 | 刘元兴,崔一鸣 来源 | 哈工大SCIR(ID:HIT_SCIR) 引言 随着人工智能的发展,越来越多深度学习框架如雨后春笋般涌现,例如PyTorc ...

最新文章

  1. 就业丨2018年国内就业薪资高的5大编程语言排行
  2. [Everyday Mathematics]20150204
  3. ef entity转json引起的Self referencing loop
  4. [转载] 人工智能:一种现代方法——第1章 绪论
  5. GDT、GDTR、LDT、LDTR的理解
  6. 光流 | 特征光流之视频中物体检测一(论文分享)
  7. PHP上传方式base64图片的接收方式
  8. 怎么看表_干货 | 剪力墙、柱、板配筋率到底怎么算?
  9. 前端错误日志收集方案
  10. jQuery环境搭建
  11. WAS生成的文件:javacore.***.txt 、heapdump.***.phd、core.***.dmp、Snap.***.trc
  12. inputtextarea表单提示文字
  13. JS JQ 页面加载顺序方法的区别
  14. Codechef Black Nodes in Subgraphs(树型背包)
  15. windows下双击可运行的Java软件打包方案
  16. 基于JavaWeb的学生信息管理系统
  17. 我的世界联机侠java_我的世界联机侠手机版-我的世界联机侠下载-Minecraft中文分享站...
  18. Echarts 地图中地点轮播
  19. 深入理解JVM---JVM垃圾回收机制
  20. 读什么,让你的生活既有诗又有远方

热门文章

  1. 查看函数库.a函数符号信息
  2. Linux中.rpm,Linux中rpm的使用
  3. vscode如何连接新设备_台州要用“超级平台”连接300万台工业设备,成为全省新示范...
  4. python 时分秒毫秒_python将时分秒转换成秒的实例
  5. 打印纸张尺寸换算_纸张尺寸与开(K)数换算
  6. 堆排序时间复杂度_图解堆结构、堆排序及堆的应用
  7. java dumpstack_Java获取执行进程的dump文件及获取Java stack
  8. 电脑机器人_【头条】厚积薄发!卡达电脑智能机器人之纸箱码垛机赋能智能工厂...
  9. websocket底层处理粘包_Socket解决粘包问题1
  10. pca图像压缩python_基于PCA的图像降维及图像重构