UNIMO learns from different modalities of data, including images, texts and image-text
pairs, thus achieving more robust and generalizable
representations for both textual and visual input.
人类识别的角度,所谓的智能就是模仿人的智能,创新的模型架构一般都是受人的智能启发
Humans perceive the world through many modalities, such as sound, vision and language.
图像,文本,图像文本对


UNIMO learns visual
representations and textual representations simultaneously, and unifies them into the same semantic
space via cross-modal contrastive learning (CMCL)
based on a large-scale corpus of image collections,
text corpus and image-text pairs.
相当于三个模型,融合对比,图像,文本,图像文本对。

class UNIMOEmbeddings(nn.Layer):#Include embeddings from word, position andtoken_type.def __init__(self,vocab_size,hidden_size=768,hidden_dropout_prob=0.1,max_position_embeddings=512,type_vocab_size=4):super(UNIMOEmbeddings, self).__init__()self.word_embeddings = nn.Embedding(vocab_size, hidden_size)self.position_embeddings = nn.Embedding(max_position_embeddings,hidden_size)self.token_type_embeddings = nn.Embedding(type_vocab_size, hidden_size)def forward(self, input_ids, token_type_ids, position_ids):input_embedings = self.word_embeddings(input_ids)position_embeddings = self.position_embeddings(position_ids)token_type_embeddings = self.token_type_embeddings(token_type_ids)embeddings = input_embedings + position_embeddings + token_type_embeddingsreturn embeddings

transformer系列的编码都是如此,三个变量加权。

UNIMOLMHeadModel((unimo): UNIMOModel((embeddings): UNIMOEmbeddings((word_embeddings): Embedding(18000, 768, sparse=False)(position_embeddings): Embedding(513, 768, sparse=False)(token_type_embeddings): Embedding(4, 768, sparse=False))(encoder_norm): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout): Dropout(p=0.1, axis=None, mode=upscale_in_train)(encoder): TransformerEncoder((layers): LayerList((0): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(1): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(2): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(3): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(4): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(5): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(6): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(7): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(8): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(9): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(10): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train))(11): TransformerEncoderLayer((self_attn): MultiHeadAttention((q_proj): Linear(in_features=768, out_features=768, dtype=float32)(k_proj): Linear(in_features=768, out_features=768, dtype=float32)(v_proj): Linear(in_features=768, out_features=768, dtype=float32)(out_proj): Linear(in_features=768, out_features=768, dtype=float32))(linear1): Linear(in_features=768, out_features=3072, dtype=float32)(dropout): Dropout(p=0, axis=None, mode=upscale_in_train)(linear2): Linear(in_features=3072, out_features=768, dtype=float32)(norm1): LayerNorm(normalized_shape=[768], epsilon=1e-05)(norm2): LayerNorm(normalized_shape=[768], epsilon=1e-05)(dropout1): Dropout(p=0.1, axis=None, mode=upscale_in_train)(dropout2): Dropout(p=0.1, axis=None, mode=upscale_in_train)))))(lm_head): UNIMOLMHead((transform): Linear(in_features=768, out_features=768, dtype=float32)(layer_norm): LayerNorm(normalized_shape=[768], epsilon=1e-05))
)

DEBUG代码看模型的结构定义是最清晰的方式,每一层是什么,结构是什么,参数是什么,输入输出是什么。比文档还清晰明了



https://blog.csdn.net/qq_15821487/article/details/120035220 参考python传参定义

input = paddle.to_tensor([[1,2],[3,4],[5,6]])index = paddle.to_tensor([0,1])output = paddle.gather(input, index, axis=0)# expected output: [[1,2],[3,4]]

截断,某个函数不清晰,直接点进去源码看一眼就好了,看多了自然就记住了

文件里面进去没找到对应的原函数的时候,直接按住ctrl,然后从弹出的列表中对照参数找到对应的函数。

通用的生成函数,聊天机器人的也是这个

   The interface for generation task. This method can generate sequences by using decoding strategy. Currently, there are three decoding strategies supported: "greedy_search", "sampling" and "beam_search".

text retrieval
文本检索

Cross-Modal Contrastive Learning

图像文本融合学习

Text Rewriting

文本重写和增强

Text Enhance Vision

Text Enhance Vision

互相融合和增加,各种语义特征空间

72_text_generation\unimo-text 理解相关推荐

  1. python画图库哪个好_机器学习基础5--python画图库matplotlib(上)

    图像是我们最直观的数据表达方式,python的matplotlib库可以用来画图.下面来简单总结下matplotlib的使用方法. 上篇讲matplot画图中用到的基础对象,包括图像Figure,平面 ...

  2. Qt 官方示例 | 这几个 QML 版的 Hello World 你学会了吗?

    .我是老吴,一枚光荣的嵌入式底层劳动人民. 作为一名 C++ 手残党的我,又来分享 Qt 的学习心得啦. 学习 Qt 的最佳途径是阅读官方的手册和示例, 今天要分享的是 Qt 官方提供的几个 Qt Q ...

  3. Android学习之仿QQ側滑功能的实现

    如今项目越来越多的应用了滑动删除的功能,Android本来遵循的是长按删除,IOS定制的是滑动删除,不可否认滑动删除确实在客户体验上要好一点,所以看了非常多关于仿QQ滑动删除的样例,还是感觉代码家的A ...

  4. Android学习之仿QQ侧滑功能的实现

    现在项目越来越多的应用了滑动删除的功能,Android本来遵循的是长按删除,IOS定制的是滑动删除,不可否认滑动删除确实在客户体验上要好一点,所以看了很多关于仿QQ滑动删除的例子,还是感觉代码家的An ...

  5. React-Native学习

    react-native 一.react-native布局 RN本质是在React的基础上添加了react-native组件库 View理解为div Text理解为span RN样式是 非层叠样式:子 ...

  6. 微信小程序(文件组成 、目录结构、生命周期方法、AppId、组件(标签)、语法、事件、Api、开发工具)

    目录 AppID 文件组成 目录结构 app.json 其它全局配置链接 app.js app.wxss App 参考文档 生命周期方法 App.js中周期方法 onload(opt) onReady ...

  7. Java VO转PO(MapStruct使用)

    文章目录 一.代码分层介绍 1.应用分层与领域模型 2.为什么要应用分层开发和区分领域模型 3.不同的实体类间进行转换 二.使用MapStruct 1.官方文档Introduction翻译 2.添加M ...

  8. FICO:会计恒等式

    preface: 有顾问说,用SAP做FICO的最高境界是忘记借贷.呵,这估计是对SAP的溢美之词. 作为业务软件(fico应该算此类),依据的是业务原则,而我们用它,透彻灵活的用好它,其设计的依据和 ...

  9. TextView上使用inputType=“textMultiLine“问题

    问题描述: 发现点击ll_group的时候,如果点击位置在TextView上,无法触发 contentBinding.llGroup.setOnClickListener(this::socialGr ...

  10. vue 单相绑定_Vuejs第一篇之入门教程详解(单向绑定、双向绑定、列表渲染、响应函数)...

    Vuejs第一篇之入门教程详解(单向绑定.双向绑定.列表渲染.响应函数) 2018-12-30 什么是组件? 组件(Component)是 Vue.js 最强大的功能之一.组件可以扩展 HTML 元素 ...

最新文章

  1. 北航c语言简答题目汇总_2020下半年至2021年【化学/计算机/生物类】国际竞赛汇总!...
  2. Sql Server函数全解(二)数学函数
  3. tf.layers.dense
  4. spring mvc学习(21):testparam请求参数和请求头表达式
  5. PolarDB-X 云原生分布式数据库 > 最佳实践 > 如何选择分片数
  6. SHFormatDrive格式化硬盘
  7. python自学免费教程-python免费入门教程/求完整的Python入门教程
  8. [云计算]网线的标签格式
  9. python导入类有红线_解决Python中导入自己写的类,被划红线,但不影响执行的问题...
  10. xBIM 基础07 创建WebBIM文件
  11. 自媒体标题不会写?用这个工具,一键生成爆文标题
  12. fatal: unable to access ...: LibreSSL SSL_connnect: Connection reset by peer in connect to... :443
  13. 南华大学计算机学院团学会成员,【计算机科学与技术学院】计算机学院第九届研究生团学会换届选举会议圆满结束...
  14. GitHub 热点速览 Vol.16:化身蒙娜丽莎和乔布斯对话
  15. java 安卓客户端开发_《安卓网络编程》之第一篇 java环境下模拟客户端、服务器端...
  16. Tomcat服务器的安装使用
  17. 无法在web服务器上启动调试。未能启动asp.net调试。在不调试的情况下启动项目也许能获得更多信息。
  18. 做回归分析时import ConvergenceWarning出错的问题
  19. “屠龙之技”与“潜规则”(Mediator模式)
  20. opencv计算图像的水平投影和垂直投影

热门文章

  1. East!模拟赛 Round 1 题目,题解在上三篇博客。
  2. 跟着我从零开始入门FPGA(一周入门系列)第二天
  3. C#生成随机数的三种方法
  4. 怎么看计算机配件型号,笔记本屏幕型号怎么看_笔记本电脑屏幕型号的查看步骤-win7之家...
  5. ZOJ 3716 - Ribbon Gymnastics
  6. 智能卡的操作系统——COS
  7. 微星MAG B650M mortar wifi主板设置温度墙
  8. 网页颜色搭配技巧nbsp;文字字体、字号…
  9. java poi生成的word表格在wps中的显示问题
  10. Python爬取新浪新闻评论的url查找方法