A Novel Transformer based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images(语义分割任务)

Self-Supervised Learning with Swin Transformers(模型简称:MoBY,使用了对比学习)

Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation(医疗图像语义分割)


Rethinking Training from Scratch for Object Detection(看不懂)

Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight


DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation(医疗图像的语义分割)

Long-Short Temporal Contrastive Learning of Video Transformers

Video Swin Transformer



PVTv2: Improved Baselines with Pyramid Vision Transformer(Pyramid:金字塔)


CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows



What Makes for Hierarchical Vision Transformer?

CYCLEMLP: A MLP-LIKE ARCHITECTURE FOR DENSE PREDICTION


Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer


ConvNets vs. Transformers: Whose Visual Representations are More Transferable?

Vision transformers have attracted much attention from computer vision researchers as they are not restricted to the spatial inductive bias of ConvNets. However, although Transformer-based backbones have achieved much progress on ImageNet classification, it is still unclear whether the learned representations are as transferable as or even more transferable than ConvNets’ features. To address this point, we systematically investigate the transfer learning ability of ConvNets and vision transformers in 15 single-task and multi-task performance evaluations. Given the strong correlation between the performance of pretrained models and transfer learning, we include 2 residual ConvNets (i.e., R-101×3 and R-152×4) and 3 Transformer based visual backbones (i.e., ViT-B, ViT-L and Swin-B), which have close error rates on ImageNet, that indicate similar transfer learning performance on downstream datasets. We observe consistent advantages of Transformer-based backbones on 13 downstream tasks (out of 15), including but not limited to fine-grained classification, scene recognition (classification, segmentation and depth estimation), open-domain classification, face recognition, etc. More specifically, we find that two ViT models heavily rely on whole network fine-tuning to achieve performance gains while Swin Transformer does not have such a requirement. Moreover, vision transformers behave more robustly in multi-task learning, i.e., bringing more improvements when managing mutually beneficial tasks and reducing performance losses when tackling irrelevant tasks. We hope our discoveries can facilitate the exploration and exploitation of vision transformers in the future.
视觉变压器因其不局限于卷积神经网络的空间感应偏置而受到计算机视觉研究者的广泛关注。然而,尽管基于transformer的主干网在ImageNet分类方面取得了很大的进展,但我们仍然不清楚学习后的表示是否和卷积网络的特征一样可转移,甚至比卷积网络的特征更可转移。为了解决这一问题,我们在15个单任务和多任务性能评估中系统地研究了卷积神经网络和视觉变压器的迁移学习能力。考虑到预训练模型的性能与迁移学习之间的强相关性,我们包括2个残差ConvNets (R-101×3和R-152×4)和3个基于Transformer的视觉主干(vi - b、vi - l和swi - b),它们在ImageNet上的错误率接近,表明在下游数据集上的迁移学习性能类似。我们观察到基于transformer的骨干在13个下游任务(15个任务中)上具有一致的优势,包括但不限于细粒度分类、场景识别(分类、分割和深度估计)、开放域分类、人脸识别等。更具体地说,我们发现两个ViT模型严重依赖于整个网络的微调来实现性能增益,而Swin Transformer没有这样的需求。此外,视觉变压器在多任务学习中表现得更加稳健,即在管理互惠任务时带来更多的改进,在处理无关任务时减少性能损失。我们希望我们的发现可以促进未来视觉变压器的探索和开发。

SwinIR: Image Restoration Using Swin Transformer(重点是残差链接)

Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?




Semi-Supervised Wide-Angle Portraits Correction by Multi-Scale Transformer



3rd Place Scheme on Instance Segmentation Track of ICCV 2021 VIPriors Challenges

VIDT: AN EFFICIENT AND EFFECTIVE FULLY TRANSFORMER-BASED OBJECT DETECTOR


Satellite Image Semantic Segmentation(卫星图像语义分割)(手稿)
COVID-19 Detection in Chest X-ray Images Using Swin Transformer and Transformer in Transformer

HRFormer: High-Resolution Transformer for Dense Prediction


Vis-TOP: Visual Transformer Overlay Processor





Hepatic vessel segmentation based on 3D swin-transformer with inductive biased multi-head self-attention


Transformer-based Image Compression(图像压缩)



Swin Transformer V2: Scaling Up Capacity and Resolution

Vision Transformer with Deformable Attention

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images

SWIN-POSE: SWIN TRANSFORMER BASED HUMAN POSE ESTIMATION




关于大家对Swin Transformer的魔改论文模型记录(只关注Swin是如何使用的)相关推荐

  1. 【魔改练习题】五只小猪称体重

    语言:C++ 原题:在一个数组中记录了五只小猪的体重,如:int arr[5] = {300,350,200,400,250};  找出并打印最重的小猪体重. 魔改后需求:能够在输入五只小猪的体重后对 ...

  2. OpenAI魔改大模型,参数减少100倍!13亿参数InstructGPT碾压GPT-3

      视学算法报道   编辑:好困 拉燕 [新智元导读]文能吟诗一首,武能「打劫邻居」,只需百分之一的参数量,InstructGPT包您满意. 2020年5月,OpenAI推出了参数量高达1750亿的语 ...

  3. 全网首发,Swin Transformer+FaceNet实现人脸识别

    目录 一. 简介 二.Swin Transformer作为Backbone 1.Swin Transformer整体结构 2.PatchEmbed = Patch Partition + Linear ...

  4. Swin Transformer【Backbone】

    背景 Swin Transformer是ICCV2021最佳论文. ViT让transformer从NLP直接应用到CV有两个直接的问题:尺度问题(比如行人,车等大大小小的尺度问题在NLP领域就没有) ...

  5. ICCV 2021 Best Paper | Swin Transformer何以屠榜各大CV任务!

    作者:陀飞轮@知乎(已授权) 来源:https://zhuanlan.zhihu.com/p/360513527 编辑:智源社区 近日,Swin Transformer拿到2021 ICCV Best ...

  6. 大大刷新记录!Swin Transformer v2.0 来了,30亿参数!

    关注公众号,发现CV技术之美 本文转载自 微软研究院AI头条 编者按:2021年,获得 ICCV 最佳论文奖的 Swin Transformer,通过在广泛的视觉问题上证明 Transformer 架 ...

  7. 【神经网络架构】Swin Transformer细节详解-1

    目录 源码: 1. Patch Partition + Liner Embedding 模块 2. Swin Transformer block(一个完整的W-MSA) partition windo ...

  8. 【读点论文】Swin Transformer: Hierarchical Vision Transformer using Shifted Windows通过窗口化进行局部MSA,sw-MSA融合信息

    Swin Transformer: Hierarchical Vision Transformer using Shifted Windows abstract 本文提出了一种新的视觉transfor ...

  9. 【深度学习】论文阅读:(ICCV-2021))Swin Transformer

    这里写目录标题 论文详情 VIT缺点 改进点 概述 核心思想 整体结构 名称解释 Window.Patch.Token 与vit区别 结构过程 Patch Embedding BasicLayer P ...

最新文章

  1. 青少年电子信息智能创新大赛 -- 图形化编程挑战赛初赛试题说明
  2. 【Codeforces】835B The number on the board (贪心)
  3. Eclipse内存分析工具的用法
  4. 为什么刹车热了会失灵_汽车为什么要换刹车油?
  5. bat 存储过程返回值_为什么不推荐使用存储过程?
  6. jquery load 事件用法
  7. java 数据库mysql_java是怎么连接mysql数据库的
  8. python \uxxxx转中文,Python列表中的字典 \uxxxx转中文,
  9. 精读《useEffect 完全指南》
  10. JAVA中关于if结构的相关的练习题
  11. android tasker,Android 神器,Tasker 实战
  12. 超高精度时间频率同步及其应用
  13. 一个icon的选中与不选中
  14. python加速度算位移_基于Labview的加速度两次积分求位移
  15. window10企业版永久密钥激活
  16. 002 fidder中 Customize Rules打不开却无法下载问题
  17. Spark:利用tac+cellid基站定位
  18. 技嘉B560M VCCIO2电压设计缺陷
  19. 2021-2022学年广大附中九年级第一学期12月大联盟英语试题
  20. Python小工具——格雷码转换器

热门文章

  1. BZOJ(8) 1053: [HAOI2007]反素数ant
  2. 【转】Java程序员常用工具类库 - 目录
  3. 一次绕过360+诺顿的提权过程
  4. javascript array sort()
  5. dockerfile构建mysql_Dockerfile在linux上构建mysql8镜像并创建数据库-Go语言中文社区
  6. 剑三服务器文件在哪里,剑三服务器同步设置在哪
  7. 统计信号处理_声学前端:深度学习算法和传统信号处理方法各有千秋
  8. css 图文 上下 居中,CSS垂直居中的6种方法
  9. matlab画置信区间图,matlab绘制带置信区间的双y轴图形 | 学步园
  10. android 手机命令大全,adb 命令大全