
A curated list of resources dedicated to table recognition


1. Papers

  • *CODE means official code and CODE means not official code
Conf. Date Title Highlight code
arXiv 2021/12/2 Flexible Table Recognition and Semantic Interpretation System Others *CODE
arXiv 2021/11/18 PubTables-1M: Towards comprehensive table extraction from unstructured documents Dataset *CODE
arXiv 2021/5/23 Multi-Type-TD-TSR – Extracting Tables from Document Images using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: from OCR to Structured Table Representations Others **CODE
ICCV 2021 Parsing Table Structures in the Wild Dectction No
ICCV 2021 TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition GNN *CODE
ICDAR Competition 2021 ICDAR 2021 Competition on Scientific Literature Parsing Dataset *CODE
ICDAR Competition 2021 PingAn-VCGroup’s Solution for ICDAR 2021 Competition on Scientific Literature Parsing Task B: Table Recognition to HTML Sequence *CODE
ICDAR Competition 2021 LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment Others *CODE
WACV 2021 Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context Others No
CVPR Workshop 2020 CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents Others *CODE
ECCV 2020 Image-based table recognition: data, model, and evaluation Dataset *CODE
ECCV 2020 Table structure recognition using top-down and bottom-up cues Others *CODE
LREC 2020 TableBank: A Benchmark Dataset for Table Detection and Recognition Dataset *CODE
arXiv 2019/8/28 Complicated table structure recognition Others *CODE
ICDAR 2019 Rethinking Table Recognition using Graph Neural Networks GNN *CODE
ICDAR 2019 Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images Others No
ICDAR 2019 Res2tim: Reconstruct syntactic structures from table images. Others *CODE
ICDAR 2017 Deepdesrt: Deep learning for detection and structure recognition of tables in document images Others No

2. Datasets

Dataset Description Examples dataset link
TableBank English TableBank is a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet, contains 417K high-quality labeled tables.It only contain cell Topology groudtruth TableBank TableBank
SciTSR English SciTSR is a large-scale table structure recognition dataset, which contains 15,000 tables in PDF format and their corresponding structure labels obtained from LaTeX source files.It contain cell Topology, cell content groudtruth SciTSR SciTSR
PubTabNet English PubTabNet is a large dataset for image-based table recognition, containing 568k+ images of tabular data annotated with the corresponding HTML representation of the tables.It contain cell Topology, cell content and non-blank cell location groudtruth PubTabNet PubTabNet
FinTabNet English This dataset contains complex tables from the annual reports of S&P 500 companies with detailed table structure annotations to help train and test structure recognition. FinTabNet FinTabNet
PubTables-1M English A large, detailed, high-quality dataset for training and evaluating a wide variety of models for the tasks of table detection, table structure recognition, and functional analysis. PubTables-1M PubTables-1M
WTW English WTW-Dataset is the first wild table dataset for table detection and table structure recongnition tasks, which is constructed from photoing, scanning and web pages, covers 7 challenging cases like: (1)Inclined tables, (2) Curved tables, (3) Occluded tables or blurredtables (4) Extreme aspect ratio tables (5) Overlaid tables, (6) Multi-color tables and (7) Irregular tables in table structure recognition. WTW WTW
TNCR English a new table dataset with varying image quality collected from open access websites.TNCR contains 9428 labeled tables with approximately 6621 images.their classification into 5 different classes(Full Lined,Merged Cells,No lines,Partial Lined,Partial Lined Merged Cells). TNCR TNCR
TAL_OCR_TABLE Chinese TAL_OCR_TABLE dataset come from TAL Form Recognition Technology Challenge.The data of comes from the real homework of students in the education scene and the scene of the test paper. It contain 16k train image and 4k test imageIt contain cell Topology, cell content and all cell location groudtruth TAL_OCR_TABLE TAL_OCR_TABLE


  1. 表格识别论文阅读——《Robust Table Detection and Structure Recognition from Heterogeneous Document Images》

    摘要 引入了一种名为RobusTabNet的方法来进行表格检测和结构识别.对于表格检测,使用CornerNet来作为RPN.对于表格结构识别,提出了基于空间CNN的分割模块和基于Grid CNN的单元 ...

  2. 三篇论文,纵览深度学习在表格识别中的最新应用

    本文从三篇表格识别领域的精选论文出发,深入分析了深度学习在表格识别任务中的应用. 表格识别是文档分析与识别领域的一个重要分支,其具体目标是从表格中获取和访问数据及其它有效信息.众所周知,本质上表格是信 ...

  3. 2021-IEEE论文-深度神经网络在文档图像表格识别中的应用现状及性能分析

    2021年5月12日收到, 2021年6月4日接受, 出版日期2021年6月9日, 当前版本日期2021年6月24日. 原论文下载地址 摘要 - Abstract   表格识别的第一阶段是检测文档中的 ...

  4. 论文阅读: (ICDAR2021 海康威视)LGPMA(表格识别算法)及官方源码对应解读

    目录 引言 2022-06-08 update LGPMA整体结构 训练阶段 Aligned Boudning Box Detection(对齐的包围框检测) Local Pyramid Mask A ...

  5. 识别、提取三维超声中标准平面的总结+论文+代码+数据集+练习合集

    目录 数据特点 三维空间定位标准平面 基于监督学习方法 基于强化学习方法 wulalago/LearningNote: some resources on my path in deep learni ...

  6. ICDAR 2021竞赛 科学文献分析——表格识别综述部分(剩余部分是文档布局分析)

    任务B为表格识别部分,本文暂只看表格识别 摘要(不重要,想直接看表格识别部分可以跳过). 科学文献包含与不同领域的前沿创新有关的重要信息.自然语言处理的进步推动了科学文献信息自动提取的快速发展.然而, ...

  7. latex 表格中虚线_如何识别和修复表格识别中的虚线

    latex 表格中虚线 When documents are digitalized via scanning or via photo, the image quality can suffer f ...

  8. 深度学习资源一网打尽!论文、数据集、框架、课程、图书等应有尽有

    乾明 发自 凹非寺 量子位 出品 | 公众号 QbitAI 最近,GitHub上出现一份深度学习资源,涵盖深度学习的各个方面,包括论文.数据集.课程.图书.博客.教程.框架等. 资源的贡献者说,与其他 ...

  9. GitHub一份深度学习资源,包括论文、数据集、课程、图书、博客、教程、框架

    整体来说,这份资源可以理解为是深度学习领域的hao123,一共将深度学习各方面的资源分成了7大类.具体是: 论文 论文资源版块,一共分成3个类别,分别是模型.核心和应用. 在每个类别之下,又进行了两次 ...


  1. BZOJ1861:[ZJOI2006]书架(Splay)
  2. react-native-image-picker 运用launchCamera直接调取摄像头的缺陷及修复
  3. 【正一专栏】儿时的夏天——似水流年
  4. linux 运行c b停止,以下Linux命令中,用于终止某个进程的命令是()。A.deadB.killC.quitD.exit...
  5. 需求简报_代码简报:我如何通过做自己喜欢的事情来获得顶级技术实习
  6. 复述-软考网规--云计算专题
  7. django model中的DateField()转为时间戳
  8. 面向对象 —— 类的分类
  9. poj 1159 Palindrome(dp)
  10. 计算机存储信息的单位
  11. 如何把电视盒子做成游戏机? —— 破解电视IPTV盒子(Skyworth E900-S)
  12. 项目管理十大过程思维导图
  13. Google的愚人节
  14. 人工智能开源项目推荐
  15. android 破解软件
  16. 诚之和:虚假滤镜、照骗风波,小红书到底得了什么病?
  17. 临时起搏器测试----概念梳理
  18. 树莓派ubuntu换源
  19. 曝光 兼职达人(深圳市青木网络科技)无耻、恶心
  20. 【硬创邦】跟hoowa学做智能路由(十):扩充RAM和FLASH


  1. use of undefined constant php assumed php,php程序语言中出现:Use of undefined constant H - assumed 'H'...
  2. dependencies devDependencies peerDependencies三者的区别
  3. C语言笔记 | 一元二次方程
  4. mysql的sandbox_利用 mysql-sandbox快速搭建MySQL测试环境
  5. 因为套用这个模板,我成了公司最佳员工
  6. 【东华oj】复试练习题
  7. (一)微信公众号环境搭建与开发接入
  8. WordPress插件 Hide My WP 没人知道你使用的是WP系统[更新v3.0]
  9. 售前管理——怎样写解决方案
  10. k8s之Secret详细理解及使用