TGMS:一种端到端的表格图像到标记序列的识别框架

李世琪; 金大海; 宫云战

doi:10.20009/j.cnki.21-1106/TP.2025-0253

小型微型计算机系统 ›› 2026, Vol. 47 ›› Issue (5) : 1175 -1181. DOI: 10.20009/j.cnki.21-1106/TP.2025-0253

计算机图形与图像

TGMS:一种端到端的表格图像到标记序列的识别框架

李世琪, 金大海, 宫云战

作者信息 +

TGMS:an End-to-end Framework for Table Graph to Markup Sequence

LI Shiqi, JIN Dahai, GONG Yunzhan

Author information +

文章历史 +

摘要

由于表格样式和布局的多样性,从文档图像中识别二维结构的表格是一项复杂的任务.表格以紧凑的形式表达数据内容,提高信息传递和人类理解效率,但与人类相比,机器需要理解二维结构与内容之间的关系,因此使用机器自动识别表格面临很大的挑战.针对这一任务,提出了一种端到端的表格图像到标记序列的识别框架(TGMS:An End-to-End Framework for Table Graph to Markup Sequence).该框架首先使用卷积神经网络来进行视觉特征提取,然后采用基于分割的方法识别单元格空间位置,构建表图并利用图卷积网络和注意力机制推导逻辑关系,最后识别区域内文本并结合逻辑关系生成表格标记序列.在ICDAR-2013、SciTSR、PubTabNet 3个广泛使用的表格识别数据集上的实验结果表明,所提出的TGMS能有效完成表格识别任务.

Abstract

Recently,due to the variety of table styles and layouts,recognizing tables with 2D structure from document images is a complex task.Tables express data content in a compact form to improve the efficiency of information transfer and human comprehension,but the relationship between the 2D structure and the content needs to be understood by machines,making it challenging to automatically recognize tables.To address these issues,an end-to-end framework for Table Graph to Markup Sequence is proposed,named TGMS.The framework first uses a convolutional neural network for visual feature extraction,and then employs a segmentation-based approach to recognize the spatial location of cells.Secondly,it uses spatial location information to recognize the text in the region and constructs a graph,and deduces logical relationships using a graph convolutional network and an attention mechanism.Finally,the last module generates a sequence of table tokens by combining the logical relationships and the text in the cell.Experimental results on three widely used form recognition datasets,ICDAR-2013,SciTSR,and PubTabNet,show that the proposed TGMS can effectively accomplish the form recognition task.

关键词

表格结构识别 / 表格识别 / 端到端 / 图卷积网络 / 注意力机制

Key words

table structure recognition / table recognition / end-to-end / graph convolutional network / attention mechanism

引用本文

引用格式 ▾

李世琪, 金大海, 宫云战. TGMS:一种端到端的表格图像到标记序列的识别框架[J]. 小型微型计算机系统, 2026, 47(5): 1175-1181 DOI:10.20009/j.cnki.21-1106/TP.2025-0253

登录浏览全文

4963

注册一个新账户忘记密码

参考文献

[1] KONG L J,BAO Y C,WANG Q W,et al.A summary of table detection and recognition algorithms based on deep learning[J].Computer and Network,2021,47(2):65-73.
[2] Yoon J,Zhang Y,Jordon J,et al.Vime:extending the success of self-and semi-supervised learning to tabular domain[C]//Advances in Neural Information Processing Systems,2020:11033-11043.
[3] Ramel J Y,Crucianu M,Vincent N,et al.Detection,extraction and representation of tables[C]//7th International Conference on Document Analysis and Recognition,2003:374-378.
[4] Yildiz B,Kaiser K,Miksch S.pdf2table:a method to extract table information from PDF files[C]//Indian International Conference on Artificial Intelligence,2008,doi:US591830 A.
[5] Hassan T,Baumgartner R.Table recognition and understanding from PDF files[C]//9th International Conference on IEEE,2007,doi:10.1109/ICDAR.2007.4377094.
[6] Jane Hoffswell,Zhicheng Liu.Interactive repair of tables extracted from PDF documents on mobile devices[C]//Proceedings of the CHI Conference on Human Factors in Computing Systems,2019:1-13.
[7] Itonori K.Table structure recognition based on textblock arrangement and ruled line position[C]//International Conference on Document Analysis & Recognition,1993,doi:10.1109/ICDAR.1993.395625.
[8] Kieninger T G.Table structure recognition based on robust block segmentation[C]//Document Recognition V,1998:22-32.
[9] Wang Y,Phillips I T,Haralick R M.Table structure understanding and its performance evaluation[J].Pattern Recognition,2004,37(7):1479-1497.
[10] Schreiber S,Agne S,Wolf I,et al.DeepDeSRT:deep learning for detection and structure recognition of tables in document images[C]//14th IAPR International Conference on Document Analysis and Recognition(ICDAR),2017,doi:10.1109/ICDAR.2017.192.
[11] Qiao L,Li Z,Cheng Z,et al.Lgpma:complicated table structure recognition with local and global pyramid mask alignment[C]//International Conference on Document Analysis and Recognition,2021:99-114.
[12] Prasad D,Gadpal A,Kapadni K,et al.CascadeTabNet:an approach for end to end table detection and structure recognition from image-based documents[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,2020:572-573.
[13] Paliwal S S,Vishwanath D,Rahul R,et al.Tablenet:deep learning model for end-to-end table detection and tabular data extraction from scanned document images[C]//International Conference on Document Analysis and Recognition(ICDAR),2019:128-133.
[14] Ashish V,Noam S,Niki P,et al.Attention is all you need[C]//Advances in Neural Information Processing Systems,2017:5998-6008.
[15] Nassar A,Livathinos N,Lysak M,et al.Tableformer:table structure understanding with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2022:4614-4623.
[16] Jiaquan Y,Xianbiao Q,Yelin H,et al.PingAn-VCGroup′s solution for ICDAR 2021 Competition on Scientific literature parsing task B:table recognition to HTML[J].Computing Research Repository,2021,doi:10.48550/arXiv.2105.01848.
[17] Nam T L,Atsuhiro T,Phuc N,et al.Rethinking image-based table recognition using weakly supervised methods[C]//International Conference on Pattern Recognition Applications and Methods,2023:872-880.
[18] Nam Tuan Ly,Atsuhiro Takasu.An end-to-end multi-task learning model for image-based table recognition[C]//Proceedings of the 18th International Joint Conference on Computer Vision,Imaging and Computer Graphics Theory andApplications,2023:626-634.
[19] Ren S,He K,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis & Machine Intelligence,2017,39(6):1137-1149.
[20] Long J,Shelhamer E,Darrell T.Fully convolutional networks for semantic segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,39(4):640-651.
[21] Khan S A,Khalid S M D,Shahzad M A,et al.Table structure extraction with bi-directional gated recurrent unit networks[C]//International Conference on Document Analysis and Recognition(ICDAR),2019:1366-1371.
[22] Siddiqui S A,Khan P I,Dengel A,et al.Rethinking semantic segmentation for table structure recognition in documents[C]//International Conference on Document Analysis and Recognition(ICDAR),2019,doi:10.1109/ICDAR.2019.00225.
[23] Tensmeyer C,Morariu V I,Price B,et al.Deep splitting and merging for table structure decomposition[C]//International Conference on Document Analysis and Recognition(ICDAR),2019,doi:10.1109/ICDAR.2019.00027.
[24] Siddiqui S A,Fateh I A,Rizvi S T R,et al.DeepTabStR:deep learning based table structure recognition[C]//International Conference on Document Analysis and Recognition(ICDAR),2020,doi:10.1109/ICDAR.2019.00226.
[25] Zheng X,Burdick D,Popa L,et al.Global table extractor(GTE):a framework for joint table identification and cell structure recognition using visual context[C]//Workshop on Applications of Computer Vision,2021,doi:10.1109/WACV48630.2021.00074.
[26] Sachin Raja,Ajoy Mondal,Jawahar C V.Table structure recognition using top-down and bottom-up cues[C]//European Conference on Computer Vision,2020,doi:10.1007/978-3-030-58604-1_5
[27] Li M,Cui L,Huang S,et al.Tablebank:table benchmark for image-based table detection and recognition[C]//Proceedings of the 12th Language Resources and Evaluation Conference,2020:1918-1925.
[28] Deng Y,Rosenberg D,Mann G.Challenges in end-to-end neural scientific table recognition[C]//International Conference on Document Analysis and Recognition(ICDAR),2019,doi:10.1109/ICDAR.2019.00148.
[29] Huang Y,Lu N,Chen D,et al.Improving table structure recognition with visual-alignment sequential coordinate modeling[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2023:11134-11143.
[30] Huawen S,Xiang G,Jin W,et al.Divide rows and conquer cells:towards structure recognition for large tables[C]//International Joint Conference on Artificial Intelligence,2023:1369-1377.
[31] Chi Z,Huang H,Xu H D,et al.Complicated table structure recognition[J].arXiv preprint arXiv:1908.04729,2019.
[32] Hao L,Xin L,Bing L,et al.Show,read and reason:table structure recognition with flexible context aggregator[C]//ACM International Conference on Multimedia,2021:1084-1092.
[33] Lang Cao,Hanbing Liu.TableMaster:a recipe to advance table understanding with language models[J].CoRR,2025,doi:10.48550/arXiv.2501.19378.
[34] Zhong X,Shafieibavani E,Yepes A J.Image-based table recognition:data,model,and evaluation[C]//European Conference on Computer Vision,2020,doi:10.1007/978-3-030-58589-1_34.
[35] He K,Zhang X,Ren S,et al.Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2016:770-778.
[36] Lin T Y,Dollár P,Girshick R,et al.Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2017:2117-2125.
[37] He K,Gkioxari G,Dollár P,et al.Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision,2017:2961-2969.
[38] Göbel M,Hassan T,Oro E,et al.ICDAR 2013 table competition[C]//12th International Conference on Document Analysis and Recognition,2013:1449-1453.
[39] Diederik P Kingma,Jimmy Ba.Adam:a method for stochastic optimization[C]//International Conference on Learning Representations,2014,doi:10.48550/arXiv.1412.6980.
[40] Pawlik M,Augsten N.Tree edit distance:robust and memory-efficient[J].Information Systems,2016,56:157-173,doi:10.1016/j.is.2015.08.004.
[41] Xue W,Li Q,Tao D.Res2tim:reconstruct syntactic structures from table images[C]//International Conference on Document Analysis and Recognition(ICDAR),2019:749-755.
[42] Zhenrong Z,Jianshu Z,Jun D,et al.Split,embed and merge:an accurate table structure recognizer[J].Pattern Recognition,2022,126:108565-108565,doi:10.1016/j.patcog.2022.108565.

附中文参考文献:[1] 孔令军,包云超,王茜雯,等.基于深度学习的表格检测识别算法综述[J].计算机与网络,2021,47(2):65-73.