PDF (12324K)
摘要
在水利工程历史建设过程中,受到文本信息化水平的限制,积累了大量以纸质文本和扫描图像形式保存的水工混凝土材料不可编辑文档,难以直接有效利用材料数据,极大增加了材料知识应用的难度。提出一种基于机器视觉和深度学习的文档解析方法,准确高效地将水工混凝土材料文本信息和表格数据转化为可编辑形式。进一步,基于已解译的表格信息,构建了水工混凝土材料表格数据库,实现了混凝土材料数据的高效查询和统一管理。以实际工程的水工混凝土材料文档为例验证新方法的可行性,结果表明,文档解析方法各项子任务的准确率均达90%以上,有助于混凝土材料不可编辑资源的自动化再利用。
Abstract
In the process of historical construction of water conservancy projects, limited by the level of text informatization, a large number of non-editable documents of hydraulic concrete materials have been accumulated in the form of paper texts and scanned images, making it difficult to directly and effectively utilize material data, greatly increasing the difficulty of applying material knowledge. A document parsing method was proposed based on machine vision and deep learning, which accurately and efficiently converts the text information and table data of hydraulic concrete materials into editable form. Furthermore, based on the interpreted table information, a database of hydraulic concrete material tables was constructed, achieving efficient querying and unified management of concrete material data. Taking the actual engineering hydraulic concrete material document as an example to verify the feasibility of new method, the result show that the accuracy of each subtask of the document parsing method is over 90%, which is helpful for the automated reuse of non-editable resources of concrete materials and improves the data service capability in the field of water conservancy engineering.
关键词
水工混凝土材料
/
版面结构划分
/
文本检测与识别
/
表格数据库
Key words
hydraulic concrete materials
/
layout structure division
/
text detection and recognition
/
table database
Author summay
[Author(id=1248649898992718529, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, orderNo=0, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=huamei@ctg.com.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1248649899051438788, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649898992718529, language=EN, stringName=Huamei YANG, firstName=Huamei, middleName=null, lastName=YANG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1. China Three Gorges Corporation, Beijing 100038, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1248649899101770438, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649898992718529, language=CN, stringName=杨华美, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1.中国长江三峡集团有限公司, 北京 100038, bio={"content":"杨华美(1986—), 女, 副教授, 博士, 主要从事水工混凝土性能研究。E-mail: yang_ huamei@ctg.com.cn
"}, bioImg=null, bioContent=杨华美(1986—), 女, 副教授, 博士, 主要从事水工混凝土性能研究。E-mail: yang_ huamei@ctg.com.cn
, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1248649898778809014, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, xref=null, ext=[AuthorCompanyExt(id=1248649898795586230, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898778809014, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1. China Three Gorges Corporation, Beijing 100038), AuthorCompanyExt(id=1248649898808169143, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898778809014, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国长江三峡集团有限公司, 北京 100038)])]), Author(id=1248649899147907785, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, orderNo=1, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=liuleping@tju.edu.cn, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1248649899206628043, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899147907785, language=EN, stringName=Leping LIU, firstName=Leping, middleName=null, lastName=LIU, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=2, address=2. State Key Laboratory of Hydraulic Engineering Intelligent Construction and Operation, Tianjin University, Tianjin 300350, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1248649899248571085, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899147907785, language=CN, stringName=刘乐平, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=2, address=2.天津大学 水利工程智能建设与运维全国重点实验室, 天津 300350, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1248649898854306489, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, xref=null, ext=[AuthorCompanyExt(id=1248649898866889402, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898854306489, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2. State Key Laboratory of Hydraulic Engineering Intelligent Construction and Operation, Tianjin University, Tianjin 300350), AuthorCompanyExt(id=1248649898879472315, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898854306489, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=2.天津大学 水利工程智能建设与运维全国重点实验室, 天津 300350)])]), Author(id=1248649899290514128, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, orderNo=2, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1248649899349234388, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899290514128, language=EN, stringName=Wenwei LI, firstName=Wenwei, middleName=null, lastName=LI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1. China Three Gorges Corporation, Beijing 100038, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1248649899391177428, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899290514128, language=CN, stringName=李文伟, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1.中国长江三峡集团有限公司, 北京 100038, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1248649898778809014, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, xref=null, ext=[AuthorCompanyExt(id=1248649898795586230, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898778809014, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1. China Three Gorges Corporation, Beijing 100038), AuthorCompanyExt(id=1248649898808169143, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898778809014, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国长江三峡集团有限公司, 北京 100038)])]), Author(id=1248649899433120471, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, orderNo=3, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1248649899491840730, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899433120471, language=EN, stringName=Xufang DENG, firstName=Xufang, middleName=null, lastName=DENG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=3, address=3. China Yangtze Power Co., Ltd., Wuhan 430014, Hubei, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1248649899537978076, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899433120471, language=CN, stringName=邓旭方, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=3, address=3.中国长江电力股份有限公司, 湖北 武汉, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1248649898921415357, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, xref=null, ext=[AuthorCompanyExt(id=1248649898933998270, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898921415357, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3. China Yangtze Power Co., Ltd., Wuhan 430014, Hubei, China), AuthorCompanyExt(id=1248649898950775487, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898921415357, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3.中国长江电力股份有限公司, 湖北 武汉)])]), Author(id=1248649899579921118, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, orderNo=4, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1248649899638641376, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899579921118, language=EN, stringName=Shuguang LI, firstName=Shuguang, middleName=null, lastName=LI, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1. China Three Gorges Corporation, Beijing 100038, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1248649899680584417, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899579921118, language=CN, stringName=李曙光, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=1, address=1.中国长江三峡集团有限公司, 北京 100038, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1248649898778809014, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, xref=null, ext=[AuthorCompanyExt(id=1248649898795586230, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898778809014, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1. China Three Gorges Corporation, Beijing 100038), AuthorCompanyExt(id=1248649898808169143, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898778809014, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=1.中国长江三峡集团有限公司, 北京 100038)])]), Author(id=1248649899722527459, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, orderNo=5, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1248649899777053413, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899722527459, language=EN, stringName=Zhenghu CHEN, firstName=Zhenghu, middleName=null, lastName=CHEN, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=3, address=3. China Yangtze Power Co., Ltd., Wuhan 430014, Hubei, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1248649899823190758, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899722527459, language=CN, stringName=陈正虎, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=3, address=3.中国长江电力股份有限公司, 湖北 武汉, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1248649898921415357, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, xref=null, ext=[AuthorCompanyExt(id=1248649898933998270, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898921415357, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3. China Yangtze Power Co., Ltd., Wuhan 430014, Hubei, China), AuthorCompanyExt(id=1248649898950775487, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898921415357, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3.中国长江电力股份有限公司, 湖北 武汉)])]), Author(id=1248649899865133800, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, orderNo=6, firstName=null, middleName=null, lastName=null, nameCn=null, orcid=null, stid=null, country=null, authorPic=null, dead=0, email=null, emailSecond=null, emailThird=null, correspondingAuthor=0, authorType=1, ext={EN=AuthorExt(id=1248649899923854058, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899865133800, language=EN, stringName=Lun DENG, firstName=Lun, middleName=null, lastName=DENG, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=3, address=3. China Yangtze Power Co., Ltd., Wuhan 430014, Hubei, China, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null), CN=AuthorExt(id=1248649899965797099, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, authorId=1248649899865133800, language=CN, stringName=邓伦, firstName=null, middleName=null, lastName=null, prefix=null, suffix=null, authorComment=null, nameInitials=null, affiliation=null, department=null, xref=3, address=3.中国长江电力股份有限公司, 湖北 武汉, bio=null, bioImg=null, bioContent=null, aboutCorrespAuthor=null)}, companyList=[AuthorCompany(id=1248649898921415357, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, xref=null, ext=[AuthorCompanyExt(id=1248649898933998270, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898921415357, language=EN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3. China Yangtze Power Co., Ltd., Wuhan 430014, Hubei, China), AuthorCompanyExt(id=1248649898950775487, tenantId=1045748351789510663, journalId=1221126710357164034, articleId=1248596719941206508, companyId=1248649898921415357, language=CN, country=null, province=null, city=null, postcode=null, companyName=null, departmentName=null, remark=3.中国长江电力股份有限公司, 湖北 武汉)])])]
杨华美,刘乐平,李文伟,邓旭方,李曙光,陈正虎,邓伦.
水工混凝土材料非结构化文本解析与表格数据库构建[J].
水利水电技术(中英文), 2025, 56(S2): 66-70 DOI:10.13928/j.cnki.wrahe.2025.S2.016
基金资助
中国长江电力股份有限公司科研项目资助(Z212302036)