Workshop on

Visual Text Generation and Text Image Processing

ICDAR 2025 Workshop

September 21, 2025 @ Wuhan, Hubei, China

Introduction

Visual text generation and text image preprocessing are two fundamental areas that play crucial roles in modern visual text analysis systems and directly impact the performance of downstream tasks such as OCR, information extraction, and visual text understanding. Visual text generation addresses the critical challenge of data scarcity by creating diverse, high-quality synthetic datasets [1, 2]. This not only reduces the cost and time of manual data collection but also enables the creation of comprehensive training sets that cover various visual text types and edge cases [3, 4]. Text image preprocessing [5], serving as the foundation of reliable visual text analysis pipelines, tackles real-world challenges such as low resolution [6, 7], uneven illumination [8, 9], and geometric distortions [10, 11]. These techniques are essential for handling various visual text conditions, from documents to scene images. The synergy of these topics to visual text analysis is becoming increasingly important: visual text generation techniques help create better training data, while advanced text image preprocessing techniques improve text quality in real conditions, facilitating model recognition. This workshop aims to bring together researchers and practitioners to work on these two topics and foster innovations in visual text analysis.

Schedule

All times in Beijing Time (UTC+08:00)

Time	Events
09:00 - 09:10	Opening Remarks
09:10 - 09:45	Prof. ZHOU, Wengang Invited Talk 1 : Intelligent Document Images Restoration and Understanding
09:45 - 10:20	Prof. LIAN, Zhouhui Invited Talk 2 : Font Synthesis via Deep Generative Models
10:20 - 10:40	Coffee Break
10:40 - 11:40	Oral Session
11:40 - 12:00	Splotlight Session
12:00 - 12:10	Announcement of Best Paper Awards (No Distinction in Order) Modular OCR Using Web Scraping Data DAA-Net: Dynamic Adaptive Aggregation Network for Document Image Rectification

Call for Papers

Acceptable submission topics may include but are not limited to:

GANs-based and Diffusion-based models for text image synthesis
Layout-aware document image generation
Real-synthetic domain gap analysis
Text generation model benchmarking
Image text removal, editing, style transfer
Shadow, ink, and watermark removal of text image
Illumination correction, deblurring, and binarization of text image
Text image super-resolution
Document image dewarping
Text segmentation
Tampered text detection

Submission

This workshop invites original contributions in both theoretical and applied research domains. All submissions must adhere to the formatting guidelines specified on the ICDAR 2025 official website. Paper length is limited to 15 pages (excluding references) and must comply with our double-blind review requirements:

Remove all author identifiers (names, affiliations, etc.) from the manuscript
Cite previous work in third-person format to avoid identity disclosure
Omit acknowledgments section in initial submissions

Submissions will be accepted through the workshop's CMT submission portal. At least one author of each accepted paper must complete workshop registration to present the work. Detailed submission procedures are available on the ICDAR 2025 guidelines portal.

Important Dates

Submission Deadline: June 30, 2025
Decisions Announced: July 20, 2025
Camera Ready Deadline: July 31, 2025
Workshop: September 21, 2025

Publication

Accepted papers will be published in the ICDAR 2025 workshop proceedings.

Invited Speakers

Prof. ZHOU, Wengang. Wengang Zhou is a Professor at the School of Information Science and Technology, University of Science and Technology of China (USTC), and a recipient of the National Science Fund for Excellent Young Scholars. His main research interests include computer vision, machine gaming, and embodied artificial intelligence. He has published over 100 papers in IEEE Transactions and CCF Rank A journals/conferences, with more than 23,000 citations and an H-index of 69 on Google Scholar. He serves as an editorial board member and leading guest editor for the international journal IEEE Transactions on Multimedia. He has led major research projects, including those under the National Key R&D Program of China and key joint funds of the National Natural Science Foundation of China. His notable awards include the CAS Outstanding Doctoral Dissertation Award, the Young Scientist Award from the China Society of Image and Graphics (CSIG), the Wu Wenjun Artificial Intelligence Science and Technology Progress Award (First Prize), the CSIG Natural Science Award (Second Prize), and the CAS Outstanding Supervisor Award. He has been consistently ranked among the Top 2% of Scientists Worldwide by Stanford University for five consecutive years (2020–2024). He was also selected as an Outstanding Member of the CAS Youth Innovation Promotion Association and was supported by the Young Talent Support Project of the China Association for Science and Technology. Under his supervision, his doctoral students have received the CAS Outstanding Doctoral Dissertation Award (twice) and the CSIG Outstanding Doctoral Dissertation Award.

Title: Intelligent Document Images Restoration and Understanding

Abstract: The restoration and understanding of document images are two core tasks in intelligent document processing systems. Document restoration aims to transform deformed or curved captured documents into flat and regular images, while document understanding focuses on the automated analysis and semantic extraction of document content. In terms of document restoration, we propose two innovative approaches: the first is based on a progressive learning strategy, achieving high-precision deformation correction through iterative optimization; the second addresses the limitation of existing methods in effectively restoring documents with incomplete boundaries by modeling their deformation process and incorporating a multi-scale attention mechanism, significantly enhancing the model's generalization capability in complex scenarios. For document understanding, tackling challenges such as high resolution, multi-page structures, and complex tables commonly found in document images, we have developed a series of document understanding models based on a multimodal large-scale architecture—DocPedia, DocR1, and TabPedia. These models enable end-to-end semantic parsing of document images, moving beyond the limitations of the traditional “recognition first, analysis later” pipeline, and achieving more efficient and accurate document structure understanding and information extraction.

Prof. LIAN, Zhouhui. Zhouhui Lian is an Associate Professor at the Wangxuan Institute of Computer Technology, Peking University, and Deputy Director of Center for Chinese Font Design and Research. His research fields include computer graphics, computer vision, and artificial intelligence, mainly focus on graphic and image generation as well as 3D vision. He has published over 100 papers in leading journals (e.g., TOG, TPAMI, and IJCV) and conferences (e.g., SIGGRAPH/SIGGRAPH Asia, CVPR, and NeurIPS) in the field. He has served as Area Chair for international conferences like NeurIPS, CVPR, ICCV, and ICML, and as a Member of the SIGGRAPH Asia Technical Papers Committee. He is also an Editorial Board Member for several journals, including Pattern Recognition and Journal of Computer-Aided Design & Computer Graphics. Additionally, he holds the position of Secretary-General of the 3D Vision Specialized Committee under the China Society of Image and Graphics. His awards and honors include the Second Prize of Beijing Technological Invention Award (Rank 1), China Excellent Patent Award (Rank 1), Finalist for the Best Paper Award in Service Robotics at ICRA 2024, Wu Wenjun Outstanding Young Award in Artificial Intelligence, and the Peking University-China Optics Valley Achievement Transformation Award.

Title: Font Synthesis via Deep Generative Models

Abstract: Glyphs are the core element that enables the preservation and inheritance of human civilization, as well as the progress and development of human society. Since the advent of the information age, how to use and disseminate glyphs in computers has become a top priority that must be addressed. Especially with the emergence of graphical user interfaces, font-related technologies (such as font rendering, analysis, and production) have gradually become a classic and important research direction in the field of computer graphics. Chinese characters, with their large quantity, complex structure, and varied shapes, posed significant challenges to their use and dissemination in early low-performance computers. With the development of computer hardware and software technologies, particularly the popularization of the Internet and mobile intelligent devices, people’s demand for fonts with specific styles and personalized forms has grown increasingly intense. The traditional labor-intensive production process of Chinese font libraries has become a major bottleneck restricting the development of the Chinese font industry and the expansion of the font library consumer market. In recent years, with the proposal, application, and rapid development of artificial intelligence technologies—especially graphics and image generation models such as deep learning-based diffusion models and generative adversarial networks—technologies for font design, production, and generation have also made considerable progress, gradually moving from the digital stage to the intelligent era. This report first provides an overview of the research background and challenges of font generation technology, focuses on the transformation of Chinese font generation technology from digital to intelligent in our team over the past decade, highlights some of the latest research progress achieved by our team in recent years, and finally looks forward to the future development trends of Chinese font generation technology.

Workshop Chairs

ZHOU, Yu, Nankai University, China
ZENG, Gangyan, Nanjing University of Science and Technology, China
XIE, Hongtao, University of Science and Technology of China, China
YIN, Xu-Cheng, University of Science and Technology Beijing, China

Program Committee Members (Alphabetical Order)

CHEN, Zhineng, Fudan University, China
CHENG, Wentao, BNU-HKBU United International College, China
FANG, Shancheng, Beijing Yuanshi Technology Company, China
GAO, Liangcai, Peking University, China
LIAN, Zhouhui, Peking University, China
LIU, Juhua, Wuhan University, China
YANG, Chun, University of Science and Technology Beijing, China
WANG, Yaxing, Nankai University, China
WANG, Yuxin, University of Science and Technology of China, China

Short CV of the Workshop Chairs

Prof. ZHOU, Yu. Yu Zhou holds the BSc, MSc and PhD degrees in computer science from Harbin Institute of Technology. As a professor and a PhD supervisor in college of computer science, Nankai University, his research interests include computer vision and deep learning, with a special interest in visual text processing, detection, recognition and understanding. He served as AC, SPC, and PC members of CVPR, ICCV, ECCV, NeurIPS, ICDAR, and etc, and reviewers of TPAMI, TIP, and etc. He has published over 80 papers in peer-reviewed journals and conferences including CVPR, ICCV, NeurIPS, TMM, TNNLS, and etc, and the paper PIMNet has been selected as the best paper candidate in ACM MM 2021.

Prof. ZENG, Gangyan. Gangyan Zeng is with the School of Cyber Science and Engineering, Nanjing University of Science and Technology. She received the B.S. and Ph.D. degrees in information and communication engineering from the Communication University of China, in 2018 and 2023, respectively. From 2020 to 2023, she was a visiting student at the Institute of Information Engineering, Chinese Academy of Sciences. Her primary research interests include computer vision and multimodal intelligence, with a particular focus on the analysis and understanding of scene text. She has published over 20 papers in journals and conferences including CVPR, AAAI, ACM MM, and Pattern Recognition. She has also received several rewards in the field of document analysis such as the second runner-up at the 2020 CVPR DocVQA Challenge.

Prof. XIE, Hongtao. Hongtao Xie is currently a Professor at the School of Information Science and Technology, University of Science and Technology of China. He is a recipient of the National Excellent Science Fund and is recognized as an outstanding member of the Youth Innovation Promotion Association of the Chinese Academy of Sciences. Additionally, he is a member of the Chinese Academy of Sciences’ Network Space Security Expert Group. He is engaged in research in the field of artificial intelligence and multimedia content security, including visual content recognition and inference, cross-model recognition of scene text and images, cross-modal content analysis and understanding, intelligent content generation, and security. He has published over 80 academic papers as the first or corresponding author in top international journals and conferences, including IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE-TPAMI), IEEE Transactions on Image Processing (IEEE-TIP), IEEE Transactions on Knowledge and Data Engineering (IEEE-TKDE), NeurIPS (NIPS), International Conference on Computer Vision (ICCV), and Conference on Computer Vision and Pattern Recognition (CVPR). Among these publications, four have been highly cited according to the Essential Science Indicators (ESI), three are considered hot-topic papers, and two have received the Best Paper Award at conferences.

Prof. YIN, Xu-Cheng. Xu-Cheng Yin is a full professor of Department of Computer Science and Technology, and the dean of School of Computer and Communication Engineering, University of Science and Technology Beijing, China. He received the B.Sc. and M.Sc. degrees in computer science from the University of Science and Technology Beijing, China, in 1999 and 2002, respectively, and the Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, in 2006. He was a visiting professor in the College of Information and Computer Sciences, University of Massachusetts Amherst, USA, for three times (Jan 2013 to Jan 2014, Jul 2014 to Aug 2014, and Jul 2016 to Sep 2016). His research interests include pattern recognition, document analysis and recognition, computer vision. He has published more than 100 academic papers (IEEE T-PAMI, IEEE T-IP, CVPR, ICDAR, etc.). From 2013 to 2019, his team had won the first place of a series of text detection and recognition competition tasks for 15 times in ICDAR Robust Reading Competition.

Reference

[1] Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei, TextDiffuser: Diffusion Models as Text Painters, NeurIPS, 2023.

[2] Zhenhang Li, Yan Shu, Weichao Zeng, Dongbao Yang, Yu Zhou, First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending, ECAI, 2024.

[3] Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie, AnyText: Multilingual Visual Text Generation And Editing, ICLR, 2024.

[4] Weichao Zeng, Yan Shu, Zhenhang Li, Dongbao Yang, Yu Zhou, TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control, NeurIPS, 2024.

[5] Yan Shu, Weichao Zeng, Zhenhang Li, Fangmin Zhao, Yu Zhou, Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing, arXiv, 2024.

[6] Yuzhe Zhang, Jiawei Zhang, Hao Li, Zhouxia Wang, Luwei Hou, Dongqing Zou, Liheng Bian, Diffusion-based Blind Text Image Super-Resolution, CVPR, 2024.

[7] Xiaoming Li, Wangmeng Zuo, Chen Change Loy, Learning Generative Structure Prior for Blind Text Image Super-resolution, CVPR, 2023.

[8] Yonghui Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li, UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior, ACM MM, 2022.

[9] Ling Zhang, Yinghao He, Qing Zhang, Zheng Liu, Xiaolong Zhang, Chunxia Xiao, Document Image Shadow Removal Guided by Color-Aware Background, CVPR, 2023.

[10] Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung, UVDoc: Neural Grid-based Document Unwarping, SIGGRAPH, 2023.

[11] Pu Li, Weize Quan, Jianwei Guo, Dong-Ming Yan, Layout-aware Single-image Document Flattening, ACM TOG, 2023.