Workshop on
Visual Text Generation and Text Image Processing
ICDAR 2025 Workshop
Introduction
Visual text generation and text image preprocessing are two fundamental areas that play crucial roles in modern visual text analysis systems and directly impact the performance of downstream tasks such as OCR, information extraction, and visual text understanding. Visual text generation addresses the critical challenge of data scarcity by creating diverse, high-quality synthetic datasets [1, 2]. This not only reduces the cost and time of manual data collection but also enables the creation of comprehensive training sets that cover various visual text types and edge cases [3, 4]. Text image preprocessing [5], serving as the foundation of reliable visual text analysis pipelines, tackles real-world challenges such as low resolution [6, 7], uneven illumination [8, 9], and geometric distortions [10, 11]. These techniques are essential for handling various visual text conditions, from documents to scene images. The synergy of these topics to visual text analysis is becoming increasingly important: visual text generation techniques help create better training data, while advanced text image preprocessing techniques improve text quality in real conditions, facilitating model recognition. This workshop aims to bring together researchers and practitioners to work on these two topics and foster innovations in visual text analysis.
Schedule
All times in Beijing Time (UTC+08:00)
Time | Events | |
13:50 - 14:00 | Opening Remarks | |
14:00 - 14:40 | Prof. ZHOU, Wengang
Invited Talk 1 : |
|
14:40 - 15:20 | Prof. LIAN, Zhouhui
Invited Talk 2 : Font Synthesis via Deep Generative Models |
|
15:20 - 16:00 | Contributed Talks (best/ runner-up paper talks) | |
16:00 - 16:20 | Coffee break | |
16:20 - 17:40 | Poster or oral Session |
Call for Papers
Acceptable submission topics may include but are not limited to:
- GANs-based and Diffusion-based models for text image synthesis
- Layout-aware document image generation
- Real-synthetic domain gap analysis
- Text generation model benchmarking
- Image text removal, editing, style transfer
- Shadow, ink, and watermark removal of text image
- Illumination correction, deblurring, and binarization of text image
- Text image super-resolution
- Document image dewarping
- Text segmentation
- Tampered text detection
Submission
This workshop invites original contributions in both theoretical and applied research domains. All submissions must adhere to the formatting guidelines specified on the ICDAR 2025 official website. Paper length is limited to 15 pages (excluding references) and must comply with our double-blind review requirements:
- Remove all author identifiers (names, affiliations, etc.) from the manuscript
- Cite previous work in third-person format to avoid identity disclosure
- Omit acknowledgments section in initial submissions
Submissions will be accepted through the workshop's CMT submission portal. At least one author of each accepted paper must complete workshop registration to present the work. Detailed submission procedures are available on the ICDAR 2025 guidelines portal.
Important Dates
- Submission Deadline: May 28, 2025
- Decisions Announced: June 13, 2025
- Camera Ready Deadline: June 20, 2025
- Workshop: September 20, 2025
Publication
Accepted papers will be published in the ICDAR 2025 workshop proceedings.
Workshop Chairs
- ZHOU, Yu, Nankai University, China
- ZENG, Gangyan, Nanjing University of Science and Technology, China
- XIE, Hongtao, University of Science and Technology of China, China
- YIN, Xu-Cheng, University of Science and Technology Beijing, China
Program Committee Members (Alphabetical Order)
- CHEN, Zhineng, Fudan University, China
- CHENG, Wentao, BNU-HKBU United International College, China
- FANG, Shancheng, Beijing Yuanshi Technology Company, China
- GAO, Liangcai, Peking University, China
- LIAN, Zhouhui, Peking University, China
- LIU, Juhua, Wuhan University, China
- YANG, Chun, University of Science and Technology Beijing, China
- WANG, Yaxing, Nankai University, China
- WANG, Yuxin, University of Science and Technology of China, China
Short CV of the Workshop Chairs

Prof. ZHOU, Yu. Yu Zhou holds the BSc, MSc and PhD degrees in computer science from Harbin Institute of Technology. As a professor and a PhD supervisor in college of computer science, Nankai University, his research interests include computer vision and deep learning, with a special interest in visual text processing, detection, recognition and understanding. He served as AC, SPC, and PC members of CVPR, ICCV, ECCV, NeurIPS, ICDAR, and etc, and reviewers of TPAMI, TIP, and etc. He has published over 80 papers in peer-reviewed journals and conferences including CVPR, ICCV, NeurIPS, TMM, TNNLS, and etc, and the paper PIMNet has been selected as the best paper candidate in ACM MM 2021.

Prof. ZENG, Gangyan. Gangyan Zeng is with the School of Cyber Science and Engineering, Nanjing University of Science and Technology. She received the B.S. and Ph.D. degrees in information and communication engineering from the Communication University of China, in 2018 and 2023, respectively. From 2020 to 2023, she was a visiting student at the Institute of Information Engineering, Chinese Academy of Sciences. Her primary research interests include computer vision and multimodal intelligence, with a particular focus on the analysis and understanding of scene text. She has published over 10 papers in journals and conferences including ACM MM, AAAI, and Pattern Recognition. She has also received several rewards in the field of document analysis such as the second runner-up at the 2020 CVPR DocVQA Challenge.

Prof. XIE, Hongtao. Hongtao Xie is currently a Professor at the School of Information Science and Technology, University of Science and Technology of China. He is a recipient of the National Excellent Science Fund and is recognized as an outstanding member of the Youth Innovation Promotion Association of the Chinese Academy of Sciences. Additionally, he is a member of the Chinese Academy of Sciences’ Network Space Security Expert Group. He is engaged in research in the field of artificial intelligence and multimedia content security, including visual content recognition and inference, cross-model recognition of scene text and images, cross-modal content analysis and understanding, intelligent content generation, and security. He has published over 80 academic papers as the first or corresponding author in top international journals and conferences, including IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE-TPAMI), IEEE Transactions on Image Processing (IEEE-TIP), IEEE Transactions on Knowledge and Data Engineering (IEEE-TKDE), NeurIPS (NIPS), International Conference on Computer Vision (ICCV), and Conference on Computer Vision and Pattern Recognition (CVPR). Among these publications, four have been highly cited according to the Essential Science Indicators (ESI), three are considered hot-topic papers, and two have received the Best Paper Award at conferences.

Prof. YIN, Xu-Cheng. Xu-Cheng Yin is a full professor of Department of Computer Science and Technology, and the dean of School of Computer and Communication Engineering, University of Science and Technology Beijing, China. He received the B.Sc. and M.Sc. degrees in computer science from the University of Science and Technology Beijing, China, in 1999 and 2002, respectively, and the Ph.D. degree in pattern recognition and intelligent systems from the Institute of Automation, Chinese Academy of Sciences, in 2006. He was a visiting professor in the College of Information and Computer Sciences, University of Massachusetts Amherst, USA, for three times (Jan 2013 to Jan 2014, Jul 2014 to Aug 2014, and Jul 2016 to Sep 2016). His research interests include pattern recognition, document analysis and recognition, computer vision. He has published more than 100 academic papers (IEEE T-PAMI, IEEE T-IP, CVPR, ICDAR, etc.). From 2013 to 2019, his team had won the first place of a series of text detection and recognition competition tasks for 15 times in ICDAR Robust Reading Competition.
Invited Speakers

Prof. ZHOU, Wengang. Wengang Zhou is a Professor and Ph.D. Supervisor at the School of Information Science and Technology, University of Science and Technology of China. His research interests include multimedia information retrieval, computer vision, and machine gaming. He has published over 100 papers in IEEE Transactions and CCF-A ranked conferences, accumulating over 20,000 Google Scholar citations with an h-index of 64. He is a recipient of the National Natural Science Foundation of China for Excellent Young Scholars and serves as an Editorial Board Member and Lead Guest Editor for IEEE Transactions on Multimedia. He has led several major research projects, including the Major Project of the Ministry of Science and Technology and the Key Project of the National Natural Science Foundation of China Joint Fund. He has been awarded the First Prize of the Wu Wenjun Artificial Intelligence Science and Technology Progress Award, the Second Prize of the Natural Science Award by the CCIG, and the Outstanding Mentor Award by the Chinese Academy of Sciences. He has been chosen as an Outstanding Member of the CAS Youth Innovation Promotion Association and is a recipient of the Youth Talent Support Program by the China Association for Science and Technology. The Ph.D. students he supervised have received the CAS Outstanding Doctoral Dissertation Award and the CCIG Outstanding Doctoral Dissertation Award.
Title:
Abstract:

Prof. LIAN, Zhouhui. Zhouhui Lian is an associate professor at Wangxuan Institute of Computer Technology (WICT), Peking University, China.
Title: Font Synthesis via Deep Generative Models
Abstract: Font Synthesis via Deep Generative Models
Reference
[1] Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, Furu Wei, TextDiffuser: Diffusion Models as Text Painters, NeurIPS, 2023.
[2] Zhenhang Li, Yan Shu, Weichao Zeng, Dongbao Yang, Yu Zhou, First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending, ECAI, 2024. [3] Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He, Yifeng Geng, Xuansong Xie, AnyText: Multilingual Visual Text Generation And Editing, ICLR, 2024. [4] Weichao Zeng, Yan Shu, Zhenhang Li, Dongbao Yang, Yu Zhou, TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control, NeurIPS, 2024. [5] Yan Shu, Weichao Zeng, Zhenhang Li, Fangmin Zhao, Yu Zhou, Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing, arXiv, 2024. [6] Yuzhe Zhang, Jiawei Zhang, Hao Li, Zhouxia Wang, Luwei Hou, Dongqing Zou, Liheng Bian, Diffusion-based Blind Text Image Super-Resolution, CVPR, 2024. [7] Xiaoming Li, Wangmeng Zuo, Chen Change Loy, Learning Generative Structure Prior for Blind Text Image Super-resolution, CVPR, 2023. [8] Yonghui Wang, Wengang Zhou, Zhenbo Lu, Houqiang Li, UDoc-GAN: Unpaired Document Illumination Correction with Background Light Prior, ACM MM, 2022. [9] Ling Zhang, Yinghao He, Qing Zhang, Zheng Liu, Xiaolong Zhang, Chunxia Xiao, Document Image Shadow Removal Guided by Color-Aware Background, CVPR, 2023. [10] Floor Verhoeven, Tanguy Magne, Olga Sorkine-Hornung, UVDoc: Neural Grid-based Document Unwarping, SIGGRAPH, 2023. [11] Pu Li, Weize Quan, Jianwei Guo, Dong-Ming Yan, Layout-aware Single-image Document Flattening, ACM TOG, 2023.