Enhancing YOLO through Multi-Task Learning: Joint Detection, Reconstruction, and Classification of Distorted Text Images

Shaji, Reshma

Enhancing YOLO through Multi-Task Learning: Joint Detection, Reconstruction, and Classification of Distorted Text Images

dc.contributor.author	Shaji, Reshma
dc.date.accessioned	2025-06-12T18:48:20Z
dc.date.available	2025-06-12T18:48:20Z
dc.date.issued	2025-06-12
dc.date.submitted	2025-06-11
dc.description.abstract	Robust recognition of alphanumeric text mounted on vehicle surfaces may present significant challenges. These real-world challenges include conditions such as motion blur, out-of-focus imagery, variation in illumination, and compression artifacts. Existing automatic license plate recognition (ALPR) pipelines usually separate detection, enhancement, and recognition into distinct stages, either relying on explicit deblurring networks or extensive augmentation for generalization, each incurring latency, error propagation, or a performance ceiling on severely degraded inputs. This study introduces YOLO CRNet, a unified end-to-end multi-task framework built upon the YOLO object detector, designed to simultaneously localize characters, enhance text regions, and perform optical character recognition (OCR) within a single network. We integrate two specialized heads into the YOLO backbone: a reconstruction head that restores degraded text regions, and a classification head that directly recognizes alphanumeric characters. Shared feature representations are extracted from multiple depths of the core YOLO network for synergistic learning across complementary tasks. To inform feature selection for the classifier head, we extract per‑character embeddings from five different layer combinations of the YOLO network (ranging from early backbone to deep neck layers) and visualize class separability via t‑SNE. This analysis reveals that Configuration A which comprises of early backbone layers (1,2,4) with neck layers (10,13,16) yields the most distinct clusters for the alphanumeric character classes. The YOLO CRNet classifier head trained on Configuration A achieves 95.2% accuracy and a 94.97% F1‑score on a held‑out set of 10,100 sharp character crops, outperforming alternative layer configurations by up to 18%. Extensive experiments on blurred text datasets demonstrate that combined reconstruction followed by classification of YOLO CRNet significantly outperforms both the baseline YOLO detector and the YOLO CRNet classification head. In particular, the combined reconstruction followed by classification configuration achieves a 23.5% relative improvement in classification accuracy (from 44.5% to 68.0%) and a 15.5% gain in F1-score (from 0.550 to 0.705). By integrating detection, enhancement, and recognition into a single network guided by t‑SNE based feature selection, YOLO CRNet reduces latency, mitigates error propagation, and explicitly handles image distortions. This work lays a foundation for real‑time, robust vehicle text detection and illustrates the power of multi‑task learning and data‑driven feature analysis in fine‑grained text recognition tasks.
dc.identifier.uri	https://hdl.handle.net/10012/21854
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	reconstruction
dc.subject	classification
dc.subject	YOLO
dc.subject	multi-task learning
dc.subject	t-SNE
dc.title	Enhancing YOLO through Multi-Task Learning: Joint Detection, Reconstruction, and Classification of Distorted Text Images
dc.type	Master Thesis
uws-etd.degree	Master of Applied Science
uws-etd.degree.department	Electrical and Computer Engineering
uws-etd.degree.discipline	Electrical and Computer Engineering
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Naik, Kshirasagar
uws.contributor.affiliation1	Faculty of Engineering
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Shaji_Reshma.pdf
Size:: 41.13 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Electrical and Computer Engineering