Multistroke Character Recognition Using Orthogonal Polynomial Representations

Cheriakara Joseph, Arun

Multistroke Character Recognition Using Orthogonal Polynomial Representations

dc.contributor.author	Cheriakara Joseph, Arun
dc.date.accessioned	2026-06-30T19:14:30Z
dc.date.available	2026-06-30T19:14:30Z
dc.date.issued	2026-06-30
dc.date.submitted	2026-06-26
dc.description.abstract	This thesis studies stroke grouping for online word-level handwriting recognition of Latin letters and digits using orthogonal polynomial representations of pen strokes. A word arrives as an ordered sequence of pen-down strokes, and the system has to decide which strokes belong to which character before it can decide what each character is. At the word level the problem is harder than for isolated characters: the right grouping of strokes depends on what the characters turn out to be, and the right characters depend on how the strokes are grouped. Most existing systems commit to one segmentation and use whatever that segmentation outputs, which can lead to wrong results. The difficulty is sharpened by characters drawn with multiple strokes, by variation in stroke order between writers, and by several letter pairs and letter/digit pairs that share the same shape. This thesis describes an online word-level recognition pipeline built on orthogonal polynomial representations of multistroke characters. Each pen stroke is re-parameterized by arc length, and its coefficients are projected onto an orthogonal Legendre basis of degree eleven, giving a fixed-length coefficient vector per stroke. For multistroke characters, the per-stroke vectors are concatenated into a single feature vector. Because all strokes in a character are normalized together against a shared bounding box, this block-concatenated representation captures the relative position and scale of the strokes within the character, but it does not directly encode every pairwise relationship between strokes. A probabilistic gap model generates up to six candidate groupings per word, and each candidate character group is normalized in a common bounding box before projection. The resulting vectors are matched against a reference database of 76{,}428 samples across 62 character labels, organized into 3{,}237 classes. Classification runs in two stages: a centroid-and-radius heuristic prunes the candidate pool to fifty classes, and a label-pooled $k$-nearest-neighbour stage then ranks the seven closest samples per label by distance to the convex hull of those samples. The pipeline is evaluated on the UniPen word collection drawn from the 62-character Latin-plus-digits alphabet.
dc.identifier.uri	https://hdl.handle.net/10012/23680
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	multi-stroke recognition
dc.subject	online handwriting recognition
dc.subject	orthogonal polynomial representation
dc.subject	Legendre coefficients
dc.subject	convex-hull KNN
dc.subject	trace grouping
dc.subject	UniPen
dc.title	Multistroke Character Recognition Using Orthogonal Polynomial Representations
dc.type	Master Thesis
uws-etd.degree	Master of Mathematics
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.comment.hidden	Thank you!
uws.contributor.advisor	Watt, Stephen
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: CheriakaraJoseph_Arun.pdf
Size:: 862.43 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science