Multistroke Character Recognition Using Orthogonal Polynomial Representations

dc.contributor.authorCheriakara Joseph, Arun
dc.date.accessioned2026-06-30T19:14:30Z
dc.date.available2026-06-30T19:14:30Z
dc.date.issued2026-06-30
dc.date.submitted2026-06-26
dc.description.abstractThis thesis studies stroke grouping for online word-level handwriting recognition of Latin letters and digits using orthogonal polynomial representations of pen strokes. A word arrives as an ordered sequence of pen-down strokes, and the system has to decide which strokes belong to which character before it can decide what each character is. At the word level the problem is harder than for isolated characters: the right grouping of strokes depends on what the characters turn out to be, and the right characters depend on how the strokes are grouped. Most existing systems commit to one segmentation and use whatever that segmentation outputs, which can lead to wrong results. The difficulty is sharpened by characters drawn with multiple strokes, by variation in stroke order between writers, and by several letter pairs and letter/digit pairs that share the same shape. This thesis describes an online word-level recognition pipeline built on orthogonal polynomial representations of multistroke characters. Each pen stroke is re-parameterized by arc length, and its coefficients are projected onto an orthogonal Legendre basis of degree eleven, giving a fixed-length coefficient vector per stroke. For multistroke characters, the per-stroke vectors are concatenated into a single feature vector. Because all strokes in a character are normalized together against a shared bounding box, this block-concatenated representation captures the relative position and scale of the strokes within the character, but it does not directly encode every pairwise relationship between strokes. A probabilistic gap model generates up to six candidate groupings per word, and each candidate character group is normalized in a common bounding box before projection. The resulting vectors are matched against a reference database of 76{,}428 samples across 62 character labels, organized into 3{,}237 classes. Classification runs in two stages: a centroid-and-radius heuristic prunes the candidate pool to fifty classes, and a label-pooled $k$-nearest-neighbour stage then ranks the seven closest samples per label by distance to the convex hull of those samples. The pipeline is evaluated on the UniPen word collection drawn from the 62-character Latin-plus-digits alphabet.
dc.identifier.urihttps://hdl.handle.net/10012/23680
dc.language.isoen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectmulti-stroke recognition
dc.subjectonline handwriting recognition
dc.subjectorthogonal polynomial representation
dc.subjectLegendre coefficients
dc.subjectconvex-hull KNN
dc.subjecttrace grouping
dc.subjectUniPen
dc.titleMultistroke Character Recognition Using Orthogonal Polynomial Representations
dc.typeMaster Thesis
uws-etd.degreeMaster of Mathematics
uws-etd.degree.departmentDavid R. Cheriton School of Computer Science
uws-etd.degree.disciplineComputer Science
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.embargo.terms0
uws.comment.hiddenThank you!
uws.contributor.advisorWatt, Stephen
uws.contributor.affiliation1Faculty of Mathematics
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
CheriakaraJoseph_Arun.pdf
Size:
862.43 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: