Towards Private Biometric Authentication and Identification

Loading...
Thumbnail Image

Date

2023-09-05

Authors

Gold, Jonathan

Advisor

Menezes, Alfred
Karabina, Koray

Journal Title

Journal ISSN

Volume Title

Publisher

University of Waterloo

Abstract

Handwriting and speech are important parts of our everyday lives. Handwriting recognition is the task that allows the recognizing of written text, whether it be letters, words or equations, from given data. When analyzing handwriting, we can analyze static images or the recording of written text through sensors. Handwriting recognition algorithms can be used in many applications, including signature verification, electronic document processing, as well as e-security and e-health related tasks. The OnHW datasets consists of a set of datasets which, through the use of various sensors, captures the writing of characters, words, symbols and equations, recorded in the form of multivariate time series. We begin by developing character recognition models, targeting letters (and later symbols), trained and tested using the OnHW-chars dataset (and later the split OnHW-equations dataset). Our models were able to improve upon the accuracy of the previous best results on both datasets explored. Using our machine learning (ML) models, we provide 11.3%-23.56% improvements over the previous best ML models. Using deep learning (DL), as well as ensemble techniques, we were able to improve on the best previous models by 3.08%-7.01%. In addition to the accuracy improvements, we aim to provide some level of explainability, using a specialized version of LIME for time series data. This explanation helps provide some rationale for why the models make sense for the data, as well as why ensemble methods may be useful to improve accuracy rates for this task. To verify the robustness of our models trained over the OnHW-chars dataset, we trained our DL models using the same model parameters over a more recently published OnHW-equations dataset. Our DL models with ensemble learning provide 0.05%-4.75% improvements over the previous best DL models. While the character recognition task has many applications, when using it to provide a service, it is important to consider user privacy since handwriting is biometric data and contains private information. Next, we design a framework that uses multiparty computation (MPC) to provide users with privacy over their handwritten data, when providing a service for character recognition. We then implement the framework using the models trained on public data to provide private inference on hidden user data. This framework is implemented in the CrypTen MPC framework. We obtain results on the accuracy difference of the models when making inference using MPC, as well as the costs associated with performing this inference. We found a 0.55%-1.42% accuracy difference between plaintext inference and inference with MPC. Next, we pivot to explore writer identification, which involves identifying the writer of some handwritten text. We use the OnHW-equations dataset for our analysis, which at the time of writing has not been used for this task before. We first analyze and reformat the data to fit the writer identification task, as well as remove bias. Using DL models, we obtain accuracy results of up to 91.57% in identifying the writer using their handwriting. As with private inference in the character recognition task, it is important to account for user privacy when training writer identification models and making inference. We design and implement a framework for private training and inference for the writer recognition task, using the CrypTen MPC framework. Since training these models is very costly, we use simpler CNN's for private writer recognition. The chosen CNN trained privately in MPC obtained an accuracy of 77.45%. Next, we analyze the costs associated with privately training the CNN and other CNN's with altered model architectures. Finally, we switch to explore voice as a biometric in the speaker verification task. As with handwriting, a person's voice contains unique characteristics which can be used to determine the speaker. Not only can voice be analyzed similarly with handwriting, in that we can explore the speech recognition and speaker identification tasks, it comes with similar privacy risks for users. We design and implement a unique framework for private speaker verification using the MP-SPDZ MPC framework. We analyze the costs associated with training the model and making inferences, with our main goal being to determine the time it takes to make private inference. We then used these times as part of a survey conducted to determine how much people value the privacy of their biometrics and how long they were willing to wait for the increased privacy. We found that people were willing to tolerate significant time delays in order to privately authenticate themselves, when primed with the benefits of using MPC for privacy.

Description

Keywords

LC Subject Headings

Citation