Towards Human-Centered AI-Powered Assistants for the Visually Impaired

Wang, Linda

Towards Human-Centered AI-Powered Assistants for the Visually Impaired

dc.contributor.advisor	Wong, Alexander
dc.contributor.author	Wang, Linda
dc.date.accessioned	2020-09-02T17:49:28Z
dc.date.available	2020-09-02T17:49:28Z
dc.date.issued	2020-09-02
dc.date.submitted	2020-08-27
dc.description.abstract	Artificial intelligence has become ubiquitous in today's society, aiding us in many everyday tasks. Given particular prowess of today's AI technologies in visual perception and speech recognition, an area where AI can have tremendous societal impact is in assistive technologies for the visually impaired. Although assisting the visually impaired for tasks such as environment navigation and item localization improves independence and autonomy, concerns over privacy arise. Taking privacy of personal data into consideration, we present the design of a human-centered AI-powered assistant for object localization for impaired vision (OLIV). OLIV integrates multi-modal perception (custom-designed visual scene understanding and speech recognition and synthesis) for the purpose of assisting the visually impaired in locating misplaced items in indoor environments. OLIV is comprised of three main components: speech recognition, custom-designed visual scene understanding, and synthesis. Speech recognition allows these individuals to independently query and interact with the system, increasing their level of independence. Visual scene understanding performs on-device object detection and depth estimation to build up a representation of the surrounding 3D scene. Synthesis then combines the detected objects along with their locations and depths with the user’s intent to construct a verbal semantic description that is verbally conveyed via speech synthesis. An important component of OLIV is scene understanding. Current state-of-the-art deep neural networks for the two tasks have been shown to achieve superior performance, but requires high computation and memory, making them cost prohibitive for on-device operation. On-device operation is necessary to address privacy concerns related to misuse of personal data. By performing on-device scene understanding, data captured by the camera will remain on the device. To address the challenge of high computation and memory requirements, two different architecture design exploration approaches, micro-architecture exploration and human-machine collaborative design strategy, are taken to design efficient neural networks with an optimal trade-off between accuracy, speed and size. Micro-architecture exploration approach resulted in a highly compact single shot network architecture for object detection. Human-machine collaborative design strategy resulted in a highly compact densely-connected encoder-decoder network architecture for monocular depth estimation. Through experiments on two indoor datasets to simulate environments OLIV operates in, the object detection network and depth estimation network were able to achieve CPU speeds of 17 FPS and 9.35 FPS, sizes of 6.99 and 3.46 million parameters, respectively, while maintaining comparable accuracy performance. Size and speed are important for on-device scene understanding on OLIV to provide a more private assistance for the visually impaired.	en
dc.identifier.uri	http://hdl.handle.net/10012/16227
dc.language.iso	en	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.subject	deep learning	en
dc.subject	human-centered artificial intelligence	en
dc.subject	efficient neural networks	en
dc.subject	perception	en
dc.title	Towards Human-Centered AI-Powered Assistants for the Visually Impaired	en
dc.type	Master Thesis	en
uws-etd.degree	Master of Applied Science	en
uws-etd.degree.department	Systems Design Engineering	en
uws-etd.degree.discipline	System Design Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws.contributor.advisor	Wong, Alexander
uws.contributor.affiliation1	Faculty of Engineering	en
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wang_Linda.pdf
Size:: 5.34 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Systems Design Engineering