Towards Human-Centered AI-Powered Assistants for the Visually Impaired

dc.contributor.authorWang, Linda
dc.date.accessioned2020-09-02T17:49:28Z
dc.date.available2020-09-02T17:49:28Z
dc.date.issued2020-09-02
dc.date.submitted2020-08-27
dc.description.abstractArtificial intelligence has become ubiquitous in today's society, aiding us in many everyday tasks. Given particular prowess of today's AI technologies in visual perception and speech recognition, an area where AI can have tremendous societal impact is in assistive technologies for the visually impaired. Although assisting the visually impaired for tasks such as environment navigation and item localization improves independence and autonomy, concerns over privacy arise. Taking privacy of personal data into consideration, we present the design of a human-centered AI-powered assistant for object localization for impaired vision (OLIV). OLIV integrates multi-modal perception (custom-designed visual scene understanding and speech recognition and synthesis) for the purpose of assisting the visually impaired in locating misplaced items in indoor environments. OLIV is comprised of three main components: speech recognition, custom-designed visual scene understanding, and synthesis. Speech recognition allows these individuals to independently query and interact with the system, increasing their level of independence. Visual scene understanding performs on-device object detection and depth estimation to build up a representation of the surrounding 3D scene. Synthesis then combines the detected objects along with their locations and depths with the user’s intent to construct a verbal semantic description that is verbally conveyed via speech synthesis. An important component of OLIV is scene understanding. Current state-of-the-art deep neural networks for the two tasks have been shown to achieve superior performance, but requires high computation and memory, making them cost prohibitive for on-device operation. On-device operation is necessary to address privacy concerns related to misuse of personal data. By performing on-device scene understanding, data captured by the camera will remain on the device. To address the challenge of high computation and memory requirements, two different architecture design exploration approaches, micro-architecture exploration and human-machine collaborative design strategy, are taken to design efficient neural networks with an optimal trade-off between accuracy, speed and size. Micro-architecture exploration approach resulted in a highly compact single shot network architecture for object detection. Human-machine collaborative design strategy resulted in a highly compact densely-connected encoder-decoder network architecture for monocular depth estimation. Through experiments on two indoor datasets to simulate environments OLIV operates in, the object detection network and depth estimation network were able to achieve CPU speeds of 17 FPS and 9.35 FPS, sizes of 6.99 and 3.46 million parameters, respectively, while maintaining comparable accuracy performance. Size and speed are important for on-device scene understanding on OLIV to provide a more private assistance for the visually impaired.en
dc.identifier.urihttp://hdl.handle.net/10012/16227
dc.language.isoenen
dc.pendingfalse
dc.publisherUniversity of Waterlooen
dc.subjectdeep learningen
dc.subjecthuman-centered artificial intelligenceen
dc.subjectefficient neural networksen
dc.subjectperceptionen
dc.titleTowards Human-Centered AI-Powered Assistants for the Visually Impaireden
dc.typeMaster Thesisen
uws-etd.degreeMaster of Applied Scienceen
uws-etd.degree.departmentSystems Design Engineeringen
uws-etd.degree.disciplineSystem Design Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws.contributor.advisorWong, Alexander
uws.contributor.affiliation1Faculty of Engineeringen
uws.peerReviewStatusUnrevieweden
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.scholarLevelGraduateen
uws.typeOfResourceTexten

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wang_Linda.pdf
Size:
5.34 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
6.4 KB
Format:
Item-specific license agreed upon to submission
Description: