Website Fingerprinting: Attacks and Defenses

Wang, Tao

Website Fingerprinting: Attacks and Defenses

Files

Wang_Tao.pdf (1015.56 KB)

Date

2016-01-13

Authors

Wang, Tao

Advisor

Goldberg, Ian

Publisher

University of Waterloo

Abstract

Website fingerprinting attacks allow a local, passive eavesdropper to determine a client's web activity by leveraging features from her packet sequence. These attacks break the privacy expected by users of privacy technologies, including low-latency anonymity networks such as proxies, VPNs, or Tor. As a discipline, website fingerprinting is an application of machine learning techniques to the diverse field of privacy. To perform a website fingerprinting attack, the eavesdropping attacker passively records the time, direction, and size of the client's packets. Then, he uses a machine learning algorithm to classify the packet sequence so as to determine the web page it came from. In this work we construct and evaluate three new website fingerprinting attacks: Wa-OSAD, an attack using a modified edit distance as the kernel of a Support Vector Machine, achieving greater accuracy than attacks before it; Wa-FLev, an attack that quickly approximates an edit distance computation, allowing a low-resource attacker to deanonymize many clients at once; and Wa-kNN, the current state-of-the-art attack, which is effective and fast, with a very low false positive rate in the open-world scenario. While our new attacks perform well in theoretical scenarios, there are significant differences between the situation in the wild and in the laboratory. Specifically, we tackle concerns regarding the freshness of the training set, splitting packet sequences so that each part corresponds to one web page access (for easy classification), and removing misleading noise from the packet sequence. To defend ourselves against such attacks, we need defenses that are both efficient and provable. We rigorously define and motivate the notion of a provable defense in this work, and we present three new provable defenses: Tamaraw, which is a relatively efficient way to flood the channel with fixed-rate packet scheduling; Supersequence, which uses smallest common supersequences to save on bandwidth overhead; and Walkie-Talkie, which uses half-duplex communication to significantly reduce both bandwidth and time overhead, allowing a truly efficient yet provable defense.