Understanding and Efficiently Servicing HTTP Streaming Video Workloads
MetadataShow full item record
Live and on-demand video streaming has emerged as the most popular application for the Internet. One reason for this success is the pragmatic decision to use HTTP to deliver video content. However, while all web servers are capable of servicing HTTP streaming video workloads, web servers were not originally designed or optimized for video workloads. Web server research has concentrated on requests for small items that exhibit high locality, while video files are much larger and have a popularity distribution with a long tail of less popular content. Given the large number of servers needed to service millions of streaming video clients, there are large potential benefits from even small improvements in servicing HTTP streaming video workloads. To investigate how web server implementations can be improved, we require a benchmark to analyze existing web servers and test alternate implementations, but no such HTTP streaming video benchmark exists. One reason for the lack of a benchmark is that video delivery is undergoing rapid evolution, so we devise a flexible methodology and tools for creating benchmarks that can be readily adapted to changes in HTTP video streaming methods. Using our methodology, we characterize YouTube traffic from early 2011 using several published studies and implement a benchmark to replicate this workload. We then demonstrate that three different widely-used web servers (Apache, nginx and the userver) are all poorly suited to servicing streaming video workloads. We modify the userver to use asynchronous serialized aggressive prefetching (ASAP). Aggressive prefetching uses a single large disk access to service multiple small sequential requests, and serialization prevents the kernel from interleaving disk accesses, which together greatly increase throughput. Using the modified userver, we show that characteristics of the workload and server affect the best prefetch size to use and we provide an algorithm that automatically finds a good prefetch size for a variety of workloads and server configurations. We conduct our own characterization of an HTTP streaming video workload, using server logs obtained from Netflix. We study this workload because, in 2015, Netflix alone accounted for 37% of peak period North American Internet traffic. Netflix clients employ DASH (Dynamic Adaptive Streaming over HTTP) to switch between different bit rates based on changes in network and server conditions. We introduce the notion of chains of sequential requests to represent the spatial locality of workloads and find that even with DASH clients, the majority of bytes are requested sequentially. We characterize rate adaptation by separating sessions into transient, stable and inactive phases, each with distinct patterns of requests. We find that playback sessions are surprisingly stable; in aggregate, 5% of total session duration is spent in transient phases, 79% in stable and 16% in inactive phases. Finally we evaluate prefetch algorithms that exploit knowledge about workload characteristics by simulating the servicing of the Netflix workload. We show that the workload can be serviced with either 13% lower hard drive utilization or 48% less system memory than a prefetch algorithm that makes no use of workload characteristics.