Reducing Data Copying Overhead in Web Servers

Yeung, Gary

Reducing Data Copying Overhead in Web Servers

Files

gy-thesis.pdf (589.65 KB)

Date

2010-07-13T13:39:55Z

Authors

Yeung, Gary

Publisher

University of Waterloo

Abstract

Web servers that generate dynamic content are widely used in the development of Internet applications. With the Internet highly connected to people’s lifestyles, the service requirements of Internet applications have increased significantly. This increasing trend intensifies the need to improve server performance in dynamic content generation. In this thesis, we describe the opportunity to improve server performance by co-locating the web server and the application server on the same machine. We identify related work and discuss their respective advantages and deficiencies. We then introduce and explain our technique that passes the client socket’s file descriptor from the web server process to the application server. This allows the application server to reply to the client directly, reducing the amount of data copied and improving server performance. Experiments were designed to evaluate the performance of this technique and provide a detailed analysis of processor time and data copying during response delivery. A performance comparison against alternative approaches has been performed. We analyze the results to understand factors in data copying efficiency and determine that cache misses are an important factor in server performance. There are four major contributions in this thesis. First, we show that in multiprocessor environments, co-locating web servers and application servers can take advantage of faster communication. Second, we introduce a new technique that reduces the amount of data copied by two-thirds. This technique requires no modifications to the application server code (other existing techniques do), and it is also applicable in a variety of systems, allowing easy adoption in production environments. Third, we provide a performance comparison against other approaches and raise questions regarding data copying efficiency. Our technique attains an average peak throughput of 1.27 times the FastCGI with Unix domain sockets in both uniprocessor and multiprocessor environments. Finally, our analysis on the effect of cache misses on server performance provides valuable insights into why these benefits are obtained.