From b5b54f34157608652bb52de081baa6e19f1a3e6d Mon Sep 17 00:00:00 2001 From: Brian Pane Date: Sun, 16 Jun 2002 22:55:11 +0000 Subject: [PATCH] More changes to bring the performance tuning guide closer to compatibility with httpd-2.0: - Updated the sample system call trace with a 2.0/worker example - Removed the section on the preforking model git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@95719 13f79535-47bb-0310-9956-ffa450edef68 --- docs/manual/misc/perf-tuning.html | 444 ++++++++---------------------- 1 file changed, 118 insertions(+), 326 deletions(-) diff --git a/docs/manual/misc/perf-tuning.html b/docs/manual/misc/perf-tuning.html index f0864f75bf..fbe23faef1 100644 --- a/docs/manual/misc/perf-tuning.html +++ b/docs/manual/misc/perf-tuning.html @@ -41,8 +41,6 @@ @@ -705,358 +703,152 @@ DirectoryIndex index.cgi index.pl index.shtml index.html

Appendix: Detailed Analysis of a Trace

- Here is a system call trace of Apache 1.3 running on Linux. The - run-time configuration file is essentially the default plus: +

Here is a system call trace of Apache 2.0.38 with the worker MPM + on Solaris 8. This trace was collected using:

+
+truss -l -p httpd_child_pid. +
+

The -l option tells truss to log the ID of the + LWP (lightweight process--Solaris's form of kernel-level thread) + that invokes each system call.

+ +

Other systems may have different system call tracing utilities + such as strace, ktrace, or par. + They all produce similar output.

+ +

In this trace, a client has requested a 10KB static file + from the httpd. Traces of non-static requests or requests + with content negotiation look wildly different (and quite ugly + in some cases).

-<Directory />
-    AllowOverride none
-    Options FollowSymLinks
-</Directory>
+/67:    accept(3, 0x00200BEC, 0x00200C0C, 1) (sleeping...)
+/67:    accept(3, 0x00200BEC, 0x00200C0C, 1)            = 9
 
-
- The file being requested is a static 6K file of no particular - content. Traces of non-static requests or requests with content - negotiation look wildly different (and quite ugly in some - cases). First the entire trace, then we'll examine details. - (This was generated by the strace program, other - similar programs include truss, - ktrace, and par.) - -
+
+

In this trace, the listener thread is running within LWP #67.

+

Note the lack of accept(2) serialization. On this particular +platform, the worker MPM uses an unserialized accept by default +unless it is listening on multiple ports.

+
-accept(15, {sin_family=AF_INET, sin_port=htons(22283), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3
-flock(18, LOCK_UN)                      = 0
-sigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0
-getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
-setsockopt(3, IPPROTO_TCP1, [1], 4)     = 0
-read(3, "GET /6k HTTP/1.0\r\nUser-Agent: "..., 4096) = 60
-sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0
-time(NULL)                              = 873959960
-gettimeofday({873959960, 404935}, NULL) = 0
-stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0
-open("/home/dgaudet/ap/apachen/htdocs/6k", O_RDONLY) = 4
-mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400ee000
-writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389
-close(4)                                = 0
-time(NULL)                              = 873959960
-write(17, "127.0.0.1 - - [10/Sep/1997:23:39"..., 71) = 71
-gettimeofday({873959960, 417742}, NULL) = 0
-times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 446747
-shutdown(3, 1 /* send */)               = 0
-oldselect(4, [3], NULL, [3], {2, 0})    = 1 (in [3], left {2, 0})
-read(3, "", 2048)                       = 0
-close(3)                                = 0
-sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0
-munmap(0x400ee000, 6144)                = 0
-flock(18, LOCK_EX)                      = 0
+/65:    lwp_park(0x00000000, 0)                         = 0
+/67:    lwp_unpark(65, 1)                               = 0
 
-
- -

Notice the accept serialization:

- -
+
+

Upon accepting the connection, the listener thread wakes up +a worker thread to do the request processing. In this trace, +the worker thread that handles the request is mapped to LWP #65.

+
-flock(18, LOCK_UN)                      = 0
-...
-flock(18, LOCK_EX)                      = 0
+/65:    getsockname(9, 0x00200BA4, 0x00200BC4, 1)       = 0
 
-
- These two calls can be removed by defining - SINGLE_LISTEN_UNSERIALIZED_ACCEPT as described - earlier. - -

Notice the SIGUSR1 manipulation:

- -
+
+

In order to implement virtual hosts, Apache needs to know +the local socket address used to accept the connection. It +is possible to eliminate this call in many situations (such +as when there are no virtual hosts, or when Listen +directives are used which do not have wildcard addresses). But +no effort has yet been made to do these optimizations.

+
-sigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0
-...
-sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0
-...
-sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0
+/65:    brk(0x002170E8)                                 = 0
+/65:    brk(0x002190E8)                                 = 0
 
-
- This is caused by the implementation of graceful restarts. When - the parent receives a SIGUSR1 it sends a - SIGUSR1 to all of its children (and it also - increments a "generation counter" in shared memory). Any - children that are idle (between connections) will immediately - die off when they receive the signal. Any children that are in - keep-alive connections, but are in between requests will die - off immediately. But any children that have a connection and - are still waiting for the first request will not die off - immediately. - -

To see why this is necessary, consider how a browser reacts - to a closed connection. If the connection was a keep-alive - connection and the request being serviced was not the first - request then the browser will quietly reissue the request on a - new connection. It has to do this because the server is always - free to close a keep-alive connection in between requests - (i.e., due to a timeout or because of a maximum number - of requests). But, if the connection is closed before the first - response has been received the typical browser will display a - "document contains no data" dialogue (or a broken image icon). - This is done on the assumption that the server is broken in - some way (or maybe too overloaded to respond at all). So Apache - tries to avoid ever deliberately closing the connection before - it has sent a single response. This is the cause of those - SIGUSR1 manipulations.

- -

Note that it is theoretically possible to eliminate all - three of these calls. But in rough tests the gain proved to be - almost unnoticeable.

- -

In order to implement virtual hosts, Apache needs to know - the local socket address used to accept the connection:

- -
+
+

The brk(2) calls allocate memory from the heap. It is rare +to see these in a system call trace, because the httpd uses +custom memory allocators (apr_pool and +apr_bucket_alloc) for most request processing. +In this trace, the httpd has just been started, so it must +call malloc(3) to get the blocks of raw memory with which +to create the custom memory allocators. +

-getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
+/65:    fcntl(9, F_GETFL, 0x00000000)                   = 2
+/65:    fstat64(9, 0xFAF7B818)                          = 0
+/65:    getsockopt(9, 65535, 8192, 0xFAF7B918, 0xFAF7B910, 2190656) = 0
+/65:    fstat64(9, 0xFAF7B818)                          = 0
+/65:    getsockopt(9, 65535, 8192, 0xFAF7B918, 0xFAF7B914, 2190656) = 0
+/65:    setsockopt(9, 65535, 8192, 0xFAF7B918, 4, 2190656) = 0
+/65:    fcntl(9, F_SETFL, 0x00000082)                   = 0
 
-
- It is possible to eliminate this call in many situations (such - as when there are no virtual hosts, or when Listen - directives are used which do not have wildcard addresses). But - no effort has yet been made to do these optimizations. - -

Apache turns off the Nagle algorithm:

- -
-
-setsockopt(3, IPPROTO_TCP1, [1], 4)     = 0
-
-
- because of problems described in a - paper by John Heidemann. - -

Notice the two time calls:

- -
+
+

Next, the worker thread puts the connection to the client (file +descriptor 9) in non-blocking mode. The setsockopt(2) and getsockopt(2) +calls are a side-effect of how Solaris's libc handles fcntl(2) on sockets.

+
-time(NULL)                              = 873959960
-...
-time(NULL)                              = 873959960
+/65:    read(9, " G E T   / 1 0 k . h t m".., 8000)     = 97
 
-
- One of these occurs at the beginning of the request, and the - other occurs as a result of writing the log. At least one of - these is required to properly implement the HTTP protocol. The - second occurs because the Common Log Format dictates that the - log record include a timestamp of the end of the request. A - custom logging module could eliminate one of the calls. Or you - can use a method which moves the time into shared memory, see - the patches section below. - -

As described earlier, ExtendedStatus On causes - two gettimeofday calls and a call to - times:

- -
+
+

The worker thread reads the request from the client.

+
-gettimeofday({873959960, 404935}, NULL) = 0
-...
-gettimeofday({873959960, 417742}, NULL) = 0
-times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 446747
+/65:    stat("/var/httpd/apache/httpd-8999/htdocs/10k.html", 0xFAF7B978) = 0
+/65:    open("/var/httpd/apache/httpd-8999/htdocs/10k.html", O_RDONLY) = 10
 
-
- These can be removed by setting ExtendedStatus Off - (which is the default). - -

It might seem odd to call stat:

- -
+
+

This httpd has been configured with Options FollowSymLinks +and AllowOverride None. Thus it doesn't need to lstat(2) +each directory in the path leading up to the requested file, nor +check for .htaccess files. It simply calls stat(2) to +verify that the file: 1) exists, and 2) is a regular file, not a +directory. +

-stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0
+/65:    sendfilev(0, 9, 0x00200F90, 2, 0xFAF7B53C)      = 10269
 
-
- This is part of the algorithm which calculates the - PATH_INFO for use by CGIs. In fact if the request - had been for the URI /cgi-bin/printenv/foobar then - there would be two calls to stat. The first for - /home/dgaudet/ap/apachen/cgi-bin/printenv/foobar - which does not exist, and the second for - /home/dgaudet/ap/apachen/cgi-bin/printenv, which - does exist. Regardless, at least one stat call is - necessary when serving static files because the file size and - modification times are used to generate HTTP headers (such as - Content-Length, Last-Modified) and - implement protocol features (such as - If-Modified-Since). A somewhat more clever server - could avoid the stat when serving non-static - files, however doing so in Apache is very difficult given the - modular structure. - -

All static files are served using mmap:

- -
+
+

In this example, the httpd is able to send the HTTP response +header and the requested file with a single sendfilev(2) system call. +Sendfile semantics vary among operating systems. On some other +systems, it is necessary to do a write(2) or writev(2) call to +send the headers before calling sendfile(2).

+
-mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400ee000
-...
-munmap(0x400ee000, 6144)                = 0
+/65:    write(4, " 1 2 7 . 0 . 0 . 1   -  ".., 78)      = 78
 
-
- On some architectures it's slower to mmap small - files than it is to simply read them. The define - MMAP_THRESHOLD can be set to the minimum size - required before using mmap. By default it's set to - 0 (except on SunOS4 where experimentation has shown 8192 to be - a better value). Using a tool such as lmbench you can - determine the optimal setting for your environment. - -

You may also wish to experiment with - MMAP_SEGMENT_SIZE (default 32768) which determines - the maximum number of bytes that will be written at a time from - mmap()d files. Apache only resets the client's - Timeout in between write()s. So setting this large - may lock out low bandwidth clients unless you also increase the - Timeout.

- -

It may even be the case that mmap isn't used on - your architecture; if so then defining - USE_MMAP_FILES and HAVE_MMAP might - work (if it works then report back to us).

- -

Apache does its best to avoid copying bytes around in - memory. The first write of any request typically is turned into - a writev which combines both the headers and the - first hunk of data:

- -
+
+

This write(2) call records the request in the access log. +Note that one thing missing from this trace is a time(2) call. +Unlike Apache 1.3, Apache 2.0 uses gettimeofday(3) to look up +the time. On some operating systems, like Linux or Solaris, +gettimeofday has an optimized implementation that doesn't require +as much overhead as a typical system call.

+
-writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389
+/65:    shutdown(9, 1, 1)                               = 0
+/65:    poll(0xFAF7B980, 1, 2000)                       = 1
+/65:    read(9, 0xFAF7BC20, 512)                        = 0
+/65:    close(9)                                        = 0
 
-
- When doing HTTP/1.1 chunked encoding Apache will generate up to - four element writevs. The goal is to push the byte - copying into the kernel, where it typically has to happen - anyhow (to assemble network packets). On testing, various - Unixes (BSDI 2.x, Solaris 2.5, Linux 2.0.31+) properly combine - the elements into network packets. Pre-2.0.31 Linux will not - combine, and will create a packet for each element, so - upgrading is a good idea. Defining NO_WRITEV will - disable this combining, but result in very poor chunked - encoding performance. - -

The log write:

- -
+
+

The worker thread does a lingering close of the connection.

+
-write(17, "127.0.0.1 - - [10/Sep/1997:23:39"..., 71) = 71
+/65:    close(10)                                       = 0
+/65:    lwp_park(0x00000000, 0)         (sleeping...)
 
-
- can be deferred by defining BUFFERED_LOGS. In this - case up to PIPE_BUF bytes (a POSIX defined - constant) of log entries are buffered before writing. At no - time does it split a log entry across a PIPE_BUF - boundary because those writes may not be atomic. - (i.e., entries from multiple children could become - mixed together). The code does its best to flush this buffer - when a child dies. - -

The lingering close code causes four system calls:

- -
+
+

Finally the worker thread closes the file that it has just delivered +and blocks until the listener assigns it another connection.

+
-shutdown(3, 1 /* send */)               = 0
-oldselect(4, [3], NULL, [3], {2, 0})    = 1 (in [3], left {2, 0})
-read(3, "", 2048)                       = 0
-close(3)                                = 0
+/67:    accept(3, 0x001FEB74, 0x001FEB94, 1) (sleeping...)
+
+

Meanwhile, the listener thread is able to accept another connection +as soon as it has dispatched this connection to a worker thread (subject +to some flow-control logic in the worker MPM that throttles the listener +if all the available workers are busy). Though it isn't apparent from +this trace, the next accept(2) can (and usually does, under high load +conditions) occur in parallel with the worker thread's handling of the +just-accepted connection.

+
- which were described earlier. -

Let's apply some of these optimizations: - -DSINGLE_LISTEN_UNSERIALIZED_ACCEPT - -DBUFFERED_LOGS and ExtendedStatus Off. - Here's the final trace:

- -
-
-accept(15, {sin_family=AF_INET, sin_port=htons(22286), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3
-sigaction(SIGUSR1, {SIG_IGN}, {0x8058c98, [], SA_INTERRUPT}) = 0
-getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
-setsockopt(3, IPPROTO_TCP1, [1], 4)     = 0
-read(3, "GET /6k HTTP/1.0\r\nUser-Agent: "..., 4096) = 60
-sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0
-time(NULL)                              = 873961916
-stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0
-open("/home/dgaudet/ap/apachen/htdocs/6k", O_RDONLY) = 4
-mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400e3000
-writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389
-close(4)                                = 0
-time(NULL)                              = 873961916
-shutdown(3, 1 /* send */)               = 0
-oldselect(4, [3], NULL, [3], {2, 0})    = 1 (in [3], left {2, 0})
-read(3, "", 2048)                       = 0
-close(3)                                = 0
-sigaction(SIGUSR1, {0x8058c98, [], SA_INTERRUPT}, {SIG_IGN}) = 0
-munmap(0x400e3000, 6144)                = 0
-
-
- That's 19 system calls, of which 4 remain relatively easy to - remove, but don't seem worth the effort. - -

Appendix: The - Pre-Forking Model

- -

Apache (on Unix) is a pre-forking model server. The - parent process is responsible only for forking - child processes, it does not serve any requests or - service any network sockets. The child processes actually - process connections, they serve multiple connections (one at a - time) before dying. The parent spawns new or kills off old - children in response to changes in the load on the server (it - does so by monitoring a scoreboard which the children keep up - to date).

- -

This model for servers offers a robustness that other models - do not. In particular, the parent code is very simple, and with - a high degree of confidence the parent will continue to do its - job without error. The children are complex, and when you add - in third party code via modules, you risk segmentation faults - and other forms of corruption. Even should such a thing happen, - it only affects one connection and the server continues serving - requests. The parent quickly replaces the dead child.

- -

Pre-forking is also very portable across dialects of Unix. - Historically this has been an important goal for Apache, and it - continues to remain so.

- -

The pre-forking model comes under criticism for various - performance aspects. Of particular concern are the overhead of - forking a process, the overhead of context switches between - processes, and the memory overhead of having multiple - processes. Furthermore it does not offer as many opportunities - for data-caching between requests (such as a pool of - mmapped files). Various other models exist and - extensive analysis can be found in the papers - of the JAWS project. In practice all of these costs vary - drastically depending on the operating system.

- -

Apache's core code is already multithread aware, and Apache - version 1.3 is multithreaded on NT. There have been at least - two other experimental implementations of threaded Apache, one - using the 1.3 code base on DCE, and one using a custom - user-level threads package and the 1.0 code base; neither is - publicly available. There is also an experimental port of - Apache 1.3 to Netscape's - Portable Run Time, which is - available (but you're encouraged to join the new-httpd mailing - list if you intend to use it). Part of our redesign for - version 2.0 of Apache will include abstractions of the server - model so that we can continue to support the pre-forking model, - and also support various threaded models. - -

-- 2.40.0