<ul>
<li><a href="#trace">Detailed Analysis of a
Trace</a></li>
-
- <li><a href="#preforking">The Pre-Forking Model</a></li>
</ul>
</li>
</ul>
<h3><a id="trace" name="trace">Appendix: Detailed Analysis of a
Trace</a></h3>
- Here is a system call trace of Apache 1.3 running on Linux. The
- run-time configuration file is essentially the default plus:
+ <p>Here is a system call trace of Apache 2.0.38 with the worker MPM
+ on Solaris 8. This trace was collected using:</p>
+<blockquote>
+<code>truss -l -p <i>httpd_child_pid</i></code>.</code>
+</blockquote>
+ <p>The <code>-l</code> option tells truss to log the ID of the
+ LWP (lightweight process--Solaris's form of kernel-level thread)
+ that invokes each system call.</p>
+
+ <p>Other systems may have different system call tracing utilities
+ such as <code>strace</code>, <code>ktrace</code>, or <code>par</code>.
+ They all produce similar output.</p>
+
+ <p>In this trace, a client has requested a 10KB static file
+ from the httpd. Traces of non-static requests or requests
+ with content negotiation look wildly different (and quite ugly
+ in some cases).
<blockquote>
<pre>
-<Directory />
- AllowOverride none
- Options FollowSymLinks
-</Directory>
+/67: accept(3, 0x00200BEC, 0x00200C0C, 1) (sleeping...)
+/67: accept(3, 0x00200BEC, 0x00200C0C, 1) = 9
</pre>
- </blockquote>
- The file being requested is a static 6K file of no particular
- content. Traces of non-static requests or requests with content
- negotiation look wildly different (and quite ugly in some
- cases). First the entire trace, then we'll examine details.
- (This was generated by the <code>strace</code> program, other
- similar programs include <code>truss</code>,
- <code>ktrace</code>, and <code>par</code>.)
-
- <blockquote>
+<blockquote>
+<p>In this trace, the listener thread is running within LWP #67.</p>
+<p>Note the lack of accept(2) serialization. On this particular
+platform, the worker MPM uses an unserialized accept by default
+unless it is listening on multiple ports.</p>
+</blockquote>
<pre>
-accept(15, {sin_family=AF_INET, sin_port=htons(22283), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3
-flock(18, LOCK_UN) = 0
-sigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0
-getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
-setsockopt(3, IPPROTO_TCP1, [1], 4) = 0
-read(3, "GET /6k HTTP/1.0\r\nUser-Agent: "..., 4096) = 60
-sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0
-time(NULL) = 873959960
-gettimeofday({873959960, 404935}, NULL) = 0
-stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0
-open("/home/dgaudet/ap/apachen/htdocs/6k", O_RDONLY) = 4
-mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400ee000
-writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389
-close(4) = 0
-time(NULL) = 873959960
-write(17, "127.0.0.1 - - [10/Sep/1997:23:39"..., 71) = 71
-gettimeofday({873959960, 417742}, NULL) = 0
-times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 446747
-shutdown(3, 1 /* send */) = 0
-oldselect(4, [3], NULL, [3], {2, 0}) = 1 (in [3], left {2, 0})
-read(3, "", 2048) = 0
-close(3) = 0
-sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0
-munmap(0x400ee000, 6144) = 0
-flock(18, LOCK_EX) = 0
+/65: lwp_park(0x00000000, 0) = 0
+/67: lwp_unpark(65, 1) = 0
</pre>
- </blockquote>
-
- <p>Notice the accept serialization:</p>
-
- <blockquote>
+<blockquote>
+<p>Upon accepting the connection, the listener thread wakes up
+a worker thread to do the request processing. In this trace,
+the worker thread that handles the request is mapped to LWP #65.</p>
+</blockquote>
<pre>
-flock(18, LOCK_UN) = 0
-...
-flock(18, LOCK_EX) = 0
+/65: getsockname(9, 0x00200BA4, 0x00200BC4, 1) = 0
</pre>
- </blockquote>
- These two calls can be removed by defining
- <code>SINGLE_LISTEN_UNSERIALIZED_ACCEPT</code> as described
- earlier.
-
- <p>Notice the <code>SIGUSR1</code> manipulation:</p>
-
- <blockquote>
+<blockquote>
+<p>In order to implement virtual hosts, Apache needs to know
+the local socket address used to accept the connection. It
+is possible to eliminate this call in many situations (such
+as when there are no virtual hosts, or when <code>Listen</code>
+directives are used which do not have wildcard addresses). But
+no effort has yet been made to do these optimizations. </p>
+</blockquote>
<pre>
-sigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0
-...
-sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0
-...
-sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0
+/65: brk(0x002170E8) = 0
+/65: brk(0x002190E8) = 0
</pre>
- </blockquote>
- This is caused by the implementation of graceful restarts. When
- the parent receives a <code>SIGUSR1</code> it sends a
- <code>SIGUSR1</code> to all of its children (and it also
- increments a "generation counter" in shared memory). Any
- children that are idle (between connections) will immediately
- die off when they receive the signal. Any children that are in
- keep-alive connections, but are in between requests will die
- off immediately. But any children that have a connection and
- are still waiting for the first request will not die off
- immediately.
-
- <p>To see why this is necessary, consider how a browser reacts
- to a closed connection. If the connection was a keep-alive
- connection and the request being serviced was not the first
- request then the browser will quietly reissue the request on a
- new connection. It has to do this because the server is always
- free to close a keep-alive connection in between requests
- (<em>i.e.</em>, due to a timeout or because of a maximum number
- of requests). But, if the connection is closed before the first
- response has been received the typical browser will display a
- "document contains no data" dialogue (or a broken image icon).
- This is done on the assumption that the server is broken in
- some way (or maybe too overloaded to respond at all). So Apache
- tries to avoid ever deliberately closing the connection before
- it has sent a single response. This is the cause of those
- <code>SIGUSR1</code> manipulations.</p>
-
- <p>Note that it is theoretically possible to eliminate all
- three of these calls. But in rough tests the gain proved to be
- almost unnoticeable.</p>
-
- <p>In order to implement virtual hosts, Apache needs to know
- the local socket address used to accept the connection:</p>
-
- <blockquote>
+<blockquote>
+<p>The brk(2) calls allocate memory from the heap. It is rare
+to see these in a system call trace, because the httpd uses
+custom memory allocators (<code>apr_pool</code> and
+<code>apr_bucket_alloc</code>) for most request processing.
+In this trace, the httpd has just been started, so it must
+call malloc(3) to get the blocks of raw memory with which
+to create the custom memory allocators.
+</blockquote>
<pre>
-getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
+/65: fcntl(9, F_GETFL, 0x00000000) = 2
+/65: fstat64(9, 0xFAF7B818) = 0
+/65: getsockopt(9, 65535, 8192, 0xFAF7B918, 0xFAF7B910, 2190656) = 0
+/65: fstat64(9, 0xFAF7B818) = 0
+/65: getsockopt(9, 65535, 8192, 0xFAF7B918, 0xFAF7B914, 2190656) = 0
+/65: setsockopt(9, 65535, 8192, 0xFAF7B918, 4, 2190656) = 0
+/65: fcntl(9, F_SETFL, 0x00000082) = 0
</pre>
- </blockquote>
- It is possible to eliminate this call in many situations (such
- as when there are no virtual hosts, or when <code>Listen</code>
- directives are used which do not have wildcard addresses). But
- no effort has yet been made to do these optimizations.
-
- <p>Apache turns off the Nagle algorithm:</p>
-
- <blockquote>
-<pre>
-setsockopt(3, IPPROTO_TCP1, [1], 4) = 0
-</pre>
- </blockquote>
- because of problems described in <a
- href="http://www.isi.edu/~johnh/PAPERS/Heidemann97a.html">a
- paper by John Heidemann</a>.
-
- <p>Notice the two <code>time</code> calls:</p>
-
- <blockquote>
+<blockquote>
+<p>Next, the worker thread puts the connection to the client (file
+descriptor 9) in non-blocking mode. The setsockopt(2) and getsockopt(2)
+calls are a side-effect of how Solaris's libc handles fcntl(2) on sockets.</p>
+</blockquote>
<pre>
-time(NULL) = 873959960
-...
-time(NULL) = 873959960
+/65: read(9, " G E T / 1 0 k . h t m".., 8000) = 97
</pre>
- </blockquote>
- One of these occurs at the beginning of the request, and the
- other occurs as a result of writing the log. At least one of
- these is required to properly implement the HTTP protocol. The
- second occurs because the Common Log Format dictates that the
- log record include a timestamp of the end of the request. A
- custom logging module could eliminate one of the calls. Or you
- can use a method which moves the time into shared memory, see
- the <a href="#patches">patches section below</a>.
-
- <p>As described earlier, <code>ExtendedStatus On</code> causes
- two <code>gettimeofday</code> calls and a call to
- <code>times</code>:</p>
-
- <blockquote>
+<blockquote>
+<p>The worker thread reads the request from the client.</p>
+</blockquote>
<pre>
-gettimeofday({873959960, 404935}, NULL) = 0
-...
-gettimeofday({873959960, 417742}, NULL) = 0
-times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 446747
+/65: stat("/var/httpd/apache/httpd-8999/htdocs/10k.html", 0xFAF7B978) = 0
+/65: open("/var/httpd/apache/httpd-8999/htdocs/10k.html", O_RDONLY) = 10
</pre>
- </blockquote>
- These can be removed by setting <code>ExtendedStatus Off</code>
- (which is the default).
-
- <p>It might seem odd to call <code>stat</code>:</p>
-
- <blockquote>
+<blockquote>
+<p>This httpd has been configured with <code>Options FollowSymLinks</code>
+and <code>AllowOverride None</code>. Thus it doesn't need to lstat(2)
+each directory in the path leading up to the requested file, nor
+check for <code>.htaccess</code> files. It simply calls stat(2) to
+verify that the file: 1) exists, and 2) is a regular file, not a
+directory.
+</blockquote>
<pre>
-stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0
+/65: sendfilev(0, 9, 0x00200F90, 2, 0xFAF7B53C) = 10269
</pre>
- </blockquote>
- This is part of the algorithm which calculates the
- <code>PATH_INFO</code> for use by CGIs. In fact if the request
- had been for the URI <code>/cgi-bin/printenv/foobar</code> then
- there would be two calls to <code>stat</code>. The first for
- <code>/home/dgaudet/ap/apachen/cgi-bin/printenv/foobar</code>
- which does not exist, and the second for
- <code>/home/dgaudet/ap/apachen/cgi-bin/printenv</code>, which
- does exist. Regardless, at least one <code>stat</code> call is
- necessary when serving static files because the file size and
- modification times are used to generate HTTP headers (such as
- <code>Content-Length</code>, <code>Last-Modified</code>) and
- implement protocol features (such as
- <code>If-Modified-Since</code>). A somewhat more clever server
- could avoid the <code>stat</code> when serving non-static
- files, however doing so in Apache is very difficult given the
- modular structure.
-
- <p>All static files are served using <code>mmap</code>:</p>
-
- <blockquote>
+<blockquote>
+<p>In this example, the httpd is able to send the HTTP response
+header and the requested file with a single sendfilev(2) system call.
+Sendfile semantics vary among operating systems. On some other
+systems, it is necessary to do a write(2) or writev(2) call to
+send the headers before calling sendfile(2).</p>
+</blockquote>
<pre>
-mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400ee000
-...
-munmap(0x400ee000, 6144) = 0
+/65: write(4, " 1 2 7 . 0 . 0 . 1 - ".., 78) = 78
</pre>
- </blockquote>
- On some architectures it's slower to <code>mmap</code> small
- files than it is to simply <code>read</code> them. The define
- <code>MMAP_THRESHOLD</code> can be set to the minimum size
- required before using <code>mmap</code>. By default it's set to
- 0 (except on SunOS4 where experimentation has shown 8192 to be
- a better value). Using a tool such as <a
- href="http://www.bitmover.com/lmbench/">lmbench</a> you can
- determine the optimal setting for your environment.
-
- <p>You may also wish to experiment with
- <code>MMAP_SEGMENT_SIZE</code> (default 32768) which determines
- the maximum number of bytes that will be written at a time from
- mmap()d files. Apache only resets the client's
- <code>Timeout</code> in between write()s. So setting this large
- may lock out low bandwidth clients unless you also increase the
- <code>Timeout</code>.</p>
-
- <p>It may even be the case that <code>mmap</code> isn't used on
- your architecture; if so then defining
- <code>USE_MMAP_FILES</code> and <code>HAVE_MMAP</code> might
- work (if it works then report back to us).</p>
-
- <p>Apache does its best to avoid copying bytes around in
- memory. The first write of any request typically is turned into
- a <code>writev</code> which combines both the headers and the
- first hunk of data:</p>
-
- <blockquote>
+<blockquote>
+<p>This write(2) call records the request in the access log.
+Note that one thing missing from this trace is a time(2) call.
+Unlike Apache 1.3, Apache 2.0 uses gettimeofday(3) to look up
+the time. On some operating systems, like Linux or Solaris,
+gettimeofday has an optimized implementation that doesn't require
+as much overhead as a typical system call.</p>
+</blockquote>
<pre>
-writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389
+/65: shutdown(9, 1, 1) = 0
+/65: poll(0xFAF7B980, 1, 2000) = 1
+/65: read(9, 0xFAF7BC20, 512) = 0
+/65: close(9) = 0
</pre>
- </blockquote>
- When doing HTTP/1.1 chunked encoding Apache will generate up to
- four element <code>writev</code>s. The goal is to push the byte
- copying into the kernel, where it typically has to happen
- anyhow (to assemble network packets). On testing, various
- Unixes (BSDI 2.x, Solaris 2.5, Linux 2.0.31+) properly combine
- the elements into network packets. Pre-2.0.31 Linux will not
- combine, and will create a packet for each element, so
- upgrading is a good idea. Defining <code>NO_WRITEV</code> will
- disable this combining, but result in very poor chunked
- encoding performance.
-
- <p>The log write:</p>
-
- <blockquote>
+<blockquote>
+<p>The worker thread does a lingering close of the connection.</p>
+</blockquote>
<pre>
-write(17, "127.0.0.1 - - [10/Sep/1997:23:39"..., 71) = 71
+/65: close(10) = 0
+/65: lwp_park(0x00000000, 0) (sleeping...)
</pre>
- </blockquote>
- can be deferred by defining <code>BUFFERED_LOGS</code>. In this
- case up to <code>PIPE_BUF</code> bytes (a POSIX defined
- constant) of log entries are buffered before writing. At no
- time does it split a log entry across a <code>PIPE_BUF</code>
- boundary because those writes may not be atomic.
- (<em>i.e.</em>, entries from multiple children could become
- mixed together). The code does its best to flush this buffer
- when a child dies.
-
- <p>The lingering close code causes four system calls:</p>
-
- <blockquote>
+<blockquote>
+<p>Finally the worker thread closes the file that it has just delivered
+and blocks until the listener assigns it another connection.</p>
+</blockquote>
<pre>
-shutdown(3, 1 /* send */) = 0
-oldselect(4, [3], NULL, [3], {2, 0}) = 1 (in [3], left {2, 0})
-read(3, "", 2048) = 0
-close(3) = 0
+/67: accept(3, 0x001FEB74, 0x001FEB94, 1) (sleeping...)</pre>
</pre>
+<blockquote>
+<p>Meanwhile, the listener thread is able to accept another connection
+as soon as it has dispatched this connection to a worker thread (subject
+to some flow-control logic in the worker MPM that throttles the listener
+if all the available workers are busy). Though it isn't apparent from
+this trace, the next accept(2) can (and usually does, under high load
+conditions) occur in parallel with the worker thread's handling of the
+just-accepted connection.</p>
+</blockquote>
</blockquote>
- which were described earlier.
- <p>Let's apply some of these optimizations:
- <code>-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT
- -DBUFFERED_LOGS</code> and <code>ExtendedStatus Off</code>.
- Here's the final trace:</p>
-
- <blockquote>
-<pre>
-accept(15, {sin_family=AF_INET, sin_port=htons(22286), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3
-sigaction(SIGUSR1, {SIG_IGN}, {0x8058c98, [], SA_INTERRUPT}) = 0
-getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
-setsockopt(3, IPPROTO_TCP1, [1], 4) = 0
-read(3, "GET /6k HTTP/1.0\r\nUser-Agent: "..., 4096) = 60
-sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0
-time(NULL) = 873961916
-stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0
-open("/home/dgaudet/ap/apachen/htdocs/6k", O_RDONLY) = 4
-mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400e3000
-writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389
-close(4) = 0
-time(NULL) = 873961916
-shutdown(3, 1 /* send */) = 0
-oldselect(4, [3], NULL, [3], {2, 0}) = 1 (in [3], left {2, 0})
-read(3, "", 2048) = 0
-close(3) = 0
-sigaction(SIGUSR1, {0x8058c98, [], SA_INTERRUPT}, {SIG_IGN}) = 0
-munmap(0x400e3000, 6144) = 0
-</pre>
- </blockquote>
- That's 19 system calls, of which 4 remain relatively easy to
- remove, but don't seem worth the effort.
-
- <h3><a id="preforking" name="preforking">Appendix: The
- Pre-Forking Model</a></h3>
-
- <p>Apache (on Unix) is a <em>pre-forking</em> model server. The
- <em>parent</em> process is responsible only for forking
- <em>child</em> processes, it does not serve any requests or
- service any network sockets. The child processes actually
- process connections, they serve multiple connections (one at a
- time) before dying. The parent spawns new or kills off old
- children in response to changes in the load on the server (it
- does so by monitoring a scoreboard which the children keep up
- to date).</p>
-
- <p>This model for servers offers a robustness that other models
- do not. In particular, the parent code is very simple, and with
- a high degree of confidence the parent will continue to do its
- job without error. The children are complex, and when you add
- in third party code via modules, you risk segmentation faults
- and other forms of corruption. Even should such a thing happen,
- it only affects one connection and the server continues serving
- requests. The parent quickly replaces the dead child.</p>
-
- <p>Pre-forking is also very portable across dialects of Unix.
- Historically this has been an important goal for Apache, and it
- continues to remain so.</p>
-
- <p>The pre-forking model comes under criticism for various
- performance aspects. Of particular concern are the overhead of
- forking a process, the overhead of context switches between
- processes, and the memory overhead of having multiple
- processes. Furthermore it does not offer as many opportunities
- for data-caching between requests (such as a pool of
- <code>mmapped</code> files). Various other models exist and
- extensive analysis can be found in the <a
- href="http://www.cs.wustl.edu/~jxh/research/research.html">papers
- of the JAWS project</a>. In practice all of these costs vary
- drastically depending on the operating system.</p>
-
- <p>Apache's core code is already multithread aware, and Apache
- version 1.3 is multithreaded on NT. There have been at least
- two other experimental implementations of threaded Apache, one
- using the 1.3 code base on DCE, and one using a custom
- user-level threads package and the 1.0 code base; neither is
- publicly available. There is also an experimental port of
- Apache 1.3 to <a
- href="http://www.mozilla.org/docs/refList/refNSPR/">Netscape's
- Portable Run Time</a>, which <a
- href="http://www.arctic.org/~dgaudet/apache/2.0/">is
- available</a> (but you're encouraged to join the <a
- href="http://dev.apache.org/mailing-lists">new-httpd mailing
- list</a> if you intend to use it). Part of our redesign for
- version 2.0 of Apache will include abstractions of the server
- model so that we can continue to support the pre-forking model,
- and also support various threaded models.
- <!--#include virtual="footer.html" -->
- </p>
</body>
</html>