More changes to bring the performance tuning guide closer to compatibility

author Brian Pane <brianp@apache.org>

Sun, 16 Jun 2002 22:55:11 +0000 (22:55 +0000)

committer Brian Pane <brianp@apache.org>

Sun, 16 Jun 2002 22:55:11 +0000 (22:55 +0000)
author Brian Pane <brianp@apache.org>
Sun, 16 Jun 2002 22:55:11 +0000 (22:55 +0000)
committer Brian Pane <brianp@apache.org>
Sun, 16 Jun 2002 22:55:11 +0000 (22:55 +0000)
diff --git a/docs/manual/misc/perf-tuning.html b/docs/manual/misc/perf-tuning.html

index f0864f75bfdfc00e95959fde53888fa881a05ffa..fbe23faef184214d5918c2889036a0c68fd3e0cb 100644 (file)
--- a/docs/manual/misc/perf-tuning.html
+++ b/docs/manual/misc/perf-tuning.html
@@ -41,8 +41,6 @@
          <ul>
            <li><a href="#trace">Detailed Analysis of a
            Trace</a></li>
-
-          <li><a href="#preforking">The Pre-Forking Model</a></li>
          </ul>
        </li>
      </ul>
@@ -705,358 +703,152 @@ DirectoryIndex index.cgi index.pl index.shtml index.html
  
      <h3><a id="trace" name="trace">Appendix: Detailed Analysis of a
      Trace</a></h3>
-    Here is a system call trace of Apache 1.3 running on Linux. The
-    run-time configuration file is essentially the default plus: 
+    <p>Here is a system call trace of Apache 2.0.38 with the worker MPM
+    on Solaris 8.  This trace was collected using:</p>
+<blockquote>
+<code>truss -l -p <i>httpd_child_pid</i></code>.</code>
+</blockquote>
+    <p>The <code>-l</code> option tells truss to log the ID of the
+    LWP (lightweight process--Solaris's form of kernel-level thread)
+    that invokes each system call.</p>
+
+    <p>Other systems may have different system call tracing utilities
+    such as <code>strace</code>, <code>ktrace</code>, or <code>par</code>.
+    They all produce similar output.</p>
+
+    <p>In this trace, a client has requested a 10KB static file
+    from the httpd.  Traces of non-static requests or requests
+    with content negotiation look wildly different (and quite ugly
+    in some cases).
  
      <blockquote>
  <pre>
-&lt;Directory /&gt;
-    AllowOverride none
-    Options FollowSymLinks
-&lt;/Directory&gt;
+/67:    accept(3, 0x00200BEC, 0x00200C0C, 1) (sleeping...)
+/67:    accept(3, 0x00200BEC, 0x00200C0C, 1)            = 9
  </pre>
-    </blockquote>
-    The file being requested is a static 6K file of no particular
-    content. Traces of non-static requests or requests with content
-    negotiation look wildly different (and quite ugly in some
-    cases). First the entire trace, then we'll examine details.
-    (This was generated by the <code>strace</code> program, other
-    similar programs include <code>truss</code>,
-    <code>ktrace</code>, and <code>par</code>.) 
-
-    <blockquote>
+<blockquote>
+<p>In this trace, the listener thread is running within LWP #67.</p>
+<p>Note the lack of accept(2) serialization.  On this particular
+platform, the worker MPM uses an unserialized accept by default
+unless it is listening on multiple ports.</p>
+</blockquote>
  <pre>
-accept(15, {sin_family=AF_INET, sin_port=htons(22283), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3
-flock(18, LOCK_UN)                      = 0
-sigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0
-getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
-setsockopt(3, IPPROTO_TCP1, [1], 4)     = 0
-read(3, "GET /6k HTTP/1.0\r\nUser-Agent: "..., 4096) = 60
-sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0
-time(NULL)                              = 873959960
-gettimeofday({873959960, 404935}, NULL) = 0
-stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0
-open("/home/dgaudet/ap/apachen/htdocs/6k", O_RDONLY) = 4
-mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400ee000
-writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389
-close(4)                                = 0
-time(NULL)                              = 873959960
-write(17, "127.0.0.1 - - [10/Sep/1997:23:39"..., 71) = 71
-gettimeofday({873959960, 417742}, NULL) = 0
-times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 446747
-shutdown(3, 1 /* send */)               = 0
-oldselect(4, [3], NULL, [3], {2, 0})    = 1 (in [3], left {2, 0})
-read(3, "", 2048)                       = 0
-close(3)                                = 0
-sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0
-munmap(0x400ee000, 6144)                = 0
-flock(18, LOCK_EX)                      = 0
+/65:    lwp_park(0x00000000, 0)                         = 0
+/67:    lwp_unpark(65, 1)                               = 0
  </pre>
-    </blockquote>
-
-    <p>Notice the accept serialization:</p>
-
-    <blockquote>
+<blockquote>
+<p>Upon accepting the connection, the listener thread wakes up
+a worker thread to do the request processing.  In this trace,
+the worker thread that handles the request is mapped to LWP #65.</p>
+</blockquote>
  <pre>
-flock(18, LOCK_UN)                      = 0
-...
-flock(18, LOCK_EX)                      = 0
+/65:    getsockname(9, 0x00200BA4, 0x00200BC4, 1)       = 0
  </pre>
-    </blockquote>
-    These two calls can be removed by defining
-    <code>SINGLE_LISTEN_UNSERIALIZED_ACCEPT</code> as described
-    earlier. 
-
-    <p>Notice the <code>SIGUSR1</code> manipulation:</p>
-
-    <blockquote>
+<blockquote>
+<p>In order to implement virtual hosts, Apache needs to know
+the local socket address used to accept the connection.  It
+is possible to eliminate this call in many situations (such
+as when there are no virtual hosts, or when <code>Listen</code>
+directives are used which do not have wildcard addresses). But
+no effort has yet been made to do these optimizations. </p>
+</blockquote>
  <pre>
-sigaction(SIGUSR1, {SIG_IGN}, {0x8059954, [], SA_INTERRUPT}) = 0
-...
-sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0
-...
-sigaction(SIGUSR1, {0x8059954, [], SA_INTERRUPT}, {SIG_IGN}) = 0
+/65:    brk(0x002170E8)                                 = 0
+/65:    brk(0x002190E8)                                 = 0
  </pre>
-    </blockquote>
-    This is caused by the implementation of graceful restarts. When
-    the parent receives a <code>SIGUSR1</code> it sends a
-    <code>SIGUSR1</code> to all of its children (and it also
-    increments a "generation counter" in shared memory). Any
-    children that are idle (between connections) will immediately
-    die off when they receive the signal. Any children that are in
-    keep-alive connections, but are in between requests will die
-    off immediately. But any children that have a connection and
-    are still waiting for the first request will not die off
-    immediately. 
-
-    <p>To see why this is necessary, consider how a browser reacts
-    to a closed connection. If the connection was a keep-alive
-    connection and the request being serviced was not the first
-    request then the browser will quietly reissue the request on a
-    new connection. It has to do this because the server is always
-    free to close a keep-alive connection in between requests
-    (<em>i.e.</em>, due to a timeout or because of a maximum number
-    of requests). But, if the connection is closed before the first
-    response has been received the typical browser will display a
-    "document contains no data" dialogue (or a broken image icon).
-    This is done on the assumption that the server is broken in
-    some way (or maybe too overloaded to respond at all). So Apache
-    tries to avoid ever deliberately closing the connection before
-    it has sent a single response. This is the cause of those
-    <code>SIGUSR1</code> manipulations.</p>
-
-    <p>Note that it is theoretically possible to eliminate all
-    three of these calls. But in rough tests the gain proved to be
-    almost unnoticeable.</p>
-
-    <p>In order to implement virtual hosts, Apache needs to know
-    the local socket address used to accept the connection:</p>
-
-    <blockquote>
+<blockquote>
+<p>The brk(2) calls allocate memory from the heap.  It is rare
+to see these in a system call trace, because the httpd uses
+custom memory allocators (<code>apr_pool</code> and
+<code>apr_bucket_alloc</code>) for most request processing.
+In this trace, the httpd has just been started, so it must
+call malloc(3) to get the blocks of raw memory with which
+to create the custom memory allocators.
+</blockquote>
  <pre>
-getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
+/65:    fcntl(9, F_GETFL, 0x00000000)                   = 2
+/65:    fstat64(9, 0xFAF7B818)                          = 0
+/65:    getsockopt(9, 65535, 8192, 0xFAF7B918, 0xFAF7B910, 2190656) = 0
+/65:    fstat64(9, 0xFAF7B818)                          = 0
+/65:    getsockopt(9, 65535, 8192, 0xFAF7B918, 0xFAF7B914, 2190656) = 0
+/65:    setsockopt(9, 65535, 8192, 0xFAF7B918, 4, 2190656) = 0
+/65:    fcntl(9, F_SETFL, 0x00000082)                   = 0
  </pre>
-    </blockquote>
-    It is possible to eliminate this call in many situations (such
-    as when there are no virtual hosts, or when <code>Listen</code>
-    directives are used which do not have wildcard addresses). But
-    no effort has yet been made to do these optimizations. 
-
-    <p>Apache turns off the Nagle algorithm:</p>
-
-    <blockquote>
-<pre>
-setsockopt(3, IPPROTO_TCP1, [1], 4)     = 0
-</pre>
-    </blockquote>
-    because of problems described in <a
-    href="http://www.isi.edu/~johnh/PAPERS/Heidemann97a.html">a
-    paper by John Heidemann</a>. 
-
-    <p>Notice the two <code>time</code> calls:</p>
-
-    <blockquote>
+<blockquote>
+<p>Next, the worker thread puts the connection to the client (file
+descriptor 9) in non-blocking mode.  The setsockopt(2) and getsockopt(2)
+calls are a side-effect of how Solaris's libc handles fcntl(2) on sockets.</p>
+</blockquote>
  <pre>
-time(NULL)                              = 873959960
-...
-time(NULL)                              = 873959960
+/65:    read(9, " G E T   / 1 0 k . h t m".., 8000)     = 97
  </pre>
-    </blockquote>
-    One of these occurs at the beginning of the request, and the
-    other occurs as a result of writing the log. At least one of
-    these is required to properly implement the HTTP protocol. The
-    second occurs because the Common Log Format dictates that the
-    log record include a timestamp of the end of the request. A
-    custom logging module could eliminate one of the calls. Or you
-    can use a method which moves the time into shared memory, see
-    the <a href="#patches">patches section below</a>. 
-
-    <p>As described earlier, <code>ExtendedStatus On</code> causes
-    two <code>gettimeofday</code> calls and a call to
-    <code>times</code>:</p>
-
-    <blockquote>
+<blockquote>
+<p>The worker thread reads the request from the client.</p>
+</blockquote>
  <pre>
-gettimeofday({873959960, 404935}, NULL) = 0
-...
-gettimeofday({873959960, 417742}, NULL) = 0
-times({tms_utime=5, tms_stime=0, tms_cutime=0, tms_cstime=0}) = 446747
+/65:    stat("/var/httpd/apache/httpd-8999/htdocs/10k.html", 0xFAF7B978) = 0
+/65:    open("/var/httpd/apache/httpd-8999/htdocs/10k.html", O_RDONLY) = 10
  </pre>
-    </blockquote>
-    These can be removed by setting <code>ExtendedStatus Off</code>
-    (which is the default). 
-
-    <p>It might seem odd to call <code>stat</code>:</p>
-
-    <blockquote>
+<blockquote>
+<p>This httpd has been configured with <code>Options FollowSymLinks</code>
+and <code>AllowOverride None</code>.  Thus it doesn't need to lstat(2)
+each directory in the path leading up to the requested file, nor
+check for <code>.htaccess</code> files.  It simply calls stat(2) to
+verify that the file: 1) exists, and 2) is a regular file, not a
+directory.
+</blockquote>
  <pre>
-stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0
+/65:    sendfilev(0, 9, 0x00200F90, 2, 0xFAF7B53C)      = 10269
  </pre>
-    </blockquote>
-    This is part of the algorithm which calculates the
-    <code>PATH_INFO</code> for use by CGIs. In fact if the request
-    had been for the URI <code>/cgi-bin/printenv/foobar</code> then
-    there would be two calls to <code>stat</code>. The first for
-    <code>/home/dgaudet/ap/apachen/cgi-bin/printenv/foobar</code>
-    which does not exist, and the second for
-    <code>/home/dgaudet/ap/apachen/cgi-bin/printenv</code>, which
-    does exist. Regardless, at least one <code>stat</code> call is
-    necessary when serving static files because the file size and
-    modification times are used to generate HTTP headers (such as
-    <code>Content-Length</code>, <code>Last-Modified</code>) and
-    implement protocol features (such as
-    <code>If-Modified-Since</code>). A somewhat more clever server
-    could avoid the <code>stat</code> when serving non-static
-    files, however doing so in Apache is very difficult given the
-    modular structure. 
-
-    <p>All static files are served using <code>mmap</code>:</p>
-
-    <blockquote>
+<blockquote>
+<p>In this example, the httpd is able to send the HTTP response
+header and the requested file with a single sendfilev(2) system call.
+Sendfile semantics vary among operating systems.  On some other
+systems, it is necessary to do a write(2) or writev(2) call to
+send the headers before calling sendfile(2).</p>
+</blockquote>
  <pre>
-mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400ee000
-...
-munmap(0x400ee000, 6144)                = 0
+/65:    write(4, " 1 2 7 . 0 . 0 . 1   -  ".., 78)      = 78
  </pre>
-    </blockquote>
-    On some architectures it's slower to <code>mmap</code> small
-    files than it is to simply <code>read</code> them. The define
-    <code>MMAP_THRESHOLD</code> can be set to the minimum size
-    required before using <code>mmap</code>. By default it's set to
-    0 (except on SunOS4 where experimentation has shown 8192 to be
-    a better value). Using a tool such as <a
-    href="http://www.bitmover.com/lmbench/">lmbench</a> you can
-    determine the optimal setting for your environment. 
-
-    <p>You may also wish to experiment with
-    <code>MMAP_SEGMENT_SIZE</code> (default 32768) which determines
-    the maximum number of bytes that will be written at a time from
-    mmap()d files. Apache only resets the client's
-    <code>Timeout</code> in between write()s. So setting this large
-    may lock out low bandwidth clients unless you also increase the
-    <code>Timeout</code>.</p>
-
-    <p>It may even be the case that <code>mmap</code> isn't used on
-    your architecture; if so then defining
-    <code>USE_MMAP_FILES</code> and <code>HAVE_MMAP</code> might
-    work (if it works then report back to us).</p>
-
-    <p>Apache does its best to avoid copying bytes around in
-    memory. The first write of any request typically is turned into
-    a <code>writev</code> which combines both the headers and the
-    first hunk of data:</p>
-
-    <blockquote>
+<blockquote>
+<p>This write(2) call records the request in the access log.
+Note that one thing missing from this trace is a time(2) call.
+Unlike Apache 1.3, Apache 2.0 uses gettimeofday(3) to look up
+the time.  On some operating systems, like Linux or Solaris,
+gettimeofday has an optimized implementation that doesn't require
+as much overhead as a typical system call.</p>
+</blockquote>
  <pre>
-writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389
+/65:    shutdown(9, 1, 1)                               = 0
+/65:    poll(0xFAF7B980, 1, 2000)                       = 1
+/65:    read(9, 0xFAF7BC20, 512)                        = 0
+/65:    close(9)                                        = 0
  </pre>
-    </blockquote>
-    When doing HTTP/1.1 chunked encoding Apache will generate up to
-    four element <code>writev</code>s. The goal is to push the byte
-    copying into the kernel, where it typically has to happen
-    anyhow (to assemble network packets). On testing, various
-    Unixes (BSDI 2.x, Solaris 2.5, Linux 2.0.31+) properly combine
-    the elements into network packets. Pre-2.0.31 Linux will not
-    combine, and will create a packet for each element, so
-    upgrading is a good idea. Defining <code>NO_WRITEV</code> will
-    disable this combining, but result in very poor chunked
-    encoding performance. 
-
-    <p>The log write:</p>
-
-    <blockquote>
+<blockquote>
+<p>The worker thread does a lingering close of the connection.</p>
+</blockquote>
  <pre>
-write(17, "127.0.0.1 - - [10/Sep/1997:23:39"..., 71) = 71
+/65:    close(10)                                       = 0
+/65:    lwp_park(0x00000000, 0)         (sleeping...)
  </pre>
-    </blockquote>
-    can be deferred by defining <code>BUFFERED_LOGS</code>. In this
-    case up to <code>PIPE_BUF</code> bytes (a POSIX defined
-    constant) of log entries are buffered before writing. At no
-    time does it split a log entry across a <code>PIPE_BUF</code>
-    boundary because those writes may not be atomic.
-    (<em>i.e.</em>, entries from multiple children could become
-    mixed together). The code does its best to flush this buffer
-    when a child dies. 
-
-    <p>The lingering close code causes four system calls:</p>
-
-    <blockquote>
+<blockquote>
+<p>Finally the worker thread closes the file that it has just delivered
+and blocks until the listener assigns it another connection.</p>
+</blockquote>
  <pre>
-shutdown(3, 1 /* send */)               = 0
-oldselect(4, [3], NULL, [3], {2, 0})    = 1 (in [3], left {2, 0})
-read(3, "", 2048)                       = 0
-close(3)                                = 0
+/67:    accept(3, 0x001FEB74, 0x001FEB94, 1) (sleeping...)</pre>
  </pre>
+<blockquote>
+<p>Meanwhile, the listener thread is able to accept another connection
+as soon as it has dispatched this connection to a worker thread (subject
+to some flow-control logic in the worker MPM that throttles the listener
+if all the available workers are busy).  Though it isn't apparent from
+this trace, the next accept(2) can (and usually does, under high load
+conditions) occur in parallel with the worker thread's handling of the
+just-accepted connection.</p>
+</blockquote>
      </blockquote>
-    which were described earlier. 
  
-    <p>Let's apply some of these optimizations:
-    <code>-DSINGLE_LISTEN_UNSERIALIZED_ACCEPT
-    -DBUFFERED_LOGS</code> and <code>ExtendedStatus Off</code>.
-    Here's the final trace:</p>
-
-    <blockquote>
-<pre>
-accept(15, {sin_family=AF_INET, sin_port=htons(22286), sin_addr=inet_addr("127.0.0.1")}, [16]) = 3
-sigaction(SIGUSR1, {SIG_IGN}, {0x8058c98, [], SA_INTERRUPT}) = 0
-getsockname(3, {sin_family=AF_INET, sin_port=htons(8080), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
-setsockopt(3, IPPROTO_TCP1, [1], 4)     = 0
-read(3, "GET /6k HTTP/1.0\r\nUser-Agent: "..., 4096) = 60
-sigaction(SIGUSR1, {SIG_IGN}, {SIG_IGN}) = 0
-time(NULL)                              = 873961916
-stat("/home/dgaudet/ap/apachen/htdocs/6k", {st_mode=S_IFREG|0644, st_size=6144, ...}) = 0
-open("/home/dgaudet/ap/apachen/htdocs/6k", O_RDONLY) = 4
-mmap(0, 6144, PROT_READ, MAP_PRIVATE, 4, 0) = 0x400e3000
-writev(3, [{"HTTP/1.1 200 OK\r\nDate: Thu, 11"..., 245}, {"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 6144}], 2) = 6389
-close(4)                                = 0
-time(NULL)                              = 873961916
-shutdown(3, 1 /* send */)               = 0
-oldselect(4, [3], NULL, [3], {2, 0})    = 1 (in [3], left {2, 0})
-read(3, "", 2048)                       = 0
-close(3)                                = 0
-sigaction(SIGUSR1, {0x8058c98, [], SA_INTERRUPT}, {SIG_IGN}) = 0
-munmap(0x400e3000, 6144)                = 0
-</pre>
-    </blockquote>
-    That's 19 system calls, of which 4 remain relatively easy to
-    remove, but don't seem worth the effort. 
-
-    <h3><a id="preforking" name="preforking">Appendix: The
-    Pre-Forking Model</a></h3>
-
-    <p>Apache (on Unix) is a <em>pre-forking</em> model server. The
-    <em>parent</em> process is responsible only for forking
-    <em>child</em> processes, it does not serve any requests or
-    service any network sockets. The child processes actually
-    process connections, they serve multiple connections (one at a
-    time) before dying. The parent spawns new or kills off old
-    children in response to changes in the load on the server (it
-    does so by monitoring a scoreboard which the children keep up
-    to date).</p>
-
-    <p>This model for servers offers a robustness that other models
-    do not. In particular, the parent code is very simple, and with
-    a high degree of confidence the parent will continue to do its
-    job without error. The children are complex, and when you add
-    in third party code via modules, you risk segmentation faults
-    and other forms of corruption. Even should such a thing happen,
-    it only affects one connection and the server continues serving
-    requests. The parent quickly replaces the dead child.</p>
-
-    <p>Pre-forking is also very portable across dialects of Unix.
-    Historically this has been an important goal for Apache, and it
-    continues to remain so.</p>
-
-    <p>The pre-forking model comes under criticism for various
-    performance aspects. Of particular concern are the overhead of
-    forking a process, the overhead of context switches between
-    processes, and the memory overhead of having multiple
-    processes. Furthermore it does not offer as many opportunities
-    for data-caching between requests (such as a pool of
-    <code>mmapped</code> files). Various other models exist and
-    extensive analysis can be found in the <a
-    href="http://www.cs.wustl.edu/~jxh/research/research.html">papers
-    of the JAWS project</a>. In practice all of these costs vary
-    drastically depending on the operating system.</p>
-
-    <p>Apache's core code is already multithread aware, and Apache
-    version 1.3 is multithreaded on NT. There have been at least
-    two other experimental implementations of threaded Apache, one
-    using the 1.3 code base on DCE, and one using a custom
-    user-level threads package and the 1.0 code base; neither is
-    publicly available. There is also an experimental port of
-    Apache 1.3 to <a
-    href="http://www.mozilla.org/docs/refList/refNSPR/">Netscape's
-    Portable Run Time</a>, which <a
-    href="http://www.arctic.org/~dgaudet/apache/2.0/">is
-    available</a> (but you're encouraged to join the <a
-    href="http://dev.apache.org/mailing-lists">new-httpd mailing
-    list</a> if you intend to use it). Part of our redesign for
-    version 2.0 of Apache will include abstractions of the server
-    model so that we can continue to support the pre-forking model,
-    and also support various threaded models. 
-    <!--#include virtual="footer.html" -->
-    </p>
    </body>
  </html>
author	Brian Pane <brianp@apache.org>
	Sun, 16 Jun 2002 22:55:11 +0000 (22:55 +0000)
committer	Brian Pane <brianp@apache.org>
	Sun, 16 Jun 2002 22:55:11 +0000 (22:55 +0000)