From: Marc Slemko Date: Tue, 28 Jan 1997 04:23:08 +0000 (+0000) Subject: Add more information on the FIN_WAIT_2 problem. X-Git-Tag: APACHE_1_2b7~23 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=9ec927d0c6542046bf721d8cca21f724f5d5e6ce;p=apache Add more information on the FIN_WAIT_2 problem. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@77523 13f79535-47bb-0310-9956-ffa450edef68 --- diff --git a/docs/manual/misc/fin_wait_2.html b/docs/manual/misc/fin_wait_2.html new file mode 100644 index 0000000000..d4aa20e478 --- /dev/null +++ b/docs/manual/misc/fin_wait_2.html @@ -0,0 +1,268 @@ + + + +Connections in FIN_WAIT_2 and Apache + + + + + + + +

Connections in the FIN_WAIT_2 state and Apache

What is the FIN_WAIT_2 state?

netstat

TCP RFC

+ +The FIN_WAIT_2 state is somewhat unusual in that there is no timeout +defined in the standard for it. This means that on many operating +systems, a connection in the FIN_WAIT_2 state will stay around until +the system is rebooted. If the system does not have a timeout and +too many FIN_WAIT_2 connections build up, it can fill up the space +allocated for storing information about the connections and crash +the kernel. The connections in FIN_WAIT_2 do not tie up a httpd +process.

+ +

But why does it happen?

+ +

Buggy clients and persistent connections

persistent connections

+KeepAliveTimeout

The buggy client opens a new connection to the same or a different + site, which causes it to fully close the connection. +
The user exits the client which, on some (most?) clients + causes the OS to fully shutdown the connection. +
The FIN_WAIT_2 times out, on servers that have a timeout + for this state. +

+If you are lucky, this means that the buggy client will fully close the +connection and release the resources on your server. However, there +are many cases where things, such as a dialup client disconnecting from +their provider before closing the client, cause it to remain open. +This is a bug in the browser.

+ +The clients on which this problem has been verified to exist:

Mozilla/3.01 (X11; I; FreeBSD 2.1.5-RELEASE i386) +
Mozilla/2.02 (X11; I; FreeBSD 2.1.5-RELEASE i386) +
Mozilla/3.01Gold (X11; I; SunOS 5.5 sun4m) +
MSIE 3.01 on the Macintosh +

+ +It is expected that many other clients have the same problem.

+ +Apache can NOT do anything to avoid this other +than disabling persistent connections for all buggy clients, just +like we recommend doing for Navigator 2.x clients due to other bugs +in Navigator 2.x. As far as we know, this happens with all servers +that support persistent connections including Apache 1.1.x and +1.2.

+ +

Something is broken

lingering_close()

appendix

+ +We have not yet tracked down the exact reason why +lingering_close() causes problems. Its code has been +thoroughly reviewed. It is possible there is some problem in the BSD +TCP stack which is causing this. Unfortunately, we are not able to +easily replicate the problem on test servers so it is difficult to +debug. We are still working on the problem.

+ +

What can I do about it?

Add a timeout for FIN_WAIT_2

FreeBSD versions starting at 2.0 or possibly earlier. +
NetBSD version 1.2(?) +
OpenBSD all versions(?) +
BSD/OS 2.1, with the + + K210-027 patch installed. +
Solaris as of around version + 2.2. The timeout may need tuning by using ndd to + modify tcp_fin_wait_2_flush_interval. +
SCO TCP/IP Release 1.2.1 + can be modified to have a timeout by following + SCO's instructions. +
Linux 2.0.x and + earlier(?) +

+The following systems are known to not have at timeout: +

SunOS 4.x does not and + almost certainly never will have one because it as at the + very end of its development cycle for Sun. If you have kernel + source should be easy to patch. +
IRIX does not have a + timeout and, according to our information, has stated that + they will not add one unless it is specified in the RFC. +

+There is a + +patch available for adding a timeout to the FIN_WAIT_2 state; it +was originally intended for BSD/OS, but should be adaptable to most +systems using BSD networking code. You need kernel source code to be +able to use it. If you do adapt it to work for any other systems, +please drop me a note at marc@apache.org. +

Compile without using `lingering_close()`

lingering_close()

+ +To compile without the lingering_close() function, add +-DNO_LINGCLOSE to the end of the +EXTRA_CFLAGS line in your Configuration file, +rerun Configure and rebuild the server. +

Use `SO_LINGER` as an alternative to +`lingering_close()`

SO_LINGER

setsockopt(2)

lingering_close()

lingering_close

+ +To try it, add -DUSE_SO_LINGER to the end of the +EXTRA_CFLAGS line in your Configuration +file, rerun Configure and rebuild the server.

+ +NOTE: Attempting to use SO_LINGER and +lingering_close() at the same time is very likely to do +very bad things, so don't.

+ +

Increase the amount of memory used for storing connection state

BSD based networking code:

BSD stores network data such as connection +states in something called a mbuf. When you get so many connections +that the kernel does not have enough mbufs to put them all in, your +kernel will likely crash. You can reduce the effects of the problem +by increasing the number of mbufs that are available; this will not +prevent the problem, it will just make the server go longer before +crashing.

+ +The exact way to increase them may depend on your OS; look +for some reference to the number of "mbufs" or "mbuf clusters". On +many systems, this can be done by adding the line +NMBCLUSTERS="n", where n is the number of +mbuf clusters you want to your kernel config file and rebuilding your +kernel.

Feedback

marc@apache.org

+ +

Appendix

+Below is a message from Roy Fielding that details some of the +reasons why some type of function that has the functionality of +lingering_close() is necessary. + +

+Date: Tue, 21 Jan 1997 01:15:38 -0800
+From: "Roy T. Fielding" <fielding@liege.ICS.UCI.EDU>
+Subject: Re: lingering_close() 
+
+Sorry, I thought everyone was up to speed on this problem (and I just
+managed to catch up on my apache mail, finally).  This is noted a couple
+times in the HTTP specs, but most of the discussion was between myself,
+Henrik, rst, and Dave Raggett in the hallways of MIT (which is why it
+doesn't appear in our archives).
+
+If a server closes the input side of the connection while the client
+is sending data (or is planning to send data), then the server's TCP
+stack will signal an RST (reset, not Robert) back to the client.  Upon
+receipt of the RST, the client will flush its own incoming TCP buffer
+back to the un-ACKed packet indicated by the RST packet argument.
+If the server has sent a message, usually an error response, to the
+client just before the close, and the client receives the RST packet
+before its application code has read the error message from its incoming
+TCP buffer, then the RST will flush the error message before the client
+application has a chance to see it, and thus the client is left thinking
+that the connection failed for no apparent reason.
+
+There are two conditions under which this is likely to occur:
+  1) sending POST or PUT data without proper authorization
+  2) sending multiple requests before each response (pipelining) 
+     and one of the middle requests resulting in an error or
+     other break-the-connection result.
+
+The solution in all cases is to send the response, close only the
+write half of the connection (what shutdown is supposed to do), and
+continue reading on the socket until it is either closed by the
+client (signifying it has finally read the response) or a timeout occurs.
+That is what the kernel is supposed to do if SO_LINGER is set.
+Unfortunately, SO_LINGER has no effect on some systems; on some other
+systems, it does not have its own timeout and thus the TCP memory
+segments just pile-up until the next reboot (planned or not).
+
+That is why rst coded-up a linger replacement.  As I recall, he said at
+the time that it needed further testing, which we never got around to
+doing.  From the descriptions I have read, it sounds like the lingering
+close code is doing something wrong when it is timed-out, since that
+is what happens if a client does not close its connection.
+
+Please note that simply removing the linger code will not solve the
+problem -- it only moves it to a different and much harder to detect one.
+
+.....Roy
+

+ + + diff --git a/docs/manual/platform/perf-bsd44.html b/docs/manual/platform/perf-bsd44.html index c22c982326..495579eca3 100644 --- a/docs/manual/platform/perf-bsd44.html +++ b/docs/manual/platform/perf-bsd44.html @@ -126,9 +126,8 @@ connection ends up in the TIME_WAIT state for several minutes, during which time its mbufs are not yet freed. Another reason is that, on server timeouts, some connections end up in FIN_WAIT_2 state forever, because this state doesn't time out on the server, and the browser never sent -a final FIN. An example patch for BSDI is available - -here. +a final FIN. For more details see the +FIN_WAIT_2 page.

Connections in the FIN_WAIT_2 state and Apache

What is the FIN_WAIT_2 state?

But why does it happen?

Buggy clients and persistent connections

Something is broken

What can I do about it?

Add a timeout for FIN_WAIT_2

Compile without using lingering_close()

Use SO_LINGER as an alternative to +lingering_close()

Increase the amount of memory used for storing connection state

Feedback

Appendix

Compile without using `lingering_close()`

Use `SO_LINGER` as an alternative to +`lingering_close()`