From: Sander Striker Date: Mon, 11 Mar 2002 12:03:44 +0000 (+0000) Subject: Add the "How filters work" section. Cut 'n paste job from Ryans X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=ba0b35828e1ed222efd08b5b9ebab1416f97ad55;p=apache Add the "How filters work" section. Cut 'n paste job from Ryans email <022501c1c529$f63a9550$7f00000a@KOJ>, needs formatting. Submitted by: Ryan Bloom git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@93839 13f79535-47bb-0310-9956-ffa450edef68 --- diff --git a/docs/manual/developer/filters.html b/docs/manual/developer/filters.html new file mode 100644 index 0000000000..b8ac610025 --- /dev/null +++ b/docs/manual/developer/filters.html @@ -0,0 +1,168 @@ + + + + + + + Request Processing in Apache 2.0 + + + + + + +

How filters work in Apache 2.0

+ +

Warning - this is a cut 'n paste job from an email: + <022501c1c529$f63a9550$7f00000a@KOJ>

+ +
+There are three basic filter types (each of these is actually broken
+down into two categories, but that comes later).
+
+CONNECTION:  Filters of this type are valid for the lifetime of this
+             connection.
+
+PROTOCOL:    Filters of this type are valid for the lifetime of this
+             request from the point of view of the client, this means
+             that the request is valid from the time that the request
+             is sent until the time that the response is received.
+
+RESOURCE:    Filters of this type are valid for the time that this
+             content is used to satisfy a request.  For simple
+             requests, this is identical to PROTOCOL, but internal redirects
+             and sub-requests can change the content without ending
+             the request.
+
+It is important to make the distinction between a protocol and a
+resource filter.  A resource filter is tied to a specific resource, it
+may also be tied to header information, but the main binding is to a
+resource.  If you are writing a filter and you want to know if it is
+resource or protocol, the correct question to ask is:  "Can this filter
+be removed if the request is redirected to a different resource?"  If
+the answer is yes, then it is a resource filter.  If it is no, then it
+is most likely a protocol or connection filter.  I won't go into
+connection filters, because they seem to be well understood.
+
+With this definition, a few examples might help:
+Byterange:  We have coded it to be inserted for all
+requests, and it is removed if not used.  Because this filter is active
+at the beginning of all requests, it can not be removed if it is
+redirected, so this is a protocol filter.
+
+http_header:  This filter actually writes the headers to the
+network.  This is obviously a required filter (except in the asis case
+which is special and will be dealt with below) and so it is a protocol
+filter.
+
+Deflate:  The administrator configures this filter based on
+which file has been requested.  If we do an internal redirect from an
+autoindex page to an index.html page, the deflate filter may be added or
+removed based on config, so this is a resource filter.
+
+The further breakdown of each category into two more filter types is
+strictly for ordering.  We could remove it, and only allow for one
+filter type, but the order would tend to be wrong, and we would need to
+hack things to make it work.  Currently, the RESOURCE filters only have
+one filter type, but that should change.
+
+How are filters inserted?
+This is actually rather simple in theory, but the code is
+complex.  First of all, it is important that everybody realize that
+there are three filter lists for each request, but they are all
+concatenated together.  So, the first list is r->output_filters, then
+r->proto_output_filters, and finally r->connection->output_filters.
+These correspond to the RESOURCE, PROTOCOL, and CONNECTION filters
+respectively.  The problem previously, was that we used a singly linked
+list to create the filter stack, and we started from the "correct"
+location.  This means that if I had a RESOURCE filter on the stack, and
+I added a CONNECTION filter, the CONNECTION filter would be ignored.
+This should make sense, because we would insert the connection filter at
+the top of the c->output_filters list, but the end of r->output_filters
+pointed to the filter that used to be at the front of c->output_filters.
+This is obviously wrong.  The new insertion code uses a doubly linked
+list.  This has the advantage that we never lose a filter that has been
+inserted.  Unfortunately, it comes with a separate set of headaches.
+
+The problem is that we have two different cases were we use subrequests.
+The first is to insert more data into a response.  The second is to
+replace the existing response with an internal redirect.  These are two
+different cases and need to be treated as such.
+
+In the first case, we are creating the subrequest from within a handler
+or filter.  This means that the next filter should be passed to
+make_sub_request function, and the last resource filter in the
+sub-request will point to the next filter in the main request.  This
+makes sense, because the sub-request's data needs to flow through the
+same set of filters as the main request.  A graphical representation
+might help:
+
+Default_handler --> includes_filter --> byterange --> content_length ->
+etc
+
+If the includes filter creates a sub request, then we don't want the
+data from that sub-request to go through the includes filter, because it
+might not be SSI data.  So, the subrequest adds the following:
+
+Default_handler --> includes_filter -/-> byterange --> content_length -> etc
+                                    /
+Default_handler --> sub_request_core
+
+What happens if the subrequest is SSI data?  Well, that's easy, the
+includes_filter is a resource filter, so it will be added to the sub
+request in between the Default_handler and the sub_request_core filter.
+
+The second case for sub-requests is when one sub-request is going to
+become the real request.  This happens whenever a sub-request is created
+outside of a handler or filter, and NULL is passed as the next filter to
+the make_sub_request function.
+
+In this case, the resource filters no longer make sense for the new
+request, because the resource has changed.  So, instead of starting from
+scratch, we simply point the front of the resource filters for the
+sub-request to the front of the protocol filters for the old request.
+This means that we won't lose any of the protocol filters, neither will
+we try to send this data through a filter that shouldn't see it.
+
+The problem is that we are using a doubly-linked list for our filter
+stacks now. But, you should notice that it is possible for two lists to
+intersect in this model.  So, you do you handle the previous pointer?
+This is a very difficult question to answer, because there is no "right"
+answer, either method is equally valid.  I looked at why we use the
+previous pointer.  The only reason for it is to allow for easier
+addition of new servers.  With that being said, the solution I chose was
+to make the previous pointer always stay on the original request.
+
+This causes some more complex logic, but it works for all cases.  My
+concern in having it move to the sub-request, is that for the more
+common case (where a sub-request is used to add data to a response), the
+main filter chain would be wrong.  That didn't seem like a good idea to
+me.
+
+asis:
+The final topic.  :-)  Mod_Asis is a bit of a hack, but the
+handler needs to remove all filters except for connection filters, and
+send the data.  If you are using mod_asis, all other bets are off.
+
+The absolutely last point is that the reason this code was so hard to
+get right, was because we had hacked so much to force it to work.  I
+wrote most of the hacks originally, so I am very much to blame.
+However, now that the code is right, I have started to remove some
+hacks.  Most people should have seen that the reset_filters and
+add_required_filters functions are gone.  Those inserted protocol level
+filters for error conditions, in fact, both functions did the same
+thing, one after the other, it was really strange.  Because we don't
+lose protocol filters for error cases any more, those hacks went away.
+The HTTP_HEADER, Content-length, and Byterange filters are all added in
+the insert_filters phase, because if they were added earlier, we had
+some interesting interactions.  Now, those could all be moved to be
+inserted with the HTTP_IN, CORE, and CORE_IN filters.  That would make
+the code easier to follow.
+
+ + + + + diff --git a/docs/manual/developer/index.html b/docs/manual/developer/index.html index c825f96948..f0555ecb8e 100644 --- a/docs/manual/developer/index.html +++ b/docs/manual/developer/index.html @@ -39,6 +39,8 @@ +
Apache Filters
+
Porting Apache 1.3 Modules