granicus.if.org Git - apache/blob - docs/manual/developer/filters.xml

   1 <?xml version="1.0" encoding="UTF-8" ?>
   2 <!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
   3 <?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
   4 <!-- $LastChangedRevision$ -->
   5
   6 <!--
   7  Licensed to the Apache Software Foundation (ASF) under one or more
   8  contributor license agreements.  See the NOTICE file distributed with
   9  this work for additional information regarding copyright ownership.
  10  The ASF licenses this file to You under the Apache License, Version 2.0
  11  (the "License"); you may not use this file except in compliance with
  12  the License.  You may obtain a copy of the License at
  13
  14      http://www.apache.org/licenses/LICENSE-2.0
  15
  16  Unless required by applicable law or agreed to in writing, software
  17  distributed under the License is distributed on an "AS IS" BASIS,
  18  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  19  See the License for the specific language governing permissions and
  20  limitations under the License.
  21 -->
  22
  23 <manualpage metafile="filters.xml.meta">
  24 <parentdocument href="./">Developer Documentation</parentdocument>
  25
  26 <title>How filters work in Apache 2.0</title>
  27
  28 <summary>
  29     <note type="warning"><title>Warning</title>
  30       <p>This is a cut 'n paste job from an email
  31       (&lt;022501c1c529$f63a9550$7f00000a@KOJ&gt;) and only reformatted for
  32       better readability. It's not up to date but may be a good start for
  33       further research.</p>
  34     </note>
  35 </summary>
  36
  37 <section id="types"><title>Filter Types</title>
  38     <p>There are three basic filter types (each of these is actually broken
  39     down into two categories, but that comes later).</p>
  40
  41     <dl>
  42     <dt><code>CONNECTION</code></dt>
  43     <dd>Filters of this type are valid for the lifetime of this connection.
  44     (<code>AP_FTYPE_CONNECTION</code>, <code>AP_FTYPE_NETWORK</code>)</dd>
  45
  46     <dt><code>PROTOCOL</code></dt>
  47     <dd>Filters of this type are valid for the lifetime of this request from
  48     the point of view of the client, this means that the request is valid
  49     from the time that the request is sent until the time that the response
  50     is received. (<code>AP_FTYPE_PROTOCOL</code>,
  51     <code>AP_FTYPE_TRANSCODE</code>)</dd>
  52
  53     <dt><code>RESOURCE</code></dt>
  54     <dd>Filters of this type are valid for the time that this content is used
  55     to satisfy a request.  For simple requests, this is identical to
  56     <code>PROTOCOL</code>, but internal redirects and sub-requests can change
  57     the content without ending the request. (<code>AP_FTYPE_RESOURCE</code>,
  58     <code>AP_FTYPE_CONTENT_SET</code>)</dd>
  59     </dl>
  60
  61     <p>It is important to make the distinction between a protocol and a
  62     resource filter.  A resource filter is tied to a specific resource, it
  63     may also be tied to header information, but the main binding is to a
  64     resource.  If you are writing a filter and you want to know if it is
  65     resource or protocol, the correct question to ask is:  "Can this filter
  66     be removed if the request is redirected to a different resource?"  If
  67     the answer is yes, then it is a resource filter.  If it is no, then it
  68     is most likely a protocol or connection filter.  I won't go into
  69     connection filters, because they seem to be well understood. With this
  70     definition, a few examples might help:</p>
  71
  72     <dl>
  73     <dt>Byterange</dt>
  74     <dd>We have coded it to be inserted for all requests, and it is removed
  75     if not used.  Because this filter is active at the beginning of all
  76     requests, it can not be removed if it is redirected, so this is a
  77     protocol filter.</dd>
  78
  79     <dt>http_header</dt>
  80     <dd>This filter actually writes the headers to the network.  This is
  81     obviously a required filter (except in the asis case which is special
  82     and will be dealt with below) and so it is a protocol filter.</dd>
  83
  84     <dt>Deflate</dt>
  85     <dd>The administrator configures this filter based on which file has been
  86     requested.  If we do an internal redirect from an autoindex page to an
  87     index.html page, the deflate filter may be added or removed based on
  88     config, so this is a resource filter.</dd>
  89     </dl>
  90
  91     <p>The further breakdown of each category into two more filter types is
  92     strictly for ordering.  We could remove it, and only allow for one
  93     filter type, but the order would tend to be wrong, and we would need to
  94     hack things to make it work.  Currently, the <code>RESOURCE</code> filters
  95     only have one filter type, but that should change.</p>
  96 </section>
  97
  98 <section id="howinserted"><title>How are filters inserted?</title>
  99     <p>This is actually rather simple in theory, but the code is
 100     complex.  First of all, it is important that everybody realize that
 101     there are three filter lists for each request, but they are all
 102     concatenated together.  So, the first list is
 103     <code>r->output_filters</code>, then <code>r->proto_output_filters</code>,
 104     and finally <code>r->connection->output_filters</code>. These correspond
 105     to the <code>RESOURCE</code>, <code>PROTOCOL</code>, and
 106     <code>CONNECTION</code> filters respectively. The problem previously, was
 107     that we used a singly linked list to create the filter stack, and we
 108     started from the "correct" location.  This means that if I had a
 109     <code>RESOURCE</code> filter on the stack, and I added a
 110     <code>CONNECTION</code> filter, the <code>CONNECTION</code> filter would
 111     be ignored. This should make sense, because we would insert the connection
 112     filter at the top of the <code>c->output_filters</code> list, but the end
 113     of <code>r->output_filters</code> pointed to the filter that used to be
 114     at the front of <code>c->output_filters</code>. This is obviously wrong.
 115     The new insertion code uses a doubly linked list. This has the advantage
 116     that we never lose a filter that has been inserted. Unfortunately, it comes
 117     with a separate set of headaches.</p>
 118
 119     <p>The problem is that we have two different cases were we use subrequests.
 120     The first is to insert more data into a response. The second is to
 121     replace the existing response with an internal redirect. These are two
 122     different cases and need to be treated as such.</p>
 123
 124     <p>In the first case, we are creating the subrequest from within a handler
 125     or filter.  This means that the next filter should be passed to
 126     <code>make_sub_request</code> function, and the last resource filter in the
 127     sub-request will point to the next filter in the main request.  This
 128     makes sense, because the sub-request's data needs to flow through the
 129     same set of filters as the main request.  A graphical representation
 130     might help:</p>
 131
 132 <example>
 133 <pre>
 134 Default_handler --> includes_filter --> byterange --> ...
 135 </pre>
 136 </example>
 137
 138     <p>If the includes filter creates a sub request, then we don't want the
 139     data from that sub-request to go through the includes filter, because it
 140     might not be SSI data.  So, the subrequest adds the following:</p>
 141
 142 <example>
 143 <pre>
 144 Default_handler --> includes_filter -/-> byterange --> ...
 145                                     /
 146 Default_handler --> sub_request_core
 147 </pre>
 148 </example>
 149
 150     <p>What happens if the subrequest is SSI data?  Well, that's easy, the
 151     <code>includes_filter</code> is a resource filter, so it will be added to
 152     the sub request in between the <code>Default_handler</code> and the
 153     <code>sub_request_core</code> filter.</p>
 154
 155     <p>The second case for sub-requests is when one sub-request is going to
 156     become the real request.  This happens whenever a sub-request is created
 157     outside of a handler or filter, and NULL is passed as the next filter to
 158     the <code>make_sub_request</code> function.</p>
 159
 160     <p>In this case, the resource filters no longer make sense for the new
 161     request, because the resource has changed.  So, instead of starting from
 162     scratch, we simply point the front of the resource filters for the
 163     sub-request to the front of the protocol filters for the old request.
 164     This means that we won't lose any of the protocol filters, neither will
 165     we try to send this data through a filter that shouldn't see it.</p>
 166
 167     <p>The problem is that we are using a doubly-linked list for our filter
 168     stacks now. But, you should notice that it is possible for two lists to
 169     intersect in this model.  So, you do you handle the previous pointer?
 170     This is a very difficult question to answer, because there is no "right"
 171     answer, either method is equally valid.  I looked at why we use the
 172     previous pointer.  The only reason for it is to allow for easier
 173     addition of new servers.  With that being said, the solution I chose was
 174     to make the previous pointer always stay on the original request.</p>
 175
 176     <p>This causes some more complex logic, but it works for all cases.  My
 177     concern in having it move to the sub-request, is that for the more
 178     common case (where a sub-request is used to add data to a response), the
 179     main filter chain would be wrong.  That didn't seem like a good idea to
 180     me.</p>
 181 </section>
 182
 183 <section id="asis"><title>Asis</title>
 184     <p>The final topic.  :-)  Mod_Asis is a bit of a hack, but the
 185     handler needs to remove all filters except for connection filters, and
 186     send the data.  If you are using <module>mod_asis</module>, all other
 187     bets are off.</p>
 188 </section>
 189
 190 <section id="conclusion"><title>Explanations</title>
 191     <p>The absolutely last point is that the reason this code was so hard to
 192     get right, was because we had hacked so much to force it to work.  I
 193     wrote most of the hacks originally, so I am very much to blame.
 194     However, now that the code is right, I have started to remove some
 195     hacks.  Most people should have seen that the <code>reset_filters</code>
 196     and <code>add_required_filters</code> functions are gone.  Those inserted
 197     protocol level filters for error conditions, in fact, both functions did
 198     the same thing, one after the other, it was really strange. Because we
 199     don't lose protocol filters for error cases any more, those hacks went away.
 200     The <code>HTTP_HEADER</code>, <code>Content-length</code>, and
 201     <code>Byterange</code> filters are all added in the
 202     <code>insert_filters</code> phase, because if they were added earlier, we
 203     had some interesting interactions.  Now, those could all be moved to be
 204     inserted with the <code>HTTP_IN</code>, <code>CORE</code>, and
 205     <code>CORE_IN</code> filters.  That would make the code easier to
 206     follow.</p>
 207 </section>
 208 </manualpage>
 209