From: Joe Orton There are a number of common pitfalls encountered when writing
+ output filters; this page aims to document best practice for
+ authors of new or existing filters. This document is applicable to both version 2.0 and version 2.2
+ of the Apache HTTP Server; it specifically targets
+ Each time a filter is invoked, it is passed a bucket
+ brigade, containing a sequence of buckets which
+ represent both data content and metadata. Every bucket has a
+ bucket type; a number of bucket types are defined and
+ used by the A filter can tell whether a bucket represents either data or
+ metadata using the There are two metadata bucket types which all filters must pay
+ attention to: the Filters can create This shows a bucket brigade which may be passed to a filter; it
+ contains two metadata buckets ( For any given request, an output filter might be invoked only
+ once and be given a single brigade representing the entire response.
+ It is also possible that the number of times a filter is invoked
+ for single response is proportional to the size of the content
+ being filtered, with the filter being passed a brigade containing
+ a single bucket each time. Filters must operate correctly in
+ either case. An output filter can distinguish the final invocation for a
+ given response by the presence of an An output filter should never pass an empty brigade up the
+ filter chain. But, for good defensive programming, filters should
+ be prepared to accept an empty brigade, and do nothing. A bucket brigade is a doubly-linked list of buckets. The list
+ is terminated (at both ends) by a sentinel which can be
+ distinguished from a normal bucket by comparing it with the
+ pointer returned by There are a variety of functions and macros for traversing and
+ manipulating bucket brigades; see the apr_bucket.h
+ header for complete coverage. Commonly used macros include:
+
+ RESOURCE
-level or CONTENT_SET
-level
+ filters though some advice is generic to all types of filter.httpd
core modules (and the
+ apr-util
library which provides the bucket brigade
+ interface), but modules are free to define their own types.APR_BUCKET_IS_METADATA
macro.
+ Generally, all metadata buckets should be passed up the filter
+ chain by an output filter. Filters may transform, delete, and
+ insert data buckets as appropriate.EOS
bucket type, and the
+ FLUSH
bucket type. An EOS
bucket
+ indicates that the end of the response has been reached and no
+ further buckets need be processed. A FLUSH
bucket
+ indicates that the filter should flush any buffered buckets (if
+ applicable) down the filter chain immediately.FLUSH
buckets are sent when the
+ content generator (or a downstream filter) knows that there may be
+ a delay before more content can be sent. By passing
+ FLUSH
buckets up the filter chain immediately,
+ filters ensure that the client is not kept waiting for pending
+ data longer than necessary.FLUSH
buckets and pass these up
+ the filter chain if desired. Generating FLUSH
+ buckets unnecessarily, or too frequently, can harm network
+ utilisation since it may force large numbers of small packets to
+ be sent, rather than a small number of larger packets. The
+ section on Non-blocking bucket reads
+ covers a case where filters are encouraged to generate
+ FLUSH
buckets.HEAP FLUSH FILE EOS
FLUSH
and
+ EOS
), and two data buckets (HEAP
and
+ FILE
).EOS
bucket in
+ the brigade. Any buckets in the brigade after an EOS should be
+ ignored.apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb)
+{
+ if (APR_BRIGADE_EMPTY(bb)) {
+ return APR_SUCCESS;
+ }
+ ....
APR_BRIGADE_SENTINEL
. The list
+ sentinel is in fact not a valid bucket structure; any attempt to
+ call normal bucket functions (such as
+ apr_bucket_read
) on the sentinel will have undefined
+ behaviour (i.e. will crash the process).
+
APR_BRIGADE_FIRST(bb)
APR_BRIGADE_LAST(bb)
APR_BUCKET_NEXT(e)
APR_BUCKET_PREV(e)
The apr_bucket_brigade
structure itself is
+ allocated out of a pool, so if a filter creates a new brigade, it
+ must ensure that memory use is correctly bounded. A filter which
+ allocates a new brigade out of the request pool
+ (r->pool
) on every invocation, for example, will fall
+ foul of the warning above concerning
+ memory use. Such a filter should instead create a brigade on the
+ first invocation per request, and store that brigade in its state structure.
apr_brigade_destroy
to "destroy" a brigade. The
+ memory used by the brigade structure will not be released by
+ calling this function (since it comes from a pool), but the
+ associated pool cleanup is unregistered. Using
+ apr_brigade_destroy
can in fact cause memory leaks;
+ if a "destroyed" brigade contains still contains buckets when its
+ containing pool is destroyed, those buckets will not be
+ immediately destroyed.When dealing with non-metadata buckets, it is important to
+ understand that the "apr_bucket *
" object is an
+ abstract representation of data:
+
+
->length
field is set to
+ the value (apr_size_t)-1
. The PIPE
+ bucket type is an example of a bucket type has an indeterminate
+ length; it represents the output from a pipe, .FILE
bucket type, for example,
+ represents data stored in a file on disk.apr_bucket_read
function. When this function is
+ invoked, the bucket may morph into a different bucket
+ type, and may also insert a new bucket into the bucket brigade.
+ This must happen for buckets which represent data not mapped into
+ memory.
+
+ To give an example; consider a bucket brigade containing a
+ single FILE
bucket representing an entire file, 24
+ kilobytes in size:
FILE(0K-24K)
When this bucket is read, it will read a block of data from the
+ file, morph into a HEAP
bucket to represent that
+ data, and return the data to the caller. It also inserts a new
+ FILE
bucket representing the remainder of the file;
+ after the apr_bucket_read
call, the brigade looks
+ like:
HEAP(8K) FILE(8K-24K)
The basic function of any output filter will be to iterate + through the passed-in brigade and transform (or simply examine) + the content in some manner. The implementation of the iteration + loop is critical to producing a well-behaved output filter.
+ +Taking an example which loops through the entire brigade as
+ follows:
+
+ apr_bucket *e = APR_BRIGADE_FIRST(bb);
+const char *data;
+apr_size_t len;
+
+while (e != APR_BRIGADE_SENTINEL(bb)) {
+ apr_bucket_read(e, &data, &length, APR_BLOCK_READ);
+ e = APR_BUCKET_NEXT(e);
+}
+
+return ap_pass_brigade(bb);
FILE
bucket, for example,
+ the entire file contents would be read into memory as each
+ apr_bucket_read
call morphed a FILE
+ bucket into a HEAP
bucket.
In contrast, the implementation below will use consume a fixed + amount of memory to filter any brigade; a temporary brigade is + needed and must be allocated only once per response, see the Maintaining state section.
+ +apr_bucket *e; +const char *data; +apr_size_t len; + +while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) { + rv = apr_bucket_read(e, &data, &length, APR_BLOCK_READ); + if (rv) ...; + /* Remove bucket e from bb. */ + APR_BUCKET_REMOVE(e); + /* Insert it into temporary brigade. */ + APR_BRIGADE_INSERT_HEAD(tmpbb); + /* Pass brigade upstream. */ + rv = ap_pass_brigade(f->next, tmpbb); + if (rv) ...; + apr_brigade_cleanup(tmpbb); +}
A filter which needs to maintain state over multiple
+ invocations per response can use the ->ctx
field of
+ its ap_filter_t
structure. It is typical to store a
+ temporary brigade in such a structure, to avoid having to allocate
+ a new brigade per invocation as described in the Brigade structure section.
struct dummy_state { + apr_bucket_brigade *tmpbb; + int filter_state; + .... +}; + +apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb) +{ + struct dummy_state *state; + + state = f->ctx; + if (state == NULL) { + /* First invocation for this response: initialise state structure. */ + f->ctx = state = apr_palloc(sizeof *state, f->r->pool); + + state->tmpbb = apr_brigade_create(f->r->pool, f->c->bucket_alloc); + state->filter_state = ...; + } + ...
If a filter decides to store buckets beyond the duration of a
+ single filter function invocation (for example storing them in its
+ ->ctx
state structure), those buckets must be set
+ aside. This is necessary because some bucket types provide
+ buckets which represent temporary resources (such as stack memory)
+ which will fall out of scope as soon as the filter chain completes
+ processing the brigade.
To setaside a bucket, the apr_bucket_setaside
+ function can be called. Not all bucket types can be setaside, but
+ if successful, the bucket will have morphed to ensure it has a
+ lifetime at least as long as the pool given as an argument to the
+ apr_bucket_setaside
function.
Alternatively, the ap_save_brigade
function can be
+ used, which will create a new brigade containing buckets with a
+ lifetime as long as the given pool argument. This function must
+ be used with great care, however: on return it guarantees that all
+ the buckets in the returned brigade will represent data mapped
+ into memory. If given an input brigade containing, for example, a
+ PIPE bucket, ap_save_brigade
will consume an
+ arbitrary amount of memory to store the entire output of the
+ pipe.
The apr_bucket_read
function takes an
+ apr_read_type_e
argument which determines whether a
+ blocking or non-blocking read will be performed
+ from the data source. A good filter will first attempt to read
+ from every data bucket using a non-blocking read; if that fails
+ with APR_EAGAIN
, then send a FLUSH
+ bucket up the filter chain, and retry using a blocking read.
This mode of operation ensure that any filters further up the + filter chain will flush any buffered buckets if a slow content + source is being used.
+ +A CGI script is an example of a slow content source which is
+ implemented as a bucket type. PIPE
buckets which represent the output from a CGI
+ script; reading from such a bucket will block when waiting for the
+ CGI script to produce more output.
apr_bucket *e; +apr_read_type_e mode = APR_NONBLOCK_READ; + +while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) { + apr_status_t rv; + + rv = apr_bucket_read(e, &data, &length, mode); + if (rv == APR_EAGAIN && mode == APR_NONBLOCK_READ) { + /* Pass up a brigade containing a flush bucket: */ + APR_BRIGADE_INSERT_TAIL(tmpbb, apr_bucket_flush_create(...)); + rv = ap_pass_brigade(f->next, tmpbb); + apr_brigade_cleanup(tmpbb); + if (rv != APR_SUCCESS) return rv; + + /* Retry, using a blocking read. */ + mode = APR_BLOCK_READ; + continue; + } else if (rv != APR_SUCCESS) { + /* handle errors */ + } + + /* Next time, try a non-blocking read first. */ + mode = APR_NONBLOCK_READ; + ... +}
In summary, here is a set of rules for all output filters to + follow:
+ +FLUSH
buckets should be respected by passing
+ any pending or buffered buckets up the filter chain.EOS
bucket.ap_pass_brigade
to pass a brigade
+ up the filter chain, output filters should call
+ apr_brigade_clear
to ensure the brigade is empty
+ before reusing that brigade structure; output filters should
+ never use apr_brigade_destroy
to "destroy"
+ brigades.ap_pass_brigade
, and must return appropriate errors
+ back down the filter chain.FLUSH
bucket up the
+ filter chain if the read blocks, before retrying with a blocking
+ read.