From 65acc78a15dc895f8167b613d31d1566ae738892 Mon Sep 17 00:00:00 2001 From: Joe Orton Date: Mon, 19 Mar 2007 14:15:07 +0000 Subject: [PATCH] - add initial version of output filters guide (as sent to dev@) git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@519952 13f79535-47bb-0310-9956-ffa450edef68 --- docs/manual/developer/output-filters.xml | 458 ++++++++++++++++++ docs/manual/developer/output-filters.xml.meta | 11 + 2 files changed, 469 insertions(+) create mode 100644 docs/manual/developer/output-filters.xml create mode 100644 docs/manual/developer/output-filters.xml.meta diff --git a/docs/manual/developer/output-filters.xml b/docs/manual/developer/output-filters.xml new file mode 100644 index 0000000000..892d98d98d --- /dev/null +++ b/docs/manual/developer/output-filters.xml @@ -0,0 +1,458 @@ + + + + + + + + Developer Documentation + + Guide to writing output filters + + +

There are a number of common pitfalls encountered when writing + output filters; this page aims to document best practice for + authors of new or existing filters.

+ +

This document is applicable to both version 2.0 and version 2.2 + of the Apache HTTP Server; it specifically targets + RESOURCE-level or CONTENT_SET-level + filters though some advice is generic to all types of filter.

+
+ +
+ Filters and bucket brigades + +

Each time a filter is invoked, it is passed a bucket + brigade, containing a sequence of buckets which + represent both data content and metadata. Every bucket has a + bucket type; a number of bucket types are defined and + used by the httpd core modules (and the + apr-util library which provides the bucket brigade + interface), but modules are free to define their own types.

+ + Output filters must be prepared to process + buckets of non-standard types; with a few exceptions, a filter + need not care about the types of buckets being filtered. + +

A filter can tell whether a bucket represents either data or + metadata using the APR_BUCKET_IS_METADATA macro. + Generally, all metadata buckets should be passed up the filter + chain by an output filter. Filters may transform, delete, and + insert data buckets as appropriate.

+ +

There are two metadata bucket types which all filters must pay + attention to: the EOS bucket type, and the + FLUSH bucket type. An EOS bucket + indicates that the end of the response has been reached and no + further buckets need be processed. A FLUSH bucket + indicates that the filter should flush any buffered buckets (if + applicable) down the filter chain immediately.

+ + FLUSH buckets are sent when the + content generator (or a downstream filter) knows that there may be + a delay before more content can be sent. By passing + FLUSH buckets up the filter chain immediately, + filters ensure that the client is not kept waiting for pending + data longer than necessary. + +

Filters can create FLUSH buckets and pass these up + the filter chain if desired. Generating FLUSH + buckets unnecessarily, or too frequently, can harm network + utilisation since it may force large numbers of small packets to + be sent, rather than a small number of larger packets. The + section on Non-blocking bucket reads + covers a case where filters are encouraged to generate + FLUSH buckets.

+ + Example bucket brigade +
HEAP FLUSH FILE EOS
+ +

This shows a bucket brigade which may be passed to a filter; it + contains two metadata buckets (FLUSH and + EOS), and two data buckets (HEAP and + FILE).

+ +
+ +
+ Filter invocation + +

For any given request, an output filter might be invoked only + once and be given a single brigade representing the entire response. + It is also possible that the number of times a filter is invoked + for single response is proportional to the size of the content + being filtered, with the filter being passed a brigade containing + a single bucket each time. Filters must operate correctly in + either case.

+ + An output filter which allocates long-lived + memory every time it is invoked may consume memory proportional to + response size. Output filters which need to allocate memory + should do so once per response; see Maintaining + state below. + +

An output filter can distinguish the final invocation for a + given response by the presence of an EOS bucket in + the brigade. Any buckets in the brigade after an EOS should be + ignored.

+ +

An output filter should never pass an empty brigade up the + filter chain. But, for good defensive programming, filters should + be prepared to accept an empty brigade, and do nothing.

+ + How to handle an empty brigade + +
apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb)
+{
+    if (APR_BRIGADE_EMPTY(bb)) {
+        return APR_SUCCESS;
+    }
+    ....
+ +
+ +
+ Brigade structure + +

A bucket brigade is a doubly-linked list of buckets. The list + is terminated (at both ends) by a sentinel which can be + distinguished from a normal bucket by comparing it with the + pointer returned by APR_BRIGADE_SENTINEL. The list + sentinel is in fact not a valid bucket structure; any attempt to + call normal bucket functions (such as + apr_bucket_read) on the sentinel will have undefined + behaviour (i.e. will crash the process).

+ +

There are a variety of functions and macros for traversing and + manipulating bucket brigades; see the apr_bucket.h + header for complete coverage. Commonly used macros include: + +

+
APR_BRIGADE_FIRST(bb)
+
returns the first bucket in brigade bb
+ +
APR_BRIGADE_LAST(bb)
+
returns the last bucket in brigade bb
+ +
APR_BUCKET_NEXT(e)
+
gives the next bucket after bucket e
+ +
APR_BUCKET_PREV(e)
+
gives the bucket before bucket e
+ +

+ +

The apr_bucket_brigade structure itself is + allocated out of a pool, so if a filter creates a new brigade, it + must ensure that memory use is correctly bounded. A filter which + allocates a new brigade out of the request pool + (r->pool) on every invocation, for example, will fall + foul of the warning above concerning + memory use. Such a filter should instead create a brigade on the + first invocation per request, and store that brigade in its state structure.

+ + It is generally never advisable to use + apr_brigade_destroy to "destroy" a brigade. The + memory used by the brigade structure will not be released by + calling this function (since it comes from a pool), but the + associated pool cleanup is unregistered. Using + apr_brigade_destroy can in fact cause memory leaks; + if a "destroyed" brigade contains still contains buckets when its + containing pool is destroyed, those buckets will not be + immediately destroyed. + +
+ +
+ + Processing buckets + +

When dealing with non-metadata buckets, it is important to + understand that the "apr_bucket *" object is an + abstract representation of data: + +

    +
  1. The amount of data represented by the bucket may or may not + have a determinate length; for a bucket which represents data of + indeterminate length, the ->length field is set to + the value (apr_size_t)-1. The PIPE + bucket type is an example of a bucket type has an indeterminate + length; it represents the output from a pipe, .
  2. + +
  3. The data represented by a bucket may or may not be mapped + into memory. The FILE bucket type, for example, + represents data stored in a file on disk.
  4. +
+ + Filters read the data from a bucket using the + apr_bucket_read function. When this function is + invoked, the bucket may morph into a different bucket + type, and may also insert a new bucket into the bucket brigade. + This must happen for buckets which represent data not mapped into + memory.

+ +

To give an example; consider a bucket brigade containing a + single FILE bucket representing an entire file, 24 + kilobytes in size:

+ +
FILE(0K-24K)
+ +

When this bucket is read, it will read a block of data from the + file, morph into a HEAP bucket to represent that + data, and return the data to the caller. It also inserts a new + FILE bucket representing the remainder of the file; + after the apr_bucket_read call, the brigade looks + like:

+ +
HEAP(8K) FILE(8K-24K)
+ +
+ +
+ Filtering brigades + +

The basic function of any output filter will be to iterate + through the passed-in brigade and transform (or simply examine) + the content in some manner. The implementation of the iteration + loop is critical to producing a well-behaved output filter.

+ +

Taking an example which loops through the entire brigade as + follows: + + Bad output filter -- do not imitate! +

apr_bucket *e = APR_BRIGADE_FIRST(bb);
+const char *data;
+apr_size_t len;
+
+while (e != APR_BRIGADE_SENTINEL(bb)) {
+   apr_bucket_read(e, &data, &length, APR_BLOCK_READ);
+   e = APR_BUCKET_NEXT(e);
+}
+
+return ap_pass_brigade(bb);
+ + The above implementation would consume memory proportional to + content size. If passed a FILE bucket, for example, + the entire file contents would be read into memory as each + apr_bucket_read call morphed a FILE + bucket into a HEAP bucket.

+ +

In contrast, the implementation below will use consume a fixed + amount of memory to filter any brigade; a temporary brigade is + needed and must be allocated only once per response, see the Maintaining state section.

+ + Better output filter + +
apr_bucket *e;
+const char *data;
+apr_size_t len;
+
+while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
+   rv = apr_bucket_read(e, &data, &length, APR_BLOCK_READ);
+   if (rv) ...;
+   /* Remove bucket e from bb. */
+   APR_BUCKET_REMOVE(e);
+   /* Insert it into  temporary brigade. */
+   APR_BRIGADE_INSERT_HEAD(tmpbb);
+   /* Pass brigade upstream. */
+   rv = ap_pass_brigade(f->next, tmpbb);
+   if (rv) ...;
+   apr_brigade_cleanup(tmpbb);
+}
+ +
+ +
+ + Maintaining state + +

A filter which needs to maintain state over multiple + invocations per response can use the ->ctx field of + its ap_filter_t structure. It is typical to store a + temporary brigade in such a structure, to avoid having to allocate + a new brigade per invocation as described in the Brigade structure section.

+ + Example code to maintain filter state + +
struct dummy_state {
+   apr_bucket_brigade *tmpbb;
+   int filter_state;
+   ....
+};
+
+apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb)
+{
+    struct dummy_state *state;
+
+    state = f->ctx;
+    if (state == NULL) {
+       /* First invocation for this response: initialise state structure. */
+       f->ctx = state = apr_palloc(sizeof *state, f->r->pool);
+       
+       state->tmpbb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);
+       state->filter_state = ...;
+    }
+    ...
+ +
+ +
+ Buffering buckets + +

If a filter decides to store buckets beyond the duration of a + single filter function invocation (for example storing them in its + ->ctx state structure), those buckets must be set + aside. This is necessary because some bucket types provide + buckets which represent temporary resources (such as stack memory) + which will fall out of scope as soon as the filter chain completes + processing the brigade.

+ +

To setaside a bucket, the apr_bucket_setaside + function can be called. Not all bucket types can be setaside, but + if successful, the bucket will have morphed to ensure it has a + lifetime at least as long as the pool given as an argument to the + apr_bucket_setaside function.

+ +

Alternatively, the ap_save_brigade function can be + used, which will create a new brigade containing buckets with a + lifetime as long as the given pool argument. This function must + be used with great care, however: on return it guarantees that all + the buckets in the returned brigade will represent data mapped + into memory. If given an input brigade containing, for example, a + PIPE bucket, ap_save_brigade will consume an + arbitrary amount of memory to store the entire output of the + pipe.

+ + Filters must ensure that any buffered data is + processed and passed up the filter chain during the last + invocation for a given response (a brigade containing an EOS + bucket). Otherwise such data will be lost. + +
+ +
+ Non-blocking bucket reads + +

The apr_bucket_read function takes an + apr_read_type_e argument which determines whether a + blocking or non-blocking read will be performed + from the data source. A good filter will first attempt to read + from every data bucket using a non-blocking read; if that fails + with APR_EAGAIN, then send a FLUSH + bucket up the filter chain, and retry using a blocking read.

+ +

This mode of operation ensure that any filters further up the + filter chain will flush any buffered buckets if a slow content + source is being used.

+ +

A CGI script is an example of a slow content source which is + implemented as a bucket type. mod_cgi will send + PIPE buckets which represent the output from a CGI + script; reading from such a bucket will block when waiting for the + CGI script to produce more output.

+ + + Example code using non-blocking bucket reads + +
apr_bucket *e;
+apr_read_type_e mode = APR_NONBLOCK_READ;
+
+while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
+    apr_status_t rv;
+
+    rv = apr_bucket_read(e, &data, &length, mode);
+    if (rv == APR_EAGAIN && mode == APR_NONBLOCK_READ) {
+        /* Pass up a brigade containing a flush bucket: */
+        APR_BRIGADE_INSERT_TAIL(tmpbb, apr_bucket_flush_create(...));
+        rv = ap_pass_brigade(f->next, tmpbb);
+        apr_brigade_cleanup(tmpbb);
+        if (rv != APR_SUCCESS) return rv;
+
+        /* Retry, using a blocking read. */
+        mode = APR_BLOCK_READ;
+        continue;
+    } else if (rv != APR_SUCCESS) { 
+        /* handle errors */
+    }
+
+    /* Next time, try a non-blocking read first. */
+    mode = APR_NONBLOCK_READ;
+    ...
+}
+ +
+ +
+ Ten rules for output filters + +

In summary, here is a set of rules for all output filters to + follow:

+ +
    +
  1. Output filters should not pass empty brigades up the filter + chain, but should be tolerant of being passed empty + brigades.
  2. + +
  3. Output filters must pass all metadata buckets up the filter + chain; FLUSH buckets should be respected by passing + any pending or buffered buckets up the filter chain.
  4. + +
  5. Output filters should ignore any buckets following an + EOS bucket.
  6. + +
  7. Output filters which read all the buckets in a brigade must + process a fixed number of buckets (or amount of data) at a time, + to ensure that memory consumption is not proportional to the + size of the content being filtered.
  8. + +
  9. Output filters should be agnostic with respect to bucket + types, and must be able to process buckets of unfamiliar + type.
  10. + +
  11. After calling ap_pass_brigade to pass a brigade + up the filter chain, output filters should call + apr_brigade_clear to ensure the brigade is empty + before reusing that brigade structure; output filters should + never use apr_brigade_destroy to "destroy" + brigades.
  12. + +
  13. Output filters must setaside any buckets which are + preserved beyond the duration of the filter function.
  14. + +
  15. Output filters must not ignore the return value of + ap_pass_brigade, and must return appropriate errors + back down the filter chain.
  16. + +
  17. Output filters must only create a fixed number of bucket + brigades for each response, rather than one per invocation.
  18. + +
  19. Output filters should first attempt non-blocking reads from + each data bucket, and send a FLUSH bucket up the + filter chain if the read blocks, before retrying with a blocking + read.
  20. + +
+ +
+ +
diff --git a/docs/manual/developer/output-filters.xml.meta b/docs/manual/developer/output-filters.xml.meta new file mode 100644 index 0000000000..75e376066d --- /dev/null +++ b/docs/manual/developer/output-filters.xml.meta @@ -0,0 +1,11 @@ + + + + output-filters + /developer/ + .. + + + en + + -- 2.40.0