1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml">
6 <meta name="generator" content="HTML Tidy, see www.w3.org" />
8 <title>Apache Content Negotiation</title>
10 <!-- Background white, links blue (unvisited), navy (visited), red (active) -->
12 <body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
13 vlink="#000080" alink="#FF0000">
14 <!--#include virtual="header.html" -->
16 <h1 align="CENTER">Content Negotiation</h1>
18 <p>Apache's support for content negotiation has been updated to
19 meet the HTTP/1.1 specification. It can choose the best
20 representation of a resource based on the browser-supplied
21 preferences for media type, languages, character set and
22 encoding. It is also implements a couple of features to give
23 more intelligent handling of requests from browsers which send
24 incomplete negotiation information.</p>
26 <p>Content negotiation is provided by the <a
27 href="mod/mod_negotiation.html">mod_negotiation</a> module,
28 which is compiled in by default.</p>
31 <h2>About Content Negotiation</h2>
33 <p>A resource may be available in several different
34 representations. For example, it might be available in
35 different languages or different media types, or a combination.
36 One way of selecting the most appropriate choice is to give the
37 user an index page, and let them select. However it is often
38 possible for the server to choose automatically. This works
39 because browsers can send as part of each request information
40 about what representations they prefer. For example, a browser
41 could indicate that it would like to see information in French,
42 if possible, else English will do. Browsers indicate their
43 preferences by headers in the request. To request only French
44 representations, the browser would send</p>
49 <p>Note that this preference will only be applied when there is
50 a choice of representations and they vary by language.</p>
52 <p>As an example of a more complex request, this browser has
53 been configured to accept French and English, but prefer
54 French, and to accept various media types, preferring HTML over
55 plain text or other text types, and preferring GIF or JPEG over
56 other media types, but also allowing any other media type as a
59 Accept-Language: fr; q=1.0, en; q=0.5
60 Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6,
61 image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1
63 Apache 1.2 supports 'server driven' content negotiation, as
64 defined in the HTTP/1.1 specification. It fully supports the
65 Accept, Accept-Language, Accept-Charset and Accept-Encoding
66 request headers. Apache 1.3.4 also supports 'transparent'
67 content negotiation, which is an experimental negotiation
68 protocol defined in RFC 2295 and RFC 2296. It does not offer
69 support for 'feature negotiation' as defined in these RFCs.
71 <p>A <strong>resource</strong> is a conceptual entity
72 identified by a URI (RFC 2396). An HTTP server like Apache
73 provides access to <strong>representations</strong> of the
74 resource(s) within its namespace, with each representation in
75 the form of a sequence of bytes with a defined media type,
76 character set, encoding, etc. Each resource may be associated
77 with zero, one, or more than one representation at any given
78 time. If multiple representations are available, the resource
79 is referred to as <strong>negotiable</strong> and each of its
80 representations is termed a <strong>variant</strong>. The ways
81 in which the variants for a negotiable resource vary are called
82 the <strong>dimensions</strong> of negotiation.</p>
84 <h2>Negotiation in Apache</h2>
86 <p>In order to negotiate a resource, the server needs to be
87 given information about each of the variants. This is done in
91 <li>Using a type map (<em>i.e.</em>, a <code>*.var</code>
92 file) which names the files containing the variants
95 <li>Using a 'MultiViews' search, where the server does an
96 implicit filename pattern match and chooses from among the
100 <h3>Using a type-map file</h3>
102 <p>A type map is a document which is associated with the
103 handler named <code>type-map</code> (or, for
104 backwards-compatibility with older Apache configurations, the
105 mime type <code>application/x-type-map</code>). Note that to
106 use this feature, you must have a handler set in the
107 configuration that defines a file suffix as
108 <code>type-map</code>; this is best done with a</p>
110 AddHandler type-map .var
112 in the server configuration file.
114 <p>Type map files should have the same name as the resource
115 which they are describing, and have an entry for each available
116 variant; these entries consist of contiguous HTTP-format header
117 lines. Entries for different variants are separated by blank
118 lines. Blank lines are illegal within an entry. It is
119 conventional to begin a map file with an entry for the combined
120 entity as a whole (although this is not required, and if
121 present will be ignored). An example map file is shown below.
122 This file would be named <code>foo.html</code>, as it describes
123 a resource named <code>foo</code>.</p>
128 Content-type: text/html
132 Content-type: text/html;charset=iso-8859-2
133 Content-language: fr, de
135 Note also that a typemap file will take precedence over the
136 filename's extension, even when Multiviews is on. If the
137 variants have different source qualities, that may be indicated
138 by the "qs" parameter to the media type, as in this picture
139 (available as jpeg, gif, or ASCII-art):
144 Content-type: image/jpeg; qs=0.8
147 Content-type: image/gif; qs=0.5
150 Content-type: text/plain; qs=0.01
153 <p>qs values can vary in the range 0.000 to 1.000. Note that
154 any variant with a qs value of 0.000 will never be chosen.
155 Variants with no 'qs' parameter value are given a qs factor of
156 1.0. The qs parameter indicates the relative 'quality' of this
157 variant compared to the other available variants, independent
158 of the client's capabilities. For example, a jpeg file is
159 usually of higher source quality than an ascii file if it is
160 attempting to represent a photograph. However, if the resource
161 being represented is an original ascii art, then an ascii
162 representation would have a higher source quality than a jpeg
163 representation. A qs value is therefore specific to a given
164 variant depending on the nature of the resource it
167 <p>The full list of headers recognized is:</p>
170 <dt><code>URI:</code></dt>
172 <dd>uri of the file containing the variant (of the given
173 media type, encoded with the given content encoding). These
174 are interpreted as URLs relative to the map file; they must
175 be on the same server (!), and they must refer to files to
176 which the client would be granted access if they were to be
177 requested directly.</dd>
179 <dt><code>Content-Type:</code></dt>
181 <dd>media type --- charset, level and "qs" parameters may be
182 given. These are often referred to as MIME types; typical
183 media types are <code>image/gif</code>,
184 <code>text/plain</code>, or
185 <code>text/html; level=3</code>.</dd>
187 <dt><code>Content-Language:</code></dt>
189 <dd>The languages of the variant, specified as an Internet
190 standard language tag from RFC 1766 (<em>e.g.</em>,
191 <code>en</code> for English, <code>kr</code> for Korean,
194 <dt><code>Content-Encoding:</code></dt>
196 <dd>If the file is compressed, or otherwise encoded, rather
197 than containing the actual raw data, this says how that was
198 done. Apache only recognizes encodings that are defined by an
199 <a href="mod/mod_mime.html#addencoding">AddEncoding</a>
200 directive. This normally includes the encodings
201 <code>x-compress</code> for compress'd files, and
202 <code>x-gzip</code> for gzip'd files. The <code>x-</code>
203 prefix is ignored for encoding comparisons.</dd>
205 <dt><code>Content-Length:</code></dt>
207 <dd>The size of the file in bytes. Specifying content lengths
208 in the type-map allows the server to compare file sizes
209 without checking the actual files.</dd>
211 <dt><code>Description:</code></dt>
213 <dd>A human-readable textual description of the variant. If
214 Apache cannot find any appropriate variant to return, it will
215 return an error response which lists all available variants
216 instead. Such a variant list will include the human-readable
217 variant descriptions.</dd>
219 Using a type map file is preferred over <code>MultiViews</code>
220 because it requires less CPU time, and less file access, to
221 parse a file explicitly listing the various resource variants,
222 than to have to look at every matching file, and parse its file
227 <p><code>MultiViews</code> is a per-directory option, meaning
228 it can be set with an <code>Options</code> directive within a
229 <code><Directory></code>, <code><Location></code>
230 or <code><Files></code> section in
231 <code>access.conf</code>, or (if <code>AllowOverride</code> is
232 properly set) in <code>.htaccess</code> files. Note that
233 <code>Options All</code> does not set <code>MultiViews</code>;
234 you have to ask for it by name.</p>
236 <p>The effect of <code>MultiViews</code> is as follows: if the
237 server receives a request for <code>/some/dir/foo</code>, if
238 <code>/some/dir</code> has <code>MultiViews</code> enabled, and
239 <code>/some/dir/foo</code> does <em>not</em> exist, then the
240 server reads the directory looking for files named foo.*, and
241 effectively fakes up a type map which names all those files,
242 assigning them the same media types and content-encodings it
243 would have if the client had asked for one of them by name. It
244 then chooses the best match to the client's requirements.</p>
246 <p><code>MultiViews</code> may also apply to searches for the
247 file named by the <code>DirectoryIndex</code> directive, if the
248 server is trying to index a directory. If the configuration
253 then the server will arbitrate between <code>index.html</code>
254 and <code>index.html3</code> if both are present. If neither
255 are present, and <code>index.cgi</code> is there, the server
258 <p>If one of the files found when reading the directive is a
259 CGI script, it's not obvious what should happen. The code gives
260 that case special treatment --- if the request was a POST, or a
261 GET with QUERY_ARGS or PATH_INFO, the script is given an
262 extremely high quality rating, and generally invoked; otherwise
263 it is given an extremely low quality rating, which generally
264 causes one of the other views (if any) to be retrieved.</p>
266 <h2>The Negotiation Methods</h2>
267 After Apache has obtained a list of the variants for a given
268 resource, either from a type-map file or from the filenames in
269 the directory, it invokes one of two methods to decide on the
270 'best' variant to return, if any. It is not necessary to know
271 any of the details of how negotiation actually takes place in
272 order to use Apache's content negotiation features. However the
273 rest of this document explains the methods used for those
276 <p>There are two negotiation methods:</p>
279 <li><strong>Server driven negotiation with the Apache
280 algorithm</strong> is used in the normal case. The Apache
281 algorithm is explained in more detail below. When this
282 algorithm is used, Apache can sometimes 'fiddle' the quality
283 factor of a particular dimension to achieve a better result.
284 The ways Apache can fiddle quality factors is explained in
285 more detail below.</li>
287 <li><strong>Transparent content negotiation</strong> is used
288 when the browser specifically requests this through the
289 mechanism defined in RFC 2295. This negotiation method gives
290 the browser full control over deciding on the 'best' variant,
291 the result is therefore dependent on the specific algorithms
292 used by the browser. As part of the transparent negotiation
293 process, the browser can ask Apache to run the 'remote
294 variant selection algorithm' defined in RFC 2296.</li>
297 <h3>Dimensions of Negotiation</h3>
309 <td>Browser indicates preferences with the Accept header
310 field. Each item can have an associated quality factor.
311 Variant description can also have a quality factor (the
312 "qs" parameter).</td>
318 <td>Browser indicates preferences with the Accept-Language
319 header field. Each item can have a quality factor. Variants
320 can be associated with none, one or more than one
327 <td>Browser indicates preference with the Accept-Encoding
328 header field. Each item can have a quality factor.</td>
334 <td>Browser indicates preference with the Accept-Charset
335 header field. Each item can have a quality factor. Variants
336 can indicate a charset as a parameter of the media
341 <h3>Apache Negotiation Algorithm</h3>
343 <p>Apache can use the following algorithm to select the 'best'
344 variant (if any) to return to the browser. This algorithm is
345 not further configurable. It operates as follows:</p>
348 <li>First, for each dimension of the negotiation, check the
349 appropriate <em>Accept*</em> header field and assign a
350 quality to each variant. If the <em>Accept*</em> header for
351 any dimension implies that this variant is not acceptable,
352 eliminate it. If no variants remain, go to step 4.</li>
355 Select the 'best' variant by a process of elimination. Each
356 of the following tests is applied in order. Any variants
357 not selected at each test are eliminated. After each test,
358 if only one variant remains, select it as the best match
359 and proceed to step 3. If more than one variant remains,
360 move on to the next test.
363 <li>Multiply the quality factor from the Accept header
364 with the quality-of-source factor for this variant's
365 media type, and select the variants with the highest
368 <li>Select the variants with the highest language quality
371 <li>Select the variants with the best language match,
372 using either the order of languages in the
373 Accept-Language header (if present), or else the order of
374 languages in the <code>LanguagePriority</code> directive
377 <li>Select the variants with the highest 'level' media
378 parameter (used to give the version of text/html media
381 <li>Select variants with the best charset media
382 parameters, as given on the Accept-Charset header line.
383 Charset ISO-8859-1 is acceptable unless explicitly
384 excluded. Variants with a <code>text/*</code> media type
385 but not explicitly associated with a particular charset
386 are assumed to be in ISO-8859-1.</li>
388 <li>Select those variants which have associated charset
389 media parameters that are <em>not</em> ISO-8859-1. If
390 there are no such variants, select all variants
393 <li>Select the variants with the best encoding. If there
394 are variants with an encoding that is acceptable to the
395 user-agent, select only these variants. Otherwise if
396 there is a mix of encoded and non-encoded variants,
397 select only the unencoded variants. If either all
398 variants are encoded or all variants are not encoded,
399 select all variants.</li>
401 <li>Select the variants with the smallest content
404 <li>Select the first variant of those remaining. This
405 will be either the first listed in the type-map file, or
406 when variants are read from the directory, the one whose
407 file name comes first when sorted using ASCII code
412 <li>The algorithm has now selected one 'best' variant, so
413 return it as the response. The HTTP response header Vary is
414 set to indicate the dimensions of negotiation (browsers and
415 caches can use this information when caching the resource).
418 <li>To get here means no variant was selected (because none
419 are acceptable to the browser). Return a 406 status (meaning
420 "No acceptable representation") with a response body
421 consisting of an HTML document listing the available
422 variants. Also set the HTTP Vary header to indicate the
423 dimensions of variance.</li>
426 <h2><a id="better" name="better">Fiddling with Quality
429 <p>Apache sometimes changes the quality values from what would
430 be expected by a strict interpretation of the Apache
431 negotiation algorithm above. This is to get a better result
432 from the algorithm for browsers which do not send full or
433 accurate information. Some of the most popular browsers send
434 Accept header information which would otherwise result in the
435 selection of the wrong variant in many cases. If a browser
436 sends full and correct information these fiddles will not be
439 <h3>Media Types and Wildcards</h3>
441 <p>The Accept: request header indicates preferences for media
442 types. It can also include 'wildcard' media types, such as
443 "image/*" or "*/*" where the * matches any string. So a request
448 would indicate that any type starting "image/" is acceptable,
449 as is any other type (so the first "image/*" is redundant).
450 Some browsers routinely send wildcards in addition to explicit
451 types they can handle. For example:
453 Accept: text/html, text/plain, image/gif, image/jpeg, */*
455 The intention of this is to indicate that the explicitly listed
456 types are preferred, but if a different representation is
457 available, that is ok too. However under the basic algorithm,
458 as given above, the */* wildcard has exactly equal preference
459 to all the other types, so they are not being preferred. The
460 browser should really have sent a request with a lower quality
461 (preference) value for *.*, such as:
463 Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01
465 The explicit types have no quality factor, so they default to a
466 preference of 1.0 (the highest). The wildcard */* is given a
467 low preference of 0.01, so other types will only be returned if
468 no variant matches an explicitly listed type.
470 <p>If the Accept: header contains <em>no</em> q factors at all,
471 Apache sets the q value of "*/*", if present, to 0.01 to
472 emulate the desired behavior. It also sets the q value of
473 wildcards of the format "type/*" to 0.02 (so these are
474 preferred over matches against "*/*". If any media type on the
475 Accept: header contains a q factor, these special values are
476 <em>not</em> applied, so requests from browsers which send the
477 correct information to start with work as expected.</p>
479 <h3>Variants with no Language</h3>
481 <p>If some of the variants for a particular resource have a
482 language attribute, and some do not, those variants with no
483 language are given a very low language quality factor of
486 <p>The reason for setting this language quality factor for
487 variant with no language to a very low value is to allow for a
488 default variant which can be supplied if none of the other
489 variants match the browser's language preferences. For example,
490 consider the situation with three variants:</p>
493 <li>foo.en.html, language en</li>
495 <li>foo.fr.html, language en</li>
497 <li>foo.html, no language</li>
500 <p>The meaning of a variant with no language is that it is
501 always acceptable to the browser. If the request
502 Accept-Language header includes either en or fr (or both) one
503 of foo.en.html or foo.fr.html will be returned. If the browser
504 does not list either en or fr as acceptable, foo.html will be
505 returned instead.</p>
507 <h2>Extensions to Transparent Content Negotiation</h2>
508 Apache extends the transparent content negotiation protocol
509 (RFC 2295) as follows. A new <code>{encoding ..}</code> element
510 is used in variant lists to label variants which are available
511 with a specific content-encoding only. The implementation of
512 the RVSA/1.0 algorithm (RFC 2296) is extended to recognize
513 encoded variants in the list, and to use them as candidate
514 variants whenever their encodings are acceptable according to
515 the Accept-Encoding request header. The RVSA/1.0 implementation
516 does not round computed quality factors to 5 decimal places
517 before choosing the best variant.
519 <h2>Note on hyperlinks and naming conventions</h2>
521 <p>If you are using language negotiation you can choose between
522 different naming conventions, because files can have more than
523 one extension, and the order of the extensions is normally
524 irrelevant (see the <a
525 href="mod/mod_mime.html#multipleext">mod_mime</a> documentation
528 <p>A typical file has a MIME-type extension (<em>e.g.</em>,
529 <samp>html</samp>), maybe an encoding extension (<em>e.g.</em>,
530 <samp>gz</samp>), and of course a language extension
531 (<em>e.g.</em>, <samp>en</samp>) when we have different
532 language variants of this file.</p>
541 <li>foo.en.html.gz</li>
544 <p>Here some more examples of filenames together with valid and
545 invalid hyperlinks:</p>
547 <table border="1" cellpadding="8" cellspacing="0">
551 <th>Valid hyperlink</th>
553 <th>Invalid hyperlink</th>
557 <td><em>foo.html.en</em></td>
566 <td><em>foo.en.html</em></td>
574 <td><em>foo.html.en.gz</em></td>
584 <td><em>foo.en.html.gz</em></td>
594 <td><em>foo.gz.html.en</em></td>
604 <td><em>foo.html.gz.en</em></td>
614 <p>Looking at the table above you will notice that it is always
615 possible to use the name without any extensions in an hyperlink
616 (<em>e.g.</em>, <samp>foo</samp>). The advantage is that you
617 can hide the actual type of a document rsp. file and can change
618 it later, <em>e.g.</em>, from <samp>html</samp> to
619 <samp>shtml</samp> or <samp>cgi</samp> without changing any
620 hyperlink references.</p>
622 <p>If you want to continue to use a MIME-type in your
623 hyperlinks (<em>e.g.</em> <samp>foo.html</samp>) the language
624 extension (including an encoding extension if there is one)
625 must be on the right hand side of the MIME-type extension
626 (<em>e.g.</em>, <samp>foo.html.en</samp>).</p>
628 <h2>Note on Caching</h2>
630 <p>When a cache stores a representation, it associates it with
631 the request URL. The next time that URL is requested, the cache
632 can use the stored representation. But, if the resource is
633 negotiable at the server, this might result in only the first
634 requested variant being cached and subsequent cache hits might
635 return the wrong response. To prevent this, Apache normally
636 marks all responses that are returned after content negotiation
637 as non-cacheable by HTTP/1.0 clients. Apache also supports the
638 HTTP/1.1 protocol features to allow caching of negotiated
641 <p>For requests which come from a HTTP/1.0 compliant client
642 (either a browser or a cache), the directive
643 <tt>CacheNegotiatedDocs</tt> can be used to allow caching of
644 responses which were subject to negotiation. This directive can
645 be given in the server config or virtual host, and takes no
646 arguments. It has no effect on requests from HTTP/1.1 clients.
647 <!--#include virtual="footer.html" -->