1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml">
6 <meta name="generator" content="HTML Tidy, see www.w3.org" />
8 <title>Apache Content Negotiation</title>
10 <!-- Background white, links blue (unvisited), navy (visited), red (active) -->
12 <body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
13 vlink="#000080" alink="#FF0000">
14 <!--#include virtual="header.html" -->
16 <h1 align="CENTER">Content Negotiation</h1>
18 <p>Apache's supports content negotiation as described in
19 the HTTP/1.1 specification. It can choose the best
20 representation of a resource based on the browser-supplied
21 preferences for media type, languages, character set and
22 encoding. It also implements a couple of features to give
23 more intelligent handling of requests from browsers that send
24 incomplete negotiation information.</p>
26 <p>Content negotiation is provided by the <a
27 href="mod/mod_negotiation.html">mod_negotiation</a> module,
28 which is compiled in by default.</p>
31 <h2>About Content Negotiation</h2>
33 <p>A resource may be available in several different
34 representations. For example, it might be available in
35 different languages or different media types, or a combination.
36 One way of selecting the most appropriate choice is to give the
37 user an index page, and let them select. However it is often
38 possible for the server to choose automatically. This works
39 because browsers can send as part of each request information
40 about what representations they prefer. For example, a browser
41 could indicate that it would like to see information in French,
42 if possible, else English will do. Browsers indicate their
43 preferences by headers in the request. To request only French
44 representations, the browser would send</p>
49 <p>Note that this preference will only be applied when there is
50 a choice of representations and they vary by language.</p>
52 <p>As an example of a more complex request, this browser has
53 been configured to accept French and English, but prefer
54 French, and to accept various media types, preferring HTML over
55 plain text or other text types, and preferring GIF or JPEG over
56 other media types, but also allowing any other media type as a
59 Accept-Language: fr; q=1.0, en; q=0.5
60 Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6,
61 image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1
63 Apache supports 'server driven' content negotiation, as
64 defined in the HTTP/1.1 specification. It fully supports the
65 Accept, Accept-Language, Accept-Charset and Accept-Encoding
66 request headers. Apache also supports 'transparent'
67 content negotiation, which is an experimental negotiation
68 protocol defined in RFC 2295 and RFC 2296. It does not offer
69 support for 'feature negotiation' as defined in these RFCs.
71 <p>A <strong>resource</strong> is a conceptual entity
72 identified by a URI (RFC 2396). An HTTP server like Apache
73 provides access to <strong>representations</strong> of the
74 resource(s) within its namespace, with each representation in
75 the form of a sequence of bytes with a defined media type,
76 character set, encoding, etc. Each resource may be associated
77 with zero, one, or more than one representation at any given
78 time. If multiple representations are available, the resource
79 is referred to as <strong>negotiable</strong> and each of its
80 representations is termed a <strong>variant</strong>. The ways
81 in which the variants for a negotiable resource vary are called
82 the <strong>dimensions</strong> of negotiation.</p>
84 <h2>Negotiation in Apache</h2>
86 <p>In order to negotiate a resource, the server needs to be
87 given information about each of the variants. This is done in
91 <li>Using a type map (<em>i.e.</em>, a <code>*.var</code>
92 file) which names the files containing the variants
95 <li>Using a 'MultiViews' search, where the server does an
96 implicit filename pattern match and chooses from among the
100 <h3>Using a type-map file</h3>
102 <p>A type map is a document which is associated with the
103 handler named <code>type-map</code> (or, for
104 backwards-compatibility with older Apache configurations, the
105 mime type <code>application/x-type-map</code>). Note that to
106 use this feature, you must have a handler set in the
107 configuration that defines a file suffix as
108 <code>type-map</code>; this is best done with a</p>
110 AddHandler type-map .var
112 in the server configuration file.
114 <p>Type map files should have the same name as the resource
115 which they are describing, and have an entry for each available
116 variant; these entries consist of contiguous HTTP-format header
117 lines. Entries for different variants are separated by blank
118 lines. Blank lines are illegal within an entry. It is
119 conventional to begin a map file with an entry for the combined
120 entity as a whole (although this is not required, and if
121 present will be ignored). An example map file is shown below.
122 This file would be named <code>foo.html</code>, as it describes
123 a resource named <code>foo</code>.</p>
128 Content-type: text/html
132 Content-type: text/html;charset=iso-8859-2
133 Content-language: fr, de
135 Note also that a typemap file will take precedence over the
136 filename's extension, even when Multiviews is on. If the
137 variants have different source qualities, that may be indicated
138 by the "qs" parameter to the media type, as in this picture
139 (available as jpeg, gif, or ASCII-art):
144 Content-type: image/jpeg; qs=0.8
147 Content-type: image/gif; qs=0.5
150 Content-type: text/plain; qs=0.01
153 <p>qs values can vary in the range 0.000 to 1.000. Note that
154 any variant with a qs value of 0.000 will never be chosen.
155 Variants with no 'qs' parameter value are given a qs factor of
156 1.0. The qs parameter indicates the relative 'quality' of this
157 variant compared to the other available variants, independent
158 of the client's capabilities. For example, a jpeg file is
159 usually of higher source quality than an ascii file if it is
160 attempting to represent a photograph. However, if the resource
161 being represented is an original ascii art, then an ascii
162 representation would have a higher source quality than a jpeg
163 representation. A qs value is therefore specific to a given
164 variant depending on the nature of the resource it
167 <p>The full list of headers recognized is available in the <a
168 href="mod/mod_negotiation.html#typemaps">mod_negotation</a>
174 <p><code>MultiViews</code> is a per-directory option, meaning
175 it can be set with an <code>Options</code> directive within a
176 <code><Directory></code>, <code><Location></code>
177 or <code><Files></code> section in
178 <code>access.conf</code>, or (if <code>AllowOverride</code> is
179 properly set) in <code>.htaccess</code> files. Note that
180 <code>Options All</code> does not set <code>MultiViews</code>;
181 you have to ask for it by name.</p>
183 <p>The effect of <code>MultiViews</code> is as follows: if the
184 server receives a request for <code>/some/dir/foo</code>, if
185 <code>/some/dir</code> has <code>MultiViews</code> enabled, and
186 <code>/some/dir/foo</code> does <em>not</em> exist, then the
187 server reads the directory looking for files named foo.*, and
188 effectively fakes up a type map which names all those files,
189 assigning them the same media types and content-encodings it
190 would have if the client had asked for one of them by name. It
191 then chooses the best match to the client's requirements.</p>
193 <p><code>MultiViews</code> may also apply to searches for the
194 file named by the <code>DirectoryIndex</code> directive, if the
195 server is trying to index a directory. If the configuration
200 then the server will arbitrate between <code>index.html</code>
201 and <code>index.html3</code> if both are present. If neither
202 are present, and <code>index.cgi</code> is there, the server
205 <p>If one of the files found when reading the directory does not
206 have an extension recognized by <code>mod_mime</code> to designate
207 its Charset, Content-Type, Language, or Encoding, then the result
208 depends on the setting of the <a
209 href="mod/mod_mime.html#multiviewsmatch">MultiViewsMatch</a>
210 directive. This directive determines whether handlers, filters,
211 and other extension types can participate in MultiViews
214 <h2>The Negotiation Methods</h2>
215 After Apache has obtained a list of the variants for a given
216 resource, either from a type-map file or from the filenames in
217 the directory, it invokes one of two methods to decide on the
218 'best' variant to return, if any. It is not necessary to know
219 any of the details of how negotiation actually takes place in
220 order to use Apache's content negotiation features. However the
221 rest of this document explains the methods used for those
224 <p>There are two negotiation methods:</p>
227 <li><strong>Server driven negotiation with the Apache
228 algorithm</strong> is used in the normal case. The Apache
229 algorithm is explained in more detail below. When this
230 algorithm is used, Apache can sometimes 'fiddle' the quality
231 factor of a particular dimension to achieve a better result.
232 The ways Apache can fiddle quality factors is explained in
233 more detail below.</li>
235 <li><strong>Transparent content negotiation</strong> is used
236 when the browser specifically requests this through the
237 mechanism defined in RFC 2295. This negotiation method gives
238 the browser full control over deciding on the 'best' variant,
239 the result is therefore dependent on the specific algorithms
240 used by the browser. As part of the transparent negotiation
241 process, the browser can ask Apache to run the 'remote
242 variant selection algorithm' defined in RFC 2296.</li>
245 <h3>Dimensions of Negotiation</h3>
257 <td>Browser indicates preferences with the Accept header
258 field. Each item can have an associated quality factor.
259 Variant description can also have a quality factor (the
260 "qs" parameter).</td>
266 <td>Browser indicates preferences with the Accept-Language
267 header field. Each item can have a quality factor. Variants
268 can be associated with none, one or more than one
275 <td>Browser indicates preference with the Accept-Encoding
276 header field. Each item can have a quality factor.</td>
282 <td>Browser indicates preference with the Accept-Charset
283 header field. Each item can have a quality factor. Variants
284 can indicate a charset as a parameter of the media
289 <h3>Apache Negotiation Algorithm</h3>
291 <p>Apache can use the following algorithm to select the 'best'
292 variant (if any) to return to the browser. This algorithm is
293 not further configurable. It operates as follows:</p>
296 <li>First, for each dimension of the negotiation, check the
297 appropriate <em>Accept*</em> header field and assign a
298 quality to each variant. If the <em>Accept*</em> header for
299 any dimension implies that this variant is not acceptable,
300 eliminate it. If no variants remain, go to step 4.</li>
303 Select the 'best' variant by a process of elimination. Each
304 of the following tests is applied in order. Any variants
305 not selected at each test are eliminated. After each test,
306 if only one variant remains, select it as the best match
307 and proceed to step 3. If more than one variant remains,
308 move on to the next test.
311 <li>Multiply the quality factor from the Accept header
312 with the quality-of-source factor for this variant's
313 media type, and select the variants with the highest
316 <li>Select the variants with the highest language quality
319 <li>Select the variants with the best language match,
320 using either the order of languages in the
321 Accept-Language header (if present), or else the order of
322 languages in the <code>LanguagePriority</code> directive
325 <li>Select the variants with the highest 'level' media
326 parameter (used to give the version of text/html media
329 <li>Select variants with the best charset media
330 parameters, as given on the Accept-Charset header line.
331 Charset ISO-8859-1 is acceptable unless explicitly
332 excluded. Variants with a <code>text/*</code> media type
333 but not explicitly associated with a particular charset
334 are assumed to be in ISO-8859-1.</li>
336 <li>Select those variants which have associated charset
337 media parameters that are <em>not</em> ISO-8859-1. If
338 there are no such variants, select all variants
341 <li>Select the variants with the best encoding. If there
342 are variants with an encoding that is acceptable to the
343 user-agent, select only these variants. Otherwise if
344 there is a mix of encoded and non-encoded variants,
345 select only the unencoded variants. If either all
346 variants are encoded or all variants are not encoded,
347 select all variants.</li>
349 <li>Select the variants with the smallest content
352 <li>Select the first variant of those remaining. This
353 will be either the first listed in the type-map file, or
354 when variants are read from the directory, the one whose
355 file name comes first when sorted using ASCII code
360 <li>The algorithm has now selected one 'best' variant, so
361 return it as the response. The HTTP response header Vary is
362 set to indicate the dimensions of negotiation (browsers and
363 caches can use this information when caching the resource).
366 <li>To get here means no variant was selected (because none
367 are acceptable to the browser). Return a 406 status (meaning
368 "No acceptable representation") with a response body
369 consisting of an HTML document listing the available
370 variants. Also set the HTTP Vary header to indicate the
371 dimensions of variance.</li>
374 <h2><a id="better" name="better">Fiddling with Quality
377 <p>Apache sometimes changes the quality values from what would
378 be expected by a strict interpretation of the Apache
379 negotiation algorithm above. This is to get a better result
380 from the algorithm for browsers which do not send full or
381 accurate information. Some of the most popular browsers send
382 Accept header information which would otherwise result in the
383 selection of the wrong variant in many cases. If a browser
384 sends full and correct information these fiddles will not be
387 <h3>Media Types and Wildcards</h3>
389 <p>The Accept: request header indicates preferences for media
390 types. It can also include 'wildcard' media types, such as
391 "image/*" or "*/*" where the * matches any string. So a request
396 would indicate that any type starting "image/" is acceptable,
397 as is any other type (so the first "image/*" is redundant).
398 Some browsers routinely send wildcards in addition to explicit
399 types they can handle. For example:
401 Accept: text/html, text/plain, image/gif, image/jpeg, */*
403 The intention of this is to indicate that the explicitly listed
404 types are preferred, but if a different representation is
405 available, that is ok too. However under the basic algorithm,
406 as given above, the */* wildcard has exactly equal preference
407 to all the other types, so they are not being preferred. The
408 browser should really have sent a request with a lower quality
409 (preference) value for *.*, such as:
411 Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01
413 The explicit types have no quality factor, so they default to a
414 preference of 1.0 (the highest). The wildcard */* is given a
415 low preference of 0.01, so other types will only be returned if
416 no variant matches an explicitly listed type.
418 <p>If the Accept: header contains <em>no</em> q factors at all,
419 Apache sets the q value of "*/*", if present, to 0.01 to
420 emulate the desired behavior. It also sets the q value of
421 wildcards of the format "type/*" to 0.02 (so these are
422 preferred over matches against "*/*". If any media type on the
423 Accept: header contains a q factor, these special values are
424 <em>not</em> applied, so requests from browsers which send the
425 correct information to start with work as expected.</p>
427 <h3>Language Negotiation Exceptions</h3>
429 <p>New in Apache 2.0, some exceptions have been added to the
430 negotiation algorithm to allow graceful fallback when language
431 negotiation fails to find a match.</p>
433 <p>When a client requests a page on your server, but the server
434 cannot find a single page that matches the Accept-language sent by
435 the browser, the server will return either a "No Acceptable
436 Variant" or "Multiple Choices" response to the client. To avoid
437 these error messages, it is possible to configure Apache to ignore
438 the Accept-language in these cases and provide a document that
439 does not explictly match the client's request. The <a
440 href="mod/mod_negotiation.html#forcelanguagepriority">ForceLanguagePriority</a>
441 directive can be used to override one or both of these error
442 messages and subsitute the servers judgement in the form of the <a
443 href="mod/mod_negotiation.html#languagepriority">LanguagePriority</a>
446 <p>The server will also attempt to match language-subsets when no
447 other match can be found. For example, if a client requests
448 documents with the language <code>en-GB</code> for British
449 English, the server is not normally allowed by the HTTP/1.1
450 standard to match that against a document that is marked as simply
451 <code>en</code>. (Note that it is almost surely a configuration
452 error to include <code>en-GB</code> and not <code>en</code> in the
453 Accept-Language header, since it is very unlikely that a reader
454 understands British English, but doesn't understand English in
455 general. Unfortunately, many current clients have default
456 configurations that resemble this.) However, if no other language
457 match is possible and the server is about to return a "No
458 Acceptable Variants" error or fallback to the
459 <code>LanguagePriority</code>, the server will ignore the subset
460 specification and match <code>en-GB</code> against <code>en</code>
461 documents. Implicitly, Apache will add the parent language to
462 the client's acceptable language list with a very low quality
463 value. But note that if the client requests "en-GB; qs=0.9, fr;
464 qs=0.8", and the server has documents designated "en" and "fr",
465 then the "fr" document will be returned. This is necessary to
466 maintain compliance with the HTTP/1.1 specification and to work
467 effectively with properly configured clients.</p>
470 <h2>Extensions to Transparent Content Negotiation</h2>
471 Apache extends the transparent content negotiation protocol
472 (RFC 2295) as follows. A new <code>{encoding ..}</code> element
473 is used in variant lists to label variants which are available
474 with a specific content-encoding only. The implementation of
475 the RVSA/1.0 algorithm (RFC 2296) is extended to recognize
476 encoded variants in the list, and to use them as candidate
477 variants whenever their encodings are acceptable according to
478 the Accept-Encoding request header. The RVSA/1.0 implementation
479 does not round computed quality factors to 5 decimal places
480 before choosing the best variant.
482 <h2>Note on hyperlinks and naming conventions</h2>
484 <p>If you are using language negotiation you can choose between
485 different naming conventions, because files can have more than
486 one extension, and the order of the extensions is normally
487 irrelevant (see the <a
488 href="mod/mod_mime.html#multipleext">mod_mime</a> documentation
491 <p>A typical file has a MIME-type extension (<em>e.g.</em>,
492 <samp>html</samp>), maybe an encoding extension (<em>e.g.</em>,
493 <samp>gz</samp>), and of course a language extension
494 (<em>e.g.</em>, <samp>en</samp>) when we have different
495 language variants of this file.</p>
504 <li>foo.en.html.gz</li>
507 <p>Here some more examples of filenames together with valid and
508 invalid hyperlinks:</p>
510 <table border="1" cellpadding="8" cellspacing="0">
514 <th>Valid hyperlink</th>
516 <th>Invalid hyperlink</th>
520 <td><em>foo.html.en</em></td>
529 <td><em>foo.en.html</em></td>
537 <td><em>foo.html.en.gz</em></td>
547 <td><em>foo.en.html.gz</em></td>
557 <td><em>foo.gz.html.en</em></td>
567 <td><em>foo.html.gz.en</em></td>
577 <p>Looking at the table above you will notice that it is always
578 possible to use the name without any extensions in an hyperlink
579 (<em>e.g.</em>, <samp>foo</samp>). The advantage is that you
580 can hide the actual type of a document rsp. file and can change
581 it later, <em>e.g.</em>, from <samp>html</samp> to
582 <samp>shtml</samp> or <samp>cgi</samp> without changing any
583 hyperlink references.</p>
585 <p>If you want to continue to use a MIME-type in your
586 hyperlinks (<em>e.g.</em> <samp>foo.html</samp>) the language
587 extension (including an encoding extension if there is one)
588 must be on the right hand side of the MIME-type extension
589 (<em>e.g.</em>, <samp>foo.html.en</samp>).</p>
591 <h2>Note on Caching</h2>
593 <p>When a cache stores a representation, it associates it with
594 the request URL. The next time that URL is requested, the cache
595 can use the stored representation. But, if the resource is
596 negotiable at the server, this might result in only the first
597 requested variant being cached and subsequent cache hits might
598 return the wrong response. To prevent this, Apache normally
599 marks all responses that are returned after content negotiation
600 as non-cacheable by HTTP/1.0 clients. Apache also supports the
601 HTTP/1.1 protocol features to allow caching of negotiated
604 <p>For requests which come from a HTTP/1.0 compliant client
605 (either a browser or a cache), the directive
606 <tt>CacheNegotiatedDocs</tt> can be used to allow caching of
607 responses which were subject to negotiation. This directive can
608 be given in the server config or virtual host, and takes no
609 arguments. It has no effect on requests from HTTP/1.1 clients.
611 <h2>More Information</h2>
613 <p>For more information about content negotiation, see Alan
615 href="http://ppewww.ph.gla.ac.uk/~flavell/www/lang-neg.html">Language
616 Negotiation Notes</a>. But note that this document may not be
617 updated to include changes in Apache 2.0.</p>
619 <!--#include virtual="footer.html" -->