1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
4 <TITLE>Apache Content Negotiation</TITLE>
7 <!-- Background white, links blue (unvisited), navy (visited), red (active) -->
15 <!--#include virtual="header.html" -->
16 <H1 ALIGN="CENTER">Content Negotiation</H1>
19 Apache's support for content negotiation has been updated to meet the
20 HTTP/1.1 specification. It can choose the best representation of a
21 resource based on the browser-supplied preferences for media type,
22 languages, character set and encoding. It is also implements a
23 couple of features to give more intelligent handling of requests from
24 browsers which send incomplete negotiation information. <P>
26 Content negotiation is provided by the
27 <A HREF="mod/mod_negotiation.html">mod_negotiation</A> module,
28 which is compiled in by default.
32 <H2>About Content Negotiation</H2>
35 A resource may be available in several different representations. For
36 example, it might be available in different languages or different
37 media types, or a combination. One way of selecting the most
38 appropriate choice is to give the user an index page, and let them
39 select. However it is often possible for the server to choose
40 automatically. This works because browsers can send as part of each
41 request information about what representations they prefer. For
42 example, a browser could indicate that it would like to see
43 information in French, if possible, else English will do. Browsers
44 indicate their preferences by headers in the request. To request only
45 French representations, the browser would send
52 Note that this preference will only be applied when there is a choice
53 of representations and they vary by language.
56 As an example of a more complex request, this browser has been
57 configured to accept French and English, but prefer French, and to
58 accept various media types, preferring HTML over plain text or other
59 text types, and preferring GIF or JPEG over other media types, but also
60 allowing any other media type as a last resort:
63 Accept-Language: fr; q=1.0, en; q=0.5
64 Accept: text/html; q=1.0, text/*; q=0.8, image/gif; q=0.6,
65 image/jpeg; q=0.6, image/*; q=0.5, */*; q=0.1
68 Apache 1.2 supports 'server driven' content negotiation, as defined in
69 the HTTP/1.1 specification. It fully supports the Accept,
70 Accept-Language, Accept-Charset and Accept-Encoding request headers.
71 Apache 1.3.4 also supports 'transparent' content negotiation, which is
72 an experimental negotiation protocol defined in RFC 2295 and RFC 2296.
73 It does not offer support for 'feature negotiation' as defined in
77 A <STRONG>resource</STRONG> is a conceptual entity identified by a URI
78 (RFC 2396). An HTTP server like Apache provides access to
79 <STRONG>representations</STRONG> of the resource(s) within its namespace,
80 with each representation in the form of a sequence of bytes with a
81 defined media type, character set, encoding, etc. Each resource may be
82 associated with zero, one, or more than one representation
83 at any given time. If multiple representations are available,
84 the resource is referred to as <STRONG>negotiable</STRONG> and each of its
85 representations is termed a <STRONG>variant</STRONG>. The ways in which the
86 variants for a negotiable resource vary are called the
87 <STRONG>dimensions</STRONG> of negotiation.
89 <H2>Negotiation in Apache</H2>
92 In order to negotiate a resource, the server needs to be given
93 information about each of the variants. This is done in one of two
97 <LI> Using a type map (<EM>i.e.</EM>, a <CODE>*.var</CODE> file) which
98 names the files containing the variants explicitly, or
99 <LI> Using a 'MultiViews' search, where the server does an implicit
100 filename pattern match and chooses from among the results.
103 <H3>Using a type-map file</H3>
106 A type map is a document which is associated with the handler
107 named <CODE>type-map</CODE> (or, for backwards-compatibility with
108 older Apache configurations, the mime type
109 <CODE>application/x-type-map</CODE>). Note that to use this feature,
110 you must have a handler set in the configuration that defines a
111 file suffix as <CODE>type-map</CODE>; this is best done with a
114 AddHandler type-map .var
117 in the server configuration file. See the comments in the sample config
118 file for more details. <P>
120 Type map files have an entry for each available variant; these entries
121 consist of contiguous HTTP-format header lines. Entries for
122 different variants are separated by blank lines. Blank lines are
123 illegal within an entry. It is conventional to begin a map file with
124 an entry for the combined entity as a whole (although this
125 is not required, and if present will be ignored). An example
132 Content-type: text/html
136 Content-type: text/html;charset=iso-8859-2
137 Content-language: fr, de
140 If the variants have different source qualities, that may be indicated
141 by the "qs" parameter to the media type, as in this picture (available
142 as jpeg, gif, or ASCII-art):
148 Content-type: image/jpeg; qs=0.8
151 Content-type: image/gif; qs=0.5
154 Content-type: text/plain; qs=0.01
158 qs values can vary in the range 0.000 to 1.000. Note that any variant with
159 a qs value of 0.000 will never be chosen. Variants with no 'qs'
160 parameter value are given a qs factor of 1.0. The qs parameter indicates
161 the relative 'quality' of this variant compared to the other available
162 variants, independent of the client's capabilities. For example, a jpeg
163 file is usually of higher source quality than an ascii file if it is
164 attempting to represent a photograph. However, if the resource being
165 represented is an original ascii art, then an ascii representation would
166 have a higher source quality than a jpeg representation. A qs value
167 is therefore specific to a given variant depending on the nature of
168 the resource it represents.
171 The full list of headers recognized is:
174 <DT> <CODE>URI:</CODE>
175 <DD> uri of the file containing the variant (of the given media
176 type, encoded with the given content encoding). These are
177 interpreted as URLs relative to the map file; they must be on
178 the same server (!), and they must refer to files to which the
179 client would be granted access if they were to be requested
181 <DT> <CODE>Content-Type:</CODE>
182 <DD> media type --- charset, level and "qs" parameters may be given. These
183 are often referred to as MIME types; typical media types are
184 <CODE>image/gif</CODE>, <CODE>text/plain</CODE>, or
185 <CODE>text/html; level=3</CODE>.
186 <DT> <CODE>Content-Language:</CODE>
187 <DD> The languages of the variant, specified as an Internet standard
188 language tag from RFC 1766 (<EM>e.g.</EM>, <CODE>en</CODE> for English,
189 <CODE>kr</CODE> for Korean, <EM>etc.</EM>).
190 <DT> <CODE>Content-Encoding:</CODE>
191 <DD> If the file is compressed, or otherwise encoded, rather than
192 containing the actual raw data, this says how that was done.
193 Apache only recognizes encodings that are defined by an
194 <A HREF="mod/mod_mime.html#addencoding">AddEncoding</A> directive.
195 This normally includes the encodings <CODE>x-compress</CODE>
196 for compress'd files, and <CODE>x-gzip</CODE> for gzip'd files.
197 The <CODE>x-</CODE> prefix is ignored for encoding comparisons.
198 <DT> <CODE>Content-Length:</CODE>
199 <DD> The size of the file. Specifying content
200 lengths in the type-map allows the server to compare file sizes
201 without checking the actual files.
202 <DT> <CODE>Description:</CODE>
203 <DD> A human-readable textual description of the variant. If Apache cannot
204 find any appropriate variant to return, it will return an error
205 response which lists all available variants instead. Such a variant
206 list will include the human-readable variant descriptions.
212 <CODE>MultiViews</CODE> is a per-directory option, meaning it can be set with
213 an <CODE>Options</CODE> directive within a <CODE><Directory></CODE>,
214 <CODE><Location></CODE> or <CODE><Files></CODE>
215 section in <CODE>access.conf</CODE>, or (if <CODE>AllowOverride</CODE>
216 is properly set) in <CODE>.htaccess</CODE> files. Note that
217 <CODE>Options All</CODE> does not set <CODE>MultiViews</CODE>; you
218 have to ask for it by name.
221 The effect of <CODE>MultiViews</CODE> is as follows: if the server
222 receives a request for <CODE>/some/dir/foo</CODE>, if
223 <CODE>/some/dir</CODE> has <CODE>MultiViews</CODE> enabled, and
224 <CODE>/some/dir/foo</CODE> does <EM>not</EM> exist, then the server reads the
225 directory looking for files named foo.*, and effectively fakes up a
226 type map which names all those files, assigning them the same media
227 types and content-encodings it would have if the client had asked for
228 one of them by name. It then chooses the best match to the client's
232 <CODE>MultiViews</CODE> may also apply to searches for the file named by the
233 <CODE>DirectoryIndex</CODE> directive, if the server is trying to
234 index a directory. If the configuration files specify
240 then the server will arbitrate between <CODE>index.html</CODE>
241 and <CODE>index.html3</CODE> if both are present. If neither are
242 present, and <CODE>index.cgi</CODE> is there, the server will run it.
245 If one of the files found when reading the directive is a CGI script,
246 it's not obvious what should happen. The code gives that case
247 special treatment --- if the request was a POST, or a GET with
248 QUERY_ARGS or PATH_INFO, the script is given an extremely high quality
249 rating, and generally invoked; otherwise it is given an extremely low
250 quality rating, which generally causes one of the other views (if any)
253 <H2>The Negotiation Methods</H2>
255 After Apache has obtained a list of the variants for a given resource,
256 either from a type-map file or from the filenames in the directory, it
257 invokes one of two methods to decide on the 'best' variant to
258 return, if any. It is not necessary to know any of the details of how
259 negotiation actually takes place in order to use Apache's content
260 negotiation features. However the rest of this document explains the
261 methods used for those interested.
264 There are two negotiation methods:
268 <LI><STRONG>Server driven negotiation with the Apache
269 algorithm</STRONG> is used in the normal case. The Apache algorithm is
270 explained in more detail below. When this algorithm is used, Apache
271 can sometimes 'fiddle' the quality factor of a particular dimension to
272 achieve a better result. The ways Apache can fiddle quality factors is
273 explained in more detail below.
275 <LI><STRONG>Transparent content negotiation</STRONG> is used when the
276 browser specifically requests this through the mechanism defined in RFC
277 2295. This negotiation method gives the browser full control over
278 deciding on the 'best' variant, the result is therefore dependent on
279 the specific algorithms used by the browser. As part of the
280 transparent negotiation process, the browser can ask Apache to run the
281 'remote variant selection algorithm' defined in RFC 2296. </UL>
284 <H3>Dimensions of Negotiation</H3>
292 <TD>Browser indicates preferences with the Accept header field. Each item
293 can have an associated quality factor. Variant description can also
294 have a quality factor (the "qs" parameter).
297 <TD>Browser indicates preferences with the Accept-Language header field.
298 Each item can have a quality factor. Variants can be associated with none, one
299 or more than one language.
302 <TD>Browser indicates preference with the Accept-Encoding header field.
303 Each item can have a quality factor.
306 <TD>Browser indicates preference with the Accept-Charset header field.
307 Each item can have a quality factor.
308 Variants can indicate a charset as a parameter of the media type.
311 <H3>Apache Negotiation Algorithm</H3>
314 Apache can use the following algorithm to select the 'best' variant
315 (if any) to return to the browser. This algorithm is not
316 further configurable. It operates as follows:
319 <LI>First, for each dimension of the negotiation, check the appropriate
320 <EM>Accept*</EM> header field and assign a quality to each
321 variant. If the <EM>Accept*</EM> header for any dimension implies that this
322 variant is not acceptable, eliminate it. If no variants remain, go
325 <LI>Select the 'best' variant by a process of elimination. Each of the
326 following tests is applied in order. Any variants not selected at each
327 test are eliminated. After each test, if only one variant remains,
328 select it as the best match and proceed to step 3. If more than one
329 variant remains, move on to the next test.
332 <LI>Multiply the quality factor from the Accept header with the
333 quality-of-source factor for this variant's media type, and select
334 the variants with the highest value.
336 <LI>Select the variants with the highest language quality factor.
338 <LI>Select the variants with the best language match, using either the
339 order of languages in the Accept-Language header (if present), or else
340 else the order of languages in the <CODE>LanguagePriority</CODE>
341 directive (if present).
343 <LI>Select the variants with the highest 'level' media parameter
344 (used to give the version of text/html media types).
346 <LI>Select variants with the best charset media parameters,
347 as given on the Accept-Charset header line. Charset ISO-8859-1
348 is acceptable unless explicitly excluded. Variants with a
349 <CODE>text/*</CODE> media type but not explicitly associated
350 with a particular charset are assumed to be in ISO-8859-1.
352 <LI>Select those variants which have associated
353 charset media parameters that are <EM>not</EM> ISO-8859-1.
354 If there are no such variants, select all variants instead.
356 <LI>Select the variants with the best encoding. If there are
357 variants with an encoding that is acceptable to the user-agent,
358 select only these variants. Otherwise if there is a mix of encoded
359 and non-encoded variants, select only the unencoded variants.
360 If either all variants are encoded or all variants are not encoded,
363 <LI>Select the variants with the smallest content length.
365 <LI>Select the first variant of those remaining. This will be either the
366 first listed in the type-map file, or when variants are read from
367 the directory, the one whose file name comes first when sorted using
372 <LI>The algorithm has now selected one 'best' variant, so return
373 it as the response. The HTTP response header Vary is set to indicate the
374 dimensions of negotiation (browsers and caches can use this
375 information when caching the resource). End.
377 <LI>To get here means no variant was selected (because none are acceptable
378 to the browser). Return a 406 status (meaning "No acceptable representation")
379 with a response body consisting of an HTML document listing the
380 available variants. Also set the HTTP Vary header to indicate the
381 dimensions of variance.
385 <H2><A NAME="better">Fiddling with Quality Values</A></H2>
388 Apache sometimes changes the quality values from what would be
389 expected by a strict interpretation of the Apache negotiation
390 algorithm above. This is to get a better result from the algorithm for
391 browsers which do not send full or accurate information. Some of the
392 most popular browsers send Accept header information which would
393 otherwise result in the selection of the wrong variant in many
394 cases. If a browser sends full and correct information these fiddles
398 <H3>Media Types and Wildcards</H3>
401 The Accept: request header indicates preferences for media types. It
402 can also include 'wildcard' media types, such as "image/*" or "*/*"
403 where the * matches any string. So a request including:
408 would indicate that any type starting "image/" is acceptable,
409 as is any other type (so the first "image/*" is redundant). Some
410 browsers routinely send wildcards in addition to explicit types they
411 can handle. For example:
413 Accept: text/html, text/plain, image/gif, image/jpeg, */*
416 The intention of this is to indicate that the explicitly
417 listed types are preferred, but if a different representation is
418 available, that is ok too. However under the basic algorithm, as given
419 above, the */* wildcard has exactly equal preference to all the other
420 types, so they are not being preferred. The browser should really have
421 sent a request with a lower quality (preference) value for *.*, such
424 Accept: text/html, text/plain, image/gif, image/jpeg, */*; q=0.01
427 The explicit types have no quality factor, so they default to a
428 preference of 1.0 (the highest). The wildcard */* is given
429 a low preference of 0.01, so other types will only be returned if
430 no variant matches an explicitly listed type.
433 If the Accept: header contains <EM>no</EM> q factors at all, Apache sets
434 the q value of "*/*", if present, to 0.01 to emulate the desired
435 behavior. It also sets the q value of wildcards of the format
436 "type/*" to 0.02 (so these are preferred over matches against
437 "*/*". If any media type on the Accept: header contains a q factor,
438 these special values are <EM>not</EM> applied, so requests from browsers
439 which send the correct information to start with work as expected.
441 <H3>Variants with no Language</H3>
444 If some of the variants for a particular resource have a language
445 attribute, and some do not, those variants with no language
446 are given a very low language quality factor of 0.001.<P>
448 The reason for setting this language quality factor for
449 variant with no language to a very low value is to allow
450 for a default variant which can be supplied if none of the
451 other variants match the browser's language preferences.
453 For example, consider the situation with three variants:
456 <LI>foo.en.html, language en
457 <LI>foo.fr.html, language en
458 <LI>foo.html, no language
462 The meaning of a variant with no language is that it is
463 always acceptable to the browser. If the request Accept-Language
464 header includes either en or fr (or both) one of foo.en.html
465 or foo.fr.html will be returned. If the browser does not list
466 either en or fr as acceptable, foo.html will be returned instead.
468 <H2>Extensions to Transparent Content Negotiation</H2>
470 Apache extends the transparent content negotiation protocol (RFC 2295)
471 as follows. A new <CODE> {encoding ..}</CODE> element is used in
472 variant lists to label variants which are available with a specific
473 content-encoding only. The implementation of the
474 RVSA/1.0 algorithm (RFC 2296) is extended to recognize encoded
475 variants in the list, and to use them as candidate variants whenever
476 their encodings are acceptable according to the Accept-Encoding
477 request header. The RVSA/1.0 implementation does not round computed
478 quality factors to 5 decimal places before choosing the best variant.
480 <H2>Note on hyperlinks and naming conventions</H2>
483 If you are using language negotiation you can choose between
484 different naming conventions, because files can have more than one
485 extension, and the order of the extensions is normally irrelevant
486 (see <A HREF="mod/mod_mime.html">mod_mime</A> documentation for details).
488 A typical file has a MIME-type extension (<EM>e.g.</EM>, <SAMP>html</SAMP>),
489 maybe an encoding extension (<EM>e.g.</EM>, <SAMP>gz</SAMP>), and of course a
490 language extension (<EM>e.g.</EM>, <SAMP>en</SAMP>) when we have different
491 language variants of this file.
502 Here some more examples of filenames together with valid and invalid
506 <TABLE BORDER=1 CELLPADDING=8 CELLSPACING=0>
509 <TH>Valid hyperlink</TH>
510 <TH>Invalid hyperlink</TH>
513 <TD><EM>foo.html.en</EM></TD>
519 <TD><EM>foo.en.html</EM></TD>
524 <TD><EM>foo.html.en.gz</EM></TD>
531 <TD><EM>foo.en.html.gz</EM></TD>
538 <TD><EM>foo.gz.html.en</EM></TD>
545 <TD><EM>foo.html.gz.en</EM></TD>
554 Looking at the table above you will notice that it is always possible to
555 use the name without any extensions in an hyperlink (<EM>e.g.</EM>, <SAMP>foo</SAMP>).
556 The advantage is that you can hide the actual type of a
557 document rsp. file and can change it later, <EM>e.g.</EM>, from <SAMP>html</SAMP>
558 to <SAMP>shtml</SAMP> or <SAMP>cgi</SAMP> without changing any
559 hyperlink references.
562 If you want to continue to use a MIME-type in your hyperlinks (<EM>e.g.</EM>
563 <SAMP>foo.html</SAMP>) the language extension (including an encoding extension
564 if there is one) must be on the right hand side of the MIME-type extension
565 (<EM>e.g.</EM>, <SAMP>foo.html.en</SAMP>).
568 <H2>Note on Caching</H2>
571 When a cache stores a representation, it associates it with the request URL.
572 The next time that URL is requested, the cache can use the stored
573 representation. But, if the resource is negotiable at the server,
574 this might result in only the first requested variant being cached and
575 subsequent cache hits might return the wrong response. To prevent this,
576 Apache normally marks all responses that are returned after content negotiation
577 as non-cacheable by HTTP/1.0 clients. Apache also supports the HTTP/1.1
578 protocol features to allow caching of negotiated responses. <P>
580 For requests which come from a HTTP/1.0 compliant client (either a
581 browser or a cache), the directive <TT>CacheNegotiatedDocs</TT> can be
582 used to allow caching of responses which were subject to negotiation.
583 This directive can be given in the server config or virtual host, and
584 takes no arguments. It has no effect on requests from HTTP/1.1 clients.
586 <!--#include virtual="footer.html" -->