Introduce StrictURI|UnsafeURI for RFC3986 enforcement

author William A. Rowe Jr <wrowe@apache.org>

Fri, 19 Aug 2016 19:48:58 +0000 (19:48 +0000)

committer William A. Rowe Jr <wrowe@apache.org>

Fri, 19 Aug 2016 19:48:58 +0000 (19:48 +0000)
author William A. Rowe Jr <wrowe@apache.org>
Fri, 19 Aug 2016 19:48:58 +0000 (19:48 +0000)
committer William A. Rowe Jr <wrowe@apache.org>
Fri, 19 Aug 2016 19:48:58 +0000 (19:48 +0000)
diff --git a/docs/log-message-tags/next-number b/docs/log-message-tags/next-number

index 94b82247fd1ae8fa9bdf945474ee4ef7705a28c9..bbd257fd6f0dc5ced0f2f680291a549049fc260c 100644 (file)
--- a/docs/log-message-tags/next-number
+++ b/docs/log-message-tags/next-number
@@ -1 +1 @@
-3454
+3455
diff --git a/docs/manual/mod/core.xml b/docs/manual/mod/core.xml

index 67c86f14998abb8d1e7345c1ee6538f8613a574e..5be625a447bd8b55777f659237e42df288041b21 100644 (file)
--- a/docs/manual/mod/core.xml
+++ b/docs/manual/mod/core.xml
@@ -1252,10 +1252,11 @@ EnableSendfile On
  <directivesynopsis>
  <name>HttpProtocolOptions</name>
  <description>Modify restrictions on HTTP Request Messages</description>
-<syntax>HttpProtocolOptions [Strict|Unsafe] [Allow0.9|Require1.0] 
-[StrictWhitespace|LenientWhitespace] [RegisteredMethods|LenientMethods]</syntax>
-<default>HttpProtocolOptions Strict Allow0.9 LenientWhitespace 
-LenientMethods</default>
+<syntax>HttpProtocolOptions [Strict|Unsafe] [StrictURL|UnsafeURL]
+ [StrictWhitespace|LenientWhitespace] [RegisteredMethods|LenientMethods]
+ [Allow0.9|Require1.0]</syntax>
+<default>HttpProtocolOptions Strict StrictURL LenientWhitespace 
+LenientMethods Allow0.9</default>
  <contextlist><context>server config</context>
  <context>virtual host</context></contextlist>
  <compatibility>2.2.32 or 2.4.24 and later</compatibility>
@@ -1267,11 +1268,11 @@ LenientMethods</default>
      (<a href="https://tools.ietf.org/html/rfc7230#section-3.2"
        >RFC 7230 &sect;3.2</a>), which are now applied by default or using
      the <code>Strict</code> option. Due to legacy modules, applications or
-    custom user-agents which must be deperecated, an <code>Unsafe</code>
-    option has been added to revert to the legacy behavior. These rules are
-    applied prior to request processing, so must be configured at the global
-    or default (first) matching virtual host section, by interface and not
-    by name, to be honored.</p>
+    custom user-agents which must be deperecated, <code>Unsafe</code>
+    and <code>UnsafeURL</code> options have been added to revert to the legacy
+    behaviors. These rules are applied prior to request processing, so must be
+    configured at the global or default (first) matching virtual host section,
+    by IP/port interface and not by name, to be honored.</p>
  
      <p>Prior to the introduction of this directive, the Apache HTTP Server
      request message parsers were tolerant of a number of forms of input
@@ -1284,25 +1285,27 @@ LenientMethods</default>
      of this directive, all grammer rules of the specification are enforced in
      the default <code>Strict</code> operating mode.</p>
  
+    <p><a href="https://tools.ietf.org/html/rfc3986#section-2.2"
+         >RFC 7230 &sect;2.2 and 2.3</a> define "Reserved Characters" and
+    "Unreserved Characters". All other character octets are required to
+    be %XX encoded under this spec, and RFC7230 defers to these requirements.
+    By default the <code>StrictURI</code> option will reject all requests 
+    containing invalid characters. This rule can be relaxed with the
+    <code>UnsafeURI</code> option to support badly written user-agents.</p>
+    
      <p>Users are strongly cautioned against toggling the <code>Unsafe</code>
-    mode of operation for these reasons, most especially on outward-facing,
-    publicly accessible server deployments. Reviewing the messages within the
-    <directive>ErrorLog</directive>, configured with
-    <directive>LogLevel</directive> <code>info</code> level or below,
+    and <code>UnsafeURI</code> modes of operation, most especially on 
+    outward-facing, publicly accessible server deployments. If an interface
+    is required of faulty monitoring or other custom software running only
+    on an intranet, users should consider toggling these only for a specific
+    virtual host configured on their private subnet.</p>
+
+    <p>Reviewing the messages logged to the <directive>ErrorLog</directive>,
+    configured with <directive>LogLevel</directive> <code>info</code> level,
      can help identify such faulty requests along with their origin.
      Users should pay particular attention to any 400 responses in the access
      log for indiciations that valid requests are unexpectedly rejected.</p>
  
-    <p><a href="https://tools.ietf.org/html/rfc2616#section-19.6"
-         >RFC 2616 &sect;19.6</a> "Compatibility With Previous Versions" had
-    encouraged HTTP servers to support legacy HTTP/0.9 requests. RFC 7230
-    superceeds this with "The expectation to support HTTP/0.9 requests has
-    been removed" and offers additional comments in 
-    <a href="https://tools.ietf.org/html/rfc7230#appendix-A"
-      >RFC 2616 Appendix A</a>. The <code>Require1.0</code> option allows
-    the user to remove support of the default <code>Allow0.9</code> option's
-    behavior.</p>
-
      <p><a href="https://tools.ietf.org/html/rfc7230#section-3.5"
           >RFC 7230 &sect;3.5</a> "Message Parsing Robustness" permits, and
      identifies potential risks of parsing messages containing non-space
@@ -1321,12 +1324,22 @@ LenientMethods</default>
      origin servers shall respond with an error when an unsupported method
      is encountered in the request line. This already happens when the
      <code>LenientMethods</code> option is used, but administrators may wish
-    to toggle the <code>RegisteredMethods</code> option and register all
-    permitted method tokens using the <directive>RegisterHttpMethod</directive>
+    to toggle the <code>RegisteredMethods</code> option and register any
+    non-standard methods using the <directive>RegisterHttpMethod</directive>
      directive, particularly if the <code>Unsafe</code> option has been toggled.
      The <code>RegisteredMethods</code> option should <strong>not</strong>
      be toggled for forward proxy hosts, as the methods supported by the
      origin servers are unknown to the proxy server.</p>
+
+    <p><a href="https://tools.ietf.org/html/rfc2616#section-19.6"
+         >RFC 2616 &sect;19.6</a> "Compatibility With Previous Versions" had
+    encouraged HTTP servers to support legacy HTTP/0.9 requests. RFC 7230
+    superceeds this with "The expectation to support HTTP/0.9 requests has
+    been removed" and offers additional comments in 
+    <a href="https://tools.ietf.org/html/rfc7230#appendix-A"
+      >RFC 2616 Appendix A</a>. The <code>Require1.0</code> option allows
+    the user to remove support of the default <code>Allow0.9</code> option's
+    behavior.</p>
  </usage>
  </directivesynopsis>
  
diff --git a/include/http_core.h b/include/http_core.h

index 1f3207e820ab4db508df2cb43806e1880c370b3c..7bbdde51dbaa65b052a080005fd0125be1246c8c 100644 (file)
--- a/include/http_core.h
+++ b/include/http_core.h
@@ -749,6 +749,11 @@ typedef struct {
  #define AP_HTTP_METHODS_REGISTERED    2
      char http_methods;
  
+#define AP_HTTP_URI_UNSET             0
+#define AP_HTTP_URI_UNSAFE            1
+#define AP_HTTP_URI_STRICT            2
+    char http_stricturi;
+
  #define AP_HTTP_CL_HEAD_ZERO_UNSET    0
  #define AP_HTTP_CL_HEAD_ZERO_ENABLE   1
  #define AP_HTTP_CL_HEAD_ZERO_DISABLE  2
diff --git a/include/httpd.h b/include/httpd.h

index b4f5ea7626485cf91bc3b4dcbbc3e27c1c622c04..4defe1b51328a6a4a1aad9baeacfa8556b166d85 100644 (file)
--- a/include/httpd.h
+++ b/include/httpd.h
@@ -1647,6 +1647,13 @@ AP_DECLARE(const char *) ap_scan_http_field_content(const char *ptr);
   */
  AP_DECLARE(const char *) ap_scan_http_token(const char *ptr);
  
+/* Scan a string for valid URI characters per RFC3986, and 
+ * return a pointer to the first non-URI character encountered.
+ * @param ptr The string to scan
+ * @return A pointer to the first non-token character.
+ */
+AP_DECLARE(const char *) ap_scan_http_uri_safe(const char *ptr);
+
  /* Retrieve a token, advancing the pointer to the first non-token character
   * and returning a copy of the token string.
   * @param ptr The string to scan. On return, this points to the first non-token
diff --git a/server/core.c b/server/core.c

index 943474ff20663ceffa0bc5468b36a1da922f1cf9..0af04c461b9d905a223ab32e3707969ca403c138 100644 (file)
--- a/server/core.c
+++ b/server/core.c
@@ -533,6 +533,9 @@ static void *merge_core_server_configs(apr_pool_t *p, void *basev, void *virtv)
      if (virt->http_conformance != AP_HTTP_CONFORMANCE_UNSET)
          conf->http_conformance = virt->http_conformance;
  
+    if (virt->http_stricturi != AP_HTTP_URI_UNSET)
+        conf->http_stricturi = virt->http_stricturi;
+
      if (virt->http_whitespace != AP_HTTP_WHITESPACE_UNSET)
          conf->http_whitespace = virt->http_whitespace;
  
@@ -4031,6 +4034,10 @@ static const char *set_http_protocol_options(cmd_parms *cmd, void *dummy,
          conf->http_conformance |= AP_HTTP_CONFORMANCE_STRICT;
      else if (strcasecmp(arg, "unsafe") == 0)
          conf->http_conformance |= AP_HTTP_CONFORMANCE_UNSAFE;
+    else if (strcasecmp(arg, "stricturi") == 0)
+        conf->http_stricturi |= AP_HTTP_URI_STRICT;
+    else if (strcasecmp(arg, "unsafeuri") == 0)
+        conf->http_stricturi |= AP_HTTP_URI_UNSAFE;
      else if (strcasecmp(arg, "strictwhitespace") == 0)
          conf->http_whitespace |= AP_HTTP_WHITESPACE_STRICT;
      else if (strcasecmp(arg, "lenientwhitespace") == 0)
@@ -4040,8 +4047,10 @@ static const char *set_http_protocol_options(cmd_parms *cmd, void *dummy,
      else if (strcasecmp(arg, "lenientmethods") == 0)
          conf->http_methods |= AP_HTTP_METHODS_LENIENT;
      else
-        return "HttpProtocolOptions accepts 'Allow0.9' (default) or "
-               "'Require1.0', 'Unsafe' or 'Strict' (default), "
+        return "HttpProtocolOptions accepts " 
+               "'Unsafe' or 'Strict' (default), "
+               "'UnsafeURI' or 'StrictURI' (default), "
+               "'Require1.0' or 'Allow0.9' (default), "
                 "'StrictWhitespace' or 'LenientWhitespace (default), and "
                 "'RegisteredMethods' or 'LenientMethods (default)";
  
@@ -4050,6 +4059,11 @@ static const char *set_http_protocol_options(cmd_parms *cmd, void *dummy,
          return "HttpProtocolOptions 'Allow0.9' and 'Require1.0'"
                 " are mutually exclusive";
  
+    if ((conf->http_stricturi & AP_HTTP_URI_STRICT)
+            && (conf->http_stricturi & AP_HTTP_URI_UNSAFE))
+        return "HttpProtocolOptions 'StrictURI' and 'UnsafeURI'"
+               " are mutually exclusive";
+
      if ((conf->http_conformance & AP_HTTP_CONFORMANCE_STRICT)
              && (conf->http_conformance & AP_HTTP_CONFORMANCE_UNSAFE))
          return "HttpProtocolOptions 'Strict' and 'Unsafe'"
diff --git a/server/gen_test_char.c b/server/gen_test_char.c

index 046f47b51bbd7ac41c986381fc549eafa1fae6dc..ed9620fe40d6a3d7a11f091dff3066286ee46cdf 100644 (file)
--- a/server/gen_test_char.c
+++ b/server/gen_test_char.c
@@ -53,11 +53,12 @@
  #define T_ESCAPE_FORENSIC     (0x20)
  #define T_ESCAPE_URLENCODED   (0x40)
  #define T_HTTP_CTRLS          (0x80)
+#define T_URI_RFC3986        (0x100)
  
  int main(int argc, char *argv[])
  {
      unsigned c;
-    unsigned char flags;
+    unsigned short flags;
  
      printf("/* this file is automatically generated by gen_test_char, "
             "do not edit */\n"
@@ -69,8 +70,9 @@ int main(int argc, char *argv[])
             "#define T_ESCAPE_FORENSIC      (%u)\n"
             "#define T_ESCAPE_URLENCODED    (%u)\n"
             "#define T_HTTP_CTRLS           (%u)\n"
+           "#define T_URI_RFC3986          (%u)\n"
             "\n"
-           "static const unsigned char test_char_table[256] = {",
+           "static const unsigned short test_char_table[256] = {",
             T_ESCAPE_SHELL_CMD,
             T_ESCAPE_PATH_SEGMENT,
             T_OS_ESCAPE_PATH,
@@ -78,7 +80,8 @@ int main(int argc, char *argv[])
             T_ESCAPE_LOGITEM,
             T_ESCAPE_FORENSIC,
             T_ESCAPE_URLENCODED,
-           T_HTTP_CTRLS);
+           T_HTTP_CTRLS,
+           T_URI_RFC3986);
  
      for (c = 0; c < 256; ++c) {
          flags = 0;
@@ -122,7 +125,7 @@ int main(int argc, char *argv[])
           * and "tspecials" (RFC2068) a.k.a. "separators" (RFC2616), which
           * is easer to express as characters remaining in the ASCII token set
           */
-        if (!(apr_isalnum(c) || strchr("!#$%&'*+-.^_`|~", c))) {
+        if (!c || !(apr_isalnum(c) || strchr("!#$%&'*+-.^_`|~", c))) {
              flags |= T_HTTP_TOKEN_STOP;
          }
  
@@ -136,6 +139,16 @@ int main(int argc, char *argv[])
              flags |= T_HTTP_CTRLS;
          }
  
+        /* From RFC3986, the specific sets of gen-delims, sub-delims (2.2),
+         * and unreserved (2.3) that are possible somewhere within a URI.
+         * Spec requires all others to be %XX encoded, including obs-text.
+         */
+        if (c && strchr(":/?#[]@"                       /* gen-delims */ 
+                        "!$&'()*+,;="                   /* sub-delims */
+                        "-._~", c) || apr_isalnum(c)) { /* unreserved */
+            flags |= T_URI_RFC3986;
+        }
+
          /* For logging, escape all control characters,
           * double quotes (because they delimit the request in the log file)
           * backslashes (because we use backslash for escaping)
@@ -153,7 +166,7 @@ int main(int argc, char *argv[])
              flags |= T_ESCAPE_FORENSIC;
          }
  
-        printf("0x%02x%c", flags, (c < 255) ? ',' : ' ');
+        printf("0x%03x%c", flags, (c < 255) ? ',' : ' ');
      }
  
      printf("\n};\n");
diff --git a/server/protocol.c b/server/protocol.c

index 094dc88a40c4c815e8a9f448d3d07721b74dddee..e642db118d4c0aa953f7aac9d3d073cbf1e3780b 100644 (file)
--- a/server/protocol.c
+++ b/server/protocol.c
@@ -573,7 +573,7 @@ static int read_request_line(request_rec *r, apr_bucket_brigade *bb)
  {
      enum {
          rrl_none, rrl_badmethod, rrl_badwhitespace, rrl_excesswhitespace,
-        rrl_missinguri, rrl_badprotocol, rrl_trailingtext
+        rrl_missinguri, rrl_baduri, rrl_badprotocol, rrl_trailingtext
      } deferred_error = rrl_none;
      char *ll;
      char *uri;
@@ -583,6 +583,7 @@ static int read_request_line(request_rec *r, apr_bucket_brigade *bb)
      int strict = (conf->http_conformance != AP_HTTP_CONFORMANCE_UNSAFE);
      const char *badwhitespace = strict ? "\t\n\v\f\r" : "\n\v\f\r";
      int strictspaces = (conf->http_whitespace == AP_HTTP_WHITESPACE_STRICT);
+    int stricturi = (conf->http_stricturi != AP_HTTP_URI_UNSAFE);
  
      /* Read past empty lines until we get a real request line,
       * a read error, the connection closes (EOF), or we timeout.
@@ -674,10 +675,20 @@ static int read_request_line(request_rec *r, apr_bucket_brigade *bb)
      if (!*uri && deferred_error == rrl_none)
          deferred_error = rrl_missinguri;
  
-    if (!(ll = strpbrk(uri, " \t\n\v\f\r"))) {
-        r->protocol = "";
-        len = 0;
-        goto rrl_done;
+    if (stricturi) {
+        ll = (char*) ap_scan_http_uri_safe(uri);
+        if (ll == uri || (*ll && !apr_isspace(*ll))) {
+            deferred_error = rrl_baduri;
+            ll = strpbrk(ll, "\t\n\v\f\r ");
+        }
+    }
+    else {
+        ll = strpbrk(uri, "\t\n\v\f\r ");
+    }
+    if (!ll) {
+            r->protocol = "";
+            len = 0;
+            goto rrl_done;
      }
      for (r->protocol = ll; apr_isspace(*r->protocol); ++r->protocol) 
          if (ap_strchr_c(badwhitespace, *r->protocol) && deferred_error == rrl_none)
@@ -784,6 +795,10 @@ rrl_done:
          else if (deferred_error == rrl_missinguri)
              ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, APLOGNO(03446)
                            "HTTP Request Line; Missing URI");
+        else if (deferred_error == rrl_baduri)
+            ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, APLOGNO(03454)
+                          "HTTP Request Line; URI incorrectly encoded: '%.*s'",
+                          field_name_len(r->method), r->method);
          else if (deferred_error == rrl_badwhitespace)
              ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, APLOGNO(03447)
                            "HTTP Request Line; Invalid whitespace");
@@ -824,7 +839,8 @@ rrl_done:
      }
  
      if (strict) {
-        if (ap_has_cntrl(r->the_request)) {
+        /* No sense re-testing here for what was evaulated above */
+        if (!stricturi && ap_has_cntrl(r->the_request)) {
              ap_log_rerror(APLOG_MARK, APLOG_INFO, 0, r, APLOGNO(02420)
                            "HTTP Request Line; URI must not contain control"
                            " characters");
diff --git a/server/util.c b/server/util.c

index 9fb31e4562b036c2036eae631faaf4f99a2fe4e6..c6533cd9b5c76914187624f739ed10c555c4a767 100644 (file)
--- a/server/util.c
+++ b/server/util.c
@@ -1634,6 +1634,16 @@ AP_DECLARE(char *) ap_get_http_token(apr_pool_t *p, const char **ptr)
      return tok;
  }
  
+/* Scan a string for valid URI characters per RFC3986, and 
+ * return a pointer to the first non-URI character encountered.
+ */
+AP_DECLARE(const char *) ap_scan_http_uri_safe(const char *ptr)
+{
+    for ( ; TEST_CHAR(*ptr, T_URI_RFC3986); ++ptr) ;
+
+    return ptr;
+}
+
  /* Retrieve a token, spacing over it and returning a pointer to
   * the first non-white byte afterwards.  Note that these tokens
   * are delimited by semis and commas; and can also be delimited
author	William A. Rowe Jr <wrowe@apache.org>
	Fri, 19 Aug 2016 19:48:58 +0000 (19:48 +0000)
committer	William A. Rowe Jr <wrowe@apache.org>
	Fri, 19 Aug 2016 19:48:58 +0000 (19:48 +0000)
docs/log-message-tags/next-number		patch \| blob \| history
docs/manual/mod/core.xml		patch \| blob \| history
include/http_core.h		patch \| blob \| history
include/httpd.h		patch \| blob \| history
server/core.c		patch \| blob \| history
server/gen_test_char.c		patch \| blob \| history
server/protocol.c		patch \| blob \| history
server/util.c		patch \| blob \| history