]> granicus.if.org Git - postgresql/commitdiff
Fix detection of unfinished Unicode surrogate pair at end of string.
authorTom Lane <tgl@sss.pgh.pa.us>
Wed, 21 Dec 2016 22:39:32 +0000 (17:39 -0500)
committerTom Lane <tgl@sss.pgh.pa.us>
Wed, 21 Dec 2016 22:39:32 +0000 (17:39 -0500)
The U&'...' and U&"..." syntaxes silently discarded a surrogate pair
start (that is, a code between U+D800 and U+DBFF) if it occurred at
the very end of the string.  This seems like an obvious oversight,
since we throw an error for every other invalid combination of surrogate
characters, including the very same situation in E'...' syntax.

This has been wrong since the pair processing was added (in 9.0),
so back-patch to all supported branches.

Discussion: https://postgr.es/m/19113.1482337898@sss.pgh.pa.us

src/backend/parser/scan.l

index 998349d7421a49bd692f7c8a206525055da52eab..acd92698057928aeccfd68b4461c35981b31f0a0 100644 (file)
@@ -1435,6 +1435,13 @@ litbuf_udeescape(unsigned char escape, core_yyscan_t yyscanner)
                }
        }
 
+       /* unfinished surrogate pair? */
+       if (pair_first)
+       {
+               ADVANCE_YYLLOC(in - litbuf + 3);                                /* 3 for U&" */
+               yyerror("invalid Unicode surrogate pair");
+       }
+
        *out = '\0';
 
        /*