From: Tom Lane Date: Wed, 5 Aug 2015 01:09:12 +0000 (-0400) Subject: Docs: add an explicit example about controlling overall greediness of REs. X-Git-Tag: REL9_6_BETA1~1552 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=1b5d34ca6244a9296215325a9f82fb805e739f9e;p=postgresql Docs: add an explicit example about controlling overall greediness of REs. Per discussion of bug #13538. --- diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index fd82ea4f4e..59121da536 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5203,10 +5203,37 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})'); The quantifiers {1,1} and {1,1}? can be used to force greediness or non-greediness, respectively, on a subexpression or a whole RE. + This is useful when you need the whole RE to have a greediness attribute + different from what's deduced from its elements. As an example, + suppose that we are trying to separate a string containing some digits + into the digits and the parts before and after them. We might try to + do that like this: + +SELECT regexp_matches('abc01234xyz', '(.*)(\d+)(.*)'); +Result: {abc0123,4,xyz} + + That didn't work: the first .* is greedy so + it eats as much as it can, leaving the \d+ to + match at the last possible place, the last digit. We might try to fix + that by making it non-greedy: + +SELECT regexp_matches('abc01234xyz', '(.*?)(\d+)(.*)'); +Result: {abc,0,""} + + That didn't work either, because now the RE as a whole is non-greedy + and so it ends the overall match as soon as possible. We can get what + we want by forcing the RE as a whole to be greedy: + +SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); +Result: {abc,01234,xyz} + + Controlling the RE's overall greediness separately from its components' + greediness allows great flexibility in handling variable-length patterns. - Match lengths are measured in characters, not collating elements. + When deciding what is a longer or shorter match, + match lengths are measured in characters, not collating elements. An empty string is considered longer than no match at all. For example: bb*