From: Andy Heninger Date: Sat, 26 Aug 2017 00:44:28 +0000 (+0000) Subject: ICU-13318 RBBITest, remove obsolete tests, move remaining test data to rbbitst.txt X-Git-Tag: release-60-rc~172 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=bc779765285779f65494d51b488d0747887676e6;p=icu ICU-13318 RBBITest, remove obsolete tests, move remaining test data to rbbitst.txt X-SVN-Rev: 40356 --- diff --git a/icu4c/source/test/testdata/rbbitst.txt b/icu4c/source/test/testdata/rbbitst.txt index f07107bdfb0..0757bdf7dbc 100644 --- a/icu4c/source/test/testdata/rbbitst.txt +++ b/icu4c/source/test/testdata/rbbitst.txt @@ -14,6 +14,7 @@ # any following data is for sentence break testing # any following data is for line break testing # any following data is for char break testing +# any following data is for title break testing # <rules> rules ... </rules> following data is tested against these rules. # Applies until a following occurence of <word>, <sent>, etc. or another <rules> # <locale locale_name> Switch to the named locale at the next occurence of <word>, <sent>, etc. @@ -148,6 +149,9 @@ # Treat Japanese Half Width voicing marks as combining <data>•A\uff9e•B\uff9f\uff9e\uff9f•C•</data> +# Test data originally from Java BreakIteratorTest.TestCharcterBreak() +<data>•S\u0300•i\u0317•m•p•l•e\u0301• •s•a\u0302•m•p•l•e\u0303•.•w•a\u0302•w•a•f•q•\n•\r•\r\n•\n•</data> + ######################################################################################## # # @@ -446,9 +450,12 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal <data>•No breaks when . is followed by a lower, with possible intervening punct .,a .$a .)a. •</data> # -# Sentence Breaks: no break at the boundary between CJK and other letters +# Sentence Breaks: no break at the boundary between CJK and other letters. TestBug4111338 # -<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048•He said, "I can go there."\u2029•Bye, now.•</data> +<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029\ +•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002\ +•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048\ +•He said, "I can go there."\u2029•Bye, now.•</data> # # Treat fullwidth variants of .!? the same as their @@ -499,22 +506,28 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal # test for bug #4152416: Make sure sentences ending with a capital # letter are treated correctly # -<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM. •Calls to xxx will return an implementor of this interface. \u2029•</data> +<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM. •\ +Calls to xxx will return an implementor of this interface. \u2029•</data> # test for bug #4152117: Make sure sentence breaking is handling # punctuation correctly [COULD NOT REPRODUCE THIS BUG, BUT TEST IS # HERE TO MAKE SURE IT DOESN'T CROP UP] # -<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive. •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>. •Note that this constructor always constructs a non-negative biginteger. \n•Ahh abc. -•</data> +<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to\ + \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive. \ + •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>. \ + •Note that this constructor always constructs a non-negative biginteger. \n•Ahh abc.•</data> # sentence breaks for hindi which used Devanagari script # make sure there is sentence break after ?,danda(hindi phrase separator), # fullstop followed by space. (VERY old test) # -<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\ +<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?\ +•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\ \u0915\u0948\u0938\u0947 \u0939\u0948?•\u0935\u0939 \u0915\u094d\u200d\u092f\u093e\n\ -<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". •"\u092a\u095d\u093e\u0908" meaning "education" or "studies". •\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data> +<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". \ +•"\u092a\u095d\u093e\u0908" meaning "education" or "studies". \ +•\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data> # Regression test for bug #1984, Sentence break in Arabic text. @@ -685,6 +698,12 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal # <data>•\uc0c1•\ud56d •\ud55c•\uc778 •\uc5f0•\ud569 •\uc7a5•\ub85c•\uad50•\ud68c•</data> +# Bug 4450804 estLineBreakContractions +# +<line> +<data>•These •are •'foobles'. •Don't •you •like •them?•</data> + + # conjoining jamo... <data>•\u1109\u1161\u11bc•\u1112\u1161\u11bc •\u1112\u1161\u11ab•\u110b\u1175\u11ab •\u110b\u1167\u11ab•\u1112\u1161\u11b8 •\u110c\u1161\u11bc•\u1105\u1169•\u1100\u116d•\u1112\u116c•</data> @@ -711,6 +730,10 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal # <data>•abc\ud801xyz•</data> +# a character sequence such as "X11" or "30F3" or "native2ascii" should +# be kept together as a single word. +<data>•X11 •30F3 •native2ascii•</data> + # # Regression tests for failures that originally came from the monkey test. # Monkey test failure lines can, with slight reformatting, be copied into this section @@ -732,6 +755,14 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal <line> <data>•R$ •JP¥ •a9 •3a •H% •CA$ •Travi$ •Scott •Ke$ha •Curren$y •A$AP •Rocky•</data> +# Test Bug 4146175 Lines +# the fullwidth comma should stick to the preceding Japanese character +<line> +<data>•\u7d42\uff0c•\u308f•</data> + +# Empty String +<line> +<data>•</data> ######################################################################################## diff --git a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/BreakIteratorTest.java b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/BreakIteratorTest.java index ad237f824b9..3e497ecf23b 100644 --- a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/BreakIteratorTest.java +++ b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/BreakIteratorTest.java @@ -48,49 +48,6 @@ public class BreakIteratorTest extends TestFmwk // general test subroutines //========================================================================= - private void generalIteratorTest(BreakIterator bi, List<String> expectedResult) { - StringBuffer buffer = new StringBuffer(); - String text; - for (int i = 0; i < expectedResult.size(); i++) { - text = expectedResult.get(i); - buffer.append(text); - } - text = buffer.toString(); - - bi.setText(text); - - List<String> nextResults = _testFirstAndNext(bi, text); - List<String> previousResults = _testLastAndPrevious(bi, text); - - logln("comparing forward and backward..."); - // TODO(#13318): As part of clean-up, permanently remove the error count check. - //int errs = getErrorCount(); - compareFragmentLists("forward iteration", "backward iteration", nextResults, - previousResults); - //if (getErrorCount() == errs) { - logln("comparing expected and actual..."); - compareFragmentLists("expected result", "actual result", expectedResult, - nextResults); - logln("comparing expected and actual..."); - compareFragmentLists("expected result", "actual result", expectedResult, - nextResults); - //} - - int[] boundaries = new int[expectedResult.size() + 3]; - boundaries[0] = BreakIterator.DONE; - boundaries[1] = 0; - for (int i = 0; i < expectedResult.size(); i++) - boundaries[i + 2] = boundaries[i + 1] + (expectedResult.get(i)). - length(); - boundaries[boundaries.length - 1] = BreakIterator.DONE; - - _testFollowing(bi, text, boundaries); - _testPreceding(bi, text, boundaries); - _testIsBoundary(bi, text, boundaries); - - doMultipleSelectionTest(bi, text); - } - private List<String> _testFirstAndNext(BreakIterator bi, String text) { int p = bi.first(); int lastP = p; @@ -247,46 +204,6 @@ public class BreakIteratorTest extends TestFmwk } } - private void doMultipleSelectionTest(BreakIterator iterator, String testText) - { - logln("Multiple selection test..."); - BreakIterator testIterator = (BreakIterator)iterator.clone(); - int offset = iterator.first(); - int testOffset; - int count = 0; - - do { - testOffset = testIterator.first(); - testOffset = testIterator.next(count); - logln("next(" + count + ") -> " + testOffset); - if (offset != testOffset) - errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset); - - if (offset != BreakIterator.DONE) { - count++; - offset = iterator.next(); - } - } while (offset != BreakIterator.DONE); - - // now do it backwards... - offset = iterator.last(); - count = 0; - - do { - testOffset = testIterator.last(); - testOffset = testIterator.next(count); - logln("next(" + count + ") -> " + testOffset); - if (offset != testOffset) - errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset); - - if (offset != BreakIterator.DONE) { - count--; - offset = iterator.previous(); - } - } while (offset != BreakIterator.DONE); - } - - private void doOtherInvariantTest(BreakIterator tb, String testChars) { StringBuffer work = new StringBuffer("a\r\na"); @@ -361,344 +278,6 @@ public class BreakIteratorTest extends TestFmwk //========================================================================= - /** - * @bug 4097779 - */ - @Test - public void TestBug4097779() { - List<String> wordSelectionData = new ArrayList<String>(2); - - wordSelectionData.add("aa\u0300a"); - wordSelectionData.add(" "); - - generalIteratorTest(wordBreak, wordSelectionData); - } - - /** - * @bug 4098467 - */ - @Test - public void TestBug4098467Words() { - List<String> wordSelectionData = new ArrayList<String>(); - - // What follows is a string of Korean characters (I found it in the Yellow Pages - // ad for the Korean Presbyterian Church of San Francisco, and I hope I transcribed - // it correctly), first as precomposed syllables, and then as conjoining jamo. - // Both sequences should be semantically identical and break the same way. - // precomposed syllables... - wordSelectionData.add("\uc0c1\ud56d"); - wordSelectionData.add(" "); - wordSelectionData.add("\ud55c\uc778"); - wordSelectionData.add(" "); - wordSelectionData.add("\uc5f0\ud569"); - wordSelectionData.add(" "); - wordSelectionData.add("\uc7a5\ub85c\uad50\ud68c"); - wordSelectionData.add(" "); - // conjoining jamo... - wordSelectionData.add("\u1109\u1161\u11bc\u1112\u1161\u11bc"); - wordSelectionData.add(" "); - wordSelectionData.add("\u1112\u1161\u11ab\u110b\u1175\u11ab"); - wordSelectionData.add(" "); - wordSelectionData.add("\u110b\u1167\u11ab\u1112\u1161\u11b8"); - wordSelectionData.add(" "); - wordSelectionData.add("\u110c\u1161\u11bc\u1105\u1169\u1100\u116d\u1112\u116c"); - wordSelectionData.add(" "); - - generalIteratorTest(wordBreak, wordSelectionData); - } - - - /** - * @bug 4111338 - */ - @Test - public void TestBug4111338() { - List<String> sentenceSelectionData = new ArrayList<String>(); - - // test for bug #4111338: Don't break sentences at the boundary between CJK - // and other letters - sentenceSelectionData.add("\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:\"JAVA\u821c" - + "\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba" - + "\u611d\u57b6\u2510\u5d46\".\u2029"); - sentenceSelectionData.add("\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8" - + "\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0" - + "\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2029"); - sentenceSelectionData.add("\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4" - + "\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8" - + "\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2029"); - sentenceSelectionData.add("He said, \"I can go there.\"\u2029"); - - generalIteratorTest(sentenceBreak, sentenceSelectionData); - } - - - /** - * @bug 4143071 - */ - @Test - public void TestBug4143071() { - List<String> sentenceSelectionData = new ArrayList<String>(3); - - // Make sure sentences that end with digits work right - sentenceSelectionData.add("Today is the 27th of May, 1998. "); - sentenceSelectionData.add("Tomorrow will be 28 May 1998. "); - sentenceSelectionData.add("The day after will be the 30th.\u2029"); - - generalIteratorTest(sentenceBreak, sentenceSelectionData); - } - - /** - * @bug 4152416 - */ - @Test - public void TestBug4152416() { - List<String> sentenceSelectionData = new ArrayList<String>(2); - - // Make sure sentences ending with a capital letter are treated correctly - sentenceSelectionData.add("The type of all primitive " - + "<code>boolean</code> values accessed in the target VM. "); - sentenceSelectionData.add("Calls to xxx will return an " - + "implementor of this interface.\u2029"); - - generalIteratorTest(sentenceBreak, sentenceSelectionData); - } - - /** - * @bug 4152117 - */ - @Test - public void TestBug4152117() { - List<String> sentenceSelectionData = new ArrayList<String>(3); - - // Make sure sentence breaking is handling punctuation correctly - // [COULD NOT REPRODUCE THIS BUG, BUT TEST IS HERE TO MAKE SURE - // IT DOESN'T CROP UP] - sentenceSelectionData.add("Constructs a randomly generated " - + "BigInteger, uniformly distributed over the range <tt>0</tt> " - + "to <tt>(2<sup>numBits</sup> - 1)</tt>, inclusive. "); - sentenceSelectionData.add("The uniformity of the distribution " - + "assumes that a fair source of random bits is provided in " - + "<tt>rnd</tt>. "); - sentenceSelectionData.add("Note that this constructor always " - + "constructs a non-negative BigInteger.\u2029"); - - generalIteratorTest(sentenceBreak, sentenceSelectionData); - } - - @Test - public void TestLineBreak() { - List<String> lineSelectionData = new ArrayList<String>(); - - lineSelectionData.add("Multi-"); - lineSelectionData.add("Level "); - lineSelectionData.add("example "); - lineSelectionData.add("of "); - lineSelectionData.add("a "); - lineSelectionData.add("semi-"); - lineSelectionData.add("idiotic "); - lineSelectionData.add("non-"); - lineSelectionData.add("sensical "); - lineSelectionData.add("(non-"); - lineSelectionData.add("important) "); - lineSelectionData.add("sentence. "); - - lineSelectionData.add("Hi "); - lineSelectionData.add("Hello "); - lineSelectionData.add("How\n"); - lineSelectionData.add("are\r"); - lineSelectionData.add("you\u2028"); - lineSelectionData.add("fine.\t"); - lineSelectionData.add("good. "); - - lineSelectionData.add("Now\r"); - lineSelectionData.add("is\n"); - lineSelectionData.add("the\r\n"); - lineSelectionData.add("time\n"); - lineSelectionData.add("\r"); - lineSelectionData.add("for\r"); - lineSelectionData.add("\r"); - lineSelectionData.add("all"); - - generalIteratorTest(lineBreak, lineSelectionData); - } - - /** - * @bug 4068133 - */ - @Test - public void TestBug4068133() { - List<String> lineSelectionData = new ArrayList<String>(9); - - lineSelectionData.add("\u96f6"); - lineSelectionData.add("\u4e00\u3002"); - lineSelectionData.add("\u4e8c\u3001"); - lineSelectionData.add("\u4e09\u3002\u3001"); - lineSelectionData.add("\u56db\u3001\u3002\u3001"); - lineSelectionData.add("\u4e94,"); - lineSelectionData.add("\u516d."); - lineSelectionData.add("\u4e03.\u3001,\u3002"); - lineSelectionData.add("\u516b"); - - generalIteratorTest(lineBreak, lineSelectionData); - } - - /** - * @bug 4086052 - */ - @Test - public void TestBug4086052() { - List<String> lineSelectionData = new ArrayList<String>(1); - - lineSelectionData.add("foo\u00a0bar "); -// lineSelectionData.addElement("foo\ufeffbar"); - - generalIteratorTest(lineBreak, lineSelectionData); - } - - /** - * @bug 4097920 - */ - @Test - public void TestBug4097920() { - List<String> lineSelectionData = new ArrayList<String>(3); - - lineSelectionData.add("dog,cat,mouse "); - lineSelectionData.add("(one)"); - lineSelectionData.add("(two)\n"); - generalIteratorTest(lineBreak, lineSelectionData); - } - - - - /** - * @bug 4117554 - */ - @Test - public void TestBug4117554Lines() { - List<String> lineSelectionData = new ArrayList<String>(3); - - // Fullwidth .!? should be treated as postJwrd - lineSelectionData.add("\u4e01\uff0e"); - lineSelectionData.add("\u4e02\uff01"); - lineSelectionData.add("\u4e03\uff1f"); - - generalIteratorTest(lineBreak, lineSelectionData); - } - - @Test - public void TestLettersAndDigits() { - // a character sequence such as "X11" or "30F3" or "native2ascii" should - // be kept together as a single word - List<String> lineSelectionData = new ArrayList<String>(3); - - lineSelectionData.add("X11 "); - lineSelectionData.add("30F3 "); - lineSelectionData.add("native2ascii"); - - generalIteratorTest(lineBreak, lineSelectionData); - } - - - private static final String graveS = "S\u0300"; - private static final String acuteBelowI = "i\u0317"; - private static final String acuteE = "e\u0301"; - private static final String circumflexA = "a\u0302"; - private static final String tildeE = "e\u0303"; - - @Test - public void TestCharacterBreak() { - List<String> characterSelectionData = new ArrayList<String>(); - - characterSelectionData.add(graveS); - characterSelectionData.add(acuteBelowI); - characterSelectionData.add("m"); - characterSelectionData.add("p"); - characterSelectionData.add("l"); - characterSelectionData.add(acuteE); - characterSelectionData.add(" "); - characterSelectionData.add("s"); - characterSelectionData.add(circumflexA); - characterSelectionData.add("m"); - characterSelectionData.add("p"); - characterSelectionData.add("l"); - characterSelectionData.add(tildeE); - characterSelectionData.add("."); - characterSelectionData.add("w"); - characterSelectionData.add(circumflexA); - characterSelectionData.add("w"); - characterSelectionData.add("a"); - characterSelectionData.add("f"); - characterSelectionData.add("q"); - characterSelectionData.add("\n"); - characterSelectionData.add("\r"); - characterSelectionData.add("\r\n"); - characterSelectionData.add("\n"); - - generalIteratorTest(characterBreak, characterSelectionData); - } - - /** - * @bug 4098467 - */ - @Test - public void TestBug4098467Characters() { - List<String> characterSelectionData = new ArrayList<String>(); - - // What follows is a string of Korean characters (I found it in the Yellow Pages - // ad for the Korean Presbyterian Church of San Francisco, and I hope I transcribed - // it correctly), first as precomposed syllables, and then as conjoining jamo. - // Both sequences should be semantically identical and break the same way. - // precomposed syllables... - characterSelectionData.add("\uc0c1"); - characterSelectionData.add("\ud56d"); - characterSelectionData.add(" "); - characterSelectionData.add("\ud55c"); - characterSelectionData.add("\uc778"); - characterSelectionData.add(" "); - characterSelectionData.add("\uc5f0"); - characterSelectionData.add("\ud569"); - characterSelectionData.add(" "); - characterSelectionData.add("\uc7a5"); - characterSelectionData.add("\ub85c"); - characterSelectionData.add("\uad50"); - characterSelectionData.add("\ud68c"); - characterSelectionData.add(" "); - // conjoining jamo... - characterSelectionData.add("\u1109\u1161\u11bc"); - characterSelectionData.add("\u1112\u1161\u11bc"); - characterSelectionData.add(" "); - characterSelectionData.add("\u1112\u1161\u11ab"); - characterSelectionData.add("\u110b\u1175\u11ab"); - characterSelectionData.add(" "); - characterSelectionData.add("\u110b\u1167\u11ab"); - characterSelectionData.add("\u1112\u1161\u11b8"); - characterSelectionData.add(" "); - characterSelectionData.add("\u110c\u1161\u11bc"); - characterSelectionData.add("\u1105\u1169"); - characterSelectionData.add("\u1100\u116d"); - characterSelectionData.add("\u1112\u116c"); - - generalIteratorTest(characterBreak, characterSelectionData); - } - - @Test - public void TestTitleBreak() - { - List<String> titleData = new ArrayList<String>(); - titleData.add(" "); - titleData.add("This "); - titleData.add("is "); - titleData.add("a "); - titleData.add("simple "); - titleData.add("sample "); - titleData.add("sentence. "); - titleData.add("This "); - - generalIteratorTest(titleBreak, titleData); - } - - - /* * @bug 4153072 */ @@ -728,17 +307,6 @@ public class BreakIteratorTest extends TestFmwk } - @Test - public void TestBug4146175Lines() { - List<String> lineSelectionData = new ArrayList<String>(2); - - // the fullwidth comma should stick to the preceding Japanese character - lineSelectionData.add("\u7d42\uff0c"); - lineSelectionData.add("\u308f"); - - generalIteratorTest(lineBreak, lineSelectionData); - } - private static final String cannedTestChars = "\u0000\u0001\u0002\u0003\u0004 !\"#$%&()+-01234<=>ABCDE[]^_`abcde{}|\u00a0\u00a2" + "\u00a3\u00a4\u00a5\u00a6\u00a7\u00a8\u00a9\u00ab\u00ad\u00ae\u00af\u00b0\u00b2\u00b3" @@ -754,16 +322,6 @@ public class BreakIteratorTest extends TestFmwk doOtherInvariantTest(e, cannedTestChars + ".,\u3001\u3002\u3041\u3042\u3043\ufeff"); } - @Test - public void TestEmptyString() - { - String text = ""; - List<String> x = new ArrayList<String>(1); - x.add(text); - - generalIteratorTest(lineBreak, x); - } - @Test public void TestGetAvailableLocales() { @@ -838,23 +396,6 @@ public class BreakIteratorTest extends TestFmwk } } - - /** - * Bug 4450804 - */ - @Test - public void TestLineBreakContractions() { - List<String> expected = new ArrayList<String>(7); - expected.add("These "); - expected.add("are "); - expected.add("'foobles'. "); - expected.add("Don't "); - expected.add("you "); - expected.add("like "); - expected.add("them?"); - generalIteratorTest(lineBreak, expected); - } - /** * Ticket#5615 */ diff --git a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java index bcc105bb969..b3f6f45cf3d 100644 --- a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java +++ b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java @@ -31,495 +31,7 @@ public class RBBITest extends TestFmwk { public RBBITest() { } - private static final String halfNA = "\u0928\u094d\u200d"; /* - * halfform NA = devanigiri NA + virama(supresses - * inherent vowel)+ zero width joiner - */ - - // tests default rules based character iteration. - // Builds a new iterator from the source rules in the default (prebuilt) iterator. - // - @Test - public void TestDefaultRuleBasedCharacterIteration() { - RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) BreakIterator.getCharacterInstance(); - logln("Testing the RBBI for character iteration by using default rules"); - - // fetch the rules used to create the above RuleBasedBreakIterator - String defaultRules = rbbi.toString(); - - RuleBasedBreakIterator charIterDefault = null; - try { - charIterDefault = new RuleBasedBreakIterator(defaultRules); - } catch (IllegalArgumentException iae) { - errln("ERROR: failed construction in TestDefaultRuleBasedCharacterIteration()" + iae.toString()); - } - - List<String> chardata = new ArrayList<String>(); - chardata.add("H"); - chardata.add("e"); - chardata.add("l"); - chardata.add("l"); - chardata.add("o"); - chardata.add("e\u0301"); // acuteE - chardata.add("&"); - chardata.add("e\u0303"); // tildaE - // devanagiri characters for Hindi support - chardata.add("\u0906"); // devanagiri AA - // chardata.add("\u093e\u0901"); //devanagiri vowelsign AA+ chandrabindhu - chardata.add("\u0916\u0947"); // devanagiri KHA+vowelsign E - chardata.add("\u0938\u0941\u0902"); // devanagiri SA+vowelsign U + anusvara(bindu) - chardata.add("\u0926"); // devanagiri consonant DA - chardata.add("\u0930"); // devanagiri consonant RA - // chardata.add("\u0939\u094c"); //devanagiri HA+vowel sign AI - chardata.add("\u0964"); // devanagiri danda - // end hindi characters - chardata.add("A\u0302"); // circumflexA - chardata.add("i\u0301"); // acuteBelowI - // conjoining jamo... - chardata.add("\u1109\u1161\u11bc"); - chardata.add("\u1112\u1161\u11bc"); - chardata.add("\n"); - chardata.add("\r\n"); // keep CRLF sequences together - chardata.add("S\u0300"); // graveS - chardata.add("i\u0301"); // acuteBelowI - chardata.add("!"); - - // What follows is a string of Korean characters (I found it in the Yellow Pages - // ad for the Korean Presbyterian Church of San Francisco, and I hope I transcribed - // it correctly), first as precomposed syllables, and then as conjoining jamo. - // Both sequences should be semantically identical and break the same way. - // precomposed syllables... - chardata.add("\uc0c1"); - chardata.add("\ud56d"); - chardata.add(" "); - chardata.add("\ud55c"); - chardata.add("\uc778"); - chardata.add(" "); - chardata.add("\uc5f0"); - chardata.add("\ud569"); - chardata.add(" "); - chardata.add("\uc7a5"); - chardata.add("\ub85c"); - chardata.add("\uad50"); - chardata.add("\ud68c"); - chardata.add(" "); - // conjoining jamo... - chardata.add("\u1109\u1161\u11bc"); - chardata.add("\u1112\u1161\u11bc"); - chardata.add(" "); - chardata.add("\u1112\u1161\u11ab"); - chardata.add("\u110b\u1175\u11ab"); - chardata.add(" "); - chardata.add("\u110b\u1167\u11ab"); - chardata.add("\u1112\u1161\u11b8"); - chardata.add(" "); - chardata.add("\u110c\u1161\u11bc"); - chardata.add("\u1105\u1169"); - chardata.add("\u1100\u116d"); - chardata.add("\u1112\u116c"); - - generalIteratorTest(charIterDefault, chardata); - - } - - @Test - public void TestDefaultRuleBasedWordIteration() { - logln("Testing the RBBI for word iteration using default rules"); - RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) BreakIterator.getWordInstance(); - // fetch the rules used to create the above RuleBasedBreakIterator - String defaultRules = rbbi.toString(); - - RuleBasedBreakIterator wordIterDefault = null; - try { - wordIterDefault = new RuleBasedBreakIterator(defaultRules); - } catch (IllegalArgumentException iae) { - errln("ERROR: failed construction in TestDefaultRuleBasedWordIteration() -- custom rules" + iae.toString()); - } - - List<String> worddata = new ArrayList<String>(); - worddata.add("Write"); - worddata.add(" "); - worddata.add("wordrules"); - worddata.add("."); - worddata.add(" "); - // worddata.add("alpha-beta-gamma"); - worddata.add(" "); - worddata.add("\u092f\u0939"); - worddata.add(" "); - worddata.add("\u0939\u093f" + halfNA + "\u0926\u0940"); - worddata.add(" "); - worddata.add("\u0939\u0948"); - // worddata.add("\u0964"); //danda followed by a space - worddata.add(" "); - worddata.add("\u0905\u093e\u092a"); - worddata.add(" "); - worddata.add("\u0938\u093f\u0916\u094b\u0917\u0947"); - worddata.add("?"); - worddata.add(" "); - worddata.add("\r"); - worddata.add("It's"); - worddata.add(" "); - // worddata.add("$30.10"); - worddata.add(" "); - worddata.add(" "); - worddata.add("Badges"); - worddata.add("?"); - worddata.add(" "); - worddata.add("BADGES"); - worddata.add("!"); - worddata.add("1000,233,456.000"); - worddata.add(" "); - - generalIteratorTest(wordIterDefault, worddata); - } -// private static final String kParagraphSeparator = "\u2029"; - private static final String kLineSeparator = "\u2028"; - - @Test - public void TestDefaultRuleBasedSentenceIteration() { - logln("Testing the RBBI for sentence iteration using default rules"); - RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) BreakIterator.getSentenceInstance(); - - // fetch the rules used to create the above RuleBasedBreakIterator - String defaultRules = rbbi.toString(); - RuleBasedBreakIterator sentIterDefault = null; - try { - sentIterDefault = new RuleBasedBreakIterator(defaultRules); - } catch (IllegalArgumentException iae) { - errln("ERROR: failed construction in TestDefaultRuleBasedSentenceIteration()" + iae.toString()); - } - - List<String> sentdata = new ArrayList<String>(); - sentdata.add("(This is it.) "); - sentdata.add("Testing the sentence iterator. "); - sentdata.add("\"This isn\'t it.\" "); - sentdata.add("Hi! "); - sentdata.add("This is a simple sample sentence. "); - sentdata.add("(This is it.) "); - sentdata.add("This is a simple sample sentence. "); - sentdata.add("\"This isn\'t it.\" "); - sentdata.add("Hi! "); - sentdata.add("This is a simple sample sentence. "); - sentdata.add("It does not have to make any sense as you can see. "); - sentdata.add("Nel mezzo del cammin di nostra vita, mi ritrovai in una selva oscura. "); - sentdata.add("Che la dritta via aveo smarrita. "); - - generalIteratorTest(sentIterDefault, sentdata); - } - - @Test - public void TestDefaultRuleBasedLineIteration() { - logln("Testing the RBBI for line iteration using default rules"); - RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) RuleBasedBreakIterator.getLineInstance(); - // fetch the rules used to create the above RuleBasedBreakIterator - String defaultRules = rbbi.toString(); - RuleBasedBreakIterator lineIterDefault = null; - try { - lineIterDefault = new RuleBasedBreakIterator(defaultRules); - } catch (IllegalArgumentException iae) { - errln("ERROR: failed construction in TestDefaultRuleBasedLineIteration()" + iae.toString()); - } - - List<String> linedata = new ArrayList<String>(); - linedata.add("Multi-"); - linedata.add("Level "); - linedata.add("example "); - linedata.add("of "); - linedata.add("a "); - linedata.add("semi-"); - linedata.add("idiotic "); - linedata.add("non-"); - linedata.add("sensical "); - linedata.add("(non-"); - linedata.add("important) "); - linedata.add("sentence. "); - - linedata.add("Hi "); - linedata.add("Hello "); - linedata.add("How\n"); - linedata.add("are\r"); - linedata.add("you" + kLineSeparator); - linedata.add("fine.\t"); - linedata.add("good. "); - - linedata.add("Now\r"); - linedata.add("is\n"); - linedata.add("the\r\n"); - linedata.add("time\n"); - linedata.add("\r"); - linedata.add("for\r"); - linedata.add("\r"); - linedata.add("all"); - - generalIteratorTest(lineIterDefault, linedata); - - } - - // ========================================================================= - // general test subroutines - // ========================================================================= - - private void generalIteratorTest(RuleBasedBreakIterator rbbi, List<String> expectedResult) { - StringBuffer buffer = new StringBuffer(); - String text; - for (int i = 0; i < expectedResult.size(); i++) { - text = expectedResult.get(i); - buffer.append(text); - } - text = buffer.toString(); - if (rbbi == null) { - errln("null iterator, test skipped."); - return; - } - - rbbi.setText(text); - - List<String> nextResults = _testFirstAndNext(rbbi, text); - List<String> previousResults = _testLastAndPrevious(rbbi, text); - - logln("comparing forward and backward..."); - // TODO(#13318): As part of clean-up, permanently remove the error count check. - //int errs = getErrorCount(); - compareFragmentLists("forward iteration", "backward iteration", nextResults, previousResults); - //if (getErrorCount() == errs) { - logln("comparing expected and actual..."); - compareFragmentLists("expected result", "actual result", expectedResult, nextResults); - logln("comparing expected and actual..."); - compareFragmentLists("expected result", "actual result", expectedResult, nextResults); - //} - - int[] boundaries = new int[expectedResult.size() + 3]; - boundaries[0] = RuleBasedBreakIterator.DONE; - boundaries[1] = 0; - for (int i = 0; i < expectedResult.size(); i++) { - boundaries[i + 2] = boundaries[i + 1] + (expectedResult.get(i).length()); - } - - boundaries[boundaries.length - 1] = RuleBasedBreakIterator.DONE; - - _testFollowing(rbbi, text, boundaries); - _testPreceding(rbbi, text, boundaries); - _testIsBoundary(rbbi, text, boundaries); - - doMultipleSelectionTest(rbbi, text); - } - - private List<String> _testFirstAndNext(RuleBasedBreakIterator rbbi, String text) { - int p = rbbi.first(); - int lastP = p; - List<String> result = new ArrayList<String>(); - - if (p != 0) { - errln("first() returned " + p + " instead of 0"); - } - - while (p != RuleBasedBreakIterator.DONE) { - p = rbbi.next(); - if (p != RuleBasedBreakIterator.DONE) { - if (p <= lastP) { - errln("next() failed to move forward: next() on position " - + lastP + " yielded " + p); - } - result.add(text.substring(lastP, p)); - } - else { - if (lastP != text.length()) { - errln("next() returned DONE prematurely: offset was " - + lastP + " instead of " + text.length()); - } - } - lastP = p; - } - return result; - } - - private List<String> _testLastAndPrevious(RuleBasedBreakIterator rbbi, String text) { - int p = rbbi.last(); - int lastP = p; - List<String> result = new ArrayList<String>(); - - if (p != text.length()) { - errln("last() returned " + p + " instead of " + text.length()); - } - - while (p != RuleBasedBreakIterator.DONE) { - p = rbbi.previous(); - if (p != RuleBasedBreakIterator.DONE) { - if (p >= lastP) { - errln("previous() failed to move backward: previous() on position " - + lastP + " yielded " + p); - } - - result.add(0, text.substring(p, lastP)); - } - else { - if (lastP != 0) { - errln("previous() returned DONE prematurely: offset was " - + lastP + " instead of 0"); - } - } - lastP = p; - } - return result; - } - - private void compareFragmentLists(String f1Name, String f2Name, List<String> f1, List<String> f2) { - int p1 = 0; - int p2 = 0; - String s1; - String s2; - int t1 = 0; - int t2 = 0; - - while (p1 < f1.size() && p2 < f2.size()) { - s1 = f1.get(p1); - s2 = f2.get(p2); - t1 += s1.length(); - t2 += s2.length(); - - if (s1.equals(s2)) { - debugLogln(" >" + s1 + "<"); - ++p1; - ++p2; - } - else { - int tempT1 = t1; - int tempT2 = t2; - int tempP1 = p1; - int tempP2 = p2; - - while (tempT1 != tempT2 && tempP1 < f1.size() && tempP2 < f2.size()) { - while (tempT1 < tempT2 && tempP1 < f1.size()) { - tempT1 += (f1.get(tempP1)).length(); - ++tempP1; - } - while (tempT2 < tempT1 && tempP2 < f2.size()) { - tempT2 += (f2.get(tempP2)).length(); - ++tempP2; - } - } - logln("*** " + f1Name + " has:"); - while (p1 <= tempP1 && p1 < f1.size()) { - s1 = f1.get(p1); - t1 += s1.length(); - debugLogln(" *** >" + s1 + "<"); - ++p1; - } - logln("***** " + f2Name + " has:"); - while (p2 <= tempP2 && p2 < f2.size()) { - s2 = f2.get(p2); - t2 += s2.length(); - debugLogln(" ***** >" + s2 + "<"); - ++p2; - } - errln("Discrepancy between " + f1Name + " and " + f2Name); - } - } - } - - private void _testFollowing(RuleBasedBreakIterator rbbi, String text, int[] boundaries) { - logln("testFollowing():"); - int p = 2; - for(int i = 0; i <= text.length(); i++) { - if (i == boundaries[p]) - ++p; - int b = rbbi.following(i); - logln("rbbi.following(" + i + ") -> " + b); - if (b != boundaries[p]) - errln("Wrong result from following() for " + i + ": expected " + boundaries[p] - + ", got " + b); - } - } - - private void _testPreceding(RuleBasedBreakIterator rbbi, String text, int[] boundaries) { - logln("testPreceding():"); - int p = 0; - for(int i = 0; i <= text.length(); i++) { - int b = rbbi.preceding(i); - logln("rbbi.preceding(" + i + ") -> " + b); - if (b != boundaries[p]) - errln("Wrong result from preceding() for " + i + ": expected " + boundaries[p] - + ", got " + b); - if (i == boundaries[p + 1]) - ++p; - } - } - - private void _testIsBoundary(RuleBasedBreakIterator rbbi, String text, int[] boundaries) { - logln("testIsBoundary():"); - int p = 1; - boolean isB; - for(int i = 0; i <= text.length(); i++) { - isB = rbbi.isBoundary(i); - logln("rbbi.isBoundary(" + i + ") -> " + isB); - if(i == boundaries[p]) { - if (!isB) - errln("Wrong result from isBoundary() for " + i + ": expected true, got false"); - ++p; - } - else { - if(isB) - errln("Wrong result from isBoundary() for " + i + ": expected false, got true"); - } - } - } - private void doMultipleSelectionTest(RuleBasedBreakIterator iterator, String testText) - { - logln("Multiple selection test..."); - RuleBasedBreakIterator testIterator = (RuleBasedBreakIterator)iterator.clone(); - int offset = iterator.first(); - int testOffset; - int count = 0; - - do { - testOffset = testIterator.first(); - testOffset = testIterator.next(count); - logln("next(" + count + ") -> " + testOffset); - if (offset != testOffset) - errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset); - - if (offset != RuleBasedBreakIterator.DONE) { - count++; - offset = iterator.next(); - } - } while (offset != RuleBasedBreakIterator.DONE); - - // now do it backwards... - offset = iterator.last(); - count = 0; - - do { - testOffset = testIterator.last(); - testOffset = testIterator.next(count); - logln("next(" + count + ") -> " + testOffset); - if (offset != testOffset) - errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset); - - if (offset != RuleBasedBreakIterator.DONE) { - count--; - offset = iterator.previous(); - } - } while (offset != RuleBasedBreakIterator.DONE); - } - - private void debugLogln(String s) { - final String zeros = "0000"; - String temp; - StringBuffer out = new StringBuffer(); - for (int i = 0; i < s.length(); i++) { - char c = s.charAt(i); - if (c >= ' ' && c < '\u007f') - out.append(c); - else { - out.append("\\u"); - temp = Integer.toHexString(c); - out.append(zeros.substring(0, 4 - temp.length())); - out.append(temp); - } - } - logln(out.toString()); - } @Test public void TestThaiDictionaryBreakIterator() { @@ -629,7 +141,7 @@ public class RBBITest extends TestFmwk { } return buildString.toString(); } - @Test + public void doTest() { BreakIterator brkIter; switch( type ) { diff --git a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITestExtended.java b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITestExtended.java index 3dfe7d9e4a7..422770ded1f 100644 --- a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITestExtended.java +++ b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITestExtended.java @@ -413,6 +413,8 @@ public void TestExtended() { } void executeTest(TestParams t) { + // TODO: also rerun tests with a break iterator re-created from bi.getRules() + // and from bi.clone(). If in exhaustive mode only. int bp; int prevBP; int i; diff --git a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt index f07107bdfb0..0757bdf7dbc 100644 --- a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt +++ b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt @@ -14,6 +14,7 @@ # <sent> any following data is for sentence break testing # <line> any following data is for line break testing # <char> any following data is for char break testing +# <title> any following data is for title break testing # <rules> rules ... </rules> following data is tested against these rules. # Applies until a following occurence of <word>, <sent>, etc. or another <rules> # <locale locale_name> Switch to the named locale at the next occurence of <word>, <sent>, etc. @@ -148,6 +149,9 @@ # Treat Japanese Half Width voicing marks as combining <data>•A\uff9e•B\uff9f\uff9e\uff9f•C•</data> +# Test data originally from Java BreakIteratorTest.TestCharcterBreak() +<data>•S\u0300•i\u0317•m•p•l•e\u0301• •s•a\u0302•m•p•l•e\u0303•.•w•a\u0302•w•a•f•q•\n•\r•\r\n•\n•</data> + ######################################################################################## # # @@ -446,9 +450,12 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal <data>•No breaks when . is followed by a lower, with possible intervening punct .,a .$a .)a. •</data> # -# Sentence Breaks: no break at the boundary between CJK and other letters +# Sentence Breaks: no break at the boundary between CJK and other letters. TestBug4111338 # -<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048•He said, "I can go there."\u2029•Bye, now.•</data> +<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029\ +•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002\ +•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048\ +•He said, "I can go there."\u2029•Bye, now.•</data> # # Treat fullwidth variants of .!? the same as their @@ -499,22 +506,28 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal # test for bug #4152416: Make sure sentences ending with a capital # letter are treated correctly # -<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM. •Calls to xxx will return an implementor of this interface. \u2029•</data> +<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM. •\ +Calls to xxx will return an implementor of this interface. \u2029•</data> # test for bug #4152117: Make sure sentence breaking is handling # punctuation correctly [COULD NOT REPRODUCE THIS BUG, BUT TEST IS # HERE TO MAKE SURE IT DOESN'T CROP UP] # -<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive. •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>. •Note that this constructor always constructs a non-negative biginteger. \n•Ahh abc. -•</data> +<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to\ + \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive. \ + •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>. \ + •Note that this constructor always constructs a non-negative biginteger. \n•Ahh abc.•</data> # sentence breaks for hindi which used Devanagari script # make sure there is sentence break after ?,danda(hindi phrase separator), # fullstop followed by space. (VERY old test) # -<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\ +<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?\ +•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\ \u0915\u0948\u0938\u0947 \u0939\u0948?•\u0935\u0939 \u0915\u094d\u200d\u092f\u093e\n\ -<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". •"\u092a\u095d\u093e\u0908" meaning "education" or "studies". •\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data> +<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". \ +•"\u092a\u095d\u093e\u0908" meaning "education" or "studies". \ +•\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data> # Regression test for bug #1984, Sentence break in Arabic text. @@ -685,6 +698,12 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal # <data>•\uc0c1•\ud56d •\ud55c•\uc778 •\uc5f0•\ud569 •\uc7a5•\ub85c•\uad50•\ud68c•</data> +# Bug 4450804 estLineBreakContractions +# +<line> +<data>•These •are •'foobles'. •Don't •you •like •them?•</data> + + # conjoining jamo... <data>•\u1109\u1161\u11bc•\u1112\u1161\u11bc •\u1112\u1161\u11ab•\u110b\u1175\u11ab •\u110b\u1167\u11ab•\u1112\u1161\u11b8 •\u110c\u1161\u11bc•\u1105\u1169•\u1100\u116d•\u1112\u116c•</data> @@ -711,6 +730,10 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal # <data>•abc\ud801xyz•</data> +# a character sequence such as "X11" or "30F3" or "native2ascii" should +# be kept together as a single word. +<data>•X11 •30F3 •native2ascii•</data> + # # Regression tests for failures that originally came from the monkey test. # Monkey test failure lines can, with slight reformatting, be copied into this section @@ -732,6 +755,14 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal <line> <data>•R$ •JP¥ •a9 •3a •H% •CA$ •Travi$ •Scott •Ke$ha •Curren$y •A$AP •Rocky•</data> +# Test Bug 4146175 Lines +# the fullwidth comma should stick to the preceding Japanese character +<line> +<data>•\u7d42\uff0c•\u308f•</data> + +# Empty String +<line> +<data>•</data> ########################################################################################