ICU-13318 RBBITest, remove obsolete tests, move remaining test data to rbbitst.txt

author Andy Heninger <andy.heninger@gmail.com>

Sat, 26 Aug 2017 00:44:28 +0000 (00:44 +0000)

committer Andy Heninger <andy.heninger@gmail.com>

Sat, 26 Aug 2017 00:44:28 +0000 (00:44 +0000)
author Andy Heninger <andy.heninger@gmail.com>
Sat, 26 Aug 2017 00:44:28 +0000 (00:44 +0000)
committer Andy Heninger <andy.heninger@gmail.com>
Sat, 26 Aug 2017 00:44:28 +0000 (00:44 +0000)
diff --git a/icu4c/source/test/testdata/rbbitst.txt b/icu4c/source/test/testdata/rbbitst.txt

index f07107bdfb03d3808d8c000bdd4bcd7350fc762e..0757bdf7dbca2158e4144dd577f9df2b64071808 100644 (file)
--- a/icu4c/source/test/testdata/rbbitst.txt
+++ b/icu4c/source/test/testdata/rbbitst.txt
@@ -14,6 +14,7 @@
  #   <sent>    any following data is for sentence break testing
  #   <line>    any following data is for line break testing
  #   <char>    any following data is for char break testing
+#   <title>   any following data is for title break testing
  #   <rules> rules ... </rules>  following data is tested against these rules.
  #                               Applies until a following occurence of <word>, <sent>, etc. or another <rules>
  #   <locale locale_name>  Switch to the named locale at the next occurence of <word>, <sent>, etc.
@@ -148,6 +149,9 @@
  #  Treat Japanese Half Width voicing marks as combining
  <data>•A\uff9e•B\uff9f\uff9e\uff9f•C•</data>
  
+# Test data originally from Java BreakIteratorTest.TestCharcterBreak()
+<data>•S\u0300•i\u0317•m•p•l•e\u0301• •s•a\u0302•m•p•l•e\u0303•.•w•a\u0302•w•a•f•q•\n•\r•\r\n•\n•</data>
+
  ########################################################################################
  #
  #
@@ -446,9 +450,12 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  <data>•No breaks when . is followed by a lower, with possible intervening punct .,a .$a .)a. •</data>
  
  #
-#  Sentence Breaks: no break at the boundary between CJK and other letters
+#  Sentence Breaks: no break at the boundary between CJK and other letters. TestBug4111338
  #
-<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048•He said, "I can go there."\u2029•Bye, now.•</data>
+<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029\
+•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002\
+•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048\
+•He said, "I can go there."\u2029•Bye, now.•</data>
  
  #
  #      Treat fullwidth variants of .!? the same as their
@@ -499,22 +506,28 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  #        test for bug #4152416: Make sure sentences ending with a capital
  #        letter are treated correctly
  #
-<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM.  •Calls to xxx will return an implementor of this interface.  \u2029•</data>
+<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM.  •\
+Calls to xxx will return an implementor of this interface.  \u2029•</data>
  
  #        test for bug #4152117: Make sure sentence breaking is handling
  #        punctuation correctly [COULD NOT REPRODUCE THIS BUG, BUT TEST IS
  #        HERE TO MAKE SURE IT DOESN'T CROP UP]
  #
-<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive.  •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>.  •Note that this constructor always constructs a non-negative biginteger.  \n•Ahh abc.
-•</data>
+<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to\
+ \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive.  \
+ •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>.  \
+ •Note that this constructor always constructs a non-negative biginteger.  \n•Ahh abc.•</data>
  
  #        sentence breaks for hindi which used Devanagari script
  #        make sure there is sentence break after ?,danda(hindi phrase separator),
  #        fullstop followed by space.  (VERY old test)
  #
-<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\
+<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?\
+•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\
  \u0915\u0948\u0938\u0947 \u0939\u0948?•\u0935\u0939 \u0915\u094d\u200d\u092f\u093e\n\
-<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". •"\u092a\u095d\u093e\u0908" meaning "education" or "studies". •\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data>
+<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". \
+•"\u092a\u095d\u093e\u0908" meaning "education" or "studies". \
+•\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data>
  
  #         Regression test for bug #1984, Sentence break in Arabic text.
  
@@ -685,6 +698,12 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  #
  <data>•\uc0c1•\ud56d •\ud55c•\uc778 •\uc5f0•\ud569 •\uc7a5•\ub85c•\uad50•\ud68c•</data>
  
+#      Bug 4450804 estLineBreakContractions
+#
+<line>
+<data>•These •are •'foobles'. •Don't •you •like •them?•</data>
+
+
  #      conjoining jamo...
  <data>•\u1109\u1161\u11bc•\u1112\u1161\u11bc •\u1112\u1161\u11ab•\u110b\u1175\u11ab •\u110b\u1167\u11ab•\u1112\u1161\u11b8 •\u110c\u1161\u11bc•\u1105\u1169•\u1100\u116d•\u1112\u116c•</data>
  
@@ -711,6 +730,10 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  #
  <data>•abc\ud801xyz•</data>
  
+#   a character sequence such as "X11" or "30F3" or "native2ascii" should
+#   be kept together as a single word.
+<data>•X11 •30F3 •native2ascii•</data>
+
  #
  #     Regression tests for failures that originally came from the monkey test.
  #     Monkey test failure lines can, with slight reformatting, be copied into this section
@@ -732,6 +755,14 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  <line>
  <data>•R$ •JP¥ •a9 •3a •H% •CA$ •Travi$ •Scott •Ke$ha •Curren$y •A$AP •Rocky•</data>
  
+# Test Bug 4146175 Lines
+# the fullwidth comma should stick to the preceding Japanese character
+<line>
+<data>•\u7d42\uff0c•\u308f•</data>
+
+# Empty String
+<line>
+<data>•</data>
  
  
  ########################################################################################
diff --git a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/BreakIteratorTest.java b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/BreakIteratorTest.java

index ad237f824b95280b47c567f26f85137ec4b5307c..3e497ecf23bd2e63b4091023775375ddbcea41af 100644 (file)
--- a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/BreakIteratorTest.java
+++ b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/BreakIteratorTest.java
@@ -48,49 +48,6 @@ public class BreakIteratorTest extends TestFmwk
      // general test subroutines
      //=========================================================================
  
-    private void generalIteratorTest(BreakIterator bi, List<String> expectedResult) {
-        StringBuffer buffer = new StringBuffer();
-        String text;
-        for (int i = 0; i < expectedResult.size(); i++) {
-            text = expectedResult.get(i);
-            buffer.append(text);
-        }
-        text = buffer.toString();
-
-        bi.setText(text);
-
-        List<String> nextResults = _testFirstAndNext(bi, text);
-        List<String> previousResults = _testLastAndPrevious(bi, text);
-
-        logln("comparing forward and backward...");
-        // TODO(#13318): As part of clean-up, permanently remove the error count check.
-        //int errs = getErrorCount();
-        compareFragmentLists("forward iteration", "backward iteration", nextResults,
-                        previousResults);
-        //if (getErrorCount() == errs) {
-        logln("comparing expected and actual...");
-        compareFragmentLists("expected result", "actual result", expectedResult,
-                        nextResults);
-        logln("comparing expected and actual...");
-        compareFragmentLists("expected result", "actual result", expectedResult,
-                            nextResults);
-        //}
-
-        int[] boundaries = new int[expectedResult.size() + 3];
-        boundaries[0] = BreakIterator.DONE;
-        boundaries[1] = 0;
-        for (int i = 0; i < expectedResult.size(); i++)
-            boundaries[i + 2] = boundaries[i + 1] + (expectedResult.get(i)).
-                            length();
-        boundaries[boundaries.length - 1] = BreakIterator.DONE;
-
-        _testFollowing(bi, text, boundaries);
-        _testPreceding(bi, text, boundaries);
-        _testIsBoundary(bi, text, boundaries);
-
-        doMultipleSelectionTest(bi, text);
-    }
-
      private List<String> _testFirstAndNext(BreakIterator bi, String text) {
          int p = bi.first();
          int lastP = p;
@@ -247,46 +204,6 @@ public class BreakIteratorTest extends TestFmwk
          }
      }
  
-    private void doMultipleSelectionTest(BreakIterator iterator, String testText)
-    {
-        logln("Multiple selection test...");
-        BreakIterator testIterator = (BreakIterator)iterator.clone();
-        int offset = iterator.first();
-        int testOffset;
-        int count = 0;
-
-        do {
-            testOffset = testIterator.first();
-            testOffset = testIterator.next(count);
-            logln("next(" + count + ") -> " + testOffset);
-            if (offset != testOffset)
-                errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset);
-
-            if (offset != BreakIterator.DONE) {
-                count++;
-                offset = iterator.next();
-            }
-        } while (offset != BreakIterator.DONE);
-
-        // now do it backwards...
-        offset = iterator.last();
-        count = 0;
-
-        do {
-            testOffset = testIterator.last();
-            testOffset = testIterator.next(count);
-            logln("next(" + count + ") -> " + testOffset);
-            if (offset != testOffset)
-                errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset);
-
-            if (offset != BreakIterator.DONE) {
-                count--;
-                offset = iterator.previous();
-            }
-        } while (offset != BreakIterator.DONE);
-    }
-
-
      private void doOtherInvariantTest(BreakIterator tb, String testChars)
      {
          StringBuffer work = new StringBuffer("a\r\na");
@@ -361,344 +278,6 @@ public class BreakIteratorTest extends TestFmwk
      //=========================================================================
  
  
-    /**
-     * @bug 4097779
-     */
-    @Test
-    public void TestBug4097779() {
-        List<String> wordSelectionData = new ArrayList<String>(2);
-
-        wordSelectionData.add("aa\u0300a");
-        wordSelectionData.add(" ");
-
-        generalIteratorTest(wordBreak, wordSelectionData);
-    }
-
-    /**
-     * @bug 4098467
-     */
-    @Test
-    public void TestBug4098467Words() {
-        List<String> wordSelectionData = new ArrayList<String>();
-
-        // What follows is a string of Korean characters (I found it in the Yellow Pages
-        // ad for the Korean Presbyterian Church of San Francisco, and I hope I transcribed
-        // it correctly), first as precomposed syllables, and then as conjoining jamo.
-        // Both sequences should be semantically identical and break the same way.
-        // precomposed syllables...
-        wordSelectionData.add("\uc0c1\ud56d");
-        wordSelectionData.add(" ");
-        wordSelectionData.add("\ud55c\uc778");
-        wordSelectionData.add(" ");
-        wordSelectionData.add("\uc5f0\ud569");
-        wordSelectionData.add(" ");
-        wordSelectionData.add("\uc7a5\ub85c\uad50\ud68c");
-        wordSelectionData.add(" ");
-        // conjoining jamo...
-        wordSelectionData.add("\u1109\u1161\u11bc\u1112\u1161\u11bc");
-        wordSelectionData.add(" ");
-        wordSelectionData.add("\u1112\u1161\u11ab\u110b\u1175\u11ab");
-        wordSelectionData.add(" ");
-        wordSelectionData.add("\u110b\u1167\u11ab\u1112\u1161\u11b8");
-        wordSelectionData.add(" ");
-        wordSelectionData.add("\u110c\u1161\u11bc\u1105\u1169\u1100\u116d\u1112\u116c");
-        wordSelectionData.add(" ");
-
-        generalIteratorTest(wordBreak, wordSelectionData);
-    }
-
-
-    /**
-     * @bug 4111338
-     */
-    @Test
-    public void TestBug4111338() {
-        List<String> sentenceSelectionData = new ArrayList<String>();
-
-        // test for bug #4111338: Don't break sentences at the boundary between CJK
-        // and other letters
-        sentenceSelectionData.add("\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:\"JAVA\u821c"
-                + "\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba"
-                + "\u611d\u57b6\u2510\u5d46\".\u2029");
-        sentenceSelectionData.add("\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8"
-                + "\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0"
-                + "\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2029");
-        sentenceSelectionData.add("\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4"
-                + "\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8"
-                + "\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2029");
-        sentenceSelectionData.add("He said, \"I can go there.\"\u2029");
-
-        generalIteratorTest(sentenceBreak, sentenceSelectionData);
-    }
-
-
-    /**
-     * @bug 4143071
-     */
-    @Test
-    public void TestBug4143071() {
-        List<String> sentenceSelectionData = new ArrayList<String>(3);
-
-        // Make sure sentences that end with digits work right
-        sentenceSelectionData.add("Today is the 27th of May, 1998.  ");
-        sentenceSelectionData.add("Tomorrow will be 28 May 1998.  ");
-        sentenceSelectionData.add("The day after will be the 30th.\u2029");
-
-        generalIteratorTest(sentenceBreak, sentenceSelectionData);
-    }
-
-    /**
-     * @bug 4152416
-     */
-    @Test
-    public void TestBug4152416() {
-        List<String> sentenceSelectionData = new ArrayList<String>(2);
-
-        // Make sure sentences ending with a capital letter are treated correctly
-        sentenceSelectionData.add("The type of all primitive "
-                + "<code>boolean</code> values accessed in the target VM.  ");
-        sentenceSelectionData.add("Calls to xxx will return an "
-                + "implementor of this interface.\u2029");
-
-        generalIteratorTest(sentenceBreak, sentenceSelectionData);
-    }
-
-    /**
-     * @bug 4152117
-     */
-    @Test
-    public void TestBug4152117() {
-        List<String> sentenceSelectionData = new ArrayList<String>(3);
-
-        // Make sure sentence breaking is handling punctuation correctly
-        // [COULD NOT REPRODUCE THIS BUG, BUT TEST IS HERE TO MAKE SURE
-        // IT DOESN'T CROP UP]
-        sentenceSelectionData.add("Constructs a randomly generated "
-                + "BigInteger, uniformly distributed over the range <tt>0</tt> "
-                + "to <tt>(2<sup>numBits</sup> - 1)</tt>, inclusive.  ");
-        sentenceSelectionData.add("The uniformity of the distribution "
-                + "assumes that a fair source of random bits is provided in "
-                + "<tt>rnd</tt>.  ");
-        sentenceSelectionData.add("Note that this constructor always "
-                + "constructs a non-negative BigInteger.\u2029");
-
-        generalIteratorTest(sentenceBreak, sentenceSelectionData);
-    }
-
-    @Test
-    public void TestLineBreak() {
-        List<String> lineSelectionData = new ArrayList<String>();
-
-        lineSelectionData.add("Multi-");
-        lineSelectionData.add("Level ");
-        lineSelectionData.add("example ");
-        lineSelectionData.add("of ");
-        lineSelectionData.add("a ");
-        lineSelectionData.add("semi-");
-        lineSelectionData.add("idiotic ");
-        lineSelectionData.add("non-");
-        lineSelectionData.add("sensical ");
-        lineSelectionData.add("(non-");
-        lineSelectionData.add("important) ");
-        lineSelectionData.add("sentence. ");
-
-        lineSelectionData.add("Hi  ");
-        lineSelectionData.add("Hello ");
-        lineSelectionData.add("How\n");
-        lineSelectionData.add("are\r");
-        lineSelectionData.add("you\u2028");
-        lineSelectionData.add("fine.\t");
-        lineSelectionData.add("good.  ");
-
-        lineSelectionData.add("Now\r");
-        lineSelectionData.add("is\n");
-        lineSelectionData.add("the\r\n");
-        lineSelectionData.add("time\n");
-        lineSelectionData.add("\r");
-        lineSelectionData.add("for\r");
-        lineSelectionData.add("\r");
-        lineSelectionData.add("all");
-
-        generalIteratorTest(lineBreak, lineSelectionData);
-    }
-
-    /**
-     * @bug 4068133
-     */
-    @Test
-    public void TestBug4068133() {
-        List<String> lineSelectionData = new ArrayList<String>(9);
-
-        lineSelectionData.add("\u96f6");
-        lineSelectionData.add("\u4e00\u3002");
-        lineSelectionData.add("\u4e8c\u3001");
-        lineSelectionData.add("\u4e09\u3002\u3001");
-        lineSelectionData.add("\u56db\u3001\u3002\u3001");
-        lineSelectionData.add("\u4e94,");
-        lineSelectionData.add("\u516d.");
-        lineSelectionData.add("\u4e03.\u3001,\u3002");
-        lineSelectionData.add("\u516b");
-
-        generalIteratorTest(lineBreak, lineSelectionData);
-    }
-
-    /**
-     * @bug 4086052
-     */
-    @Test
-    public void TestBug4086052() {
-        List<String> lineSelectionData = new ArrayList<String>(1);
-
-        lineSelectionData.add("foo\u00a0bar ");
-//        lineSelectionData.addElement("foo\ufeffbar");
-
-        generalIteratorTest(lineBreak, lineSelectionData);
-    }
-
-    /**
-     * @bug 4097920
-     */
-    @Test
-    public void TestBug4097920() {
-        List<String> lineSelectionData = new ArrayList<String>(3);
-
-        lineSelectionData.add("dog,cat,mouse ");
-        lineSelectionData.add("(one)");
-        lineSelectionData.add("(two)\n");
-        generalIteratorTest(lineBreak, lineSelectionData);
-    }
-
-
-
-    /**
-     * @bug 4117554
-     */
-    @Test
-    public void TestBug4117554Lines() {
-        List<String> lineSelectionData = new ArrayList<String>(3);
-
-        // Fullwidth .!? should be treated as postJwrd
-        lineSelectionData.add("\u4e01\uff0e");
-        lineSelectionData.add("\u4e02\uff01");
-        lineSelectionData.add("\u4e03\uff1f");
-
-        generalIteratorTest(lineBreak, lineSelectionData);
-    }
-
-    @Test
-    public void TestLettersAndDigits() {
-        // a character sequence such as "X11" or "30F3" or "native2ascii" should
-        // be kept together as a single word
-        List<String> lineSelectionData = new ArrayList<String>(3);
-
-        lineSelectionData.add("X11 ");
-        lineSelectionData.add("30F3 ");
-        lineSelectionData.add("native2ascii");
-
-        generalIteratorTest(lineBreak, lineSelectionData);
-    }
-
-
-    private static final String graveS = "S\u0300";
-    private static final String acuteBelowI = "i\u0317";
-    private static final String acuteE = "e\u0301";
-    private static final String circumflexA = "a\u0302";
-    private static final String tildeE = "e\u0303";
-
-    @Test
-    public void TestCharacterBreak() {
-        List<String> characterSelectionData = new ArrayList<String>();
-
-        characterSelectionData.add(graveS);
-        characterSelectionData.add(acuteBelowI);
-        characterSelectionData.add("m");
-        characterSelectionData.add("p");
-        characterSelectionData.add("l");
-        characterSelectionData.add(acuteE);
-        characterSelectionData.add(" ");
-        characterSelectionData.add("s");
-        characterSelectionData.add(circumflexA);
-        characterSelectionData.add("m");
-        characterSelectionData.add("p");
-        characterSelectionData.add("l");
-        characterSelectionData.add(tildeE);
-        characterSelectionData.add(".");
-        characterSelectionData.add("w");
-        characterSelectionData.add(circumflexA);
-        characterSelectionData.add("w");
-        characterSelectionData.add("a");
-        characterSelectionData.add("f");
-        characterSelectionData.add("q");
-        characterSelectionData.add("\n");
-        characterSelectionData.add("\r");
-        characterSelectionData.add("\r\n");
-        characterSelectionData.add("\n");
-
-        generalIteratorTest(characterBreak, characterSelectionData);
-    }
-
-    /**
-     * @bug 4098467
-     */
-    @Test
-    public void TestBug4098467Characters() {
-        List<String> characterSelectionData = new ArrayList<String>();
-
-        // What follows is a string of Korean characters (I found it in the Yellow Pages
-        // ad for the Korean Presbyterian Church of San Francisco, and I hope I transcribed
-        // it correctly), first as precomposed syllables, and then as conjoining jamo.
-        // Both sequences should be semantically identical and break the same way.
-        // precomposed syllables...
-        characterSelectionData.add("\uc0c1");
-        characterSelectionData.add("\ud56d");
-        characterSelectionData.add(" ");
-        characterSelectionData.add("\ud55c");
-        characterSelectionData.add("\uc778");
-        characterSelectionData.add(" ");
-        characterSelectionData.add("\uc5f0");
-        characterSelectionData.add("\ud569");
-        characterSelectionData.add(" ");
-        characterSelectionData.add("\uc7a5");
-        characterSelectionData.add("\ub85c");
-        characterSelectionData.add("\uad50");
-        characterSelectionData.add("\ud68c");
-        characterSelectionData.add(" ");
-        // conjoining jamo...
-        characterSelectionData.add("\u1109\u1161\u11bc");
-        characterSelectionData.add("\u1112\u1161\u11bc");
-        characterSelectionData.add(" ");
-        characterSelectionData.add("\u1112\u1161\u11ab");
-        characterSelectionData.add("\u110b\u1175\u11ab");
-        characterSelectionData.add(" ");
-        characterSelectionData.add("\u110b\u1167\u11ab");
-        characterSelectionData.add("\u1112\u1161\u11b8");
-        characterSelectionData.add(" ");
-        characterSelectionData.add("\u110c\u1161\u11bc");
-        characterSelectionData.add("\u1105\u1169");
-        characterSelectionData.add("\u1100\u116d");
-        characterSelectionData.add("\u1112\u116c");
-
-        generalIteratorTest(characterBreak, characterSelectionData);
-    }
-
-    @Test
-    public void TestTitleBreak()
-    {
-        List<String> titleData = new ArrayList<String>();
-        titleData.add("   ");
-        titleData.add("This ");
-        titleData.add("is ");
-        titleData.add("a ");
-        titleData.add("simple ");
-        titleData.add("sample ");
-        titleData.add("sentence. ");
-        titleData.add("This ");
-
-        generalIteratorTest(titleBreak, titleData);
-    }
-
-
-
      /*
       * @bug 4153072
       */
@@ -728,17 +307,6 @@ public class BreakIteratorTest extends TestFmwk
      }
  
  
-    @Test
-    public void TestBug4146175Lines() {
-        List<String> lineSelectionData = new ArrayList<String>(2);
-
-        // the fullwidth comma should stick to the preceding Japanese character
-        lineSelectionData.add("\u7d42\uff0c");
-        lineSelectionData.add("\u308f");
-
-        generalIteratorTest(lineBreak, lineSelectionData);
-    }
-
      private static final String cannedTestChars
          = "\u0000\u0001\u0002\u0003\u0004 !\"#$%&()+-01234<=>ABCDE[]^_`abcde{}|\u00a0\u00a2"
          + "\u00a3\u00a4\u00a5\u00a6\u00a7\u00a8\u00a9\u00ab\u00ad\u00ae\u00af\u00b0\u00b2\u00b3"
@@ -754,16 +322,6 @@ public class BreakIteratorTest extends TestFmwk
          doOtherInvariantTest(e, cannedTestChars + ".,\u3001\u3002\u3041\u3042\u3043\ufeff");
      }
  
-    @Test
-    public void TestEmptyString()
-    {
-        String text = "";
-        List<String> x = new ArrayList<String>(1);
-        x.add(text);
-
-        generalIteratorTest(lineBreak, x);
-    }
-
      @Test
      public void TestGetAvailableLocales()
      {
@@ -838,23 +396,6 @@ public class BreakIteratorTest extends TestFmwk
          }
      }
  
-
-    /**
-     * Bug 4450804
-     */
-    @Test
-    public void TestLineBreakContractions() {
-        List<String> expected = new ArrayList<String>(7);
-        expected.add("These ");
-        expected.add("are ");
-        expected.add("'foobles'. ");
-        expected.add("Don't ");
-        expected.add("you ");
-        expected.add("like ");
-        expected.add("them?");
-        generalIteratorTest(lineBreak, expected);
-    }
-
      /**
       * Ticket#5615
       */
diff --git a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java

index bcc105bb96986c529573d327fb0d5420a88b3027..b3f6f45cf3d46d11c12a781a93ab8f87ade9c07d 100644 (file)
--- a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java
+++ b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java
@@ -31,495 +31,7 @@ public class RBBITest extends TestFmwk {
      public RBBITest() {
      }
  
-    private static final String halfNA = "\u0928\u094d\u200d"; /*
-                                                                * halfform NA = devanigiri NA + virama(supresses
-                                                                * inherent vowel)+ zero width joiner
-                                                                */
-
-    // tests default rules based character iteration.
-    // Builds a new iterator from the source rules in the default (prebuilt) iterator.
-    //
-    @Test
-    public void TestDefaultRuleBasedCharacterIteration() {
-        RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) BreakIterator.getCharacterInstance();
-        logln("Testing the RBBI for character iteration by using default rules");
-
-        // fetch the rules used to create the above RuleBasedBreakIterator
-        String defaultRules = rbbi.toString();
-
-        RuleBasedBreakIterator charIterDefault = null;
-        try {
-            charIterDefault = new RuleBasedBreakIterator(defaultRules);
-        } catch (IllegalArgumentException iae) {
-            errln("ERROR: failed construction in TestDefaultRuleBasedCharacterIteration()" + iae.toString());
-        }
-
-        List<String> chardata = new ArrayList<String>();
-        chardata.add("H");
-        chardata.add("e");
-        chardata.add("l");
-        chardata.add("l");
-        chardata.add("o");
-        chardata.add("e\u0301"); // acuteE
-        chardata.add("&");
-        chardata.add("e\u0303"); // tildaE
-        // devanagiri characters for Hindi support
-        chardata.add("\u0906"); // devanagiri AA
-        // chardata.add("\u093e\u0901"); //devanagiri vowelsign AA+ chandrabindhu
-        chardata.add("\u0916\u0947"); // devanagiri KHA+vowelsign E
-        chardata.add("\u0938\u0941\u0902"); // devanagiri SA+vowelsign U + anusvara(bindu)
-        chardata.add("\u0926"); // devanagiri consonant DA
-        chardata.add("\u0930"); // devanagiri consonant RA
-        // chardata.add("\u0939\u094c"); //devanagiri HA+vowel sign AI
-        chardata.add("\u0964"); // devanagiri danda
-        // end hindi characters
-        chardata.add("A\u0302"); // circumflexA
-        chardata.add("i\u0301"); // acuteBelowI
-        // conjoining jamo...
-        chardata.add("\u1109\u1161\u11bc");
-        chardata.add("\u1112\u1161\u11bc");
-        chardata.add("\n");
-        chardata.add("\r\n"); // keep CRLF sequences together
-        chardata.add("S\u0300"); // graveS
-        chardata.add("i\u0301"); // acuteBelowI
-        chardata.add("!");
-
-        // What follows is a string of Korean characters (I found it in the Yellow Pages
-        // ad for the Korean Presbyterian Church of San Francisco, and I hope I transcribed
-        // it correctly), first as precomposed syllables, and then as conjoining jamo.
-        // Both sequences should be semantically identical and break the same way.
-        // precomposed syllables...
-        chardata.add("\uc0c1");
-        chardata.add("\ud56d");
-        chardata.add(" ");
-        chardata.add("\ud55c");
-        chardata.add("\uc778");
-        chardata.add(" ");
-        chardata.add("\uc5f0");
-        chardata.add("\ud569");
-        chardata.add(" ");
-        chardata.add("\uc7a5");
-        chardata.add("\ub85c");
-        chardata.add("\uad50");
-        chardata.add("\ud68c");
-        chardata.add(" ");
-        // conjoining jamo...
-        chardata.add("\u1109\u1161\u11bc");
-        chardata.add("\u1112\u1161\u11bc");
-        chardata.add(" ");
-        chardata.add("\u1112\u1161\u11ab");
-        chardata.add("\u110b\u1175\u11ab");
-        chardata.add(" ");
-        chardata.add("\u110b\u1167\u11ab");
-        chardata.add("\u1112\u1161\u11b8");
-        chardata.add(" ");
-        chardata.add("\u110c\u1161\u11bc");
-        chardata.add("\u1105\u1169");
-        chardata.add("\u1100\u116d");
-        chardata.add("\u1112\u116c");
-
-        generalIteratorTest(charIterDefault, chardata);
-
-    }
-
-    @Test
-    public void TestDefaultRuleBasedWordIteration() {
-        logln("Testing the RBBI for word iteration using default rules");
-        RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) BreakIterator.getWordInstance();
-        // fetch the rules used to create the above RuleBasedBreakIterator
-        String defaultRules = rbbi.toString();
-
-        RuleBasedBreakIterator wordIterDefault = null;
-        try {
-            wordIterDefault = new RuleBasedBreakIterator(defaultRules);
-        } catch (IllegalArgumentException iae) {
-            errln("ERROR: failed construction in TestDefaultRuleBasedWordIteration() -- custom rules" + iae.toString());
-        }
-
-        List<String> worddata = new ArrayList<String>();
-        worddata.add("Write");
-        worddata.add(" ");
-        worddata.add("wordrules");
-        worddata.add(".");
-        worddata.add(" ");
-        // worddata.add("alpha-beta-gamma");
-        worddata.add(" ");
-        worddata.add("\u092f\u0939");
-        worddata.add(" ");
-        worddata.add("\u0939\u093f" + halfNA + "\u0926\u0940");
-        worddata.add(" ");
-        worddata.add("\u0939\u0948");
-        // worddata.add("\u0964"); //danda followed by a space
-        worddata.add(" ");
-        worddata.add("\u0905\u093e\u092a");
-        worddata.add(" ");
-        worddata.add("\u0938\u093f\u0916\u094b\u0917\u0947");
-        worddata.add("?");
-        worddata.add(" ");
-        worddata.add("\r");
-        worddata.add("It's");
-        worddata.add(" ");
-        // worddata.add("$30.10");
-        worddata.add(" ");
-        worddata.add(" ");
-        worddata.add("Badges");
-        worddata.add("?");
-        worddata.add(" ");
-        worddata.add("BADGES");
-        worddata.add("!");
-        worddata.add("1000,233,456.000");
-        worddata.add(" ");
-
-        generalIteratorTest(wordIterDefault, worddata);
-    }
  
-//    private static final String kParagraphSeparator = "\u2029";
-    private static final String kLineSeparator      = "\u2028";
-
-    @Test
-    public void TestDefaultRuleBasedSentenceIteration() {
-        logln("Testing the RBBI for sentence iteration using default rules");
-        RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) BreakIterator.getSentenceInstance();
-
-        // fetch the rules used to create the above RuleBasedBreakIterator
-        String defaultRules = rbbi.toString();
-        RuleBasedBreakIterator sentIterDefault = null;
-        try {
-            sentIterDefault = new RuleBasedBreakIterator(defaultRules);
-        } catch (IllegalArgumentException iae) {
-            errln("ERROR: failed construction in TestDefaultRuleBasedSentenceIteration()" + iae.toString());
-        }
-
-        List<String> sentdata = new ArrayList<String>();
-        sentdata.add("(This is it.) ");
-        sentdata.add("Testing the sentence iterator. ");
-        sentdata.add("\"This isn\'t it.\" ");
-        sentdata.add("Hi! ");
-        sentdata.add("This is a simple sample sentence. ");
-        sentdata.add("(This is it.) ");
-        sentdata.add("This is a simple sample sentence. ");
-        sentdata.add("\"This isn\'t it.\" ");
-        sentdata.add("Hi! ");
-        sentdata.add("This is a simple sample sentence. ");
-        sentdata.add("It does not have to make any sense as you can see. ");
-        sentdata.add("Nel mezzo del cammin di nostra vita, mi ritrovai in una selva oscura. ");
-        sentdata.add("Che la dritta via aveo smarrita. ");
-
-        generalIteratorTest(sentIterDefault, sentdata);
-    }
-
-    @Test
-    public void TestDefaultRuleBasedLineIteration() {
-        logln("Testing the RBBI for line iteration using default rules");
-        RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) RuleBasedBreakIterator.getLineInstance();
-        // fetch the rules used to create the above RuleBasedBreakIterator
-        String defaultRules = rbbi.toString();
-        RuleBasedBreakIterator lineIterDefault = null;
-        try {
-            lineIterDefault = new RuleBasedBreakIterator(defaultRules);
-        } catch (IllegalArgumentException iae) {
-            errln("ERROR: failed construction in TestDefaultRuleBasedLineIteration()" + iae.toString());
-        }
-
-        List<String> linedata = new ArrayList<String>();
-        linedata.add("Multi-");
-        linedata.add("Level ");
-        linedata.add("example ");
-        linedata.add("of ");
-        linedata.add("a ");
-        linedata.add("semi-");
-        linedata.add("idiotic ");
-        linedata.add("non-");
-        linedata.add("sensical ");
-        linedata.add("(non-");
-        linedata.add("important) ");
-        linedata.add("sentence. ");
-
-        linedata.add("Hi  ");
-        linedata.add("Hello ");
-        linedata.add("How\n");
-        linedata.add("are\r");
-        linedata.add("you" + kLineSeparator);
-        linedata.add("fine.\t");
-        linedata.add("good.  ");
-
-        linedata.add("Now\r");
-        linedata.add("is\n");
-        linedata.add("the\r\n");
-        linedata.add("time\n");
-        linedata.add("\r");
-        linedata.add("for\r");
-        linedata.add("\r");
-        linedata.add("all");
-
-        generalIteratorTest(lineIterDefault, linedata);
-
-    }
-
-    // =========================================================================
-    // general test subroutines
-    // =========================================================================
-
-    private void generalIteratorTest(RuleBasedBreakIterator rbbi, List<String> expectedResult) {
-        StringBuffer buffer = new StringBuffer();
-        String text;
-        for (int i = 0; i < expectedResult.size(); i++) {
-            text = expectedResult.get(i);
-            buffer.append(text);
-        }
-        text = buffer.toString();
-        if (rbbi == null) {
-            errln("null iterator, test skipped.");
-            return;
-        }
-
-        rbbi.setText(text);
-
-        List<String> nextResults = _testFirstAndNext(rbbi, text);
-        List<String> previousResults = _testLastAndPrevious(rbbi, text);
-
-        logln("comparing forward and backward...");
-        // TODO(#13318): As part of clean-up, permanently remove the error count check.
-        //int errs = getErrorCount();
-        compareFragmentLists("forward iteration", "backward iteration", nextResults, previousResults);
-        //if (getErrorCount() == errs) {
-        logln("comparing expected and actual...");
-        compareFragmentLists("expected result", "actual result", expectedResult, nextResults);
-            logln("comparing expected and actual...");
-            compareFragmentLists("expected result", "actual result", expectedResult, nextResults);
-        //}
-
-        int[] boundaries = new int[expectedResult.size() + 3];
-        boundaries[0] = RuleBasedBreakIterator.DONE;
-        boundaries[1] = 0;
-        for (int i = 0; i < expectedResult.size(); i++) {
-            boundaries[i + 2] = boundaries[i + 1] + (expectedResult.get(i).length());
-        }
-
-        boundaries[boundaries.length - 1] = RuleBasedBreakIterator.DONE;
-
-        _testFollowing(rbbi, text, boundaries);
-        _testPreceding(rbbi, text, boundaries);
-        _testIsBoundary(rbbi, text, boundaries);
-
-        doMultipleSelectionTest(rbbi, text);
-    }
-
-     private List<String> _testFirstAndNext(RuleBasedBreakIterator rbbi, String text) {
-         int p = rbbi.first();
-         int lastP = p;
-         List<String> result = new ArrayList<String>();
-
-         if (p != 0) {
-             errln("first() returned " + p + " instead of 0");
-         }
-
-         while (p != RuleBasedBreakIterator.DONE) {
-             p = rbbi.next();
-             if (p != RuleBasedBreakIterator.DONE) {
-                 if (p <= lastP) {
-                     errln("next() failed to move forward: next() on position "
-                                     + lastP + " yielded " + p);
-                 }
-                 result.add(text.substring(lastP, p));
-             }
-             else {
-                 if (lastP != text.length()) {
-                     errln("next() returned DONE prematurely: offset was "
-                                     + lastP + " instead of " + text.length());
-                 }
-             }
-             lastP = p;
-         }
-         return result;
-     }
-
-     private List<String> _testLastAndPrevious(RuleBasedBreakIterator rbbi, String text) {
-         int p = rbbi.last();
-         int lastP = p;
-         List<String> result = new ArrayList<String>();
-
-         if (p != text.length()) {
-             errln("last() returned " + p + " instead of " + text.length());
-         }
-
-         while (p != RuleBasedBreakIterator.DONE) {
-             p = rbbi.previous();
-             if (p != RuleBasedBreakIterator.DONE) {
-                 if (p >= lastP) {
-                     errln("previous() failed to move backward: previous() on position "
-                                     + lastP + " yielded " + p);
-                 }
-
-                 result.add(0, text.substring(p, lastP));
-             }
-             else {
-                 if (lastP != 0) {
-                     errln("previous() returned DONE prematurely: offset was "
-                                     + lastP + " instead of 0");
-                 }
-             }
-             lastP = p;
-         }
-         return result;
-     }
-
-     private void compareFragmentLists(String f1Name, String f2Name, List<String> f1, List<String> f2) {
-         int p1 = 0;
-         int p2 = 0;
-         String s1;
-         String s2;
-         int t1 = 0;
-         int t2 = 0;
-
-         while (p1 < f1.size() && p2 < f2.size()) {
-             s1 = f1.get(p1);
-             s2 = f2.get(p2);
-             t1 += s1.length();
-             t2 += s2.length();
-
-             if (s1.equals(s2)) {
-                 debugLogln("   >" + s1 + "<");
-                 ++p1;
-                 ++p2;
-             }
-             else {
-                 int tempT1 = t1;
-                 int tempT2 = t2;
-                 int tempP1 = p1;
-                 int tempP2 = p2;
-
-                 while (tempT1 != tempT2 && tempP1 < f1.size() && tempP2 < f2.size()) {
-                     while (tempT1 < tempT2 && tempP1 < f1.size()) {
-                         tempT1 += (f1.get(tempP1)).length();
-                         ++tempP1;
-                     }
-                     while (tempT2 < tempT1 && tempP2 < f2.size()) {
-                         tempT2 += (f2.get(tempP2)).length();
-                         ++tempP2;
-                     }
-                 }
-                 logln("*** " + f1Name + " has:");
-                 while (p1 <= tempP1 && p1 < f1.size()) {
-                     s1 = f1.get(p1);
-                     t1 += s1.length();
-                     debugLogln(" *** >" + s1 + "<");
-                     ++p1;
-                 }
-                 logln("***** " + f2Name + " has:");
-                 while (p2 <= tempP2 && p2 < f2.size()) {
-                     s2 = f2.get(p2);
-                     t2 += s2.length();
-                     debugLogln(" ***** >" + s2 + "<");
-                     ++p2;
-                 }
-                 errln("Discrepancy between " + f1Name + " and " + f2Name);
-             }
-         }
-     }
-
-    private void _testFollowing(RuleBasedBreakIterator rbbi, String text, int[] boundaries) {
-       logln("testFollowing():");
-       int p = 2;
-       for(int i = 0; i <= text.length(); i++) {
-           if (i == boundaries[p])
-               ++p;
-           int b = rbbi.following(i);
-           logln("rbbi.following(" + i + ") -> " + b);
-           if (b != boundaries[p])
-               errln("Wrong result from following() for " + i + ": expected " + boundaries[p]
-                               + ", got " + b);
-       }
-   }
-
-   private void _testPreceding(RuleBasedBreakIterator rbbi, String text, int[] boundaries) {
-       logln("testPreceding():");
-       int p = 0;
-       for(int i = 0; i <= text.length(); i++) {
-           int b = rbbi.preceding(i);
-           logln("rbbi.preceding(" + i + ") -> " + b);
-           if (b != boundaries[p])
-               errln("Wrong result from preceding() for " + i + ": expected " + boundaries[p]
-                              + ", got " + b);
-           if (i == boundaries[p + 1])
-               ++p;
-       }
-   }
-
-   private void _testIsBoundary(RuleBasedBreakIterator rbbi, String text, int[] boundaries) {
-       logln("testIsBoundary():");
-       int p = 1;
-       boolean isB;
-       for(int i = 0; i <= text.length(); i++) {
-           isB = rbbi.isBoundary(i);
-           logln("rbbi.isBoundary(" + i + ") -> " + isB);
-           if(i == boundaries[p]) {
-               if (!isB)
-                   errln("Wrong result from isBoundary() for " + i + ": expected true, got false");
-               ++p;
-           }
-           else {
-               if(isB)
-                   errln("Wrong result from isBoundary() for " + i + ": expected false, got true");
-           }
-       }
-   }
-   private void doMultipleSelectionTest(RuleBasedBreakIterator iterator, String testText)
-   {
-       logln("Multiple selection test...");
-       RuleBasedBreakIterator testIterator = (RuleBasedBreakIterator)iterator.clone();
-       int offset = iterator.first();
-       int testOffset;
-       int count = 0;
-
-       do {
-           testOffset = testIterator.first();
-           testOffset = testIterator.next(count);
-           logln("next(" + count + ") -> " + testOffset);
-           if (offset != testOffset)
-               errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset);
-
-           if (offset != RuleBasedBreakIterator.DONE) {
-               count++;
-               offset = iterator.next();
-           }
-       } while (offset != RuleBasedBreakIterator.DONE);
-
-       // now do it backwards...
-       offset = iterator.last();
-       count = 0;
-
-       do {
-           testOffset = testIterator.last();
-           testOffset = testIterator.next(count);
-           logln("next(" + count + ") -> " + testOffset);
-           if (offset != testOffset)
-               errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset);
-
-           if (offset != RuleBasedBreakIterator.DONE) {
-               count--;
-               offset = iterator.previous();
-           }
-       } while (offset != RuleBasedBreakIterator.DONE);
-   }
-
-   private void debugLogln(String s) {
-        final String zeros = "0000";
-        String temp;
-        StringBuffer out = new StringBuffer();
-        for (int i = 0; i < s.length(); i++) {
-            char c = s.charAt(i);
-            if (c >= ' ' && c < '\u007f')
-                out.append(c);
-            else {
-                out.append("\\u");
-                temp = Integer.toHexString(c);
-                out.append(zeros.substring(0, 4 - temp.length()));
-                out.append(temp);
-            }
-        }
-         logln(out.toString());
-    }
  
      @Test
     public void TestThaiDictionaryBreakIterator() {
@@ -629,7 +141,7 @@ public class RBBITest extends TestFmwk {
                  }
                  return buildString.toString();
              }
-    @Test
+
              public void doTest() {
                  BreakIterator brkIter;
                  switch( type ) {
diff --git a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITestExtended.java b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITestExtended.java

index 3dfe7d9e4a79817b41801157d060a2837812fd86..422770ded1f6245f4c23021a80cce53018c7bab3 100644 (file)
--- a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITestExtended.java
+++ b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITestExtended.java
@@ -413,6 +413,8 @@ public void TestExtended() {
  }
  
  void executeTest(TestParams t) {
+    // TODO: also rerun tests with a break iterator re-created from bi.getRules()
+    //       and from bi.clone(). If in exhaustive mode only.
      int    bp;
      int    prevBP;
      int    i;
diff --git a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt

index f07107bdfb03d3808d8c000bdd4bcd7350fc762e..0757bdf7dbca2158e4144dd577f9df2b64071808 100644 (file)
--- a/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt
+++ b/icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt
@@ -14,6 +14,7 @@
  #   <sent>    any following data is for sentence break testing
  #   <line>    any following data is for line break testing
  #   <char>    any following data is for char break testing
+#   <title>   any following data is for title break testing
  #   <rules> rules ... </rules>  following data is tested against these rules.
  #                               Applies until a following occurence of <word>, <sent>, etc. or another <rules>
  #   <locale locale_name>  Switch to the named locale at the next occurence of <word>, <sent>, etc.
@@ -148,6 +149,9 @@
  #  Treat Japanese Half Width voicing marks as combining
  <data>•A\uff9e•B\uff9f\uff9e\uff9f•C•</data>
  
+# Test data originally from Java BreakIteratorTest.TestCharcterBreak()
+<data>•S\u0300•i\u0317•m•p•l•e\u0301• •s•a\u0302•m•p•l•e\u0303•.•w•a\u0302•w•a•f•q•\n•\r•\r\n•\n•</data>
+
  ########################################################################################
  #
  #
@@ -446,9 +450,12 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  <data>•No breaks when . is followed by a lower, with possible intervening punct .,a .$a .)a. •</data>
  
  #
-#  Sentence Breaks: no break at the boundary between CJK and other letters
+#  Sentence Breaks: no break at the boundary between CJK and other letters. TestBug4111338
  #
-<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048•He said, "I can go there."\u2029•Bye, now.•</data>
+<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029\
+•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002\
+•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048\
+•He said, "I can go there."\u2029•Bye, now.•</data>
  
  #
  #      Treat fullwidth variants of .!? the same as their
@@ -499,22 +506,28 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  #        test for bug #4152416: Make sure sentences ending with a capital
  #        letter are treated correctly
  #
-<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM.  •Calls to xxx will return an implementor of this interface.  \u2029•</data>
+<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM.  •\
+Calls to xxx will return an implementor of this interface.  \u2029•</data>
  
  #        test for bug #4152117: Make sure sentence breaking is handling
  #        punctuation correctly [COULD NOT REPRODUCE THIS BUG, BUT TEST IS
  #        HERE TO MAKE SURE IT DOESN'T CROP UP]
  #
-<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive.  •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>.  •Note that this constructor always constructs a non-negative biginteger.  \n•Ahh abc.
-•</data>
+<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to\
+ \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive.  \
+ •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>.  \
+ •Note that this constructor always constructs a non-negative biginteger.  \n•Ahh abc.•</data>
  
  #        sentence breaks for hindi which used Devanagari script
  #        make sure there is sentence break after ?,danda(hindi phrase separator),
  #        fullstop followed by space.  (VERY old test)
  #
-<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\
+<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?\
+•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\
  \u0915\u0948\u0938\u0947 \u0939\u0948?•\u0935\u0939 \u0915\u094d\u200d\u092f\u093e\n\
-<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". •"\u092a\u095d\u093e\u0908" meaning "education" or "studies". •\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data>
+<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". \
+•"\u092a\u095d\u093e\u0908" meaning "education" or "studies". \
+•\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data>
  
  #         Regression test for bug #1984, Sentence break in Arabic text.
  
@@ -685,6 +698,12 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  #
  <data>•\uc0c1•\ud56d •\ud55c•\uc778 •\uc5f0•\ud569 •\uc7a5•\ub85c•\uad50•\ud68c•</data>
  
+#      Bug 4450804 estLineBreakContractions
+#
+<line>
+<data>•These •are •'foobles'. •Don't •you •like •them?•</data>
+
+
  #      conjoining jamo...
  <data>•\u1109\u1161\u11bc•\u1112\u1161\u11bc •\u1112\u1161\u11ab•\u110b\u1175\u11ab •\u110b\u1167\u11ab•\u1112\u1161\u11b8 •\u110c\u1161\u11bc•\u1105\u1169•\u1100\u116d•\u1112\u116c•</data>
  
@@ -711,6 +730,10 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  #
  <data>•abc\ud801xyz•</data>
  
+#   a character sequence such as "X11" or "30F3" or "native2ascii" should
+#   be kept together as a single word.
+<data>•X11 •30F3 •native2ascii•</data>
+
  #
  #     Regression tests for failures that originally came from the monkey test.
  #     Monkey test failure lines can, with slight reformatting, be copied into this section
@@ -732,6 +755,14 @@ What is the proper use of the abbreviation pp.? •Yes, I am definatelly 12" tal
  <line>
  <data>•R$ •JP¥ •a9 •3a •H% •CA$ •Travi$ •Scott •Ke$ha •Curren$y •A$AP •Rocky•</data>
  
+# Test Bug 4146175 Lines
+# the fullwidth comma should stick to the preceding Japanese character
+<line>
+<data>•\u7d42\uff0c•\u308f•</data>
+
+# Empty String
+<line>
+<data>•</data>
  
  
  ########################################################################################
author	Andy Heninger <andy.heninger@gmail.com>
	Sat, 26 Aug 2017 00:44:28 +0000 (00:44 +0000)
committer	Andy Heninger <andy.heninger@gmail.com>
	Sat, 26 Aug 2017 00:44:28 +0000 (00:44 +0000)
icu4c/source/test/testdata/rbbitst.txt		patch \| blob \| history
icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/BreakIteratorTest.java		patch \| blob \| history
icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITest.java		patch \| blob \| history
icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/RBBITestExtended.java		patch \| blob \| history
icu4j/main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt		patch \| blob \| history