# <sent> any following data is for sentence break testing
# <line> any following data is for line break testing
# <char> any following data is for char break testing
+# <title> any following data is for title break testing
# <rules> rules ... </rules> following data is tested against these rules.
# Applies until a following occurence of <word>, <sent>, etc. or another <rules>
# <locale locale_name> Switch to the named locale at the next occurence of <word>, <sent>, etc.
# Treat Japanese Half Width voicing marks as combining
<data>•A\uff9e•B\uff9f\uff9e\uff9f•C•</data>
+# Test data originally from Java BreakIteratorTest.TestCharcterBreak()
+<data>•S\u0300•i\u0317•m•p•l•e\u0301• •s•a\u0302•m•p•l•e\u0303•.•w•a\u0302•w•a•f•q•\n•\r•\r\n•\n•</data>
+
########################################################################################
#
#
<data>•No breaks when . is followed by a lower, with possible intervening punct .,a .$a .)a. •</data>
#
-# Sentence Breaks: no break at the boundary between CJK and other letters
+# Sentence Breaks: no break at the boundary between CJK and other letters. TestBug4111338
#
-<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048•He said, "I can go there."\u2029•Bye, now.•</data>
+<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029\
+•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002\
+•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048\
+•He said, "I can go there."\u2029•Bye, now.•</data>
#
# Treat fullwidth variants of .!? the same as their
# test for bug #4152416: Make sure sentences ending with a capital
# letter are treated correctly
#
-<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM. •Calls to xxx will return an implementor of this interface. \u2029•</data>
+<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM. •\
+Calls to xxx will return an implementor of this interface. \u2029•</data>
# test for bug #4152117: Make sure sentence breaking is handling
# punctuation correctly [COULD NOT REPRODUCE THIS BUG, BUT TEST IS
# HERE TO MAKE SURE IT DOESN'T CROP UP]
#
-<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive. •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>. •Note that this constructor always constructs a non-negative biginteger. \n•Ahh abc.
-•</data>
+<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to\
+ \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive. \
+ •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>. \
+ •Note that this constructor always constructs a non-negative biginteger. \n•Ahh abc.•</data>
# sentence breaks for hindi which used Devanagari script
# make sure there is sentence break after ?,danda(hindi phrase separator),
# fullstop followed by space. (VERY old test)
#
-<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\
+<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?\
+•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\
\u0915\u0948\u0938\u0947 \u0939\u0948?•\u0935\u0939 \u0915\u094d\u200d\u092f\u093e\n\
-<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". •"\u092a\u095d\u093e\u0908" meaning "education" or "studies". •\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data>
+<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". \
+•"\u092a\u095d\u093e\u0908" meaning "education" or "studies". \
+•\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data>
# Regression test for bug #1984, Sentence break in Arabic text.
#
<data>•\uc0c1•\ud56d •\ud55c•\uc778 •\uc5f0•\ud569 •\uc7a5•\ub85c•\uad50•\ud68c•</data>
+# Bug 4450804 estLineBreakContractions
+#
+<line>
+<data>•These •are •'foobles'. •Don't •you •like •them?•</data>
+
+
# conjoining jamo...
<data>•\u1109\u1161\u11bc•\u1112\u1161\u11bc •\u1112\u1161\u11ab•\u110b\u1175\u11ab •\u110b\u1167\u11ab•\u1112\u1161\u11b8 •\u110c\u1161\u11bc•\u1105\u1169•\u1100\u116d•\u1112\u116c•</data>
#
<data>•abc\ud801xyz•</data>
+# a character sequence such as "X11" or "30F3" or "native2ascii" should
+# be kept together as a single word.
+<data>•X11 •30F3 •native2ascii•</data>
+
#
# Regression tests for failures that originally came from the monkey test.
# Monkey test failure lines can, with slight reformatting, be copied into this section
<line>
<data>•R$ •JP¥ •a9 •3a •H% •CA$ •Travi$ •Scott •Ke$ha •Curren$y •A$AP •Rocky•</data>
+# Test Bug 4146175 Lines
+# the fullwidth comma should stick to the preceding Japanese character
+<line>
+<data>•\u7d42\uff0c•\u308f•</data>
+
+# Empty String
+<line>
+<data>•</data>
########################################################################################
// general test subroutines
//=========================================================================
- private void generalIteratorTest(BreakIterator bi, List<String> expectedResult) {
- StringBuffer buffer = new StringBuffer();
- String text;
- for (int i = 0; i < expectedResult.size(); i++) {
- text = expectedResult.get(i);
- buffer.append(text);
- }
- text = buffer.toString();
-
- bi.setText(text);
-
- List<String> nextResults = _testFirstAndNext(bi, text);
- List<String> previousResults = _testLastAndPrevious(bi, text);
-
- logln("comparing forward and backward...");
- // TODO(#13318): As part of clean-up, permanently remove the error count check.
- //int errs = getErrorCount();
- compareFragmentLists("forward iteration", "backward iteration", nextResults,
- previousResults);
- //if (getErrorCount() == errs) {
- logln("comparing expected and actual...");
- compareFragmentLists("expected result", "actual result", expectedResult,
- nextResults);
- logln("comparing expected and actual...");
- compareFragmentLists("expected result", "actual result", expectedResult,
- nextResults);
- //}
-
- int[] boundaries = new int[expectedResult.size() + 3];
- boundaries[0] = BreakIterator.DONE;
- boundaries[1] = 0;
- for (int i = 0; i < expectedResult.size(); i++)
- boundaries[i + 2] = boundaries[i + 1] + (expectedResult.get(i)).
- length();
- boundaries[boundaries.length - 1] = BreakIterator.DONE;
-
- _testFollowing(bi, text, boundaries);
- _testPreceding(bi, text, boundaries);
- _testIsBoundary(bi, text, boundaries);
-
- doMultipleSelectionTest(bi, text);
- }
-
private List<String> _testFirstAndNext(BreakIterator bi, String text) {
int p = bi.first();
int lastP = p;
}
}
- private void doMultipleSelectionTest(BreakIterator iterator, String testText)
- {
- logln("Multiple selection test...");
- BreakIterator testIterator = (BreakIterator)iterator.clone();
- int offset = iterator.first();
- int testOffset;
- int count = 0;
-
- do {
- testOffset = testIterator.first();
- testOffset = testIterator.next(count);
- logln("next(" + count + ") -> " + testOffset);
- if (offset != testOffset)
- errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset);
-
- if (offset != BreakIterator.DONE) {
- count++;
- offset = iterator.next();
- }
- } while (offset != BreakIterator.DONE);
-
- // now do it backwards...
- offset = iterator.last();
- count = 0;
-
- do {
- testOffset = testIterator.last();
- testOffset = testIterator.next(count);
- logln("next(" + count + ") -> " + testOffset);
- if (offset != testOffset)
- errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset);
-
- if (offset != BreakIterator.DONE) {
- count--;
- offset = iterator.previous();
- }
- } while (offset != BreakIterator.DONE);
- }
-
-
private void doOtherInvariantTest(BreakIterator tb, String testChars)
{
StringBuffer work = new StringBuffer("a\r\na");
//=========================================================================
- /**
- * @bug 4097779
- */
- @Test
- public void TestBug4097779() {
- List<String> wordSelectionData = new ArrayList<String>(2);
-
- wordSelectionData.add("aa\u0300a");
- wordSelectionData.add(" ");
-
- generalIteratorTest(wordBreak, wordSelectionData);
- }
-
- /**
- * @bug 4098467
- */
- @Test
- public void TestBug4098467Words() {
- List<String> wordSelectionData = new ArrayList<String>();
-
- // What follows is a string of Korean characters (I found it in the Yellow Pages
- // ad for the Korean Presbyterian Church of San Francisco, and I hope I transcribed
- // it correctly), first as precomposed syllables, and then as conjoining jamo.
- // Both sequences should be semantically identical and break the same way.
- // precomposed syllables...
- wordSelectionData.add("\uc0c1\ud56d");
- wordSelectionData.add(" ");
- wordSelectionData.add("\ud55c\uc778");
- wordSelectionData.add(" ");
- wordSelectionData.add("\uc5f0\ud569");
- wordSelectionData.add(" ");
- wordSelectionData.add("\uc7a5\ub85c\uad50\ud68c");
- wordSelectionData.add(" ");
- // conjoining jamo...
- wordSelectionData.add("\u1109\u1161\u11bc\u1112\u1161\u11bc");
- wordSelectionData.add(" ");
- wordSelectionData.add("\u1112\u1161\u11ab\u110b\u1175\u11ab");
- wordSelectionData.add(" ");
- wordSelectionData.add("\u110b\u1167\u11ab\u1112\u1161\u11b8");
- wordSelectionData.add(" ");
- wordSelectionData.add("\u110c\u1161\u11bc\u1105\u1169\u1100\u116d\u1112\u116c");
- wordSelectionData.add(" ");
-
- generalIteratorTest(wordBreak, wordSelectionData);
- }
-
-
- /**
- * @bug 4111338
- */
- @Test
- public void TestBug4111338() {
- List<String> sentenceSelectionData = new ArrayList<String>();
-
- // test for bug #4111338: Don't break sentences at the boundary between CJK
- // and other letters
- sentenceSelectionData.add("\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:\"JAVA\u821c"
- + "\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba"
- + "\u611d\u57b6\u2510\u5d46\".\u2029");
- sentenceSelectionData.add("\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8"
- + "\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0"
- + "\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2029");
- sentenceSelectionData.add("\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4"
- + "\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8"
- + "\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2029");
- sentenceSelectionData.add("He said, \"I can go there.\"\u2029");
-
- generalIteratorTest(sentenceBreak, sentenceSelectionData);
- }
-
-
- /**
- * @bug 4143071
- */
- @Test
- public void TestBug4143071() {
- List<String> sentenceSelectionData = new ArrayList<String>(3);
-
- // Make sure sentences that end with digits work right
- sentenceSelectionData.add("Today is the 27th of May, 1998. ");
- sentenceSelectionData.add("Tomorrow will be 28 May 1998. ");
- sentenceSelectionData.add("The day after will be the 30th.\u2029");
-
- generalIteratorTest(sentenceBreak, sentenceSelectionData);
- }
-
- /**
- * @bug 4152416
- */
- @Test
- public void TestBug4152416() {
- List<String> sentenceSelectionData = new ArrayList<String>(2);
-
- // Make sure sentences ending with a capital letter are treated correctly
- sentenceSelectionData.add("The type of all primitive "
- + "<code>boolean</code> values accessed in the target VM. ");
- sentenceSelectionData.add("Calls to xxx will return an "
- + "implementor of this interface.\u2029");
-
- generalIteratorTest(sentenceBreak, sentenceSelectionData);
- }
-
- /**
- * @bug 4152117
- */
- @Test
- public void TestBug4152117() {
- List<String> sentenceSelectionData = new ArrayList<String>(3);
-
- // Make sure sentence breaking is handling punctuation correctly
- // [COULD NOT REPRODUCE THIS BUG, BUT TEST IS HERE TO MAKE SURE
- // IT DOESN'T CROP UP]
- sentenceSelectionData.add("Constructs a randomly generated "
- + "BigInteger, uniformly distributed over the range <tt>0</tt> "
- + "to <tt>(2<sup>numBits</sup> - 1)</tt>, inclusive. ");
- sentenceSelectionData.add("The uniformity of the distribution "
- + "assumes that a fair source of random bits is provided in "
- + "<tt>rnd</tt>. ");
- sentenceSelectionData.add("Note that this constructor always "
- + "constructs a non-negative BigInteger.\u2029");
-
- generalIteratorTest(sentenceBreak, sentenceSelectionData);
- }
-
- @Test
- public void TestLineBreak() {
- List<String> lineSelectionData = new ArrayList<String>();
-
- lineSelectionData.add("Multi-");
- lineSelectionData.add("Level ");
- lineSelectionData.add("example ");
- lineSelectionData.add("of ");
- lineSelectionData.add("a ");
- lineSelectionData.add("semi-");
- lineSelectionData.add("idiotic ");
- lineSelectionData.add("non-");
- lineSelectionData.add("sensical ");
- lineSelectionData.add("(non-");
- lineSelectionData.add("important) ");
- lineSelectionData.add("sentence. ");
-
- lineSelectionData.add("Hi ");
- lineSelectionData.add("Hello ");
- lineSelectionData.add("How\n");
- lineSelectionData.add("are\r");
- lineSelectionData.add("you\u2028");
- lineSelectionData.add("fine.\t");
- lineSelectionData.add("good. ");
-
- lineSelectionData.add("Now\r");
- lineSelectionData.add("is\n");
- lineSelectionData.add("the\r\n");
- lineSelectionData.add("time\n");
- lineSelectionData.add("\r");
- lineSelectionData.add("for\r");
- lineSelectionData.add("\r");
- lineSelectionData.add("all");
-
- generalIteratorTest(lineBreak, lineSelectionData);
- }
-
- /**
- * @bug 4068133
- */
- @Test
- public void TestBug4068133() {
- List<String> lineSelectionData = new ArrayList<String>(9);
-
- lineSelectionData.add("\u96f6");
- lineSelectionData.add("\u4e00\u3002");
- lineSelectionData.add("\u4e8c\u3001");
- lineSelectionData.add("\u4e09\u3002\u3001");
- lineSelectionData.add("\u56db\u3001\u3002\u3001");
- lineSelectionData.add("\u4e94,");
- lineSelectionData.add("\u516d.");
- lineSelectionData.add("\u4e03.\u3001,\u3002");
- lineSelectionData.add("\u516b");
-
- generalIteratorTest(lineBreak, lineSelectionData);
- }
-
- /**
- * @bug 4086052
- */
- @Test
- public void TestBug4086052() {
- List<String> lineSelectionData = new ArrayList<String>(1);
-
- lineSelectionData.add("foo\u00a0bar ");
-// lineSelectionData.addElement("foo\ufeffbar");
-
- generalIteratorTest(lineBreak, lineSelectionData);
- }
-
- /**
- * @bug 4097920
- */
- @Test
- public void TestBug4097920() {
- List<String> lineSelectionData = new ArrayList<String>(3);
-
- lineSelectionData.add("dog,cat,mouse ");
- lineSelectionData.add("(one)");
- lineSelectionData.add("(two)\n");
- generalIteratorTest(lineBreak, lineSelectionData);
- }
-
-
-
- /**
- * @bug 4117554
- */
- @Test
- public void TestBug4117554Lines() {
- List<String> lineSelectionData = new ArrayList<String>(3);
-
- // Fullwidth .!? should be treated as postJwrd
- lineSelectionData.add("\u4e01\uff0e");
- lineSelectionData.add("\u4e02\uff01");
- lineSelectionData.add("\u4e03\uff1f");
-
- generalIteratorTest(lineBreak, lineSelectionData);
- }
-
- @Test
- public void TestLettersAndDigits() {
- // a character sequence such as "X11" or "30F3" or "native2ascii" should
- // be kept together as a single word
- List<String> lineSelectionData = new ArrayList<String>(3);
-
- lineSelectionData.add("X11 ");
- lineSelectionData.add("30F3 ");
- lineSelectionData.add("native2ascii");
-
- generalIteratorTest(lineBreak, lineSelectionData);
- }
-
-
- private static final String graveS = "S\u0300";
- private static final String acuteBelowI = "i\u0317";
- private static final String acuteE = "e\u0301";
- private static final String circumflexA = "a\u0302";
- private static final String tildeE = "e\u0303";
-
- @Test
- public void TestCharacterBreak() {
- List<String> characterSelectionData = new ArrayList<String>();
-
- characterSelectionData.add(graveS);
- characterSelectionData.add(acuteBelowI);
- characterSelectionData.add("m");
- characterSelectionData.add("p");
- characterSelectionData.add("l");
- characterSelectionData.add(acuteE);
- characterSelectionData.add(" ");
- characterSelectionData.add("s");
- characterSelectionData.add(circumflexA);
- characterSelectionData.add("m");
- characterSelectionData.add("p");
- characterSelectionData.add("l");
- characterSelectionData.add(tildeE);
- characterSelectionData.add(".");
- characterSelectionData.add("w");
- characterSelectionData.add(circumflexA);
- characterSelectionData.add("w");
- characterSelectionData.add("a");
- characterSelectionData.add("f");
- characterSelectionData.add("q");
- characterSelectionData.add("\n");
- characterSelectionData.add("\r");
- characterSelectionData.add("\r\n");
- characterSelectionData.add("\n");
-
- generalIteratorTest(characterBreak, characterSelectionData);
- }
-
- /**
- * @bug 4098467
- */
- @Test
- public void TestBug4098467Characters() {
- List<String> characterSelectionData = new ArrayList<String>();
-
- // What follows is a string of Korean characters (I found it in the Yellow Pages
- // ad for the Korean Presbyterian Church of San Francisco, and I hope I transcribed
- // it correctly), first as precomposed syllables, and then as conjoining jamo.
- // Both sequences should be semantically identical and break the same way.
- // precomposed syllables...
- characterSelectionData.add("\uc0c1");
- characterSelectionData.add("\ud56d");
- characterSelectionData.add(" ");
- characterSelectionData.add("\ud55c");
- characterSelectionData.add("\uc778");
- characterSelectionData.add(" ");
- characterSelectionData.add("\uc5f0");
- characterSelectionData.add("\ud569");
- characterSelectionData.add(" ");
- characterSelectionData.add("\uc7a5");
- characterSelectionData.add("\ub85c");
- characterSelectionData.add("\uad50");
- characterSelectionData.add("\ud68c");
- characterSelectionData.add(" ");
- // conjoining jamo...
- characterSelectionData.add("\u1109\u1161\u11bc");
- characterSelectionData.add("\u1112\u1161\u11bc");
- characterSelectionData.add(" ");
- characterSelectionData.add("\u1112\u1161\u11ab");
- characterSelectionData.add("\u110b\u1175\u11ab");
- characterSelectionData.add(" ");
- characterSelectionData.add("\u110b\u1167\u11ab");
- characterSelectionData.add("\u1112\u1161\u11b8");
- characterSelectionData.add(" ");
- characterSelectionData.add("\u110c\u1161\u11bc");
- characterSelectionData.add("\u1105\u1169");
- characterSelectionData.add("\u1100\u116d");
- characterSelectionData.add("\u1112\u116c");
-
- generalIteratorTest(characterBreak, characterSelectionData);
- }
-
- @Test
- public void TestTitleBreak()
- {
- List<String> titleData = new ArrayList<String>();
- titleData.add(" ");
- titleData.add("This ");
- titleData.add("is ");
- titleData.add("a ");
- titleData.add("simple ");
- titleData.add("sample ");
- titleData.add("sentence. ");
- titleData.add("This ");
-
- generalIteratorTest(titleBreak, titleData);
- }
-
-
-
/*
* @bug 4153072
*/
}
- @Test
- public void TestBug4146175Lines() {
- List<String> lineSelectionData = new ArrayList<String>(2);
-
- // the fullwidth comma should stick to the preceding Japanese character
- lineSelectionData.add("\u7d42\uff0c");
- lineSelectionData.add("\u308f");
-
- generalIteratorTest(lineBreak, lineSelectionData);
- }
-
private static final String cannedTestChars
= "\u0000\u0001\u0002\u0003\u0004 !\"#$%&()+-01234<=>ABCDE[]^_`abcde{}|\u00a0\u00a2"
+ "\u00a3\u00a4\u00a5\u00a6\u00a7\u00a8\u00a9\u00ab\u00ad\u00ae\u00af\u00b0\u00b2\u00b3"
doOtherInvariantTest(e, cannedTestChars + ".,\u3001\u3002\u3041\u3042\u3043\ufeff");
}
- @Test
- public void TestEmptyString()
- {
- String text = "";
- List<String> x = new ArrayList<String>(1);
- x.add(text);
-
- generalIteratorTest(lineBreak, x);
- }
-
@Test
public void TestGetAvailableLocales()
{
}
}
-
- /**
- * Bug 4450804
- */
- @Test
- public void TestLineBreakContractions() {
- List<String> expected = new ArrayList<String>(7);
- expected.add("These ");
- expected.add("are ");
- expected.add("'foobles'. ");
- expected.add("Don't ");
- expected.add("you ");
- expected.add("like ");
- expected.add("them?");
- generalIteratorTest(lineBreak, expected);
- }
-
/**
* Ticket#5615
*/
public RBBITest() {
}
- private static final String halfNA = "\u0928\u094d\u200d"; /*
- * halfform NA = devanigiri NA + virama(supresses
- * inherent vowel)+ zero width joiner
- */
-
- // tests default rules based character iteration.
- // Builds a new iterator from the source rules in the default (prebuilt) iterator.
- //
- @Test
- public void TestDefaultRuleBasedCharacterIteration() {
- RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) BreakIterator.getCharacterInstance();
- logln("Testing the RBBI for character iteration by using default rules");
-
- // fetch the rules used to create the above RuleBasedBreakIterator
- String defaultRules = rbbi.toString();
-
- RuleBasedBreakIterator charIterDefault = null;
- try {
- charIterDefault = new RuleBasedBreakIterator(defaultRules);
- } catch (IllegalArgumentException iae) {
- errln("ERROR: failed construction in TestDefaultRuleBasedCharacterIteration()" + iae.toString());
- }
-
- List<String> chardata = new ArrayList<String>();
- chardata.add("H");
- chardata.add("e");
- chardata.add("l");
- chardata.add("l");
- chardata.add("o");
- chardata.add("e\u0301"); // acuteE
- chardata.add("&");
- chardata.add("e\u0303"); // tildaE
- // devanagiri characters for Hindi support
- chardata.add("\u0906"); // devanagiri AA
- // chardata.add("\u093e\u0901"); //devanagiri vowelsign AA+ chandrabindhu
- chardata.add("\u0916\u0947"); // devanagiri KHA+vowelsign E
- chardata.add("\u0938\u0941\u0902"); // devanagiri SA+vowelsign U + anusvara(bindu)
- chardata.add("\u0926"); // devanagiri consonant DA
- chardata.add("\u0930"); // devanagiri consonant RA
- // chardata.add("\u0939\u094c"); //devanagiri HA+vowel sign AI
- chardata.add("\u0964"); // devanagiri danda
- // end hindi characters
- chardata.add("A\u0302"); // circumflexA
- chardata.add("i\u0301"); // acuteBelowI
- // conjoining jamo...
- chardata.add("\u1109\u1161\u11bc");
- chardata.add("\u1112\u1161\u11bc");
- chardata.add("\n");
- chardata.add("\r\n"); // keep CRLF sequences together
- chardata.add("S\u0300"); // graveS
- chardata.add("i\u0301"); // acuteBelowI
- chardata.add("!");
-
- // What follows is a string of Korean characters (I found it in the Yellow Pages
- // ad for the Korean Presbyterian Church of San Francisco, and I hope I transcribed
- // it correctly), first as precomposed syllables, and then as conjoining jamo.
- // Both sequences should be semantically identical and break the same way.
- // precomposed syllables...
- chardata.add("\uc0c1");
- chardata.add("\ud56d");
- chardata.add(" ");
- chardata.add("\ud55c");
- chardata.add("\uc778");
- chardata.add(" ");
- chardata.add("\uc5f0");
- chardata.add("\ud569");
- chardata.add(" ");
- chardata.add("\uc7a5");
- chardata.add("\ub85c");
- chardata.add("\uad50");
- chardata.add("\ud68c");
- chardata.add(" ");
- // conjoining jamo...
- chardata.add("\u1109\u1161\u11bc");
- chardata.add("\u1112\u1161\u11bc");
- chardata.add(" ");
- chardata.add("\u1112\u1161\u11ab");
- chardata.add("\u110b\u1175\u11ab");
- chardata.add(" ");
- chardata.add("\u110b\u1167\u11ab");
- chardata.add("\u1112\u1161\u11b8");
- chardata.add(" ");
- chardata.add("\u110c\u1161\u11bc");
- chardata.add("\u1105\u1169");
- chardata.add("\u1100\u116d");
- chardata.add("\u1112\u116c");
-
- generalIteratorTest(charIterDefault, chardata);
-
- }
-
- @Test
- public void TestDefaultRuleBasedWordIteration() {
- logln("Testing the RBBI for word iteration using default rules");
- RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) BreakIterator.getWordInstance();
- // fetch the rules used to create the above RuleBasedBreakIterator
- String defaultRules = rbbi.toString();
-
- RuleBasedBreakIterator wordIterDefault = null;
- try {
- wordIterDefault = new RuleBasedBreakIterator(defaultRules);
- } catch (IllegalArgumentException iae) {
- errln("ERROR: failed construction in TestDefaultRuleBasedWordIteration() -- custom rules" + iae.toString());
- }
-
- List<String> worddata = new ArrayList<String>();
- worddata.add("Write");
- worddata.add(" ");
- worddata.add("wordrules");
- worddata.add(".");
- worddata.add(" ");
- // worddata.add("alpha-beta-gamma");
- worddata.add(" ");
- worddata.add("\u092f\u0939");
- worddata.add(" ");
- worddata.add("\u0939\u093f" + halfNA + "\u0926\u0940");
- worddata.add(" ");
- worddata.add("\u0939\u0948");
- // worddata.add("\u0964"); //danda followed by a space
- worddata.add(" ");
- worddata.add("\u0905\u093e\u092a");
- worddata.add(" ");
- worddata.add("\u0938\u093f\u0916\u094b\u0917\u0947");
- worddata.add("?");
- worddata.add(" ");
- worddata.add("\r");
- worddata.add("It's");
- worddata.add(" ");
- // worddata.add("$30.10");
- worddata.add(" ");
- worddata.add(" ");
- worddata.add("Badges");
- worddata.add("?");
- worddata.add(" ");
- worddata.add("BADGES");
- worddata.add("!");
- worddata.add("1000,233,456.000");
- worddata.add(" ");
-
- generalIteratorTest(wordIterDefault, worddata);
- }
-// private static final String kParagraphSeparator = "\u2029";
- private static final String kLineSeparator = "\u2028";
-
- @Test
- public void TestDefaultRuleBasedSentenceIteration() {
- logln("Testing the RBBI for sentence iteration using default rules");
- RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) BreakIterator.getSentenceInstance();
-
- // fetch the rules used to create the above RuleBasedBreakIterator
- String defaultRules = rbbi.toString();
- RuleBasedBreakIterator sentIterDefault = null;
- try {
- sentIterDefault = new RuleBasedBreakIterator(defaultRules);
- } catch (IllegalArgumentException iae) {
- errln("ERROR: failed construction in TestDefaultRuleBasedSentenceIteration()" + iae.toString());
- }
-
- List<String> sentdata = new ArrayList<String>();
- sentdata.add("(This is it.) ");
- sentdata.add("Testing the sentence iterator. ");
- sentdata.add("\"This isn\'t it.\" ");
- sentdata.add("Hi! ");
- sentdata.add("This is a simple sample sentence. ");
- sentdata.add("(This is it.) ");
- sentdata.add("This is a simple sample sentence. ");
- sentdata.add("\"This isn\'t it.\" ");
- sentdata.add("Hi! ");
- sentdata.add("This is a simple sample sentence. ");
- sentdata.add("It does not have to make any sense as you can see. ");
- sentdata.add("Nel mezzo del cammin di nostra vita, mi ritrovai in una selva oscura. ");
- sentdata.add("Che la dritta via aveo smarrita. ");
-
- generalIteratorTest(sentIterDefault, sentdata);
- }
-
- @Test
- public void TestDefaultRuleBasedLineIteration() {
- logln("Testing the RBBI for line iteration using default rules");
- RuleBasedBreakIterator rbbi = (RuleBasedBreakIterator) RuleBasedBreakIterator.getLineInstance();
- // fetch the rules used to create the above RuleBasedBreakIterator
- String defaultRules = rbbi.toString();
- RuleBasedBreakIterator lineIterDefault = null;
- try {
- lineIterDefault = new RuleBasedBreakIterator(defaultRules);
- } catch (IllegalArgumentException iae) {
- errln("ERROR: failed construction in TestDefaultRuleBasedLineIteration()" + iae.toString());
- }
-
- List<String> linedata = new ArrayList<String>();
- linedata.add("Multi-");
- linedata.add("Level ");
- linedata.add("example ");
- linedata.add("of ");
- linedata.add("a ");
- linedata.add("semi-");
- linedata.add("idiotic ");
- linedata.add("non-");
- linedata.add("sensical ");
- linedata.add("(non-");
- linedata.add("important) ");
- linedata.add("sentence. ");
-
- linedata.add("Hi ");
- linedata.add("Hello ");
- linedata.add("How\n");
- linedata.add("are\r");
- linedata.add("you" + kLineSeparator);
- linedata.add("fine.\t");
- linedata.add("good. ");
-
- linedata.add("Now\r");
- linedata.add("is\n");
- linedata.add("the\r\n");
- linedata.add("time\n");
- linedata.add("\r");
- linedata.add("for\r");
- linedata.add("\r");
- linedata.add("all");
-
- generalIteratorTest(lineIterDefault, linedata);
-
- }
-
- // =========================================================================
- // general test subroutines
- // =========================================================================
-
- private void generalIteratorTest(RuleBasedBreakIterator rbbi, List<String> expectedResult) {
- StringBuffer buffer = new StringBuffer();
- String text;
- for (int i = 0; i < expectedResult.size(); i++) {
- text = expectedResult.get(i);
- buffer.append(text);
- }
- text = buffer.toString();
- if (rbbi == null) {
- errln("null iterator, test skipped.");
- return;
- }
-
- rbbi.setText(text);
-
- List<String> nextResults = _testFirstAndNext(rbbi, text);
- List<String> previousResults = _testLastAndPrevious(rbbi, text);
-
- logln("comparing forward and backward...");
- // TODO(#13318): As part of clean-up, permanently remove the error count check.
- //int errs = getErrorCount();
- compareFragmentLists("forward iteration", "backward iteration", nextResults, previousResults);
- //if (getErrorCount() == errs) {
- logln("comparing expected and actual...");
- compareFragmentLists("expected result", "actual result", expectedResult, nextResults);
- logln("comparing expected and actual...");
- compareFragmentLists("expected result", "actual result", expectedResult, nextResults);
- //}
-
- int[] boundaries = new int[expectedResult.size() + 3];
- boundaries[0] = RuleBasedBreakIterator.DONE;
- boundaries[1] = 0;
- for (int i = 0; i < expectedResult.size(); i++) {
- boundaries[i + 2] = boundaries[i + 1] + (expectedResult.get(i).length());
- }
-
- boundaries[boundaries.length - 1] = RuleBasedBreakIterator.DONE;
-
- _testFollowing(rbbi, text, boundaries);
- _testPreceding(rbbi, text, boundaries);
- _testIsBoundary(rbbi, text, boundaries);
-
- doMultipleSelectionTest(rbbi, text);
- }
-
- private List<String> _testFirstAndNext(RuleBasedBreakIterator rbbi, String text) {
- int p = rbbi.first();
- int lastP = p;
- List<String> result = new ArrayList<String>();
-
- if (p != 0) {
- errln("first() returned " + p + " instead of 0");
- }
-
- while (p != RuleBasedBreakIterator.DONE) {
- p = rbbi.next();
- if (p != RuleBasedBreakIterator.DONE) {
- if (p <= lastP) {
- errln("next() failed to move forward: next() on position "
- + lastP + " yielded " + p);
- }
- result.add(text.substring(lastP, p));
- }
- else {
- if (lastP != text.length()) {
- errln("next() returned DONE prematurely: offset was "
- + lastP + " instead of " + text.length());
- }
- }
- lastP = p;
- }
- return result;
- }
-
- private List<String> _testLastAndPrevious(RuleBasedBreakIterator rbbi, String text) {
- int p = rbbi.last();
- int lastP = p;
- List<String> result = new ArrayList<String>();
-
- if (p != text.length()) {
- errln("last() returned " + p + " instead of " + text.length());
- }
-
- while (p != RuleBasedBreakIterator.DONE) {
- p = rbbi.previous();
- if (p != RuleBasedBreakIterator.DONE) {
- if (p >= lastP) {
- errln("previous() failed to move backward: previous() on position "
- + lastP + " yielded " + p);
- }
-
- result.add(0, text.substring(p, lastP));
- }
- else {
- if (lastP != 0) {
- errln("previous() returned DONE prematurely: offset was "
- + lastP + " instead of 0");
- }
- }
- lastP = p;
- }
- return result;
- }
-
- private void compareFragmentLists(String f1Name, String f2Name, List<String> f1, List<String> f2) {
- int p1 = 0;
- int p2 = 0;
- String s1;
- String s2;
- int t1 = 0;
- int t2 = 0;
-
- while (p1 < f1.size() && p2 < f2.size()) {
- s1 = f1.get(p1);
- s2 = f2.get(p2);
- t1 += s1.length();
- t2 += s2.length();
-
- if (s1.equals(s2)) {
- debugLogln(" >" + s1 + "<");
- ++p1;
- ++p2;
- }
- else {
- int tempT1 = t1;
- int tempT2 = t2;
- int tempP1 = p1;
- int tempP2 = p2;
-
- while (tempT1 != tempT2 && tempP1 < f1.size() && tempP2 < f2.size()) {
- while (tempT1 < tempT2 && tempP1 < f1.size()) {
- tempT1 += (f1.get(tempP1)).length();
- ++tempP1;
- }
- while (tempT2 < tempT1 && tempP2 < f2.size()) {
- tempT2 += (f2.get(tempP2)).length();
- ++tempP2;
- }
- }
- logln("*** " + f1Name + " has:");
- while (p1 <= tempP1 && p1 < f1.size()) {
- s1 = f1.get(p1);
- t1 += s1.length();
- debugLogln(" *** >" + s1 + "<");
- ++p1;
- }
- logln("***** " + f2Name + " has:");
- while (p2 <= tempP2 && p2 < f2.size()) {
- s2 = f2.get(p2);
- t2 += s2.length();
- debugLogln(" ***** >" + s2 + "<");
- ++p2;
- }
- errln("Discrepancy between " + f1Name + " and " + f2Name);
- }
- }
- }
-
- private void _testFollowing(RuleBasedBreakIterator rbbi, String text, int[] boundaries) {
- logln("testFollowing():");
- int p = 2;
- for(int i = 0; i <= text.length(); i++) {
- if (i == boundaries[p])
- ++p;
- int b = rbbi.following(i);
- logln("rbbi.following(" + i + ") -> " + b);
- if (b != boundaries[p])
- errln("Wrong result from following() for " + i + ": expected " + boundaries[p]
- + ", got " + b);
- }
- }
-
- private void _testPreceding(RuleBasedBreakIterator rbbi, String text, int[] boundaries) {
- logln("testPreceding():");
- int p = 0;
- for(int i = 0; i <= text.length(); i++) {
- int b = rbbi.preceding(i);
- logln("rbbi.preceding(" + i + ") -> " + b);
- if (b != boundaries[p])
- errln("Wrong result from preceding() for " + i + ": expected " + boundaries[p]
- + ", got " + b);
- if (i == boundaries[p + 1])
- ++p;
- }
- }
-
- private void _testIsBoundary(RuleBasedBreakIterator rbbi, String text, int[] boundaries) {
- logln("testIsBoundary():");
- int p = 1;
- boolean isB;
- for(int i = 0; i <= text.length(); i++) {
- isB = rbbi.isBoundary(i);
- logln("rbbi.isBoundary(" + i + ") -> " + isB);
- if(i == boundaries[p]) {
- if (!isB)
- errln("Wrong result from isBoundary() for " + i + ": expected true, got false");
- ++p;
- }
- else {
- if(isB)
- errln("Wrong result from isBoundary() for " + i + ": expected false, got true");
- }
- }
- }
- private void doMultipleSelectionTest(RuleBasedBreakIterator iterator, String testText)
- {
- logln("Multiple selection test...");
- RuleBasedBreakIterator testIterator = (RuleBasedBreakIterator)iterator.clone();
- int offset = iterator.first();
- int testOffset;
- int count = 0;
-
- do {
- testOffset = testIterator.first();
- testOffset = testIterator.next(count);
- logln("next(" + count + ") -> " + testOffset);
- if (offset != testOffset)
- errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset);
-
- if (offset != RuleBasedBreakIterator.DONE) {
- count++;
- offset = iterator.next();
- }
- } while (offset != RuleBasedBreakIterator.DONE);
-
- // now do it backwards...
- offset = iterator.last();
- count = 0;
-
- do {
- testOffset = testIterator.last();
- testOffset = testIterator.next(count);
- logln("next(" + count + ") -> " + testOffset);
- if (offset != testOffset)
- errln("next(n) and next() not returning consistent results: for step " + count + ", next(n) returned " + testOffset + " and next() had " + offset);
-
- if (offset != RuleBasedBreakIterator.DONE) {
- count--;
- offset = iterator.previous();
- }
- } while (offset != RuleBasedBreakIterator.DONE);
- }
-
- private void debugLogln(String s) {
- final String zeros = "0000";
- String temp;
- StringBuffer out = new StringBuffer();
- for (int i = 0; i < s.length(); i++) {
- char c = s.charAt(i);
- if (c >= ' ' && c < '\u007f')
- out.append(c);
- else {
- out.append("\\u");
- temp = Integer.toHexString(c);
- out.append(zeros.substring(0, 4 - temp.length()));
- out.append(temp);
- }
- }
- logln(out.toString());
- }
@Test
public void TestThaiDictionaryBreakIterator() {
}
return buildString.toString();
}
- @Test
+
public void doTest() {
BreakIterator brkIter;
switch( type ) {
}
void executeTest(TestParams t) {
+ // TODO: also rerun tests with a break iterator re-created from bi.getRules()
+ // and from bi.clone(). If in exhaustive mode only.
int bp;
int prevBP;
int i;
# <sent> any following data is for sentence break testing
# <line> any following data is for line break testing
# <char> any following data is for char break testing
+# <title> any following data is for title break testing
# <rules> rules ... </rules> following data is tested against these rules.
# Applies until a following occurence of <word>, <sent>, etc. or another <rules>
# <locale locale_name> Switch to the named locale at the next occurence of <word>, <sent>, etc.
# Treat Japanese Half Width voicing marks as combining
<data>•A\uff9e•B\uff9f\uff9e\uff9f•C•</data>
+# Test data originally from Java BreakIteratorTest.TestCharcterBreak()
+<data>•S\u0300•i\u0317•m•p•l•e\u0301• •s•a\u0302•m•p•l•e\u0303•.•w•a\u0302•w•a•f•q•\n•\r•\r\n•\n•</data>
+
########################################################################################
#
#
<data>•No breaks when . is followed by a lower, with possible intervening punct .,a .$a .)a. •</data>
#
-# Sentence Breaks: no break at the boundary between CJK and other letters
+# Sentence Breaks: no break at the boundary between CJK and other letters. TestBug4111338
#
-<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048•He said, "I can go there."\u2029•Bye, now.•</data>
+<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u2029\
+•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u3002\
+•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165\u9de8\u97e4\u6470\u8790JAVA\u821c\u8165\u7fc8\u51ce\u306d\ue30b\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46\u97e5\u7751\u2048\
+•He said, "I can go there."\u2029•Bye, now.•</data>
#
# Treat fullwidth variants of .!? the same as their
# test for bug #4152416: Make sure sentences ending with a capital
# letter are treated correctly
#
-<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM. •Calls to xxx will return an implementor of this interface. \u2029•</data>
+<data>•The type of all primitive \<code>boolean\</code> values accessed in the target VM. •\
+Calls to xxx will return an implementor of this interface. \u2029•</data>
# test for bug #4152117: Make sure sentence breaking is handling
# punctuation correctly [COULD NOT REPRODUCE THIS BUG, BUT TEST IS
# HERE TO MAKE SURE IT DOESN'T CROP UP]
#
-<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive. •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>. •Note that this constructor always constructs a non-negative biginteger. \n•Ahh abc.
-•</data>
+<data>•Constructs a randomly generated BigInteger, uniformly distributed over the range \<tt>0\</tt> to\
+ \<tt>(2\<sup>numBits\</sup> - 1\)\</tt>, inclusive. \
+ •The uniformity of the distribution assumes that a fair source of random bits is provided in \<tt>rnd\</tt>. \
+ •Note that this constructor always constructs a non-negative biginteger. \n•Ahh abc.•</data>
# sentence breaks for hindi which used Devanagari script
# make sure there is sentence break after ?,danda(hindi phrase separator),
# fullstop followed by space. (VERY old test)
#
-<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\
+<data>•\u0928\u092e\u0938\u094d\u200d\u0924\u0947 \u0930\u092e\u0947\u0936\u0905\u093e\u092a\u0915\u0948\u0938\u0947 \u0939\u0948?\
+•\u092e\u0948 \u0905\u091a\u094d\u200d \u091b\u093e \u0939\u0942\u0901\u0964 •\u0905\u093e\u092a\r\n<100>\
\u0915\u0948\u0938\u0947 \u0939\u0948?•\u0935\u0939 \u0915\u094d\u200d\u092f\u093e\n\
-<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". •"\u092a\u095d\u093e\u0908" meaning "education" or "studies". •\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data>
+<100>\u0939\u0948?•\u092f\u0939 \u0905\u093e\u092e \u0939\u0948. •\u092f\u0939 means "this". \
+•"\u092a\u095d\u093e\u0908" meaning "education" or "studies". \
+•\u0905\u093e\u091c(\u0938\u094d\u200d\u0935\u0924\u0902\u0924\u094d\u0930 \u0926\u093f\u0935\u093e\u0938) \u0939\u0948\u0964 •Let's end here. •</data>
# Regression test for bug #1984, Sentence break in Arabic text.
#
<data>•\uc0c1•\ud56d •\ud55c•\uc778 •\uc5f0•\ud569 •\uc7a5•\ub85c•\uad50•\ud68c•</data>
+# Bug 4450804 estLineBreakContractions
+#
+<line>
+<data>•These •are •'foobles'. •Don't •you •like •them?•</data>
+
+
# conjoining jamo...
<data>•\u1109\u1161\u11bc•\u1112\u1161\u11bc •\u1112\u1161\u11ab•\u110b\u1175\u11ab •\u110b\u1167\u11ab•\u1112\u1161\u11b8 •\u110c\u1161\u11bc•\u1105\u1169•\u1100\u116d•\u1112\u116c•</data>
#
<data>•abc\ud801xyz•</data>
+# a character sequence such as "X11" or "30F3" or "native2ascii" should
+# be kept together as a single word.
+<data>•X11 •30F3 •native2ascii•</data>
+
#
# Regression tests for failures that originally came from the monkey test.
# Monkey test failure lines can, with slight reformatting, be copied into this section
<line>
<data>•R$ •JP¥ •a9 •3a •H% •CA$ •Travi$ •Scott •Ke$ha •Curren$y •A$AP •Rocky•</data>
+# Test Bug 4146175 Lines
+# the fullwidth comma should stick to the preceding Japanese character
+<line>
+<data>•\u7d42\uff0c•\u308f•</data>
+
+# Empty String
+<line>
+<data>•</data>
########################################################################################