From: Andrei Zmievski Date: Fri, 23 Sep 2005 21:24:31 +0000 (+0000) Subject: substr() sample case X-Git-Tag: RELEASE_0_9_0~130 X-Git-Url: https://granicus.if.org/sourcecode?a=commitdiff_plain;h=cd43b7dda796853ebd14e5334abadd61468db05f;p=php substr() sample case --- diff --git a/README.UNICODE-UPGRADES b/README.UNICODE-UPGRADES index a663163399..8a637082c7 100644 --- a/README.UNICODE-UPGRADES +++ b/README.UNICODE-UPGRADES @@ -262,6 +262,66 @@ Unicode strings: +Upgrading Functions +=================== + +Let's take a look at a couple of functions that have been upgraded to +support new string types. + +substr() +-------- + +This functions returns part of a string based on offset and length +parameters. + + void *str; + int32_t str_len, cp_len; + zend_uchar str_type; + + if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "tl|l", &str, &str_len, &str_type, &f, &l) == FAILURE) { + return; + } + +The first thing we notice is that the incoming string specifier is 't', +which means that we can accept all 3 string types. The 'str' variable is +declared as void*, because it can point to either UChar* or char*. +The actual type of the incoming string is stored in 'str_type' variable. + + if (str_type == IS_UNICODE) { + cp_len = u_countChar32(str, str_len); + } else { + cp_len = str_len; + } + +If the string is a Unicode one, we cannot rely on the str_len value to tell +us the number of characters in it. Instead, we call u_countChar32() to +obtain it. + +The next several lines normalize start and length parameters to fit within the +string. Nothing new here. Then we locate the appropriate segment. + + if (str_type == IS_UNICODE) { + int32_t start = 0, end = 0; + U16_FWD_N((UChar*)str, end, str_len, f); + start = end; + U16_FWD_N((UChar*)str, end, str_len, l); + RETURN_UNICODEL((UChar*)str + start, end-start, 1); + +Since codepoint (character) #n is not necessarily at offset #n in Unicode +strings, we start at the beginning and iterate forward until we have gone +through the required number of codepoints to reach the start of the segment. +Then we save the location in 'start' and continue iterating through the number +of codepoints specified by the offset. Once that's done, we can return the +segment as a Unicode string. + + } else { + RETURN_STRINGL((char*)str + f, l, 1); + } + +For native and binary types, we can return the segment directly. + + + References ==========