Objective 3.5 - Tokenizing with Pattern and Matcher
Let's say there's the pattern 'a?' and we would like to tokenize the following string 'aba' using the following Java class.
public static void main(String[] args) {
Pattern p = Pattern.compile("a?"); // compile the pattern
Matcher m = p.matcher("aba");
while (m.find()) {
System.out.println(m.start() + " >" + m.group() + "<");
}
}
What I expected out of this listing was something like this.
0 >a<
2 >a<
But what it actually returns was this.0 >a<
1 ><
2 >a<
3 ><
Mhm. According to my study book (from which I took this example) they talking about some zero-length matches when using the greedy quantifiers '*' or '?'. They say that zero-length matches can appear under the following circumstances.- After the last character of source data
- In between characters after a match has been found
- At the beginning of source data (if the first character is not a match. Try tokenizing this string '2aba')
- At the beginning of zero-length source data
1 comment:
How nothings are handled by an unphilosophical computer never makes much sense.
Post a Comment