Java: Miscellaneous: java.util.regex
String sells its soul to the regex package (although it is far from the most general solution) by having methods that implicitly use it (without mentioning it in the method names, and without any advantage)
PatternSyntaxException unchecked? (cf. ParseException, MalformedURLException, URISyntaxException)
Pattern.split - unnessary and confusing role of limit, why not an extra boolean "removeTrailing", also no way just to iterate (instead of having all the parts at once). Why can the Pattern split (and internally use a Matcher), but only the Matcher replace? (The replace methods are also unrelated to the matcher and destroy its state).
Why is the matcher obtained from the pattern (and not constructed directly)?
Who matches (what)?:
Does a method (Pattern.matches, Matcher.matches) (attempt to) match the input against a regular expression?
Does a Matcher match the input against a regular expression (or a Pattern)?
Do regular-expression constructs match (parts) of the input? Do quantifiers match?
Does the input match a pattern?
Does a Pattern (or a regular expression) match? (intransitive use)
(Eiffelesque syntax)
public class Matcher -- silly name?
{
-- silly return values (itself, StringBuffer argument) not shown
pattern: Pattern
reset alias reset(input)
reset(newInput: CharSequence)
ensures !hasMatched, appendPosition == 0
ensures input == newInput
-- groups
start
requires hasMatched
end
requires hasMatched
group: String -- no better name? 'matched'
requires hasMatched
ensures result.equals(input.subSequence(start, end).toString)
groupCount: Int -- property of the pattern alone
start(group: Int)
requires hasMatched
requires 0 <= group <= groupCount
end(group: Int)
requires hasMatched
requires 0 <= group <= groupCount
group(group: Int): String/null
requires hasMatched
requires 0 <= group <= groupCount
ensures start(i) == -1 => result == null
start(i) != -1 => result.equals(input.subSequence(start(group), end(group)).toString)
-- command-query-separation-violation match methods:
-- the two 'find' methods are no more similar than the all four to each other.
-- find(end) is not equivalent to 'find' (it resets appendPosition)
--
-- Are there differences between
-- find(0) && start == 0 and lookingAt? -- no!
-- lookingAt && end == input.length and matches?
-- (assuming the left also unsets hasMatched) -- yes, the former may a different match
-- that doesn't match the whole input
-- so 'matches' adds to the expressibility, 'lookingAt' not (only optimizes because only
-- matches at the beginning have to be considered.)
-- Of course, one can also reach either by recompiling the pattern with an extra
-- "^(?:" at the beginning and ")" or ")$" at the end.
--
matches: Boolean
ensures result == hasMatched, result => start == 0 && end == input.length
-- implementation ensures appendPosition == 0 (one would hardly use it)
lookingAt: Boolean -- method name even worse than the others
ensures result == hasMatched, result => start == 0
-- implementation ensures appendPosition == 0
find: Boolean
ensures result == hasMatched
ensures old hasMatched => start >= old end -- more (not match the empty string twice)
ensures appendPosition == old appendPostion -- implicitly
find(from: Int): Boolean
requires: 0 <= from <= input.length
ensures result == hasMatched, result => start >= from
ensures appendPosition == 0 -- why not 'from'?
-- replacement by appending
appendReplacement(into: StringBuffer, with: String)
requires hasMatched
requires validReplacement(with)
-- ensures first appended characters are input.subSequence(appendPosition, start).toString
ensures appendPosition == end
appendTail(into: StringBuffer)
-- does not update appendPosition?
-- there should be a feature to set the append position (at least to the end of the current match)
-- state-destructing (CQS-violating) replacement features (simple application of appendReplacement, appendTail)
replaceFirst(with: String): String
requires validReplacement(with)
ensures nothing
replaceAll(with: String): String
requires validReplacement(with)
ensures nothing
invariant
groupCount >= 0
0 <= appendPosition < input.length
hasMatched =>
input.length >= end >= start >= 0
start == start(0), end == end(0)
-- proper ranges:
forall 0 < i <= groupCount:
end(i) >= start(i)
input.length >= end(i) >= start(i) >= 0 || (start(i) == -1 && end(i) == -1)
-- properly nested:
forall 0 < i < groupCount:
start(i + 1) != -1 && start(i) != -1 => -- because of this, incomplete
start(i + 1) >= start(i) -- each group is started by a bracket, which follow other
start(i + 1) < end(i) => end(i + 1) <= end(i) -- if started before the end, must also close before it
sneaky
input: CharSequence -- no way to get the input back?
hasMatched: Boolean -- by lack of CQS, introduces sneaky preconditions
appendPosition: Int -- useful because it occurs in postconditions and changes the behaviour
validReplacement(s: String): Boolean
}
Change of pattern (while keeping appendPosition/groups/end) (#4526572)
All matches (#4888230), also, why can't the Matcher also split (iteratively)?
Pattern.pattern should be named 'expression' or ' regex' (#4633112) - it's uncritical to change it!
Pattern.compile should have a functional name ("compiled", if nothing better)
Methods that take a String should take a CharSequence, not a String
(C) 2001-2009 Christian Kaufhold (java@chka.de)