Class RegExp
Automaton
.
Regular expressions are built from the following abstract syntax:
regexp | ::= | unionexp | ||
| | ||||
unionexp | ::= | interexp | unionexp | (union) | |
| | interexp | |||
interexp | ::= | concatexp & interexp | (intersection) | [OPTIONAL] |
| | concatexp | |||
concatexp | ::= | repeatexp concatexp | (concatenation) | |
| | repeatexp | |||
repeatexp | ::= | repeatexp ? | (zero or one occurrence) | |
| | repeatexp * | (zero or more occurrences) | ||
| | repeatexp + | (one or more occurrences) | ||
| | repeatexp {n} | (n occurrences) | ||
| | repeatexp {n,} | (n or more occurrences) | ||
| | repeatexp {n,m} | (n to m occurrences, including both) | ||
| | complexp | |||
complexp | ::= | ~ complexp | (complement) | [OPTIONAL] |
| | charclassexp | |||
charclassexp | ::= | [ charclasses ] | (character class) | |
| | [^ charclasses ] | (negated character class) | ||
| | simpleexp | |||
charclasses | ::= | charclass charclasses | ||
| | charclass | |||
charclass | ::= | charexp - charexp | (character range, including end-points) | |
| | charexp | |||
simpleexp | ::= | charexp | ||
| | . | (any single character) | ||
| | # | (the empty language) | [OPTIONAL] | |
| | @ | (any string) | [OPTIONAL] | |
| | " <Unicode string without double-quotes> " | (a string) | ||
| | ( ) | (the empty string) | ||
| | ( unionexp ) | (precedence override) | ||
| | < <identifier> > | (named automaton) | [OPTIONAL] | |
| | <n-m> | (numerical interval) | [OPTIONAL] | |
charexp | ::= | <Unicode character> | (a single non-reserved character) | |
| | \ <Unicode character> | (a single character) |
The productions marked [OPTIONAL] are only allowed if
specified by the syntax flags passed to the RegExp
constructor.
The reserved characters used in the (enabled) syntax must be escaped with
backslash (\) or double-quotes ("..."). (In
contrast to other regexp syntaxes, this is required also in character
classes.) Be aware that dash (-) has a special meaning in
charclass expressions. An identifier is a string not containing right
angle bracket (>) or dash (-). Numerical
intervals are specified by non-negative decimal integers and include both end
points, and if n and m have the same number
of digits, then the conforming strings must have that length (i.e. prefixed
by 0's).
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Syntax flag, enables all optional regexp syntax.static final int
Syntax flag, enables anystring (@).static final int
Syntax flag, enables named automata (<identifier>).static final int
Syntax flag, enables complement (~).static final int
Syntax flag, enables empty language (#).static final int
Syntax flag, enables intersection (&).static final int
Syntax flag, enables numerical intervals ( <n-m>).static final int
Syntax flag, enables no optional regexp syntax. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionReturns set of automaton identifiers that occur in this regular expression.boolean
setAllowMutate
(boolean flag) Sets or resets allow mutate flag.Constructs newAutomaton
from thisRegExp
.toAutomaton
(Map<String, Automaton> automata) Constructs newAutomaton
from thisRegExp
.toAutomaton
(AutomatonProvider automaton_provider) Constructs newAutomaton
from thisRegExp
.toString()
Constructs string from parsed regular expression.
-
Field Details
-
INTERSECTION
public static final int INTERSECTIONSyntax flag, enables intersection (&).- See Also:
-
COMPLEMENT
public static final int COMPLEMENTSyntax flag, enables complement (~).- See Also:
-
EMPTY
public static final int EMPTYSyntax flag, enables empty language (#).- See Also:
-
ANYSTRING
public static final int ANYSTRINGSyntax flag, enables anystring (@).- See Also:
-
AUTOMATON
public static final int AUTOMATONSyntax flag, enables named automata (<identifier>).- See Also:
-
INTERVAL
public static final int INTERVALSyntax flag, enables numerical intervals ( <n-m>).- See Also:
-
ALL
public static final int ALLSyntax flag, enables all optional regexp syntax.- See Also:
-
NONE
public static final int NONESyntax flag, enables no optional regexp syntax.- See Also:
-
-
Constructor Details
-
RegExp
Constructs newRegExp
from a string. Same asRegExp(s, ALL)
.- Parameters:
s
- regexp string- Throws:
IllegalArgumentException
- if an error occured while parsing the regular expression
-
RegExp
Constructs newRegExp
from a string.- Parameters:
s
- regexp stringsyntax_flags
- boolean 'or' of optional syntax constructs to be enabled- Throws:
IllegalArgumentException
- if an error occured while parsing the regular expression
-
-
Method Details
-
toAutomaton
Constructs newAutomaton
from thisRegExp
. Same astoAutomaton(null)
(empty automaton map). -
toAutomaton
Constructs newAutomaton
from thisRegExp
. The constructed automaton is minimal and deterministic and has no transitions to dead states.- Parameters:
automaton_provider
- provider of automata for named identifiers- Throws:
IllegalArgumentException
- if this regular expression uses a named identifier that is not available from the automaton provider
-
toAutomaton
Constructs newAutomaton
from thisRegExp
. The constructed automaton is minimal and deterministic and has no transitions to dead states.- Parameters:
automata
- a map from automaton identifiers to automata (of typeAutomaton
).- Throws:
IllegalArgumentException
- if this regular expression uses a named identifier that does not occur in the automaton map
-
setAllowMutate
public boolean setAllowMutate(boolean flag) Sets or resets allow mutate flag. If this flag is set, then automata construction uses mutable automata, which is slightly faster but not thread safe. By default, the flag is not set.- Parameters:
flag
- if true, the flag is set- Returns:
- previous value of the flag
-
toString
Constructs string from parsed regular expression. -
getIdentifiers
Returns set of automaton identifiers that occur in this regular expression.
-