Contents
Ways in which the command line interface to ANTLR differs from version 2 to version 3.
The package in which the ‘Tool‘ class is located is now different:
ANTLR 2 | ANTLR 3 |
---|---|
java antlr.Tool ... | java org.antlr.Tool ... |
See also ANTLR 3 Command line options.
Some tips for migrating a grammar developed with ANTLR 2 over to ANTLR 3 syntax:
You don‘t need separate sections defining the parser and lexer, ANTLR 3 just puts things in the appropriate place based on the case of the rule name‘s initial letter (‘foo‘ is a parser production, ‘Foo‘ is a token definition):
ANTLR 2 | ANTLR 3 |
---|---|
class FooParser extends Parser; foo : FOO ; class FooLexer extends Lexer; FOO : "foo" ; | grammar FooParser; foo : FOO ; FOO : ‘foo‘ ; |
(ANTLR 3 actually creates FooParser.lexer.g for you behind the scenes)
‘protected‘ lexer rules are rules that do not produce a separate token and are only called from other lexer rules. They are now called ‘fragment‘ rules, since they represent a ‘fragment‘ of a token.
ANTLR 2 | ANTLR 3 |
---|---|
protected LETTER : ‘0‘..‘9‘ ; | fragment LETTER : ‘0‘..‘9‘ ; |
ANTLR 3 generally prefixes the label on named code sections with an ‘@‘.
ANTLR 2 | ANTLR 3 |
---|---|
header { import java.util.*; package foo; } | @header { import java.util.*; package foo; } |
@header is only used by the parser, so in a combined parser/lexer definition, you are likely to need to duplicate some of the above in a @lexer::header section.
The method to skip a token has changed from a SKIP token type, to a more generic system allowing multiple ‘channels‘ of tokens within the token stream. The parser normally only sees tokens on the ‘default‘ channel, so changing a token‘s channel to anything else will hide it from the parser. When not playing tricks with multiple token channels, tokens should be hidden by putting them on channel 99, which ANTLR supports by providing a constant ‘HIDDEN‘:
ANTLR 2 | ANTLR 3 |
---|---|
$setType(Token.SKIP); | $channel=HIDDEN; |
If you are accessing the token stream directly, or the ‘channel‘ mechanism is otherwise insufficient, it‘s also possible in ANTLR 3 to drop tokens entirely from the token stream by using skip() in a lexer action:
WS : (‘ ‘|‘\t‘)+ {skip();}
In ANTLR 2, code surrounded by curly braces preceding the parser productions would be added to the body of the parser class, allowing the grammar to define member fields and functions in the parser. In ANTLR 3, this section must be labelled ‘@members‘:
ANTLR 2 | ANTLR 3 |
---|---|
class FooParser extends Parser; { int i; } foo : FOO ; | grammar FooParser; @members { int i; } foo : FOO ; |
To inject members into the lexer of a combined lexer-parser, use @lexer::members {}.
In ANTLR 2 you could write initialization code for a rule directly after the rule statement, this section has to be labelled ‘@init‘ now:
ANTLR 2 | ANTLR 3 |
---|---|
foo { int i; } FOO; | foo @init{ int i; } FOO; |
Literals in the parser must use single, not double quotes:
ANTLR 2 | ANTLR3 |
---|---|
x : y | z ; y : "class" ; z : "package"; | x : y | z ; y : ‘class‘ ; z : ‘package‘ ; |
Labels on elements within a production are denoted with an equals-sign, not a colon:
ANTLR 2 | ANTLR 3 |
---|---|
LPAREN a:arguments RPAREN | LPAREN a=arguments RPAREN |
In ANTLR 2, it was necessary to give elements in a production unique label names. ANTLR 3 allows several elements to share the same label (TODO: some description about what value ends up being assigned).
ANTLR 2 | ANTLR 3 |
---|---|
statement : (declaration e1:SEMI |assignExpr e2:SEMI ) {print(#e1==null?#e2:#e1);} ; | statement : (declaration e=SEMI |assignExpr e=SEMI ) {print($e);} ; |
When a single element has ‘?‘, ‘+‘ or ‘*‘ in a production, you don‘t have to put () around it, as was required in ANTLR 2:
ANTLR 2 | ANTLR 3 |
---|---|
compilationUnit : (annotations)? (packageDeclaration)? (importDeclaration)* (typeDeclaration)+ ; | compilationUnit : annotations? packageDeclaration? importDeclaration* typeDeclaration+ ; |
The option which turns on AST building code has changed:
ANTLR 2 | ANTLR 3 |
---|---|
options { buildAST = true; } | options { output = AST; } |
Where in ANTLR 2 one would use a name with a ‘#‘ prefix to refer to a labelled AST node, ANTLR 3 uses a ‘$‘.
ANTLR 2 | ANTLR 3 |
---|---|
typeBlock : LCURLY ( m:modifiers ( variableDefinition[#m] | methodDefinition[#m] ) )* RCURLY | typeBlock : LCURLY ( m=modifiers ( variableDefinition[$m.tree] | methodDefinition[$m.tree] ) )* RCURLY |
There is an entirely new syntax for Tree construction, that avoids the special syntax which was used in ANTLR 2 actions:
ANTLR 2 | ANTLR 3 |
---|---|
arrayLiteral : LBRACK! (elementList)? RBRACK! {## = #([ARRAY_LITERAL, "ARRAY_LITERAL"],##);} ; | arrayLiteral : LBRACK (elementList)? RBRACK -> ^(ARRAY_LITERAL elementList) ; |
Within a rewrite rule, there is a new syntax to replace ANTLR 2‘s setType() method call in an action:
ANTLR 2 | ANTLR 3 |
---|---|
in:INC^ {#in.setType(POST_INC);} | in=INC -> ^(POST_INC[$in]) |
The POST_INC[$in] constructs a new POST_INC node, and copies the text, line/col, etc. from the node labeled ‘in‘.
ANTLR 2 | ANTLR 3 |
---|---|
{ $setType(TOKEN); } | { $type = TOKEN; } |
Within a tree parsing rule, subtrees are indicated by ^ instead of #.
ANTLR 2 | ANTLR 3 |
---|---|
expr : #(PLUS expr expr); | expr : ^(PLUS expr expr); |
To disable generation of the standard exception handling code in the parser:
ANTLR 2 | ANTLR 3 |
---|---|
options { defaultErrorHandler=false; } | @rulecatch { } |
Further, in ANTLR 3, to case an exception to be raised on mismatched tokens in the middle of an alternative, the parser must override the mismatch() method of BaseRecogniser. The default implementation looks like this:
protected void mismatch(IntStream input, int ttype, BitSet follow)throws RecognitionException{MismatchedTokenException mte =new MismatchedTokenException(ttype, input);recoverFromMismatchedToken(input, mte, ttype, follow);}
To immediately fail on error, override this with code that constructs an exception as above, but then throws it, rather than calling the recoverFromMismatchedToken() method:
@members {// raise exception, rather than recovering, on mismatched token within alt protected void mismatch(IntStream input, int ttype, BitSet follow)throws RecognitionException{throw new MismatchedTokenException(ttype, input);}}
Lexer equivalent: ?
ANTLR 2 | ANTLR 3 |
---|---|
options { caseSensitive=false } | No equivalent option, but see How do I get case insensitivity? |
Code in the @finally{...} action executes in the finally block (java target) after all other stuff like rule memoization. Example:
foo@finally{ i=j; }: FOO;
Runtime classes are now all under org.antlr.runtime.
ANTLR 2 Class | ANTLR 3 Class |
---|---|
antlr.collections.AST![]() | org.antlr.runtime.tree.Tree |
antlr.Token![]() | org.antlr.runtime.Token |
If your actions or semantic predicates used LT() or LA() methods of ANTLR 2, these will need to be prefixed with ‘input.‘ in ANTLR 3, as the methods are no londer defined by the parser class.
ANTLR 2 | ANTLR 3 |
---|---|
{LA(1)==LCURLY} ?(block) | {input.LA(1)==LCURLY} ?(block) |
{LT(1).getText().equals("namespace")}? IDENT | {input.LT(1).getText().equals("namespace")}? IDENT |
// in lexer, { LA(2)!=‘/‘ }? ‘*‘ | // in lexer, { input.LA(2)!=‘/‘ }? ‘*‘ |
ANTLR 3 tracks newlines by itself, so if your ANTLR 2 lexical actions included calls to ‘newline()‘, these must be removed (the method has gone).
ANTLR 3 doesn‘t generate the XXXTokenTypes interface for grammar ‘XXX‘ any more. The constants are now generated directly in both the parser and lexer implementation classes.
ANTLR 2 | ANTLR 3 |
---|---|
MyGrammerTokenTypes.LBRACK | MyGrammar.LBRACK |
Stuff that existed in ANTLR 2 which has no equivalent in ANTLR 3 yet (or which [Terr] just hasn‘t explained enough times on the mailing list for it to sink in
For giving a little bit more comprehensible errors it was possible to set the paraphrase in Antlr 2.
RIGHT_PARENoptions \{ paraphrase = "a closing parenthesis ‘\}‘"; \}: ‘\}‘ ;
ANTLR 2 allowed the grammar to specify an AST implementation class per token type.
tokens \{COMPILATION_UNIT;TYPE_BLOCK<AST=uk.co.badgersinfoil.metaas.impl.ParentheticAST>;"import"<AST=uk.co.badgersinfoil.metaas.impl.ExprStmtAST>;\}
A workaround in ANTLR 3 might be to implement this ‘by hand‘ in a custom TreeAdaptor implementation.
联系客服