打开APP
userphoto
未登录

开通VIP,畅享免费电子书等14项超值服

开通VIP
Migrating from ANTLR 2 to ANTLR 3 - ANTLR 3 wiki

Contents

ANTLR Tool Changes

Ways in which the command line interface to ANTLR differs from version 2 to version 3.

Tool Invocation

The package in which the ‘Tool‘ class is located is now different:

ANTLR 2 ANTLR 3
java antlr.Tool ... java org.antlr.Tool ...

See also ANTLR 3 Command line options.

Changes in ANTLR Syntax

Some tips for migrating a grammar developed with ANTLR 2 over to ANTLR 3 syntax:

 Parser and Lexer in One Definition

You don‘t need separate sections defining the parser and lexer, ANTLR 3 just puts things in the appropriate place based on the case of the rule name‘s initial letter (‘foo‘ is a parser production, ‘Foo‘ is a token definition):

ANTLR 2 ANTLR 3
class FooParser extends Parser;
foo : FOO ;

class FooLexer extends Lexer;
FOO : "foo" ;
grammar FooParser;
foo : FOO ;

FOO : ‘foo‘ ;

(ANTLR 3 actually creates FooParser.lexer.g for you behind the scenes) 

‘protected‘ lexer rules are now called ‘fragment‘

‘protected‘ lexer rules are rules that do not produce a separate token and are only called from other lexer rules. They are now called ‘fragment‘ rules, since they represent a ‘fragment‘ of a token.

ANTLR 2 ANTLR 3
protected
LETTER : ‘0‘..‘9‘ ;
fragment
LETTER : ‘0‘..‘9‘ ;

 Renamed ‘header‘ to ‘@header‘

ANTLR 3 generally prefixes the label on named code sections with an ‘@‘.

ANTLR 2 ANTLR 3
header {
import java.util.*;

package foo;
}
@header {
import java.util.*;

package foo;
}

@header is only used by the parser, so in a combined parser/lexer definition, you are likely to need to duplicate some of the above in a @lexer::header section.

Token Skipping / Hiding

The method to skip a token has changed from a SKIP token type, to a more generic system allowing multiple ‘channels‘ of tokens within the token stream.  The parser normally only sees tokens on the ‘default‘ channel, so changing a token‘s channel to anything else will hide it from the parser.  When not playing tricks with multiple token channels, tokens should be hidden by putting them on channel 99, which ANTLR supports by providing a constant ‘HIDDEN‘:

ANTLR 2
ANTLR 3
$setType(Token.SKIP); $channel=HIDDEN;

If you are accessing the token stream directly, or the ‘channel‘ mechanism is otherwise insufficient, it‘s also possible in ANTLR 3 to drop tokens entirely from the token stream by using skip() in a lexer action:

WS  :  (‘ ‘|‘\t‘)+ {skip();}

Code section for members must now be labelled

In ANTLR 2, code surrounded by curly braces preceding the parser productions would be added to the body of the parser class, allowing the grammar to define member fields and functions in the parser. In ANTLR 3, this section must be labelled ‘@members‘:

ANTLR 2 ANTLR 3
class FooParser extends Parser;

{
    int i;
}

foo : FOO ;
grammar FooParser;

@members {
    int i;
}

foo : FOO ;

To inject members into the lexer of a combined lexer-parser, use @lexer::members {}.

Code sections for rules must now be labelled

In ANTLR 2 you could write initialization code for a rule directly after the rule statement, this section has to be labelled ‘@init‘ now:

ANTLR 2 ANTLR 3
foo
{
    int i;
}
FOO;
foo
@init{
    int i;
}
FOO;

Literals

Literals in the parser must use single, not double quotes:

ANTLR 2
ANTLR3
x : y | z ;
y : "class" ;
z : "package";
x : y | z ;
y : ‘class‘ ;
z : ‘package‘ ;

Labels

 Labels on elements within a production are denoted with an equals-sign, not a colon:

ANTLR 2
ANTLR 3
LPAREN a:arguments RPAREN LPAREN a=arguments RPAREN

Multiple Elements Sharing a Label Name

In ANTLR 2, it was necessary to give elements in a production unique label names.  ANTLR 3 allows several elements to share the same label (TODO: some description about what value ends up being assigned).

ANTLR 2 ANTLR 3
statement
    :    (declaration e1:SEMI
         |assignExpr e2:SEMI
         )
         {print(#e1==null?#e2:#e1);}
    ;
statement
    :    (declaration e=SEMI
         |assignExpr e=SEMI
         )
         {print($e);}
    ;

Parentheses no Longer Mandatory With Cardinality Operators

When a single element has ‘?‘, ‘+‘ or ‘*‘ in a production, you don‘t have to put () around it, as was required in ANTLR 2:

ANTLR 2
ANTLR 3
compilationUnit
    :   (annotations)?
        (packageDeclaration)?
        (importDeclaration)*
        (typeDeclaration)+
    ;
compilationUnit
    :   annotations?
        packageDeclaration?
        importDeclaration*
        typeDeclaration+
    ;

Tree Building

The option which turns on AST building code has changed:

ANTLR 2 ANTLR 3
options
{
buildAST = true;
}
options
{
output = AST;
}

AST References

Where in ANTLR 2 one would use a name with a ‘#‘ prefix to refer to a labelled AST node, ANTLR 3 uses a ‘$‘.

ANTLR 2
ANTLR 3
typeBlock
    :    LCURLY
        (    m:modifiers
            (    variableDefinition[#m]
            |    methodDefinition[#m]
            )
        )*
        RCURLY
typeBlock
    :    LCURLY
        (    m=modifiers
            (    variableDefinition[$m.tree]
            |    methodDefinition[$m.tree]
            )
        )*
        RCURLY

Tree Rewrite Rules Replace Rewrite Actions

There is an entirely new syntax for Tree construction, that avoids the special syntax which was used in ANTLR 2 actions:

ANTLR 2
ANTLR 3
arrayLiteral
    :    LBRACK! (elementList)? RBRACK!
        {## = #([ARRAY_LITERAL, "ARRAY_LITERAL"],##);}
    ;
arrayLiteral
    :    LBRACK (elementList)? RBRACK
        -> ^(ARRAY_LITERAL elementList)
    ;

Changing the Type of AST Nodes

Within a rewrite rule, there is a new syntax to replace ANTLR 2‘s setType() method call in an action:

ANTLR 2 ANTLR 3
in:INC^ {#in.setType(POST_INC);} in=INC -> ^(POST_INC[$in])

The POST_INC[$in] constructs a new POST_INC node, and copies the text, line/col, etc. from the node labeled ‘in‘.

Changing the Type of Tokens in the Lexer

ANTLR 2 ANTLR 3
{ $setType(TOKEN); } { $type = TOKEN; }

 Tree parser uses ^ instead of #

Within a tree parsing rule, subtrees are indicated by ^ instead of #.

ANTLR 2 ANTLR 3
expr :  #(PLUS expr expr);
expr : ^(PLUS expr expr);

Error handling

To disable generation of the standard exception handling code in the parser:

ANTLR 2
ANTLR 3
options
{ defaultErrorHandler=false; }
@rulecatch { }

Further, in ANTLR 3, to case an exception to be raised on mismatched tokens in the middle of an alternative, the parser must override the mismatch() method of BaseRecogniser.  The default implementation looks like this:

protected void mismatch(IntStream input, int ttype, BitSet follow)throws RecognitionException{MismatchedTokenException mte =new MismatchedTokenException(ttype, input);recoverFromMismatchedToken(input, mte, ttype, follow);}

To immediately fail on error, override this with code that constructs an exception as above, but then throws it, rather than calling the recoverFromMismatchedToken() method:

@members {// raise exception, rather than recovering, on mismatched token within alt        protected void mismatch(IntStream input, int ttype, BitSet follow)throws RecognitionException{throw new MismatchedTokenException(ttype, input);}}

Lexer equivalent: ?

Case-Insensitivity

ANTLR 2 ANTLR 3
options {
    caseSensitive=false
}
No equivalent option, but see How do I get case insensitivity?

New in ANTLR 3

Finally action

Code in the @finally{...} action executes in the finally block (java target) after all other stuff like rule memoization. Example:

foo@finally{ i=j; }: FOO;

 Changes in ANTLR Runtime Support Code

Java

General API Reorganisation

Runtime classes are now all under org.antlr.runtime. 

ANTLR 2 Class
ANTLR 3 Class
antlr.collections.AST
org.antlr.runtime.tree.Tree
antlr.Token
org.antlr.runtime.Token
   

Lookahead in Actions and Semantic Predicates

If your actions or semantic predicates used LT() or LA() methods of ANTLR 2,  these will need to be prefixed with ‘input.‘ in ANTLR 3, as the methods are no londer defined by the parser class.

ANTLR 2 ANTLR 3
{LA(1)==LCURLY} ?(block)
{input.LA(1)==LCURLY} ?(block)
{LT(1).getText().equals("namespace")}? IDENT
{input.LT(1).getText().equals("namespace")}? IDENT
// in lexer,
{ LA(2)!=‘/‘ }? ‘*‘
// in lexer,
{ input.LA(2)!=‘/‘ }? ‘*‘

Newline Tracking in Lexical Actions

ANTLR 3 tracks newlines by itself, so if your ANTLR 2 lexical actions included calls to ‘newline()‘, these must be removed (the method has gone). 

No More XXXTokenTypes Interface

ANTLR 3 doesn‘t generate the XXXTokenTypes interface for grammar ‘XXX‘ any more.   The constants are now generated directly in both the parser and lexer implementation classes.

ANTLR 2 ANTLR 3
MyGrammerTokenTypes.LBRACK
MyGrammar.LBRACK

AWOL

Stuff that existed in ANTLR 2 which has no equivalent in ANTLR 3 yet (or which [Terr] just hasn‘t explained enough times on the mailing list for it to sink in

):

Paraphrase

For giving a little bit more comprehensible errors it was possible to set the paraphrase in Antlr 2.

RIGHT_PARENoptions \{ paraphrase = "a closing parenthesis ‘\}‘"; \}: ‘\}‘ ;

Per-Token AST Type Specs

ANTLR 2 allowed the grammar to specify an AST implementation class per token type.

tokens \{COMPILATION_UNIT;TYPE_BLOCK<AST=uk.co.badgersinfoil.metaas.impl.ParentheticAST>;"import"<AST=uk.co.badgersinfoil.metaas.impl.ExprStmtAST>;\}

A workaround in ANTLR 3 might be to implement this ‘by hand‘ in a custom TreeAdaptor implementation.

本站仅提供存储服务,所有内容均由用户发布,如发现有害或侵权内容,请点击举报
打开APP,阅读全文并永久保存 查看更多类似文章
猜你喜欢
类似文章
【热】打开小程序,算一算2024你的财运
探索Antlr(Antlr 3.0更新版)
系统设计 | 设计和解析 DSL
Tony Bai
Java|Lexer分析报告
ANTLR笔记1
比开源快30倍的自研SQL Parser设计与实践
更多类似文章 >>
生活服务
热点新闻
分享 收藏 导长图 关注 下载文章
绑定账号成功
后续可登录账号畅享VIP特权!
如果VIP功能使用有故障,
可点击这里联系客服!

联系客服