The Abstract Syntax Tree (AST) is the base framework for many powerful tools of the Eclipse IDE, including Semantic highlighting, Refactoring, Quick Fix and Quick Assist. The Abstract Syntax Tree maps plain PHP source code in a tree form. This tree is more convenient and reliable to analyze and modify programmatically than text-based source. This part of the article shows how you can use the Abstract Syntax Tree for extending Eclipse PHP Development Tools (PDT) for your applications. This article is based on the "Abstract Syntax Tree" (JDT) By Thomas Kuhn and Olivier Thomann.
The AST is comparable to the DOM tree model of an XML file. Just like with DOM, the AST allows you to modify the tree model and reflects these modifications in the PHP source code.
This part of the article refers to an example application which covers most of the interesting AST-related topics. Let us have a look at the application that was built to illustrate this article:
According to PHP Practices [4], you should not declare local variables before using them. The goal of our application will be to detect contradicting variable declarations and to move them to their correct place. There are three cases our application has to deal with:
Removal of unnecessary declaration. If a variable is declared and initialized, only to be overridden by another assignment later on, the first declaration of the variable is an unnecessary declaration.
Move of declaration. If a variable is declared, and not immediately referenced within the following statement, this variable declaration has to be moved. The correct place for the declaration is the line before it is first referenced.
Move of declaration of a variable that is referred to from within different blocks. This is a subcase of case 2. Imagine that a variable is used in both a try- and a catch clause. Here the declaration cannot be moved right before the first reference in the try-clause, since then it would not be declared in the catch-clause. Our application has to deal with that and has to move the declaration to the best possible place, which would be here one line above the try-clause.
You can import the example application into your workspace [1] or install the plug-in using the Eclipse Update Manager [2].
A typical workflow of an application using AST looks like this:
char[]
that contains
PHP sourceorg.eclipse.jdt.core.dom.ASTParser
.
See the section called “Parsing
source code”.Manipulating the AST: If the AST of point 3 needs to be changed, this can be done in two ways:
ASTRewrite
.IDocument
:
Is a wrapper for the source code of step
1 and is needed
at point 5 As mentioned, the Abstract Syntax Tree is the way that
Eclipse looks at your source
code: every PHP source file is entirely represented as tree of AST
nodes. These nodes
are all subclasses of ASTNode
.
Every subclass is
specialized for an element of the PHP Programming Language. E.g. there
are nodes for
method declarations ( MethodDeclaration
), class
declaration (ClassDeclaration
),
assignments and so on. One very frequently used node is Identifier
. An Identifier
is any
string of PHP source that is not a keyword or a scalar Scalar
For example,
in
$i = 6 + $j;
,
$i
and $j
are represented by I
dentifier
.
All AST-relevant classes are located in the package
org.eclipse.php.core.dom
of the
org.eclipse.php.core
plug-in.
To discover how code is represented as AST, the AST Viewer plug-in [4] is a big help: Once installed you can simply mark source code in the editor and let it be displayed in a tree form in the AST Viewer view.
Most of the time, an AST is not created from scratch, but
rather parsed from
existing PHP code. This is done using the ASTParser
.
It
processes whole PHP files as well as portions of PHP code. In the
example
application the method Program
parse(ISourceModule lwUnit)
of the class AbstractASTArticle
parses the
source code
stored in the file that lwUnit
points to:
protected Program parse(ICompilationUnit unit) {
ASTParser parser = ASTParser.newParser(ASTParser.VERSION_PHP5, lwUnit);
try {
return (Program) parser.createAST(null /* IProgressMonitor */);
} catch (Exception e) {
return null;
}
}
With
ASTParser.newParser(ASTParser.VERSION_PHP5,
lwUnit)
, we advise
the parser to parse the code following to the PHP Language
Specification, includes all PHP Language Specifications up to the new
syntax
introduced in PHP 5. An
ISourceModule
is a
pointer to a PHP file, and will be used to reolve binding infoirmation
of this script. The parser
supports five kinds of input:
Entire source
file: The parser expects the source
either as a pointer to a PHP file (which means as an
ISourceModule
, see the section
called “PHP Model”) or as
char[]
.
The PHP Model is a whole different story. It is out of scope
of this
article to dive deep into its details within. The parts looked at will
be the ones which intersect with the AST. The motivation to discuss it
here is, to use it as an entry point to build an Abstract Syntax Tree
of a source file. Remember, the
ICompilationUnit
is
one of the possible parameters for
the AST parser.
The PHP Model represents a PHP Project in a tree structure, which is visualized by the well known "Package Explorer" view:
The nodes of the PHP Model implement one of the following interfaces:
IScriptProject
:
Is the node of the PHP Model and represents a PHP Project. It contains
IProjectFragment
as
child nodes.IProjectFragment
:
Represents a project fragment, and maps the contents to an
underlying resource which is either a folder, JAR, or ZIP file.IScriptFolder
: Represents
a folder containing script files inside
.ISourceModule
:
Represents a PHP source file.
IType
:
Represents a class or interface in a source file.IField
:
Represents a field or constant in an IType
IMethod
: Represents afunction in
of source file or a method in a class or interfaceIn contrast to the AST, these nodes are lightweight handles.
It costs much less
to rebuild a portion of the PHP Model than to rebuild an AST. That is
also one reason
why the PHP Model is not only defined down to the level of
ISourceModule
. There
are many cases where complete
information, like that provided by the AST, is not needed. One example
is the Outline
view: this view does not need to know the contents of a method body. It
is more
important that it can be rebuilt fast, to keep in sync with its source
code.
There are different ways to get an
ISourceModule
. The
example applications are
launched as actions from the package tree view. This is quite
convenient: only add
an
objectContribution
extension to the point
org.eclipse.ui.popupMenus
.
By choosing org.eclipse.dltk.core.ISourceModule
as
objectClass
, the action
will be only displayed in the context menu
of a compilation unit. Have a look at the example application's
plugin.xml
. The
compilation unit then can be retrieved from the
ISelection
, that
is passed to the
action's delegate (in the example, this is
ASTArticleActionDelegate
).
Another, programmatic, approach is to get the project handle
from the IDE and to
look for the compilation unit. This can be done by either step down the
PHP Model
tree to collect the desired ISourceModule
s.
Or, by calling the
findType()
of the
PHP project:
IWorkspaceRoot root = ResourcesPlugin.getWorkspace().getRoot();
IProject project = root.getProject("somePHPProject");
project.open(null /* IProgressMonitor */);
IScriptProject PHPProject = DLTKCore.create(project);
IType lwType = PHPProject.findType("MyClass");
ISourceModule lwSourceModule = lwType.getSourceModule();
Even a simple "Hello world" program results in a quite complex
tree.
How does one get the FunctionInvocation
of that
println("Hello World")
?
Scanning all the levels is a
possible, but not very convenient.
There is a better solution: every
ASTNode
allows querying
for a child node by using a visitor (visitor
pattern [5]).
Have a look at
AbstractVisitor
.
There you'll find for every subclass of
ASTNode
two methods,
one called
visit()
, the other
called
endVisit()
. Further,
the
ASTVisitor
declares
these two methods:
preVisit(ASTNode node)
and
postVisit(ASTNode node)
.
The subclass of AbstractVisitor
is passed to any node of the
AST. The AST will recursively step through the tree, calling the
mentioned methods of
the visitor for every AST node in this order (for the example of a
MethodInvocation
):
preVisit(ASTNode node)
visit(MethodInvocation
node)
endVisit(MethodInvocation
node)
postVisit(ASTNode
node)
// TODO : check here a sample for visitor
In our example application, the
LocalVariableDetector
is a subclass of
AbstractVisitor
. It
is used, amongst other things, to collect
all local variable declarations of a compilation unit:
public boolean visit(VariableDeclarationStatement node) {
for (Iterator iter = node.fragments().iterator(); iter.hasNext();) {
VariableDeclarationFragment fragment = (VariableDeclarationFragment) iter.next();
// ... store these fragments somewhere
}
return false; // prevent that SimpleName is interpreted as reference
}
If
false
is returned from visit()
, the
subtree of the visited node will not be considered. This is to ignore
parts of
the AST.
In the example, process(Program program)
is
called from the outside to start visiting the program. The function is
fairly
simple:
public void process(Program program) {
program.accept(this);
}
Every subclass of
ASTNode
contains specific
information for the PHP element it
represents. E.g. a FunctionDeclaration
will contain information about the name, return
type, parameters, etc. The information of a node is referred as
structural properties.
Let us have a closer look at the
characteristics of the structural properties. Beneath you see the
properties of
this function declaration:
function println($content) {
echo $content . '<BR/>' ;
}
Access to the values of a node's structural properties can be made using static or generic methods:
static
methods: every node offers methods to
access its properties: e.g.
getName()
, etc.
generic
method: ask for a property value using
the getStructuralProperty(StructuralPropertyDescriptor
property)
method. Every AST subclass defines a set of
StructuralPropertyDescriptor
s,
one for every
structural property. The
StructuralPropertyDescriptor
can be accessed
directly on the class to which they belong: e.g. FunctionDeclaration.NAME_PROPERTY
.
A list of all available
StructuralPropertyDescriptor
s
of a node can be retrieved by
calling the method
structuralPropertiesForType()
on any instance of
ASTNode
.
The structural properties are grouped into three different kinds: properties that hold simple values, properties which contain a single child AST node and properties which contain a list of child AST nodes.
SimplePropertyDescriptor
:
The value will be a
String
, a primitive
value wrapper for either
Integer
or
Boolean
or a basic AST
constant. For a list of all possible value
classes of a simple property, see Appendix C,
Simple properties value classes
ChildPropertyDescriptor
:
The value will be a node, an
instance of an
ASTNode
subclass
ChildListPropertyDescriptor
:
The value will be a
List
of AST nodes
The AST, as far as we know it, is just a tree-form
representation of source code.
Every element of the source code is mapped to a node or a subtree.
Looking at a reference to a
variable, let's say
$i
, is represented by an
instance of Identifier
with "i" as
IDENTIFIER
property-value.
Bindings go one step further: they provide
extended resolved information for several elements of the AST. About
the Identifier
above they tell us that it is a reference to a local
variable of type int.
Various subclasses of ASTNode
have binding
information. It is retrieved by calling
resolveBinding()
on
these classes. There are cases where
more than one binding is available: e.g. the class
MethodInvocation
returns a binding to the method that is
invoked (resolveMethodBinding()
).
Furthermore a
binding to the return type of the method
(resolveTypeBinding()
).
Since evaluating bindings is costly, the binding service has
to be explicitly
requested at parse time. This is done by passing the relevant ISourceModule
to the
method
ASTParser.createParser()
before the source is being parsed.
$i = 7;the reference of the variable
echo 'Hello!';
$x = $i * 2;
i
is represented by
a Identifier
.
Without bindings you would not know nothing more than this:
Bindings provide more information:
Bindings allow you to comfortably find out to which
declaration a reference
belongs, as well as to detect whether two elements are references to
the same element: if
they are, the bindings returned by reference-nodes and
declaration-nodes are
identical. For example, all Identifiers
that represent a
reference to a local variable
i
return the same instance
of
IVariableBinding
from Identifier.resolveBindings()
. The
declaration node, Identifier.resolveBinding()
,
returns the same
instance of
IVariableBinding
, too. If
there is another usage of a local
variable
i
(within another method
or block), another instance of
IVariableBinding
is
returned. Confusions caused by equally named
elements are avoided if bindings are used to identify an element
(variable, method,
type, etc.).
This section will show how to modify an AST and how to store these modifications back into PHP source code.
New AST nodes may have to be created. New nodes are created by
using the class
org.eclipse.php.core.dom.AST
(here AST
it is the
name of an
actual class. Do not confuse with the abbreviation "AST" used within
this
article). Have a look at this class: it offers methods to create every
AST node type. An
instance of AST
is
created when source code is parsed. This
instance can be obtained from every node of the tree by calling the
method
getAST()
. The newly
created nodes can only be added to the
tree that class AST
was retrieved from.
Often it is convenient to reuse an existing subtree of an AST
and maybe just change
some details. AST nodes cannot be re-parented, once connected to an
AST, they
cannot be attached to a different place of the tree. Though it is easy
to create a copy from
a subtree:
(Expression) ASTNode.copySubtree(ast,
node)
. The parameter
ast
is the target AST
. This instance will be
used to create the new nodes. That allows copying nodes from another
AST
(established by
another parser run) into the current
AST
domain.
There are two APIs to track modifications on an AST: either
you can directly modify
the tree or you can make use of a separate protocol, managed by an
instance of
ASTRewrite
. The latter,
using the
ASTRewrite
, is the more
sophisticated and preferable way. The changes
are noted by an instance of
ASTRewrite
, the original
AST is left untouched. It is possible to create
more than one instance of
ASTRewrite
for the same
AST, which means that different change logs can
be set up. "Quick Fix" makes use of this API: this is how for every
Quick Fix
proposal a preview is created.
Example 1. Protocolling
changes to a AST by using ASTRewrite
.
MethodDeclaration md = ast.newMethodDeclaration();
md.setName(ast.newName("foo"));
ASTRewrite rewriter = ASTRewrite.create(ast);
ClassDeclaration td = (ClassDeclaration) cu.statements().get(0);
ITrackedNodePosition tdLocation = rewriter.track(td);
ListRewrite lrw = rewriter.getListRewrite(cu, Program.METHODS_PROPERTY);
lrw.insertLast(md, null);
MethodInvocation
,
the code would look like this:
rewrite.set(methodInvocation, MethodInvocation.NAME_PROPERTY, newName, null);or
rewrite.replace(methodInvocation.getName() /* old name node*/, newName, null)To set a simple property value, call
set()
like shown
above.
Let us have a look at the second way to change an AST. Instead
of tracking the
modifications in separate protocols, we directly modify the AST. The
only thing that
has to done before modifying the first node is to turn on the change
recording by calling
recordModifications()
on
the root of the AST, the
CompilationUnit
.
Internally changes are logged to an
ASTRewrite
as well,
but this happens hidden to you.
Example 2. Modifying an AST directly.
program.recordModifications();
AST ast = program.getAST();
EchoStatement echo = ast.newEchoStatement();
echo.setExpression(ast.newScalar(“Hello World“));
program.statements().add(echo);
The next section will tell how to write the modifications back into PHP source code.
Once you have tracked changes, either by using ASTRewrite
or by modifying the tree nodes directly, these changes can be written
back into PHP
source code. Therefore a TextEdit
object has to be
created. Here we leave the code related area of the AST, and enter a
text based
environment. The TextEdit
object contains character
based modification information. It is part of the
org.eclipse.text
plug-in.
How to obtain the
TextEdit
object differs
for the two mentioned ways only slightly:
If you used
ASTRewrite
, ask the
ASTRewrite
instance
for the desired
TextEdit
object by
calling
rewriteAST(IDocument, Map)
.
If you changed the tree nodes directly, the
TextEdit
object is
created by calling
rewrite(IDocument document, Map
options)
on
CompilationUnit
.
The first parameter,
document
, contains the
source code that will be modified. The content of
this container is the same code that you fed into the
ASTParser
. The second
parameter is a map of options for the
source code formatter. To use the default options, pass
null
.
Obtaining an
IDocument
if you parsed
source code from a
String
is easy: create an
object of the class
org.eclipse.jface.text.Document
and pass the code string as
constructor parameter.
If you initially parsed an existing PHP source file and would like to store the changes back into this file, things get a little bit more tricky. You should not directly write into this file, since you might not be the only editor that is manipulating this source file. Within Eclipse, PHP editors do not write directly on a file resource, but on a shared working copy instead.
ITextFileBufferManager bufferManager = FileBuffers.getTextFileBufferManager(); // get the buffer manager
IPath path = unit.getPHPElement().getPath(); // unit: instance of CompilationUnit
try {
bufferManager.connect(path, null); // (1)
ITextFileBuffer textFileBuffer = bufferManager.getTextFileBuffer(path);
// retrieve the buffer
IDocument document = textFileBuffer.getDocument(); (2)
// ... edit the document here ...
// commit changes to underlying file
textFileBuffer.commit(null /* ProgressMonitor */, false /* Overwrite */); // (3)
} finally {
bufferManager.disconnect(path, null); // (4)
}
path
can be obtained.getTextFileBuffer
.
From the
ITextFileBuffer
we get
the
IDocument
instance we
need.One of the most frustrating part of modifying an AST is the
comment handling. The method
CompilationUnit#getCommentList()
is used to return the list of comments
located in the compilation unit in the ascendant order. Unfortunately,
this list cannot be
modified. This means that even if the AST Rewriter is used to add a
comment inside a
compilation unit, the new comment would not appear inside the comments'
list.
In order to add a comment the following code snippet can be used:
Program astRoot= ... ; // get the current program
ASTRewrite rewrite= ASTRewrite.create(astRoot.getAST());
Block block= (TypeDeclaration) astRoot.statements().get(0).getBody();
ListRewrite listRewrite= rewrite.getListRewrite(block, Block.STATEMENTS_PROPERTY);
Statement placeHolder= rewrite.createStringPlaceholder("//mycomment", ASTNode.EMPTY_STATEMENT);
listRewrite.insertFirst(placeHolder, null);
textEdits= rewrite.rewriteAST(document, null);
textEdits.apply(document);
The methods Program#getExtendedLength(ASTNode)
and Program#getExtendedStartPosition(ASTNode)
can be used to retrieve the range of a node that would
contains preceding and trailing comments and whitespaces.
This article has shown how to use the Eclipse AST for static code analysis and code manipulation issues. It touched the PHP Model, explained Bindings and showed how to store changes made to the AST back into PHP source code.
For remarks, questions, etc. enter a comment in the bugzilla entry of this article [6].
[1] Download the Packed Example Project. Use the option "Existing Projects into Workspace" from the "Import" Wizard to add it to your workspace.
[2] To install the plug-in, obtain using the Eclipse Update Manager. Update Site: http://earticleast.sourceforge.net/update.
[4] AST Viewer Plug-in .
In the introduction, three typical cases for our example application have been presented (see the section called “Example Application”). Clarifying code before / after code snippets follow to further clarify these cases.
Removal of unnecessary declaration.
Before:
int x = 0;
...
x = 2 * 3;
After:
...
int x = 2 * 3;
Move of declaration.
Before:
int x = 0;
...
System.out.println(x);
...
x = 2 * 3;
After:
...
int x = 0;
System.out.println(x);
...
x = 2 * 3;
Move of a declaration of a variable, that is used within different blocks.
Before:
int x = 0;
...
try {
x = 2 * 3;
} catch (...) {
System.out.println(x);
}
After:
...
int x = 0;
try {
x = 2 * 3;
} catch (...) {
System.out.println(x);
}
IAnnotationBinding
IMemberValuePairBinding
IMethodBinding
IPackageBinding
ITypeBinding
IVariableBinding
Below the list of all classes of which simple property values can be instance of (in Eclipse version 3.2).
boolean
int
String
Modifier.ModifierKeyword
Assignment.Operator
InfixExpression.Operator
PostfixExpression.Operator
PrefixExpression.Operator
PrimitiveType.Code