Sonntag, 13. Mai 2007

querying source code and other artifacts

Now that agreement exists on the parsed representation of Java source code (the JDT ASTs), non-standardization has moved to the languages to query such ASTs. What is querying over ASTs good for? Examples:

  • checking coding styles (such as naming conventions)
  • fault detection (to discover bugs at development time)
  • refactoring (to detect code smells, optimise and improve code design)
  • metrics (for measuring the complexity of the code)
  • aspect weaving (to identify join point shadows of interest)

Let's compare some Eclipse plugins addressing this field. For each alternative below an example is included to convey a flavor of the language (I haven’t been able to find the same query being expressed in all languages, that would ease comparison):

a) SCL
"Constructors must not invoke overridable methods, directly or indirectly, that access a (potentially uninitialized) instance variable"


b) PMD,
"While loops must use braces”

public class WhileLoopsMustUseBracesRule
extends AbstractRule {
public Object visit(ASTWhileStatement node, Object data) {
SimpleNode firstStmt = (SimpleNode)node.jjtGetChild(1);
if (!hasBlockAsFirstChild(firstStmt)) {
addViolation(data, node);
}
return super.visit(node,data);
}
private boolean hasBlockAsFirstChild(SimpleNode node) {
return (node.jjtGetNumChildren() != 0 && (node.jjtGetChild(0) instanceof ASTBlock));
}
}



c) JQuery
"List all public Getters"



d) CodeQuest
"Lookup all implementations of an abstract method"


?query3(M1,M2) :- hasStrModifier(M1,'abstract'), overrides(M2,M1), NOT(hasStrModifier(M2, 'abstract')).

overrides(M1,M2) :- strongLikeThis(M1,M2), hasChild(C1,M1),
hasChild(C2,M2), inheritableMethod(M2),
hasSubtypePlus(C2,C1).


The above list is incomplete, and that’s a sign of duplicate work: queries to detect particular code-smells are being expressed (slightly differently?) in different notations.

Bringing it down to a few words, the pros of the different approaches are:
  • SCL: harmonizing formality (the definition of SCL is formal) and readability
  • PMD: letting the user add new queries (if familiar with the detailed structured of ASTs)
  • JQuery: provide Eclipse-integrated structured viewers to display (and thus navigate) results
  • CodeQuest: efficient query-answering (Datalog queries are quite proficiently optimized by industrial-strength RDBMSs)

So each product has its stronghold. In the next blog entry, I’ll comment on yet another approach to querying software artifacts, comparing how if fares to those mentioned here. We’re dealing with conflicting requirements (for example, expressiveness vs. efficiency) so any choice will involve some trade-offs.

References

[1] SCL, http://people.clarkson.edu/%7Edhou/projects/SCL.html
[2] PMD, http://pmd.sourceforge.net/
[3] JQuery, http://jquery.cs.ubc.ca
[4] CodeQuest, http://progtools.comlab.ox.ac.uk/projects/codequest

Additional projects similar to all of the above are listed at:
http://pmd.sourceforge.net/similar-projects.html

Keine Kommentare: