Sonntag, 13. Mai 2007

cross-artifact queries

While extremely useful in themselves, the tools discussed in the previous entry are limited to querying source-code artifacts only: configuration files, plugin.xml, etc. are left out (precisely, those for which checks such as referential integrity would be most useful). Examples of referential integrity for Struts and Eclipse Workbench plugins are provided by Michał Antkiewicz here (not in OCL, unfortunately, although OCL has enough expressive power to encode the consistency he considers)

Software artifacts, once available as instances of Ecore-based model, can be queried with OCL. Some useful links for this task:
  1. an Ecore-based metamodel of Java (without OCL well-formedness rules) has been contributed by Mikael Barbero (INRIA) and can be found here (for a tree-based visualization follow this link)

  2. the MM above can be used in conjunction with this plugin, which instantiates such metamodel for a compilation unit of choice (using the JDT infrastructure)

  3. more sample code for querying the “raw” AST offered by the JDT can be found in ASTView. In order to understand in detail the type information of a given Java file, these tutorial slides prove extremely useful.

Querying a Java AST is usually cumbersome given its detailed nature. As in databases, having a stock of parameterized queries (to filter only those items useful for the task at hand) goes a long way towards improving usability.

It looks to me that until the following critical mass is necessary:
  • libraries of metamodels (with WFRs)
  • plugins to parse software artifacts into those metamodels
  • pre-defined queries for repetitive tasks on those metamodels
  • efficient mechanisms to evaluate queries, author them, and to visualize their results
in order to realize the whole potential of software repositories. Wow, that sounds like a "research agenda" ... spooky!

"Watch this space!" :)

querying source code and other artifacts

Now that agreement exists on the parsed representation of Java source code (the JDT ASTs), non-standardization has moved to the languages to query such ASTs. What is querying over ASTs good for? Examples:

  • checking coding styles (such as naming conventions)
  • fault detection (to discover bugs at development time)
  • refactoring (to detect code smells, optimise and improve code design)
  • metrics (for measuring the complexity of the code)
  • aspect weaving (to identify join point shadows of interest)

Let's compare some Eclipse plugins addressing this field. For each alternative below an example is included to convey a flavor of the language (I haven’t been able to find the same query being expressed in all languages, that would ease comparison):

a) SCL
"Constructors must not invoke overridable methods, directly or indirectly, that access a (potentially uninitialized) instance variable"


b) PMD,
"While loops must use braces”

public class WhileLoopsMustUseBracesRule
extends AbstractRule {
public Object visit(ASTWhileStatement node, Object data) {
SimpleNode firstStmt = (SimpleNode)node.jjtGetChild(1);
if (!hasBlockAsFirstChild(firstStmt)) {
addViolation(data, node);
}
return super.visit(node,data);
}
private boolean hasBlockAsFirstChild(SimpleNode node) {
return (node.jjtGetNumChildren() != 0 && (node.jjtGetChild(0) instanceof ASTBlock));
}
}



c) JQuery
"List all public Getters"



d) CodeQuest
"Lookup all implementations of an abstract method"


?query3(M1,M2) :- hasStrModifier(M1,'abstract'), overrides(M2,M1), NOT(hasStrModifier(M2, 'abstract')).

overrides(M1,M2) :- strongLikeThis(M1,M2), hasChild(C1,M1),
hasChild(C2,M2), inheritableMethod(M2),
hasSubtypePlus(C2,C1).


The above list is incomplete, and that’s a sign of duplicate work: queries to detect particular code-smells are being expressed (slightly differently?) in different notations.

Bringing it down to a few words, the pros of the different approaches are:
  • SCL: harmonizing formality (the definition of SCL is formal) and readability
  • PMD: letting the user add new queries (if familiar with the detailed structured of ASTs)
  • JQuery: provide Eclipse-integrated structured viewers to display (and thus navigate) results
  • CodeQuest: efficient query-answering (Datalog queries are quite proficiently optimized by industrial-strength RDBMSs)

So each product has its stronghold. In the next blog entry, I’ll comment on yet another approach to querying software artifacts, comparing how if fares to those mentioned here. We’re dealing with conflicting requirements (for example, expressiveness vs. efficiency) so any choice will involve some trade-offs.

References

[1] SCL, http://people.clarkson.edu/%7Edhou/projects/SCL.html
[2] PMD, http://pmd.sourceforge.net/
[3] JQuery, http://jquery.cs.ubc.ca
[4] CodeQuest, http://progtools.comlab.ox.ac.uk/projects/codequest

Additional projects similar to all of the above are listed at:
http://pmd.sourceforge.net/similar-projects.html

Mittwoch, 9. Mai 2007

an OCL bibliography

Some time ago (two years in fact) I compiled a bibliography of papers, with a focus on showing best-practices around usage of OCL. Surprisingly, most references are still relevant. I hope this list can be useful as a starting point for further exploration. The categories are listed in no particular order, with a selection of papers for each.

Configuration Management

OCL and AspectJ


Refactoring OCL, OCL rewriting

OCL for software artifacts


OCL for modeling of Software Architecture

ORDBMS-based software repositories


OCL codifying business rules

Translation of OCL to XQuery


Generation of proof conditions

verbalizing OCL

Tools to "read aloud" a formal specification belong to the basic repertoire of the (Controlled) Natural Language research community. "Reading aloud" more like paraphrasing, e.g.

∀x.Number(x)∧Prime(x)∧LessThan(x,3)⇒Even(x)

can be rewritten into
Every prime number less than 3 is even

A tool along these lines for OCL is OCLNL. Given this OCL as input:

context Copy

inv: Copy.allInstances()->forAll(c1,c2
not (c1=c2) implies not (c1.barCode=c2.barCode))


The verbalization computed by OCLNL is:

for the class Copy the following invariants hold :

  • for all copies c2 , c1 in the set of all instances of Copy

if it is not the case that c1 is equal to c2 then this implies that it is not the case that the bar code of c1 is equal to the bar code of c2

OCLNL has been applied to JavaCard APIs (security aspects mostly) and similar real-world case studies. Papers and theses can be found in the OCLNL homepage , the technical doc makes us aware to the fact that the core of OCLNL has been implemented in Haskell (but hey, if you've mastered Java's type system then you've accomplished more than you need to learn Haskell). The input models are not Ecore-based.

One quick way to get your hands dirty in this playground (verbalization of Ecore + OCL models) consists in detecting patterns in OCL Abstract Syntax Trees (ASTs) for which there's an (empirically convenient) English-language paraphrasing. For example, there's one such pattern behind the invariant above, that allows its translation to the more compact:


Two different copies have different bar codes


In a way, that's not playing fair, because covering just some such patterns from all those that may show up in valid OCL ASTs does not get the full job done. Still, it's interesting if you want to try your hand at processing OCL ASTs ... which brings me to the real plunge I wanted to make: take a look at my article on that very subject, How to process OCL Abstract Syntax Trees , and its accompanying Eclipse-based plugin. With it, you can do cool stuff such as:



'nuff said! Actually a big advantage of turning a formal spec into controlled natural language is the larger audience that can review it (and thus find errors). This aspect is highlighted in David Burke's master thesis, an excerpt follows:


May be it's just me, but I think it would be great if someone took up the task of performing similar OCL AST processing for Eclipse!

Montag, 7. Mai 2007

OCL in Topcased 1.0.0M3, very first impressions

It all started with exploring the OCL editor:



(including the context menu "Evaluate rule on model", appears after right-clicking an OCL expression)

Then I noticed other use cases:
  • OCL Model Statistics
  • OCL Model Checker
which can be accessed through



Hopefully more details will appear once I'm done with the source code,



My conclusion so far is: Hey, this kind of blogging is not that time consuming!