Wednesday, July 22, 2009

A guide to pork, part 2

Last time, I covered the very basics of using pork. In this portion of the guide, I will cover enough to get you to be able to write a small patch.

Since the time I wrote the first part of the guide, Chris Jones committed some tool wrappers known collectively as porky, which may necessitate updates to first steps.

In summary:
Step 1: Building and running your tool
Step 1.1: Running the patcher

Step 2: Using the patcher

The patcher works internally (more or less) by keeping a list of ranges and their replacement text, which it eventually uses to build up hunks that it then spits out to an output stream. The public API it provides (as of the current tip, in any case) comes in two sections: some file utility functions and text replacement functions.

Locations can be represented by one of three different types. The first is SourceLoc, which is a bit-packed integer that the elsa AST nodes give you. Then there is CPPSourceLoc, which is an only slightly less manageable location format. The final form is UnboxedLoc, which is the easiest one to work with.

As I mentioned earlier, the patcher actually works with pairs of these objects. PairLoc and UnboxedPairLoc are pairs of CPPSourceLoc and UnboxedLoc, respectively. The two are constructed in a pretty intuitive manner (although note that as UnboxedLocs do not store the file, you need to pass that into its pair type). Note that ranges include the left but not the right endpoint.

The class Patcher itself contains two methods for patching stuff: printPatch, which replaces text, and insertBefore, which inserts the text before a location. If you want to delete text, the answer is to replace a range with the empty string.

While this is nice, the patcher does suffer from a few flaws. The biggest of these that I've found is really a flaw in elsa: not all nodes have source and end locations (only statements and expressions), requiring me to roll my own search functions. Fortunately, the file API of patcher helps here.

The other big flaw is the difficulty of coping with visually important but semantically meaningless clues, namely comments and whitespace. If you naïvely delete text, you may end up with comments whose referents no longer exist or blocks of whitespace where code once was. Inserted text may violate local code conventions. I have not yet expended the effort yet to get this to work; you will either have to do this yourself, bug taras to do it, or possibly both.

Now, if you want to see some code in action:


// Here, func is a pointer to an elsa AST expression node
// And type a string representing its replacement
// patcher is of course a Patcher object.
patcher.printPatch(type, PairLoc(func->loc, func->endloc));

// Elsewhere
UnboxedPairLoc findAndMakePair(Patcher p, const SourceLoc &loc,
    char toFind) {
  int lLine, lCol;
  StringRef file;
  sourceLocManager->decodeLineCol(loc, file, lLine, lCol);
  int lineNo = lLine, col;
  do {
    std::string line = patcher.getLine(lineNo++, file);
    col = line.find(toFind);
  } while (col == -1);

  return UnboxedPairLoc(file, UnboxedLoc(lLine, lCol),
    UnboxedLoc(lineNo - 1, col + 2));
}

Step 3: The structure of the Elsa AST

The core of pork is the ability to parse AST nodes. In general, these fall under three categories: top-level declarations (possibly within classes or namespaces), statement and expression nodes, and utility nodes.

The basic structure of an AST node class is like this:


// A typical node type
class TypeSpecifier {
public:
  // Almost all nodes have these
  // Those that don't wouldn't make sense
  SourceLoc loc;

  // These methods are for nodes with subtypes
  // if returns null if it isn't the correct type; as throws
  char const *kindName() const;
  TS_name const *ifTS_nameC() const;
  TS_name *ifTS_name();
  TS_name const *asTS_nameC() const;
  TS_name *asTS_name();
  bool isTS_name() const;

  // There's another parameter that you'll never use
  void debugPrint(std::ostream &, int indent);
  void traverse(ASTVistor &vis);
};
class TS_name: public TypeSpecifier {
public:
  // Typically has some more data nodes
  PQName *name;
  bool typenameUsed;
};

To use these nodes, pork follows a typical visitor pattern. The class ASTVisitor will visit all of the node types; ExpressionVisitor subtypes have individual methods for visiting subtypes of statements or expressions. You can choose to look at nodes in either a pre or postorder traversal. A previsit traversal function is in the form:
virtual bool visitTypeSpecifier(TypeSpecifier *);
(where the return is whether or not to dig down deeper), and a postvisit in the form:
virtual void postvisitTypeSpecifier(TypeSpecifier *);.

Hopefully, this is enough to get you started on being able to use pork. In my next part, I will cover the AST nodes in more detail.

1 comment:

Anonymous said...

Not to be an asshat or anything, but I'd never read something titled: "A guide to pork"