Friday, June 10, 2011

DXR updates

It's been almost a week since I last discussed DXR, but, if you look at the commit log, you can see that I've not exactly been laying back and doing nothing. No, the main reason is because most of the changes I've been doing so far aren't exactly ground-breaking.

In terms of UI, the only thing that's changed is that I know actually link the variable declarations correctly, a result of discovering the typo that was causing it not to work properly in the midst of the other changes. From a user's perspective, I've also radically altered the invocation of DXR. Whereas before it involved a long series of steps consisting of "build here, export these variables, run this script, build there, run that script, then run this file, and I hope you did everything correctly otherwise you'll wait half an hour to find you didn't" (most of which takes place in different directories), it's now down to . dxr/setup-env.sh (if you invoke that incorrectly, you get a loud error message telling you how to do it correctly. Unfortunately, the nature of shells prohibits me from just doing the right thing in the first place), build your code, and then python dxr/dxr-index.py in the directory with the proper dxr.config. Vastly improved, then. But most of my changes are merely in cleaning up the code.

The first major change I made was replacing the libclang API with a real clang plugin. There's only one thing I've found harder (figuring out inheritance, not that I'm sure I've had it right in the first place), and there are a lot of things I've found easier. Given how crazy people get with build systems, latching onto a compiler makes things go a lot smoother. The biggest downside I've seen with clang is that it's documentation is lacking. RecursiveASTVisitor is the main visitor API I'm using, but the documentation for that class is complete and utter crap, a result of doxygen failing to handle the hurdle of comprehending macros. I ended up using the libindex-based dxr to dump a database of all of the functions in the class, of which there are something like 1289, or around 60 times the functions listed in the documentation. Another example is Decl, where the inheritance picture is actually helpful. It, however, manages to have failed to document DeclCXX.h, which is a rather glaring omission if what you are working with is C++ source.

The last set of changes I did was rearchitecting the code to make it hackable by other people. I have started on a basic pluggable architecture for actually implementing new language modules, although most of the information is still hardcoded to just use cxx-clang. In addition, I've begun work on ripping out SQL as the exchange medium of choice: the sidebar list is now directly generated using the post-processing ball of information, and linkification is now set up so that it can be generated independent of tokenization. In the process, I've greatly sped up HTML generation times only to regress a lot of it by tokenizing the file twice (the latter part will unfortunately remain until I get around to changing tokenization API). It shouldn't take that much longer for me to rip SQL fully out of the HTML builder and shove SQL generation into a parallel process for improved end-to-end time.

Some things I've discovered about python along the way. First, python closures don't let you mutate closed-over-variables. That breaks things. Second, if you have a very large python object, don't pass it around as an argument to multiprocessing: just make it a global and let the kernel's copy-on-write code make cloning cheap. Third, apparently the fastest way to concatenate strings in python is to use join with a list comprehension. Finally, python dynamic module importing sucks big time.

Where am I going from here? I'll have HTMLification (i.e., remove SQL queries and replace with ball lookups) fixed up tomorrow; after that, I'll make the cgi scripts work again. The next major change after that is getting inheritance and references in good shape, and then making sure that I can do namespaces correctly. At that point in time, I'll think I'll be able to make a prerelease of DXR.

3 comments:

jmdesp said...

Did you get a look at Sam Price's fork on github ? Unfortunetly his changes on the database are quite diverging from yours although they seem to correct interesting things

Joshua Cranmer said...

I've been following it haphazardly; while those changes are interesting and I plan to accommodate them in the mid-term, I don't have an easy way to generate said data in the near future.

Jim Rhodes said...
This comment has been removed by the author.