Heisenbug

Ever since uprading to Fedora 20 “Heisenbug,” on a whim, I seleted the OpenJDK 1.8.0 pre-release as the default Java alternative. LibreOffice builds and runs fine with it (after some minor tweaks, at least). However, when I recently wanted to run one of LibreOffice’s CppUnit tests under Valgrind, I was surprised:

This was one of our “bigger” CppUnit tests, that instantiate half of the LibreOffice code base behind your back. And among other things, it instantiates an in-process JVM. Now, even with the JVM forced into interpreting-only mode, it is notorious for causing a few false positives in Valgrind, which you need to silence with a hand-crafted suppressions file. That file mentions the full path of libjvm.so (at least, mine does), so I knew I had to still adapt it to the exact path of the new OpenJDK 1.8.0 installation.

But looking at the Valgrind output, the (expected) failure stack traces all mentioned OpenJDK 1.7.0 instead of 1.8.0. What was going wrong there? I ran the test outside Valgrind and verified it did use the OpenJDK 1.8.0 libjvm.so. I even considered that Valgrind might be confused by the multiple Java debuginfo packages and resolve the stack trace symbols to wrong source files, and so removed all the relevant debuginfos, but—of course—to no avail.

When LibreOffice has no information yet which JRE to use, it searches various places to find the “best” JRE on the system. To do that, it runs the java -version command for each JRE it finds. Now, even when Valgrind reports no errors, it still prints a few informational lines to each traced prcoess’s stderr. And as we run Valgrind with --trace-children=yes, it did so for each java -version process, whose stdout/-err output LibreOffice’s jvmfwk code inspects, and which in turn got so confused by the unexpected stderr lines that it decided to better go with the tried-and-trusted 1.7.0 JRE rather than the cutting-edge 1.8.0 (“internal”) one…

This only hit when running one of the CppUnit tests, not when running LibreOffice itself under Valgrind. For one, unlike those “stateless” unit tests, LibreOffice remembers any choice of JRE in its configuration data. For another, there is some --trace-children-skip=*/java,*/gij in place to not trace into any java -version processes when LibreOffice is run under Valgrind, and thus avoid the whole hassle in case LibreOffice does run the JRE detection code.

So, once the phenomenon was understood, the fix was easy.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s