Saturday, July 21, 2012

Two Java Gotchas that Keep Tripping me

I have a principle that if I make the same mistake twice, I put it on my blog to remind myself. The two that tripped me up recently are:

1. Java Regex

If you try to search for some text that's broken over two lines, you need to tell java.util.regex.Pattern. For instance, given the text:


This is a piece of text
with multiple lines and this text:
string to find

Then you might find a regular expression looking for the pattern text:.string (not the wild card after the colon) would find the end of the second and beginning of the third line. This is not the default behaviour. In fact, this test fails:


        String toFind = "text:.string";
        String toSearch = "This is a piece of text\nwith multiple lines and this text:\nstring to find\n";
        Pattern pattern = Pattern.compile(toFind);
        Matcher matcher = pattern.matcher(toSearch);
        Assert.assertTrue(String.format("Can't find [%s] in [%s]", toFind, toSearch), matcher.find());

You need to enable DOTALL mode with:

        Pattern pattern = Pattern.compile(toFind, Pattern.DOTALL);

Now, your regex will match.

2. XML Namespace and Java Parsers

I wasted most of a day this week wondering what was wrong with my XML when I tried to validate it against what looked like a perfectly good piece of XSD. It seems that the default behaviour of a parser is to ignore namespaces. I don't know why. Presumably it's due to historical reasons. Anyway, you need to add:

aSAXParserFactory.setNamespaceAware(true);

to your code (see the JavaDocs).

It was a colleague who told me about this. When I asked how he knew that, he said I had told him a few months before. Hence this blog entry.

No comments:

Post a Comment