Monday, April 12, 2010

"SIMPLE".toLowerCase() is simple, right?

It turns out that "SIMPLE".toLowerCase().equals("simple") is not true if your default locale is Turkish, but your code is written in English. Turkish has two "i" characters, one with a dot and one without, which throws the above code off balance. The fix is to write the expression either as "SIMPLE".toLowerCase(Locale.ENGLISH).equals("simple") or even better as "SIMPLE".equalsIgnoreCase("simple").

I just stumbled on this issue with Apache Tika (see TIKA-404), and it seems like I'm not the only one.

2 comments:

  1. oh, everyone hits that once, from then on you learn to use commons-lang for your case work, and to point out the problems to others:
    https://issues.apache.org/jira/browse/HADOOP-6657

    IntelliJ IDEA can let you flag up any suspect case conversions; would be nice if there was a compiler option to disable case conversion without explicit locale settings

    ReplyDelete
  2. Uuhhm, I guess that's the reason why they all teach you not to use lower/upperCase conversion methods without a locale. Either Checkstyle or FindBugs even reports this with a warning if I'm not mistaken.

    ReplyDelete