A Fun Little Issue with ICU4J MessageFormat

June 1st, 2015 Permalink

The International Components for Unicode (ICU) project provides localization tools for C (ICU4C) and Java (ICU4J). It was while working with IC4UJ 55.1 in a Scala project that I ran into an interesting - and briefly very aggravating - issue, outlined herein. Among the many items in the ICU4J toolkit is an improved replacement for the standard MessageFormat class. MessageFormat allows the creation of patterns and the substitution of variables. Among these patterns are those used to handle strings containing numeric variables in which plurality must be accounted for. This is a common situation when localizing content. A simple example is as follows, suitable for languages following English-like rules for singular versus plural numbers:

{count, plural, one {{count} thing} other {{count} things} }

The formatted string can then be created as follows, in this case returning "15 things":

var formattedString = new MessageFormat(
  "{count, plural, one {{count} thing} other {{count} things} }", 
  locale
).format(Map(
  "count" -> 15
).asJava)

The Issue

Most of the ICU4J code relating to the plural format can be found in the PluralFormat and PluralRules classes. Somewhere in there is an algorithm easily confused by the use of similar variable names in a pattern, however. The issue I encountered was prompted by a pattern of the following format:

{messageCount, plural, one {{message} (one)} other {{message} (other)} }

We were trying to switch between two different messages that had substantially similar content. Therefore that common content was pulled out into a separate variable. The original author had made the unfortunate choice of naming the variable "message" so that it was a substring of the numeric variable "messageCount". The outcome of this is that PluralFormat reaches an error condition that writes out the number rather than than the formatted string. You can see that in the source code:

    private String format(Number numberObject, double number) {
        // If no pattern was applied, return the formatted number.
        if (msgPattern == null || msgPattern.countParts() == 0) {
            return numberFormat.format(numberObject);
        }

An Example

You can try this out for yourself in the Scala REPL:

# Install Scala.
sudo apt-get install scala
# Obtain a copy of ICU4J.
wget http://download.icu-project.org/files/icu4j/55.1/icu4j-55_1.jar
# Run the scala REPL.
scala -classpath ./icu4j-55_1.jar

Copy in the following code. The resulting value should be "A common message for all options. Selected: (other)" but is in fact "900 (other)."

import com.ibm.icu.text.MessageFormat
import java.util.Locale
import scala.collection.JavaConverters._

val formatted = new MessageFormat(
  "{messageCount, plural, one {{message} (one)} other {{message} (other)} }",
  new java.util.Locale("en_US")
).format(Map(
  "messageCount" -> 900,
  "message" -> "A common message for all options. Selected: "
).asJava)

The Workaround

Since we were under a deadline, we didn't go much further in our explorations than managing to stumble upon the right workaround fairly quickly - a better alternative at the time than setting up to vet ICU4J in source code and a debugger in order to characterize the issue. That workaround is to avoid naming your plural format variables such that one is a substring of the other. The following works just fine for example:

val formatted = new MessageFormat(
  "{messageCount, plural, one {{aMessage} (one)} other {{aMessage} (other)} }",
  new java.util.Locale("en_US")
).format(Map(
  "messageCount" -> 900,
  "aMessage" -> "A common message for all options. Selected: "
).asJava)

The Bug Ticket

I may not have had time to chase this down to its root cause, but that won't be true for the ICU4J maintainers. I opened ticket #11716 to cover the problem, and if we're all lucky it'll be fixed for the next update.