Pages

Splitting a string with boost::tokenizer

As promised, let's see a way to get rid of sscanf for a C++ Boost safer way to split a string. Being the case in question very easy, it is not worthy to use the powerful Spirit library, better to use tokenizer, expressely designed for cases like these.

Here is the offending original function that we want to refactor:
int getValue(char* buffer)
{
    int code, value, index;
    sscanf(buffer, "%d:%d:%d",&code, &value, &index);
    std::cout << code << " [" << index << "]: " << value << std::endl;

    return value;
}
It could be unsafe, but it is simple to write and understand. The Boost tokenizer is much more flexible and safe. And a bit more complicated:
int getValue(char* raw)
{
    std::string buffer(raw);
    boost::char_separator<char> separator(":"); // 1

    typedef boost::tokenizer<boost::char_separator<char> > MyTokenizer; // 2
    MyTokenizer tok(buffer, separator);
    std::vector<std::string> tokens; // 3
    std::copy(tok.begin(), tok.end(), std::back_inserter<std::vector<std::string> >(tokens)); // 4
    if(tokens.size() != 3) // 5
        return 0;

    std::cout << tokens[0] << " [" << tokens[2] << "]: " << tokens[1] << std::endl;
    return atoi(tokens[2].c_str()); // 6
}
1. The separators are passed to the tokenizer in a string, each character passed is considered as a valid separator. In this case we need only colon.
2. A typedef makes the rest of the code more readable. boost::tokenizer is a template class that could be used "as is". Here we specify the separator, so that we can pass an instance of it to the actual tokenizer we are going to use.
3. A vector is used to keep the tokens resulting from the split.
4. Remember that you have to go through an inserter, to allocate memory for the new object in the vector.
5. Usually, when something unexpected happens, a good idea is throwing an exception. Here I assume returning zero is enough.
6. Only the "value" tokes is converted to int, just before returning it to the caller.

The resulting code is longish, but mainly because I aimed to make it as readable as I could.

No comments:

Post a Comment