Pages

From CSV to sum

A common, albeit dated, way of storing information in a file, is using CSV (Comma Separated Values) format. Say that you have lists of numbers stored in this way, and you have to provide a function that converts a string containing an unknown number of elements in its resulting sum.

The first idea that jumped to my mind involved std::accumulate(). It requires a standard container to work, so the input should be prepared converting the original string (of characters) to a vector (of doubles).

Not a bad try, but simply exposing the idea, I saw that the most tricky part of the job is not in summing the values, but in parsing the input string to extract the actual values to work with. And if parsing has to be, better using Spirit and Phoenix.

From a parser perspective, the focus is on the grammar that determine if we should accept or not the input. In our case, we could think to the input as a sequence of numbers containing at least an element, having as mandatory separator a comma, and white spaces as skip elements. The first element in the sequence will trigger a semantic action that would intialize the result, all the other elements will perform a semantic action to increase the sum.

So, the grammar should be something like:
double_[setSum] >> *(',' >> double_[incrSum])
Once we found out the grammar, large part of the job is done.

Here is a possible implementation of a function that checks a string for values in the expects format, and return a success flag and the sum of the values:
// ...
#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>

bool csvSum(const std::string& input, double& sum)
{
using boost::spirit::qi::phrase_parse;
using boost::spirit::qi::double_;
using boost::phoenix::ref;
using boost::spirit::qi::_1;
using boost::spirit::ascii::space;

std::string::const_iterator beg = input.begin();
bool r = phrase_parse(beg, input.end(),
double_[ref(sum) = _1] >> *(',' >> double_[ref(sum) += _1]), space);

if(beg != input.end())
return false;
return r;
}

If it is not clear to you what is going on in the code, I suggest you to check the previous post on Spirit and Phoenix.

The first semantic action gets the sum by reference, so that it could change it, and assign to it the double value as parsed by Spirit. The second semantic action is called for each subsequent element found by Spirit in the sequence, passing to it the parsed double value, that would be added to sum.

I based this post on a C++ source file provided by the original Boost Spirit Qi documentation.

No comments:

Post a Comment