Test hello rapidxml

This include-only C++ library is a bit outdated, still it is commonly used when the job be done fast and requirements are not too sophisticated. You can get the source code from sourceforge were you could find also a slim technical manual.

There are at least a couple of rapidxml characteristics you should be aware before start working with it.

Rapidxml parsing is destructive. The xml_document::parse() method gets in input a non-constant C-string of characters, that it uses as an its own internal buffer. If you want to keep your XML as it is, you'd better pass in a copy of it.

Preconditions are usually checked with assertions. Exceptions are thrown from the xml_document::parse() method only. Be careful in testing what you are passing to an asserting function (for instance, xml_node::last_node() requires the node to have at least a child (it asserts its first_node is not NULL), and try/catching the parse call.

I have written a test case (using the Google Test framework) that shows how to parse a simple XML and to read the information in it. Notice that I just read a document, without performing any editing on it, this keeps the example simple enough.
#include "rapidxml/rapidxml.hpp"
#include <gtest/gtest.h>

TEST(RapidXml, simple)
{
  char buffer[] = "<root><first>one</first><second>two</second><third>whatever</third></root>"; // 1

  rapidxml::xml_document<char> doc; // 2
  ASSERT_NO_THROW(doc.parse<0>(buffer)); // 3

  rapidxml::xml_node<char>* root = doc.first_node(); // 4
  ASSERT_TRUE(root);
  ASSERT_STREQ("root", root->name()); // 5

  bool fields[4] {}; // 6
  for(rapidxml::xml_node<char>* node = root->first_node(); node != NULL; node = node->next_sibling()) // 7
  {
    if(strcmp(node->name(), "first") == 0) // 8
    {
      ASSERT_STREQ("one", node->value());
      fields[0] = true;
    }
    else if(strcmp(node->name(), "second") == 0)
    {
      ASSERT_STREQ("two", node->value());
      fields[1] = true;
    }
    else if(strcmp(node->name(), "third") == 0) // 9
    {
      fields[2] = true;
    }
    else // 10
    {
      fields[3] = true; // unexpected!
      std::cout << "Unexpected node: " << node->name() << std::endl;
    }
  }

  EXPECT_TRUE(fields[0]); // 11
  EXPECT_TRUE(fields[1]);
  EXPECT_TRUE(fields[2]);
  EXPECT_FALSE(fields[3]);
}
1. Remember that rapidxml is going to change this C-string (NULL-terminated array of characters) for its own purposes.
2. The xml_document template class has a template parameter that defaults to char. If you want to save some typing you can rewrite this line without specifying the parameter, and using the char default:
rapidxml::xml_document<> doc
3. xml_document::parse() expects an int as template parameter, pass zero to get the default behavior. In your code you should try/catch this call for rapidxml::parse_error exception (it extends the std::exception). Here I assert that it should not throw.
4. xml_document IS-A xml_node, so I call on doc the xml_node::first_node() method to get the first document child. If doc has no child, first_node() returns a NULL pointer, otherwise we have a pointer to that node.
5. I expect the root to be there, so I assert that it is not zero (AKA false), then I get its name and I assert it is as expected. xml_node IS-A xml_base, where we can see that the name() method never returns NULL, if the node has no name, an empty C-string is returned instead.
6. Root has three children. I want to ensure I see all of them and nothing more. This bunch of booleans keeps track of them. They are all initialized to false (through the handy C++ empty list initializer) and then, in the following loop, when I see one of them I set the relative flag to true. There are four booleans, and not three, because I want to flag also the case of an unexpected child.
7. The for-loop is initialized getting the first root child, then we get the next sibling, until we reach the end of the family (a NULL is returned). We should pay attention using xml_node::next_sibling(), since it asserts when the current node has no parent. But here we call next_sibling() on a node that is surely a children of another node.
8. For first and second node, we want to ensure it has a specific value, hence the assertion.
9. The third node could have any value, I just set the flag when I see it.
10. In case an unexpected node is detected, I keep track of this anomaly setting the relative flag.
11. Check if the expectations are confirmed.

Go to the full post

Commuting Engineer CodeEval problem

We have the coordinates of a bunch of places that we want to visit. Being nerds, we are not happy if we don't generate an algorithm to determine our path. But, the nature of the problem is such that we are ready to deal with a loose approximation to a good solution. Frankly speaking, we are aiming to get what could look, at least at first sight, as a not-so-bad solution.

You can get a more detailed description of the problem on the CodeEval blog. There you can even enter your solution that would be used (at least at the time when I am writing this post) as a first screening for some job interview.

You can't use the solution I am proposing here for a few good reason. Firstly, it is not a good idea to use a piece of code written by someone else to represent you. Secondly, I have written and tested my code for C++11 on GCC 4.8, that is a bit too modern stuff for the current CodeEval requirements. And thirdly, come on, you don't want to give away a good chance of having some fun writing your own solution.

As I said, I am not aiming to the best possible solution, and this is not caused by sloppiness, there are good reasons for that. Let's see a couple of them.

NP-hard

If you have some knowledge of theory of computation, you should have recognized the problem as an instance of the well known Traveling Salesman Problem, commonly called just TSP. It is an interesting problem because it can be stated in a few words, it looks very easy indeed, but it comes out to be an NP-hard (Non-deterministic Polynomial-time hard) one. That means, forget about to come out with an elegant solution.

In the real life, what I would do if I had to solve a problem like that, is checking for a library provinding some adequate algorithm. For instance, you could have a look to BGL, the Boost Graph Library.

But here we can't use external libraries, we have to rely just on the standard ones. So, what I'll do, is implementing a greedy algorithm, choosing any time the local best solution. It is easy to show as such a strategy is heavily flawed, but at least it assure us we get a solution that is not the worst one, and it does it in a reasonable time.

Geography

As you should know, Earth is not flat. We usually think to it like a sort of sphere, but also this one is nothing more than a weak approximation. This implies that we shouldn't consider the coordinates of a place as they were on a two-dimensional surface.

Besides, we are going to check the distance as if we could go in a straight line from one point to the other, and this is usually not the case.

Splitting the problem

I have reduced the original problem to something that I can actually solve with a relatively simple piece of code. Now I split it in a few simpler problems, for each of them I could write a function that solve it.

Parsing the input

Our input is a number of strings, each of them like this one:
1 | CodeEval 1355 Market St, SF (37.7768016, -122.4169151)
We are interested in its ordinal number, and in the place longitude and latitude.

I decided to organize my data using STL pair's and a vector, and I gave them names that hopefully help to understand better what going on in the code:
using Position = std::pair<double, double>;
using Location = std::pair<int, Position>;
using Locations = std::vector<Location>;
Given that, what I want is to extract a Location from each input string, that is going to be pushed in a Locations container. This is the declaration of the function I am thinking of:
Location parse(const std::string& input);
This test case (written for Google Test) shows how I expect it to behave:
TEST(CommEng, Parse1)
{
    std::string input("1 | CodeEval 1355 Market St, SF (37.7768016, -122.4169151)");
    Location loc = parse(input);

    EXPECT_EQ(1, loc.first);
    EXPECT_DOUBLE_EQ(37.7768016, loc.second.first);
    EXPECT_DOUBLE_EQ(-122.4169151, loc.second.second);
}
Notice I use the EXPECT_DOUBLE_EQ() gtest macro to check the actual value extracted from the input string. This is to avoid, or at least reducing, rounding problems. What I basically do using this macro is delegating to GoogleTest the job of choosing an appropriate epsilon that determines when the two compared values are considered about equal.

Here is my function implementation:
Location parse(const std::string& input)
{
    int nr = std::stoi(input); // 1

    std::string::size_type bracket = input.find('('); // 2
    if(bracket == std::string::npos)
        return {}; // 3

    std::string::size_type comma = input.find(',', bracket);
    if(comma == std::string::npos)
        return {};

    double lon = std::stod(input.substr(bracket + 1)); // 4
    double lat = std::stod(input.substr(comma + 1));
    return { nr, {lon, lat}}; // 5
}
1. stoi() is the standard C++11 function similar to old atoi() but having as input parameter an STL-string and not a C-string. Here I am extracting the place ordinal number, that is expected to be right at the beginning of the string. Real code should be more robust, and be ready to (probably) throw an exception.
2. Just a minimal error checking, if no open bracket and following comma is found, an empty Location is returned.
3. Maybe is worthy to remember that this C++11 notation means "call the default ctor for the expected object". So, what I am doing here is building an "empty" Location.
4. Extract the input substring from the expected position (again, more error handling required in production code), than convert it to double using the C++11 stod() function.
5. Construct a Location object and return it to the caller.

Approximated distance

Just apply the pythagorean theorem to calculate the distance between to positions. For what I have said above, the result should be considered just an approximation:
double distance(const Position& beg, const Position& end)
{
    return std::sqrt(std::pow(beg.first - end.first, 2) + pow(beg.second - end.second, 2));
}
The closest point

My greedy algorithm needs to identify the closest Location to a specific Position:
Locations::iterator findClosest(Position& beg, Locations& others)
{
    double closest = std::numeric_limits<double>::max(); // 1
    Locations::iterator pos = others.end();
    for(Locations::iterator it = others.begin(); it != others.end(); ++it) // 2
    {
        double current = distance(beg, it->second); // 3
        if(current < closest)
        {
            closest = current;
            pos = it;
        }
    }

    return pos; // 4
}
1. Initially I set as a solution as "nothing sensible", so the closest distance found is set the the biggest double number available, and the found position to invalid - the end() iterator.
2. Loop on all the available other points.
3. Calculate the current distance, if I found a good candidate, mark it as such, and check if there is anything better.
4. Return the iterator to the best solution I found.

Generating a path

The core of my algorithm is a function that gets in input a container of Locations and gives back a vector containing the path, where each step is identified by the Location descriptor.
std::vector<int> getPath(const Locations& input)
{
    if(input.empty()) // 1
        return {};

    Locations locations(input); // 2
    std::vector<int> results;
    Locations::iterator current = locations.begin(); // 3
    do { // 4
        Position curPos = current->second; // 5
        results.push_back(current->first);
        locations.erase(current);

        current = findClosest(curPos, locations); // 6

    } while(!locations.empty());

    return results;
}
1. Trivial case, nothing in input, nothing in output.
2. Create an input local copy, since I am about to modify it.
3. As for requirements, the path should start from the first element in the provided list.
4. Loop until all the input is consumed.
5. Erase the current position from the list of Locations that I haven't visited yet, but before that, push its descriptor in the output vector.
6. Find the next "current" element.

Full C++11 source code, and some more test cases, on github.

Go to the full post

CURLOPT_WRITEFUNCTION and C++

Even though I am developing for C++, I don't use the curlpp wrapper to libcurl, the well known file transfer library, preferring to access its bare C interface. I feel more comfortable in this way, still there are a few low level details that require to be explicitly considered. For instance, the CURLOPT_WRITEFUNCTION option accepts as parameter only addresses to free functions (or static member functions). When we want to use a (non static) member function, we have to be prepared to deal with a certain amount of ugliness.

I'd like Curl to put the data it fetches (for more details, please have a look to the previous post where I talked about the plain Curl setup) in a (non static) member variable of the same class from which curl_easy_perform() is called. Something like that:
class CurledClass
{
  // ...

private:
  std::string data_; // 1

  CURLcode curling(/* ... */)
  {
    CURL* curl = curl_easy_init();

    // ...

    CURLcode code = curl_easy_perform(curl); // 2
    curl_easy_cleanup(curl);

    return code;
  }
};
1. I want to store the answer I get from Curl in this data member.
2. Calling Curl to perform the job.

As I said before, we can't pass a pointer to a non-static member function to CURLOPT_WRITEFUNCTION, but we pass a pointer to a static function, but we can ask Curl to pass an extra parameter to that function. We can let that parameter, CURLOPT_WRITEDATA, to be the pointer to this object, so that our static function could actually call a non-static member function:
CURL* curl = curl_easy_init();

// ...
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, &CurledClass::write); // 1
curl_easy_setopt(curl, CURLOPT_WRITEDATA, this); // 2
// ...

data_.clear(); // 3
// ...
curl_easy_perform(curl);
1. I specify the static member function I want to be called.
2. I put in the CURLOPT_WRITEDATA option the pointer to the current CurledClass object
3. It is usually a good idea to cleanup the buffer before using it.

The static function specified as CURLOPT_WRITEFUNCTION now just acts like an adapter to the member function I want to use instead:
class CurledClass
{
  // ...
  static size_t write(void* buf, size_t size, size_t nr, void* self) // 1
  {
    return static_cast<LbsTest*>(self)->mWrite(static_cast<char*>(buf), size, nr); // 2
  }

  size_t mWrite(char* buf, size_t size, size_t nr) // 3
  {
    data_.append(buf, size * nr); // 4
    return size * nr;
  }
};
1. Notice the last parameter, that I called "self". It stores the value we have put to CURLOPT_WRITEDATA.
2. The static function write() simply calls the member function mWrite(). To do that it (unsafely) casts the self parameter to this, and the first parameter to a pointer to char..
3. Here is the member function that does the job of copying the data from the Curl buffer to the local one.
4. Remember that Curl could call many times the CURLOPT_WRITEFUNCTION, and we are expected to splice the passed data to build the actual result. I am using a standard STL string as buffer, so I just append the new data as I get it.

Go to the full post

Assert death and deduction on Osmos

The Code Jam Round 1B 2013 Problem A is nicknamed Osmos. Writing googletest test cases for it, I found a way to call once the ASSERT_DEATH macro. Implementing the function, I have also used the template argument deduction. Hence the fancy name for this post.

We have in input an integer representing the weight of our mote (think of a mote like a greedy little sort-of-living organism), and a bunch of integers representing all the other motes in the neighborhood. We want our mote to survive. To do that, it has to absorb all the smaller motes (growing its weight in the process). We could help it in two ways, adding motes that it could swallow to increase its size, and removing motes that are too big for it. Besides, we are lazy. We want to minimize the number of our adding/removing actions.

Test cases would help to clarify the matter. I am using C++11 by GCC 4.8 on a Linux machine. Test cases are written in the xUnit way, using the gtest framework.
unsigned osmos(unsigned myMote, std::vector<unsigned>& motes); // 1

TEST(TestOsmos, CaseSample1) // 2
{
  const unsigned myMote = 2;
  std::vector<unsigned> motes { 2, 1 };

  ASSERT_EQ(0, osmos(myMote, motes));
}

TEST(TestOsmos, CaseSample2) // 3
{
  const unsigned myMote = 2;
  std::vector<unsigned> motes { 2, 1, 1, 6 };

  ASSERT_EQ(1, osmos(myMote, motes));
}

TEST(TestOsmos, CaseSample3) // 4
{
  const unsigned myMote = 10;
  std::vector<unsigned> motes { 25, 20, 9, 100 };

  ASSERT_EQ(2, osmos(myMote, motes));
}

TEST(TestOsmos, CaseSample4) // 5
{
  const unsigned myMote = 1;
  std::vector<unsigned> motes { 1, 1, 1, 1 };

  ASSERT_EQ(4, osmos(myMote, motes));
}

TEST(TestOsmos, CaseBigger) // 6
{
  const unsigned myMote = 42;
  std::vector<unsigned> motes { 1, 40, 12, 7 };

  ASSERT_EQ(0, osmos(myMote, motes));
}

TEST(TestOsmos, CaseMixed) // 7
{
  const unsigned myMote = 2;
  std::vector<unsigned> motes { 8, 15, 30, 40, 60 };

  ASSERT_EQ(3, osmos(myMote, motes));
}

TEST(TestOsmos, CaseBadMote) // 8
{
  const unsigned myMote = 0;
  std::vector<unsigned> motes { 8, 15, 30, 40, 60 };

  ASSERT_DEATH(osmos(myMote, motes),"Assertion .* failed.");
}
1. My osmos function gets in input my mote weight, and all the other motes ones. It gives back the number of adjustments I had to perform to the environment to help my mote. It is not clearly stated in the problem, but any mote weight shouldn't be less than one, and we can assume it would fit in an int.
2. There are four examples provided with the original problem, I have converted them in test cases. In the first one we expect our mote (sized two) to happily eat its smaller brother (one) so to get bigger enough (three) to eat the other one (sized two). No help is required from us, so the function should return zero.
3. My mote has no problem (even no ethical ones) to eat the tiny twins weighting one each, getting in this way big enough to eat the mote two, but it can't eat its brother six. We need to interfere once, either to remove six from the list, or to add a smaller mote so that our mote could absorb it, growing big enough to complete its job.
4. Eat nine, become nineteen. It is not enough to eat twenty, so I provide a eighteen mote, that would immediately be eaten by my mote that grows to a spectacular thirty seven. Eat twenty, twenty five, becomes eighty three. Again not enough to assimilate one hundred, so I need to help a second time, removing the big fish or adding another small mote to the list.
5. My mote is so small that it can't eat any other mote. I can only remove all the competitors.
6. I added a few more test for cases that jumped to my mind, here is the first one. If my mote is the biggest mote in town, there is no game for the other ones.
7. Similar to CaseSample3.
8. What if the user insert a ethereal mote with no weight? That should never occur, so I think it is right to assert on the mote value. That means that I am so unhappy of such an input that I terminate the program as soon as I spot it. You know as an assertion works, if it fails it terminates the execution outputting a string. The ASSERT_DEATH ensures termination and tests the returned string against the passed regular expression. Here I'm saying that I expect "Assertion (whatever) failed.", with anything instead of "(whatever)", accordingly to the compiler you are using you could get a different result.

There is a tiny nuisance here. On many platform googletest (at least up to version 1.6) issues a warning when calling ASSERT_DEATH, saying that it can't detect the current number of threads. You can easily get rid of this message, at least if you are working on Linux, following a suggestion you can find on stackoverflow.

And here is how I have implemented the function:
unsigned osmos(unsigned myMote, std::vector<unsigned>& motes)
{
  assert(myMote); // 1

  if(myMote == 1) // 2
    return motes.size();

  std::sort(motes.begin(), motes.end()); // 3

  if(myMote > motes.back()) // 4
    return 0;

  unsigned removing = motes.size(); // 5
  unsigned adding = 0; // 6
  for(unsigned i = 0; i < motes.size(); ++i) // 7
  {
    if(myMote <= motes[i]) // 8
    {
      removing = std::min<unsigned>(removing, adding + motes.size() - i); // 9
      while(myMote <= motes[i]) // 10
      {
        myMote += myMote - 1;
        ++adding;
      }
    }
    myMote += motes[i]; // 11
  }

  return std::min(removing, adding); // 12
}
1. It just can't happen that my mote is zero. If I detect it, something completely crazy is happening here, I don't know what to do anymore, and I do not expect the caller, its caller, any grand-caller knowing what to do. I do not have any alternative to terminate here the program execution. If you see it as a bit of an overstating, you would probably throw an exception instead.
2. My mote is so small, it can't ever eat anything. I have only one way of completing the job, removing all the motes from the passed collection. So I return the collection size.
3. Having the motes ordered by size, I could more efficiently determine which one my mote can eat. It costs a O(N log N) time complexity, but it is worthy.
4. Check the right-side mote in the collection (after sorting, the biggest one) against my mote. If my mote is bigger, it would surely eat all of them in a whiff, with no need of any help from my side.
5. Worst case scenario, I would need to remove all the motes to let my mote to win.
6. Best case scenario, I don't have to add anything.
7. Let's loop on all the motes.
8. My mote is not big enough to eat the current guy. We need to check if it is cheaper to add smaller motes or remove it.
9. The motes that I still have to process are the size of the collection minus the current position. In the worst case I should assume I have to remove all of them. Besides, I should remember that I could have already added a few motes to arrive in this position. I am interested in the less expensive solution, so I compare the previous worst case with the current one, and I choose the smallest one.
To make my choice, I use the STL min() template function. Usually I don't need to specify the parameter type, because the compiler is smart enough to deduct it automatically. But this is not the case, so I need to explicitly pass it the type I want to use.
10. Calculate how many motes I have to feed to my mote to make it big enough.
11. In any case my mote eats the current mote, growing up.
12. Let's compare if it is cheaper to remove or add motes, and return it. In this case the call to min() doesn't need any hint to understand that I want to use the unsigned int version.

Full C++ source code on github.

Go to the full post

Skipping missing element with RapidJson

In a previous post we have seen how by default RapidJson checks data coherence with C-style assertion, and how we can change this behavior to let it throw an exception instead.

Sometime throwing an exception is an overkill, and asserting should just be avoided. Let think about the case of accessing an element on the JSON document that is not mandatory. If it is there, fine, we can use it, otherwise we should simply skip it. Nothing exceptional or catastrophic is implied by not being it there.

The official RapidJson User Guide, shows as how to use the FindMember() method to get an element or a nullptr in case it is missing. But this is a way you can go through in RapidJson 0.2, that is not currently available to download. If you are using RapidJson 0.1x, you should think to something different.

In my JSON there is an array element named ds, that could contain a few elements. One of them is named "a", and should be a string.

Here is my assertion/exception-free code for this case:
const rapidjson::Value& ds; // 1

// ...

for(rapidjson::SizeType i = 0; i < ds.Size(); ++i) // 2
{
  if(ds[i]["a"].IsNull() || !ds[i]["a"].IsString()) // 3
  {
    // 4
  }
  else
  {
    std::string a(ds[i]["a"].GetString()); // 5
    // ...
  }
}
1. I'll fetch in the variable ds the content of the JSON "ds" element, ensuring it is an array.
2. Loop on all the ds elements.
3. In RapidJson 0.1x, trying to access a non-existing element we get a reference to a null-value singleton. If this is the case, the IsNull() method returns true. Since I expect my "a" element to be a string, I also call IsString() to ensure that.
4. If the "a" element is not there, or if it is not a string, I could take some alternative action, maybe logging some message, if I think the user should be aware of that.
5. Otherwise it is safe to access the "a" element, get its value as a C-string, and use it.

Go to the full post