Codility equi

Interviews for software developer positions are a sort of Russian roulette, you never know which kind of question could be shoot at you.
Codility is a site claiming to "filtering out job candidates who cannot write correct programs". I'd say they have a too high opinion of themselves, they just filter candidates who are not accustomed to their tests. In any case, it is worthy to check them out.

There is a training page on Codility, where a few sample problem are listed, and you can try to solve them. I have checked out their Equi one or, as they call it, the mother of all tasks. As I already noted, they are on the overstating side of life.

My solution is for C++ but, being a conceptually simple problem, could be easily ported to many other programming languages.

In half an hour, we have to write a function that takes in input a STL vector of int and returns, again as an int, its "equilibrium index", if such a beast exists, otherwise -1.

That index is the position of an element such that the sum of the elements on its left would match the sum of the ones on its right. So, for instance, passing { 1, 2, 42, 1, 1, 1 } we should get back 2, since 1 + 2 equals 1 + 1 + 1. Simple, isn't it? There are a couple of things we should pay attention to, though. Most interestingly, they say that the "expected worst-case time complexity is O(N)", alluding to the fact that we can get in input a vector with a size in the order of millions, and that would lead a Big-Oh N Square algorithm to fail miserably.

As any TDD developer, I started writing a fake implementation for the required function, and a few test cases. Something like this:
int equi(const std::vector<int>& input)
{
  if(input.empty()) // 1
    return -1;

  // 2

  return -1;
}

TEST(TestEqui, CaseSimple) // 3
{
  std::vector<int> input = { 1, 2, 42, 1, 1, 1 };
  int pivot = equi(input);
  ASSERT_EQ(2, pivot);
}

TEST(TestEqui, CasePlain) // 4
{
  std::vector<int> input = { -7, 1, 5, 2, -4, 3, 0 };
  int pivot = equi(input);
  ASSERT_EQ(3, pivot);
}

TEST(TestEqui, CaseNot) // 5
{
  std::vector<int> input = { 7, 1, 5, 2, 4, 3, 0 };
  int pivot = equi(input);
  ASSERT_EQ(-1, pivot);
}

TEST(TestEqui, CaseBigNum) // 6
{
  std::vector<int> input = { INT_MAX, 1, 0, INT_MIN };
  int pivot = equi(input);
  ASSERT_EQ(-1, pivot);
}
1. If I get in input an empty vector, there is nothing to do, return the "NOT FOUND" value.
2. Here will go the meat.
3. The test case I have informally described above.
4. Test case gently provided by the Codility guys.
5. Let's check also a case where the result should be "NOT FOUND".
6. I have found out that Codility problems are often fond of paying attention to limit cases. Here I check the case of overflow.

It is difficult to test the time complexity for this problem, I should have written a test case where the input vector had a few millions elements, and I should have put a time constrain on it. Kind of an overkill in this situation. Considering that we are given a mere half an hour to provide a solution, we have no time for it. We should rely instead on our software designer common sense.

Firstly, I wrote an intuitive solution, that would work fine in normal cases, but would fail if we push a bit on it:
typedef std::vector<int>::size_type SIZE;

int equi(const std::vector<int>& input) // !!! SLOPPY VERSION !!!
{
  if(input.empty())
    return -1;

  for(SIZE index = 0; index < input.size(); ++index) // 1
  {
    int left = 0; // 2
    for(SIZE i = 0; i < index; ++i)
      left += input[i];

    int right = 0; // 3
    for(SIZE i = index + 1; i < input.size(); ++i)
      right += input[i];

    if(left == right) // 4
      return index;
  }
  return -1;
}
1. We are looking an the index of the element in the vector. So I loop on all of them.
2. Let's sum up all the element on the left of the candidate index, then (on point 3) I'll get the right sum, and finally (on point 4) I'll compare them.
3. I have two for loops inside another for loop. The external loop has time complexity O(N), both internal one O(N/2). The result is a Big Oh N Square time complexity algorithm. No problem at all for small N's, but we can foresee a catastrophe for large N's.
4. Notices that I stored the left and right values in plain int's. This is alright for normal cases, but what if I play with big numbers? As the test CaseBigNum will show, overflow is lurking behind the internal loops.

One could ask, why bother with such a puny solution? Wouldn't be better to go straight to a more robust one? Well, a better solution could not exist, or just we could come up with it in a reasonable time. And a solution that works in a fair number of cases is usually considered better than nothing at all. Besides, this first puny solution helped me to write a few test cases, leading to a better knowledge of the problem.

We need to refactor our solution. First issue is easily solved, instead of int, we should use larger data type to store the sums of elements. The biggest standard C/C++ integer type is "long long", let's go for it.

Second issue is tougher. We don't want a loop in a loop, that's for sure. That means that we need to change approach, we need to be smarter. Instead of thinking the vector like a fixed single collection of elements, let's think it as two changing collections. Initially, lefty is empty and righty contains all the elements but the first. We compare them, if they are not matching, we pass considering the next element:
int equi(const std::vector<int>& input)
{
  if(input.empty())
    return -1;

  long long sum = 0; // 1
  for(SIZE i = 0; i < input.size(); ++i) // 2
    sum += input[i];

  long long left = 0;
  long long right = sum;
  for(SIZE pivot = 0; pivot < input.size(); ++pivot) // 3
  {
    right -= input[pivot]; // 3
    if(left == right)
      return pivot;
    left += input[pivot]; // 4
  }

  return -1;
}
1. Notice that I'm using long long.
2. First O(N) loop, I get the sum of all the elements in the collection.
3. Another O(N) loop. We could expect to usually find the pivot in the middle of our collection, so we can estimate an average O(N/2) time complexity for this step.
4. Initially, right is the total sum, each time I check a pivot, I remove its value from the right collection.
5. I'm shifting to the right, let's insert the previous candidate pivot to the left collection.

We still have two loops, in the worst case both O(N), but now they are on the same level, we have to sum, and not multiply them to get the function complexity. Meaning that we can officially enjoy an O(N) solution.

2 comments: