This Thread: December 2013

Dominator by golden leader

I want to check if in a collection of integers there is a value that occurs more than half the times. I have already given a solution to this problem in the previous post, but I need something smarter, linear in time and constant in space complexity.

To achieve this result I am about to implement an algorithm based on the discussion that you can read in a paper available on Codility. Look at the end of that document, for a Python function named goldenLeader.

In a few words, the idea is that if a dominator exists, we could discard couples of different elements, and in the end, we should ends up with having spared at least one dominator element. If no dominator exists, we'll find out we have throw away all the elements looking for it.

However, our function should not return just the dominant value, but an index of the original vector that refers to an element with such a value. So we need to twist a bit that algorithm.

typedef std::pair<unsigned, int> ValInd; // 1

int solution(const std::vector<int>& input)
{
    int count = 0; // 2
    ValInd candidate; // 3
    for (unsigned i = 0; i < input.size(); ++i)
    {
        if (count == 0) // 4
        {
            count = 1;
            candidate.first = input[i];
            candidate.second = i;
        }
        else // 5
        {
            count += (candidate.first == input[i]) ? 1 : -1;
        }
    }

    if (count == 0) // 6
        return -1;

    if (std::count(input.begin(), input.end(), candidate.first) > (int) input.size() / 2)
        return candidate.second;

    return -1;
}

1. I need to remember the current value and the index where I have found it.
2. I am going to count how many elements, supposedly being dominators, I have found that are not yet matched with non-dominator. I could have pushed them on a stack, but it is obviously cheaper doing in this way.
3. Here I am going to store the current candidate.
4. No previous candidate survived the looping, the current element is chosen.
5. I have a candidate, I compare it against the current element. If they have the same value, I increase the counter, otherwise I decrease it.
6. If there is no pending candidate, I am sure there is no dominant value. Otherwise I count how many elements with the candidate value are in the input collection. If they are enough, bingo.

Go to the full post

Dominator

We have a collection of integers (whichever 32 bit signed int is possible, and up to one million of them), we want to know if there is a dominant value in it. Meaning, a value that occurs more than half the times. If so, the index of one of such item (anyone would do) should be returned. Otherwise a -1 would mean "no dominant value".

To make things a bit more spicy, linear time and constant space complexity is required.

I found this problem in the Codility train page, section Leader, under the nickname Dominator. There you would find more details and the free words description of the test case I have here translated for C++ and GoogleTest:

int solution(const std::vector<int>& input);

TEST(Domi, Given)
{
    int h[] = { 3, 4, 3, 2, 3, -1, 3, 3 };
    std::vector<int> input(h, h + 8);

    ASSERT_EQ(3, input[solution(input)]);
}

More than half the elements in the collection have value 3, so I expect as output the zero-based index of an element whose value is 3.

Not considering the space complexity requirement, is not difficult to find an (almost) suitable solution.

What I would do, is creating a map where the values of the input vector are associated to a counter that keep track of their sequence. Or better, since we want to know also the index of one of such elements, besides the counter I am going to store also the first index I get.

Then it would be just a matter of scanning the map.

Here is a possible C++98 implementation:

typedef std::pair<unsigned, unsigned> CountInd; // 1
typedef std::map<unsigned, CountInd> Counter; // 2

int solution(const std::vector<int>& input)
{
    Counter counter;
    for (unsigned i = 0; i < input.size(); ++i)
    {
        Counter::iterator pos = counter.find(input[i]);
        if (pos != counter.end()) // 3
            ++pos->second.first;
        else
            counter[input[i]] = std::pair<unsigned, unsigned>(1, i);
    }

    for (Counter::const_iterator it = counter.begin(); it != counter.end(); ++it)
    {
        if (it->second.first > input.size() / 2)
        {
            return it->second.second;
        }
    }

    return -1;
}

1. The value in the map would be a pair of frequency and index in the original vector.
2. I don't care about keeping the elements in the map ordered. If I were allowed to use C++11, the STL unordered_map would have been a much better choice. I am paying in term of time complexity something that I am not using. On the other hand, defining a custom hash map for this problem would be kind of an overkilling.
3. If I have already added an element in the map for the current value, I have just to increase its counter. Otherwise I add a new element where the key is the current input value, and the value is a pair having as first the current counter for this element (one), and as second the index to the current element in the input vector.

This solution scores full marks on Codility, but we know that is not the right answer. Time complexity is so close to be linear that their tested didn't detect the logarithmic multiplicator caused by the STL map. The real issue is the space complexity (even though not detected, too) that is linear instead of constant.

To achieve this result, we could apply an astute algorithm that you could see described in this linked document from Codility, where is given in a Python implementation in a function called goldenLeader.

I am going to use this hint to refactor my solution.

Go to the full post

Fish eat fish - linear solution

A bunch of cannibal fishes move up and downstream in a narrow river following the rules that I explained in the previous post. There I also showed a few test cases (C++ and GoogleTest) and a suboptimal first solution. Here I present a better algorithm that achieves an asymptotic linear time complexity even in the worst case.

The previously shown solution has the issue of repeating on and on checks on the same couple of fishes. We can avoid this delaying the comparison until it becomes a strict necessity.

The trick that I use here is avoid doing anything with any fish going downstream but storing them in a stack. The real job is done for each fish going upstream. If there is no one coming from the other direction, it survives for sure, otherwise it should decide its fate with a comparison against all the fishes previously stored in the stack, starting from the one closest to it (that's why a stack is a good choice as container).

Here is how I implemented this algorithm:

int solution(std::vector<int>& sizes, std::vector<int>& directions)
{
    int count = 0; // 1
    std::stack<int> goingDown; // 2
    for (unsigned i = 0; i < sizes.size(); ++i)
    {
        if (directions[i] == 1) // 3
        {
            goingDown.push(sizes[i]);
            continue;
        }

        if (goingDown.empty()) // 4
        {
            ++count;
            continue;
        }

        while (!goingDown.empty()) // 5
        {
            if (sizes[i] > goingDown.top()) // 6
                goingDown.pop();
            else
                break;
        }
        if (goingDown.empty()) // 7
            ++count;
    }
    return count + goingDown.size(); // 8
}

1. I don't need anymore to store the state of each fish, I could just keep track of the number of fishes going upstream that survived.
2. Here I push each fish that is going downstream.
3. The current fish is going downstream, I just have to push it in the stack and go to the next iteration.
4. The current fish is going upstream. If there is no previous fish moving in the other direction, he made it. I increase the counter and go to check the next one.
5. Otherwise, the current fish has to prove to be bigger than any fish in the stack to survive.
6. If the current fish is bigger that the current top in the stack, pop it (that is, eat it). Otherwise, it is the one to be eaten, and I should stop looping on the stack.
7. If the stack of downwards-going fishes is empty, this means that the current fish has eaten them all up (or it was so lucky to find no traffic). So I can increase the counter.
8. In "count" I have the number of all the upwards-going fishes that made it. I just have to add to it the numbers of the downwards-going fishes to get their current total number.

The C++ source code for both solutions and relative test cases is on github.

Go to the full post

Fish eat fish

We have a bunch of fishes (there could be up to 100,000) in a very narrow river, each of them with a different weight (represented by a non negative integer up to one billion). Any single fish move upstream or downstream (0 means up, 1 down). If two fishes meet (one moving up, the other moving down) the smaller one is eaten by the bigger. How many fishes are going to survive?

You can find this problem in the Codility train page, under the Stacks and Queues section, nickname Fish.

To better think about a solution, I have taken the one proposed in the problem description and converted it to a GoogleTest for C++, then adding the ones that looked interesting to me. Here is a selection of them:

int solution(std::vector<int>& sizes, std::vector<int>& directions);

TEST(Fish, Given) // 1
{
    int a[] = { 4, 3, 2, 1, 5 };
    std::vector<int> sizes(a, a + 5);

    int b[] = { 0, 1, 0, 0, 0 };
    std::vector<int> directions(b, b + 5);

    ASSERT_EQ(2, solution(sizes, directions));
}

TEST(Fish, SameDirection) // 2
{
    int a[] = { 4, 3, 2, 1, 5 };
    std::vector<int> sizes(a, a + 5);

    std::vector<int> directions(5);

    ASSERT_EQ(5, solution(sizes, directions));
}

TEST(Fish, Mixed) // 3
{
    int a[] = { 4, 3, 2, 1, 5 };
    std::vector<int> sizes(a, a + 5);
    int b[] = { 0, 1, 0, 1, 0 };
    std::vector<int> directions(b, b + 5);

    ASSERT_EQ(2, solution(sizes, directions));
}

TEST(Fish, NoMeeting) // 4
{
    int a[] = { 4, 3, 2, 1, 5 };
    std::vector<int> sizes(a, a + 5);
    int b[] = { 0, 0, 0, 1, 1 };
    std::vector<int> directions(b, b + 5);

    ASSERT_EQ(5, solution(sizes, directions));
}

TEST(Fish, Pilot) // 5
{
    int a[] = { 1, 5, 3, 4, 2 };
    std::vector<int> sizes(a, a + 5);
    int b[] = { 1, 1, 0, 0, 0 };
    std::vector<int> directions(b, b + 5);

    ASSERT_EQ(2, solution(sizes, directions));
}

1. All the fish is going upstream but the second one, that weights 3. It is going to eat the third and forth, but it is going to be eaten by the fifth. In the end just two fishes are going to survive, the first and the last one.
2. The fish is compactly moving upstream, all of them should survive.
3. Mixed situation, only two fishes are going to survive.
4. The school is parted in two groups that are not meant to cross.
5. This is the one I found most interesting. The first two fishes are moving downwards, against the other three ones. The first one is so thin that would usually have no chance to survive, but it has just after it a big brother that is going to shield it to any attack. So, against all odds, it survives too.

Then I have written my first solution, based on the idea of checking each element against any other fish it is going to meet, keep track of their status in a vector of boolean and then sum up all the surviving fishes.

int solution(std::vector<int>& sizes, std::vector<int>& directions)
{
    std::vector<bool> alive(sizes.size(), true); // 1

    for (unsigned i = 0; i < sizes.size(); ++i) // 2
    {
        if (directions[i] == 0) // 3
        {
            for (int j = i - 1; j >= 0; --j)
            {
                if (directions[j] == 1)
                    alive[sizes[i] < sizes[j] ? i : j] = false;
                if (sizes[i] < sizes[j])
                    break;
            }
        }
        else // 4
        {
            for (unsigned j = i + 1; j < sizes.size(); ++j)
            {
                if (directions[j] == 0)
                    alive[sizes[i] < sizes[j] ? i : j] = false;
                if (sizes[i] < sizes[j])
                    break;
            }
        }
    }

    return std::accumulate(alive.begin(), alive.end(), 0); // 5
}

1. Initially, are the fishes are alive.
2. I am checking any fish against the ones that are upwards or downwards, accordingly to its moving direction.
3. My current fish is moving upstream. Let's compare it against each fish on its left, starting from the closest one. I break the loop in case it gets eaten by a bigger fish.
4. Same as above, but the current fish has to fight against all the other guys downstream.
5. Finally it is just a matter of counting how many of them are still alive.

Normally this solution works fine, however in the worst case scenario the internal loop involves a number of fish in the order of the collection size, leading to an O(N**2) time complexity that makes this algorithm unacceptably slow.

Think for instance to this innocuous-looking test case:

TEST(Fish, Big)
{
    std::vector<int> sizes(100000);
    for(unsigned i = 0; i < sizes.size(); ++i)
        sizes[i] = i + 1;
    std::vector<int> directions(100000);

    ASSERT_EQ(100000, solution(sizes, directions));
}

One hundred thousand fishes, all of them going in the same direction. A human being could give the answer in the blink of an eye. The above solution, however, painfully check all the right elements to any fish in the school.

Can we do better than this? Yes. As you would see in the next post, it is just a matter of adding a bit of intelligence to the algorithm to get, even in the worst cases, a linear(ish) time complexity.

Go to the full post

Narcissistic numbers

You could find a thorough description of narcissistic number, also known as Armstrong number, perfect digital invariant, plus perfect number, on Wolfram Mathworld. In a few words, a narcissistic number is an n-digit numbers that equals the sum of all its digits at the nth power.

We want to write a function that checks if a given number respects this definition.

It is easy to see as any positive one-digit number is narcissistic, and the Wolfram page I linked above provides a list of other narcissistic numbers. I used them to write these test cases (in C++ for GoogleTest):

bool isNarcissistic(const std::string& input);

TEST(Narcissus, Simple)
{
  ASSERT_TRUE(isNarcissistic("2"));
  ASSERT_FALSE(isNarcissistic("42"));
}

TEST(Narcissus, Three)
{
  ASSERT_TRUE(isNarcissistic("153"));
  ASSERT_TRUE(isNarcissistic("370"));
  ASSERT_TRUE(isNarcissistic("371"));
  ASSERT_TRUE(isNarcissistic("407"));
  ASSERT_FALSE(isNarcissistic("497"));
}

TEST(Narcissus, Eight)
{
  ASSERT_TRUE(isNarcissistic("24678050"));
  ASSERT_TRUE(isNarcissistic("24678051"));
  ASSERT_TRUE(isNarcissistic("88593477"));
  ASSERT_FALSE(isNarcissistic("88583475"));
}

Here is a possible implementation for such function:

bool isNarcissistic(const std::string& input)
{
  assert(!input.empty()); // 1
  if(input.size() == 1) // 2
    return input[0] != '0';

  int powered = 0; // 3
  for(unsigned i = 0; i < input.size(); ++i)
    powered += std::pow((input[i] - '0'), input.size());

  return powered == std::atoi(input.c_str()); // 4
}

1. This should never happen. It should be user code responsibility ensure that the passed input parameter is not empty. Production code should also ensure that the input strings contains only decimal digits.
2. All one-digit numbers are accepted, with the noteworthy exception of zero.
3. Each digit in the number are elevated to the expected power (third, if there are three digits in the number) and the result is summed up.
4. Compare the original value with the calculated one, and return true if they are the same.

Go to the full post

Self-descriptive numbers

Write function that check if its input represents a self-descriptive base-10 integer.

A number is said self-descriptive if its digit at the i-th position is the counter for the i-digit in the number itself.
21200 is self-descriptive, since it has 2 zeros, 1 ones, 2 twos, 0 threes, and 0 fours.

Here is a possible C++ implementation:

bool isSelfDescr(const std::string& input)
{
  if(input.empty() || input.size() > 10) // 1
    return false;

  for(unsigned i = 0; i < input.size(); ++i) // 2
    if(std::count(input.begin(), input.end(), '0' + i) != input[i] - '0')
      return false;

  return true;
}

1. I am expecting a 10-based number, that means it can't have more than 10 digits - and it shouldn't be empty.
2. For each digit in the number, I count how many of them are in the string, using the STL algorithm count(), and I compare the result against the number in that position.

Go to the full post

Happy numbers

In the Doctor Who episode "42" (the seventh of the third modern series), the tenth Doctor explains that a (natural positive) number that reduces to one when you sum up the square of its digits and continue iterating it until it yields one is said to be happy. If this description doesn't look so descriptive to you, there is a page about happy numbers on wikipedia that looks interesting.

We can check if a number is happy with a C++ function like this one:

bool isHappy(unsigned value)
{
  std::set<unsigned> numbers; // 1
  numbers.insert(1); // 2
  while(numbers.find(value) == numbers.end()) // 3
  {
    numbers.insert(value); // 4
    int next = 0;
    while(value > 0)
    {
      int digit = value % 10;
      next += std::pow(digit, 2); // 5
      value /= 10;
    }
    value = next;
  }

  return value == 1; // 6
}

1. A sad number would enter in a hopeless infinite loop. To detect it, we should keep memory of the states it has already passed, so that we can check them.
2. We could stop transforming the input number when we get a value that we have already seen (sad), of when we get 1 (happy).
3. If the current value is not in the buffering set, we iterate once again.
4. Add the current value to the "already seen" set, and calculate the next state.
5. Each digit is squared and added.
6. If value has reached 1, we have a happy number.

Go to the full post

String tolower()

The problem here is converting an entire C++ string to lower (or upper) case.

The raw part of the job is done by the toupper/tolower ctype functions in the standard library. The point is that those functions work on a single character, we should find a way of iterating on the entire string.

We could just loop from the beginning to the end of the string, applying tolower() to each element in the sequence. Something like this:

for(std::string::iterator it = line.begin(); it != line.end(); ++it)
  *it = std::tolower(*it);

This implementation is alright, still we could avoid to explicitly do some work that nothing add as a value to our code, as declaring the iterator and a for-loop block.

We could use instead the STL transform() algorithm, that hides the for-block and its loop variable, asking as input the begin-end iterators for the input sequence, the begin iterator to the output one, and the operation it has to apply to transform it.

Here we are doing an in-place transformation, so input and output are the same.

Nice and plain. There is a tiny nuisance, though. There could be a name-clash on tolower/toupper, since besides the ctype version, there is also a localized version, defined in the locale include. So we can't simply say to the compiler we want to use the standard tolower function, we need to specify which tolower is needed. In our case, the one that expects an int as both input parameter and return value:

std::transform(line.begin(), line.end(), line.begin(), static_cast<int(*)(int)>(std::tolower));

Go to the full post

Bit check

Given an unsigned int and two numbers representing the one-based index of bits (from the least significative one) in that word, write a function that returns true if those bits have the same value.

This problem is based on the CodeEval Bit Positions challenge.

A test case (C++ for GoogleTest) should be enough to clarify the requirements:

TEST(BiPo, CaseGiven)
{
  EXPECT_TRUE(solution(86, 2, 3)); // 1
  EXPECT_FALSE(solution(125, 1, 2)); // 2
}

1. 86 base ten is 1010110 base two. Its second and third bit from left are both set to 1. We expect our function to return true.
2. 125 base ten is 1111101 base two. Its first bit is 1 and the second is 0. Therefore the function should return false.

Here is my C++ implementation:

bool solution(unsigned word, int p1, int p2)
{
  bool one = word & static_cast<int>(std::pow(2, p1-1)); // 1
  bool two = word & static_cast<int>(std::pow(2, p2-1));

  return !((one) ^ (two)); // 2
}

1. I convert the passed index in the associated binary value. If I had 3 in input, this mean that I want to check the third bit, whose binary value is 100, a decimal four. Then I apply a bitwise-and (the operator &) to the input word and the converted bit value. Its result will be true if the input word value has that bit up.
2. I want to return true if the selected bits (one and two) have the same value. To do that I apply a bitwise-exclusive-or (operator ^) to the bit flags. It would return true if the two flags are mutually exclusive (if one is true, the other is false), that is the exact opposite of what I want the function return. So it is just a matter of negating it (operator !).

The original CodeEval problem asks to output a string ("false" or "true") accordingly to the boolean value we get. This could be easily achieved through the ios boolalpha manipulator:

std::cout << std::boolalpha << solution(n, p1, p2) << std::endl;

Go to the full post

Writing backwards

We want to write a function that reverse the words in input. Just the order of words is reverted, not their actual structure. If we get in input "hello" and "world" we want in output "world hello" and not "dlrow olleh". Notice that there is a blank between each couple of words, but not at the end of the sentence.

This problem is based on the CodeEval Reverse world challenge.

A few test cases (C++11 for GTest as xUnit framework) would clarify what I'd expect from that function:

TEST(ReWo, Given) // 1
{
  ASSERT_EQ(solution( { "Hello", "World" } ), "World Hello");
}

TEST(ReWo, OneWord) // 2
{
  ASSERT_EQ(solution( { "Hello" } ), "Hello");
}

TEST(ReWo, Empty) // 3
{
  ASSERT_DEATH(solution( { } ),"Assertion .* failed.");
}

1. Vanilla case, the words in input are combined in a string from the last one up.
2. A single word in input should not break the algorithm.
3. It is a caller responsibility to ensure that at least a word is passed. To make this requirement explicit in the code, I want my function to assert the input being not empty. I could have been less strict, and simply return an empty string, but I wanted to show how to use the GoogleTest ASSERT_DEATH macro. More on the same topic on a previous post.

Here is how implemented in plain C++98 the function:

std::string solution(const std::vector<std::string>& input)
{
  assert(!input.empty()); // 1

  std::ostringstream oss; // 2
  for(std::vector<std::string>::const_reverse_iterator it = input.rbegin(); it != input.rend() - 1; ++it)
    oss << *it << ' '; // 3
  oss << input[0]; // 4

  return oss.str(); // 5
}

1. The input vector can't be empty.
2. I cared more about readability than performances here. Otherwise I would have be tempted to use a plain STL string, reserving to it enough memory on creation (summing up all the string sizes plus the number of blanks).
3. Put to the string stream all the elements in the vector followed by a blank, starting from the last one (reverse begin) up the the second (the one before the reverse end). We have asserted that the vector has at least an element, so know that this loop works fine. In case there is only one string in input, this block is simply skipped.
4. The first element in input (possibly the only one), is put to the string stream as last token. No trailing blank, as required.
5. Extract the resulting string from the stream and return it.

Go to the full post

Repeat pattern

Given a vector containing an arbitrary numbers of integers, check if there any repeat pattern in it.

I found this problem in CodeEval, where is marked with the slightly misleading name of Detecting Cycles (usually we talk about detecting cycles in the domain of graphs, here we have to do with a plain vector). There you could also find a test case that helps to clarify what our function has to do. I have rewritten it in C++11 for the GoogleTest framework:

std::vector<int> solution(const std::vector<int>& input);

TEST(DeCy, Given)
{
  std::vector<int> result = solution( {2, 0, 6, 3, 1, 6, 3, 1, 6, 3, 1} );

  ASSERT_EQ(3, result.size());
  ASSERT_EQ(6, result[0]);
  ASSERT_EQ(3, result[1]);
  ASSERT_EQ(1, result[2]);
}

As you see, here there is a repeat pattern of three integers (3, 1, 6). Our function should detect it and return it.

Our input could be much more complicated, so I have written a few more test cases, but I spare you. However you should notice that the minimal case is when the pattern consist of just a number that appear twice.

I have interpreted the problem request in the most natural way, so that I can split it in two logical parts. Firstly I check if there is at least a repeated integer in the sequence. If I find it, I check how long is the pattern, filling a vector with each matching element.

This algorithm scales terribly, in the worst case it has a cubic (!) asymptotically time complexity. Any improvement suggestion is welcomed. In the meantime, here is my C++98 implementation:

std::vector<int> solution(const std::vector<int>& input)
{
  if(input.size() < 2)
    return std::vector<int>(); // 1

  for(std::vector<int>::const_iterator first = input.begin(); first != input.end() - 1; ++first) // 2
  {
    for(std::vector<int>::const_iterator second = first + 1; second != input.end(); ++second) // 3
    {
      if(*first != *second)
        continue;

      int len = second - first; // 4
      if(second + len > input.end())
        len = input.end() - second;

      for(int i = 0; i < len; ++i)
      {
        if(*(first+i) != *(second+i)) // 5
          len = i;
      }

      return std::vector<int>(first, first + len);
    }
  }

  return std::vector<int>();
}

1. The input sequence has one or no element, there is not much to do besides returning an empty vector.
2. Loop on all the elements (but the last one).
3. Loop on the elements following the first iterator.
4. The first and second characters match, calculate their distance. The pattern couldn't be bigger than that. Wait a minute, we should also consider the vector size, that could put a lower limit the the pattern length.
5. Find the actual pattern size, if less than the theoretical maximum length.

Go to the full post

Matching many types of brackets

We have a string in input that could be quite longish (200K). It contains just a bunch of open and close brackets, in three possible varieties, round, squared, curly. We want to check if they are properly nested.

I have already discussed a similar problem in the past. In that case the string contained just one type of open/close parenthesis, so the job was even simpler.

I found this version of the problem in the Codility train page, section Stacks and Queues, under the nickname Brackets.

There you could find a couple of test cases, that I write here in C++ for the GoogleTest framework:

int solution(const std::string& input);

TEST(Brac, Given1) // 1
{
    ASSERT_EQ(1, solution("{[()()]}"));
}

TEST(Brac, Given2) // 2
{
    ASSERT_EQ(0, solution("([)()]"));
}

1. This is an example of a correctly nested string.
2. This is not. In third position it is not expected a closing round bracket, since there is a pending squared one before.

The same considerations I made for the simpler version of this problem apply here. The only variation is that I have to use a proper stack to keep memory of the pending open brackets.

Here is my C++98 implementation:

int solution(const std::string& input)
{
    if(input.size() % 2) // 1
      return 0;

    std::stack buffer;

    for(std::string::const_iterator it = input.begin(); it != input.end(); ++it)
    {
        if(*it == '{' || *it == '[' || *it == '(') // 2
            buffer.push(*it);
        else // 3
        {
            if(buffer.empty())
                return 0;

            char expected = *it == '}' ? '{': *it == ']' ? '[' : '(';
            if(buffer.top() == expected)
                buffer.pop();
            else
                return 0;
        }
    }

    if(buffer.empty()) // 4
        return 1;
    return 0;
}

1. There is no need of doing any check further in case of an uneven number of brackets.
2. If the current character is an open bracket, I put it in the stack.
3. Otherwise (assuming the input source is reliable) I expect it is a closing bracket that should match with a corresponding open one. If the stack is empty, the input string is corrupted, otherwise I determine which kind of bracket I expect, compare it with the element at the top of the stack, and pop it (if matches) or return the "no good" value.
4. Check a last time the stack, it should be empty now, otherwise we have some open bracket that we failed to close.

Go to the full post

Selecting the longest lines

We have in input a few strings, we want to output a given number of them, ordered by their size.

This is such a simple problem that I couldn't think of any interesting test case about. You can find it in the CodeEval open challenge under the name Longest Lines.

At its core, the solution to this problem is nothing more than a couple of C++ statements.

Assuming the number of lines we want to output is stored in an int variable named nr, and the vector of strings named lines keep the data, it is just a matter of sort the vector by lines size and return its first nr elements:

std::sort(lines.begin(), lines.end(), StringSizeCmp());
for(int i = 0; i < nr; ++i)
  std::cout << lines[i] << std::endl;

Right. But I forgot to tell you who is that StringSizeCmp guy. It is a simple functor that defines how sort() should decide which element comes first:

struct StringSizeCmp : public std::binary_function<std::string, std::string, bool>
{
  bool operator()(const std::string& lhs, const std::string& rhs) const
  {
    return lhs.size() > rhs.size();
  }
};

As clearly stated by inheritance, StringSizeCmp is a binary function that gets a couple of strings in and gives a boolean out. The comparison is here solved accordingly to the input sizes.

C++11 makes it even simpler, by means of lambda function. We could combine the predicate definition and its usage in a single line:

std::sort(lines.begin(), lines.end(),
    [](const std::string& lhs, const std::string& rhs) {
  return lhs.size() > rhs.size();
});

Go to the full post

Summing up primes

Which is the sum of the first one thousand prime numbers?

This question is (also) a CodeEval problem named Sum of Primes.

Even though it looks similar to the previous CodeEval problem I have seen, there's something that makes it more interesting:

int solution()
{
  std::vector<int> primes; // 1
  primes.reserve(1000);
  primes.push_back(2);
  primes.push_back(3);

  for(int i = 5; primes.size() < 1000; i+=2) // 2
  {
    bool isPrime = true;
    for(unsigned j = 1; j < primes.size() && (std::pow(primes[j], 2) <= i); ++j) // 3
    {
      if(i % primes[j] == 0)
      {
        isPrime = false;
        break;
      }
    }

    if(isPrime)
      primes.push_back(i);
  }

  return std::accumulate(primes.begin(), primes.end(), 0); // 4
}

1. I am going to put the generated primes in this vector.
2. I have already pushed the first two elements by hands, now I push all the other ones. I start checking 5, and then I step to the next odd number.
3. To check if a number is prime, it is enough to ensure that no other prime number is among its divisors. Moreover, we can stop checking when we reach a tentative divisor that, when squared, is bigger than the candidate prime.
4. Finally, just sum all the prime numbers up.

Some time ago, I have written about a similar but more general problem, how to check if a given number is prime.

Go to the full post

Prime and palindrome

Write a program that output the biggest number below 1000 that is prime and palindrome.

You could test your solution on CodeEval, where this very simple problem is named Prime Palindrome.

Its cleanest solution would be, for instance in Python, something like that:

print 929

However, if you are answering this question on an interview, you'd probably better think to a less concise solution.

For instance you could write a for-loop like this (here I used C++, but you could easily adapt it to about any programming language):

for(int i = 989; i > 100; i -= 2) // 1
{
  if(isPalindrome(i) && isPrime(i)) // 2
  {
    std::cout << i << std::endl;
    break;
  }
}

1. I start looping from the highest available palindromic number, after 999 that is obviously not prime, down to 100, since it is easy to see that at least 101 is both a palindrome and prime, stepping by two, given that I don't want to check even numbers.
2. If the current number is both palindrome and prime, print it and break the loop.

Checking our number for its palindromicity is trivial. It is just a matter of comparing its first and last cipher.

A bit more interesting is the check on its primality, please follow the link to a previous post I have written on the matter. Here we could use an even simpler algorithm, since we know that our biggest number to check is less than 1000, still the structure of the algorithm stay the same.

Go to the full post

Max product of three

We have a vector of something in between three and 100,000 integers in the range [−1,000 ... 1,000]. We want to get the maximum product of three elements among all the possible choices.

I found this problem in the Codility train page, section Sorting, under the nickname Max-product-of-three.

It looks very simple, and actually it is. Just be careful in considering the properties of multiplication.

A few test cases (written in C++ for the GoogleTest framework) would help to implement correctly the solution:

int solution(std::vector<int>& input);

TEST(MPoT, Given) // 1
{
    std::vector<int> input;
    input.push_back(-3);
    input.push_back(1);
    input.push_back(2);
    input.push_back(-2);
    input.push_back(5);
    input.push_back(6);

    ASSERT_EQ(60, solution(input));
}

TEST(MPoT, AllNeg) // 2
{
    std::vector<int> input;
    input.push_back(-2);
    input.push_back(-3);
    input.push_back(-4);
    input.push_back(-1);

    ASSERT_EQ(-6, solution(input));
}

TEST(MPoT, SomeNeg) // 3
{
    std::vector<int> input;
    input.push_back(0);
    input.push_back(-3);
    input.push_back(-4);
    input.push_back(1);
    input.push_back(2);

    ASSERT_EQ(24, solution(input));
}

1. This test case is part of the Codility's problem description.
2. What if all the input elements are negative.
3. More interestingly, when a mix of positive and negative numbers are in the game, the weight of the biggest negative ones should be considered.

The first idea that I guess anyone would have is getting the three biggest elements and multiplying them. That works fine in the first two test cases, but fails spectacularly in the third one. We (or at least, I) have forget to consider the case when we have some big negative values.

Remembering that also negative values could contribute to the solution, as showed in the third test case, I came up with this piece of code:

int solution(std::vector<int>& input)
{
    std::sort(input.begin(), input.end(), std::greater<int>()); // 1
    int alpha = input[0] * input[1] * input[2]; // 2
    int beta = *input.rbegin() * *(input.rbegin()+1) * input[0]; // 3

    return alpha > beta ? alpha : beta; // 4
}

1. Actually, there is not any real advantage in sorting the array in descending order. Notice that, as required by the Codility conditions, I am modifying the input vector. This is usually not considered a bright idea.
2. Having sorted the vector from the biggest element downward, I multiply the three elements more to the left. Often this would give the result we are looking for.
3. We could have the case where taking the two most negative numbers in the sequence have a higher relevance of the second and third positive numbers. In this case beta would be bigger than alpha.
4. Now it is just a matter of comparing alpha and beta, and returning the biggest one.

Go to the full post

Triangle triplet

In a triangle, the sum of the catheti is bigger than the hypotenuse. Let's call "triangle triplet" a group of three integer numbers that represents the length sides of a triangle. We want to check if in an array of integers, that could longish (one million elements), there is at least one of such triplets.

You can find this problem in the Codility train page, section Sorting, under the nickname Triangle, where you can find also an extra requirement, the values in input could be also negative (even that doesn't make much sense, accordingly to the definition I gave), and could cover the complete range of signed 32 bit integers in C/C++.

Codility provides also a couple of test cases, described in informal language. I felt that at least a third one was missing. Here they are, written in C++ for the googletest xUnit testing framework:

TEST(Tria, Given) // 1
{
    std::vector<int> input;
    input.push_back(10);
    input.push_back(2);
    input.push_back(5);
    input.push_back(1);
    input.push_back(8);
    input.push_back(20);

    ASSERT_EQ(1, solution(input));
}

TEST(Tria, Given2) // 2
{
    std::vector<int> input;
    input.push_back(10);
    input.push_back(50);
    input.push_back(5);
    input.push_back(1);

    ASSERT_EQ(0, solution(input));
}

TEST(Tria, MaxInt) // 3
{
    std::vector<int> input;
    input.push_back(INT_MAX);
    input.push_back(INT_MAX);
    input.push_back(INT_MAX);

    ASSERT_EQ(1, solution(input));
}

1. Example of a sequence that we should accept.
2. This input sequence does not contain any triangle triplet.
3. The input represents an equilateral triangle. However it is so huge that we could overflow, if our code is not designed carefully.

The idea of the algorithm I implemented is quite simple. Get the three smallest values in input, if the sum of the first two is less than the third one we have got our positive solution. Otherwise we know that the smallest number is too small to be part of any good triplet. Discard it and try again with the new smallest number in the lot. Go on till all the triplets are checked.

This was my implementation at the time of writing this post, in the meantime, codility has upgraded its compiler to get C++11, my code here depends on int type actual definition, and so now it fails in a extreme condition. Please, have a look to the newer post I have written about it for a more robust solution:

int solution(const std::vector<int>& input)
{
    if(input.size() < 3) // 1
        return 0;

    std::vector<int> buffer(input.begin(), input.end()); // 2
    std::sort(buffer.begin(), buffer.end());

    for(unsigned i = 0; i < buffer.size() - 2; ++i) // 3
    {
        if(static_cast<int64_t>(buffer[i]) + buffer[i+1] > buffer[i+2])
            return 1;
    }

    return 0;
}

1. Trivial case.
2. Performance-wise, would make no sense to operate on an unsorted collection. Being the input constant, create a copy and sort it.
3. Apply the above described algorithm. To avoid overflow, we should store the sum of the tentative catheti in a 64 bit integer. The simplest way to do that is casting one of them to int64_t.

Go to the full post

Minimal element in subsequences

We have a non-empty string, that could contain up to 100,000 characters (only the 'A','C','G','T' ones). The user would specify a few ranges, up to 50 thousand of them, in the string, and he expects as output the indices relative to the lowest letter in that ranges.

The problem itself is not complicated, it just asks us to check carefully the requirements. You can found it in the Codility train page, prefix sums section, under the name Genomic-range-query.

We can expect only ACGT letters in input, because the idea is that it represents a DNA sequence. Pay attention to the fact that the user specifies the subsequence ranges as zero-based indices, and he expects the output as one-based indices. If 'A' is the answer, we have to output 1, and so on.

As tradition in these kind of problems, there is no stress at all in error handling. If this function should be used in a real world environment, we should check thoroughly the input. Here we can just assume anything is fine.

There is a test case in the problem description, I added a few more of them to help me understanding better how to design the code. Here are the original one and a couple of new ones, written in C++ for the xUnit GoogleTest framework:

std::vector<int> solution(const std::string& dna, // 0
        const std::vector<int>& begs, const std::vector<int>& ends);

TEST(GeRa, Simple) // 1
{
    std::string input("ACGT");

    std::vector<int> P;
    P.push_back(0);
    P.push_back(1);
    P.push_back(2);
    P.push_back(3);

    std::vector<int> Q;
    Q.push_back(0);
    Q.push_back(1);
    Q.push_back(2);
    Q.push_back(3);

    std::vector<int> output = solution(input, P, Q);
    ASSERT_EQ(4, output.size());

    EXPECT_EQ(1, output[0]);
    EXPECT_EQ(2, output[1]);
    EXPECT_EQ(3, output[2]);
    EXPECT_EQ(4, output[3]);
}

TEST(GeRa, Given) // 2
{
    std::string input("GACACCATA");

    std::vector<int> P;
    P.push_back(0);
    P.push_back(0);
    P.push_back(4);
    P.push_back(7);

    std::vector<int> Q;
    Q.push_back(8);
    Q.push_back(2);
    Q.push_back(5);
    Q.push_back(7);

    std::vector<int> output = solution(input, P, Q);
    ASSERT_EQ(4, output.size());

    ASSERT_EQ(1, output[0]);
    ASSERT_EQ(1, output[1]);
    ASSERT_EQ(2, output[2]);
    ASSERT_EQ(4, output[3]);
}

TEST(GeRa, Huge) // 3
{
    std::string input(100000, 'A');

    std::vector<int> P(50000);

    std::vector<int> Q(50000, input.size() - 1);

    std::vector<int> output = solution(input, P, Q);
    ASSERT_EQ(50000, output.size());

    for(unsigned i = 0; i < output.size(); ++i)
        EXPECT_EQ(1, output[i]);
}

0. Codility asks for a slightly different function interface, no parameter is marked there as const.
1. On a very short sequence ("ACGT"), I want to get the lower element on each single element subsequence, (0, 0), (1, 1), (2, 2), (3, 3). The expected result is a four-sized int vector containing 1, 2, 3, 4 (meaning 'A', 'C', 'G', 'T').
2. The Codility given test. From the input sequence "GACACCATA", we should return 1 for (0, 8), 1 for (0, 2), 2 for (4, 5), 4 for (7, 7).
3. This test case is meant to let the developer know how good is the chosen algorithm in term of time complexity. The input sequence is as long as possible, and we have to check the biggest possible number of subsequences that are all covering the entire range. I don't care much of the actual data, so in the input string I have only 'A', consequently as a result a 50,000 sized vector containing only 1's is expected.

Simple but slow solution

A natural solution would be require to check all the subintervals, looping from begin to end, looking for the lowest element.

Its obvious disadvantage is that it would be implemented with a for-in-a-for loop, that in the worst case would lead to an O(N*M) complexity, where N is the sequence size and M is the number of subsequences we need to check.

Here is a first naive implementation:

std::vector<int> solution(const std::string& dna,
        const std::vector<int>& begs, const std::vector<int>& ends)
{
    assert(begs.size() == ends.size()); // 1

    std::vector<int> result(begs.size()); // 2
    for (unsigned i = 0; i < begs.size(); ++i) // 3
    {
        char nucleotide = 'T'; // 4
        for (int j = begs[i]; j <= ends[i]; ++j) // 5
            if (dna[j] < nucleotide) // 6
                nucleotide = dna[j];

        result[i] = nucleotide == 'A' ? 1 : // 7
                    nucleotide == 'C' ? 2 :
                    nucleotide == 'G' ? 3 : 4;
    }
    return result;
}

1. Even if it is not required, I feel too bad not adding at least this minimal check on the input. If you are less paranoid than me, you can happily skip this line.
2. The output is going to be generated in this vector.
3. Check all the intervals.
4. Worst case, the current subsequence has a minimal nucleotide 'T'.
5. Loop on all the elements in the subsequence.
6. If the current element is less than the previous recorded nucleotide, it becomes the new minimal one.
7. I need to convert the selected nucleotide in its representative code.

There are a few improvements that we could operate on this piece of code, for instance, we can break the loop on (5) if we find that the current nucleotide is an 'A', because we have already found the best solution. However, this won't improve our worst case time complexity, it would only help its best and average cases.

We need to look for an altogether different approach.

Using more space to save time

I would like to get the minimal nucleotide in a subinterval just checking its intervals. To achieve this, I can use the partial sum algorithm, and check the difference between the values before entering and at the end of the interval.

A minor issue is that we have to keep track of four different values (A,C,G,T) and not a single variable. It is easy to overcome it, for example creating a vector for each nucleotide.

This generates a small nuisance, I need to convert the character that represent any nucleotide in its index in the vectors of partial sums. This is done with something similar to what I did at the point 7 above, but here I need a zero-based index. To keep the code readable, I created a tiny function to do this mapping:

int getType(char nucleotide)
{
    switch (nucleotide)
    {
    case 'A':
        return 0;
    case 'C':
        return 1;
    case 'G':
        return 2;
    default:
        return 3;
    }
}

What I am going to do is calculating the partial sum for each nucleotide, and then use them on each passed interval to get which is the minimal one present there.

Here is a possibile solution:

std::vector<int> solution(const std::string& dna, const std::vector<int>& begs,
        const std::vector<int>& ends)
{
    assert(begs.size() == ends.size());

    std::vector<std::vector<int> > psums(4); // 1
    for (int i = 0; i < 4; ++i)
        psums[i].resize(dna.size());

    for (unsigned i = 0; i < dna.size(); ++i) // 2
        psums[getType(dna[i])][i] = 1;

    for (int i = 0; i < 4; ++i) // 3
        std::partial_sum(psums[i].begin(), psums[i].end(), psums[i].begin());

    std::vector<int> result(begs.size());
    for (unsigned i = 0; i < begs.size(); ++i) // 4
    {
        int type = 3; // 5
        for (unsigned j = 0; j < 3; ++j) // 6
        {
            int left = begs[i] > 0 ? psums[j][begs[i] - 1] : 0; // 7
            int right = psums[j][ends[i]]; // 8
            if (right != left) // 9
            {
                type = j;
                break;
            }
        }

        result[i] = type + 1; // 10
    }
    return result;
}

1. I create four int vectors, same size of the input DNA sequence. They are initialized with the default int value, that is zero.
2. Scan the DNA sequence. Detect the nucleotide type (0..3) and put in the relative partial sum vector a 1 to mark its presence there.
3. Calculate the partial sum for each nucleotide.
4. Loop on all the subintervals.
5. As in the previous implementation, we already know that the worst case is 'T'. Let's check if there is any "lower" nucleotide. By the way, this also means that we don't really need the fourth psums vector. We could completely get rid of it and save some space.
6. Check of A, C, G nucleotides in this subinterval.
7. This line is a bit tricky. I need to know how many of the current nucleotide (j) have been already seen in the sequence before entering in the subsequence. This information is stored in the element to the immediate left of begs[i]. With one exception, when begs[i] is zero, meaning we are at the beginning of the full sequence. In that case we should use 0 instead, since obviously there were no j-nucleotide at that point.
8. No problem for the right element, just fetch the j-vector of psums, and fetch its ends[i] component.
9. If there is a variation between the final partial sum and the initial one, at least one j-nucleotide is in that interval. So we can set it as the minimal one and stop looping.
10. Last tweak. I stored the nucleotide type as a zero-based index, I need to increase it to match the user expectations.

Go to the full post

Fizz buzz

You won't believe it, but I had no idea what the fizz buzz game was. According to Wikipedia, it's a popular way in (some) English speaking countries to teach children what division is. And checking on the web, I have got the impression that it is also a not so uncommon problem asked during developer interviews. It is so simple, that I'd say it make some sense just for junior positions.

I bumped into it looking at the CodeEval's problems, where it is presented in a slightly more generalized way. We want to list all the numbers from 1 to a given maximum, but replacing the ones that could be divided by two taboo factors, fizz and buzz, with placeholders.

A couple of test cases (written in C++ for GoogleTest), should clarify the requisites.

std::string solution(int fizz, int buzz, int n);

TEST(FiBu, CaseGiven1) // 1
{
  std::string result = solution(3, 5, 10);

  ASSERT_EQ(result, "1 2 F 4 B F 7 8 F B");
}

TEST(FiBu, CaseGiven2) // 2
{
  std::string result = solution(2, 7, 15);

  ASSERT_EQ(result, "1 F 3 F 5 F B F 9 F 11 F 13 FB 15");
}

1. We want to analyze the numbers in the [1..10] interval, any number that has 3 among its factors should be replaced by a F, and 5 by B. Notice that there is no trailing blank after the last generated element.
2. Here the interval is [1..15], the factor 2 should lead to F, and 7 to B. 14 could be divided by both 2 and 7, so it should be replaced by FB.

There is not much to think about before writing the code. The main issue is how we can check if a given number has fizz (or buzz) among its factors. In C, C++ and related languages we normally use the arithmetic operator % (called "modulus" or "modulo") that returns the remainder of the integer division between the two numbers. When it returns zero, we have an integer divisor.

Here is a possible solution:

std::string solution(int fizz, int buzz, int n)
{
  std::ostringstream oss;
  for(int i = 1; i <= n; ++ i)
  {
    bool fb = false; // 1

    if(i % fizz == 0) // 2
    {
      oss << 'F';
      fb = true;
    }
    if(i % buzz == 0) // 3
    {
      oss << 'B';
      fb = true;
    }

    if(!fb) // 4
      oss << i;

    if(i != n) // 5
      oss << ' ';
  }

  return oss.str();
}

1. Flag for fizz or buzz detection.
2. If the remainder of the division of the current number by fizz is zero, it is one of its factors.
3. Same for buzz.
4. If nor fizz nor buzz has been detected, the current number in used.
5. I don't want a trailing blank at the end of the string, so I ensure I am not in the last iteration.

Go to the full post

Counting couples

We have a vector that contains up to one hundred thousand zeros and ones. We want to count the total number of ones that follows each zero.

This problem is better (?) described in the Passing-cars Codility test, part of their train page, prefix sums section.

In my opinion a few test cases would be a better way to clarify it. They are written for C++ with the Google Test framework, however I guess you can easily adapt to your preferred language/xUnit environment:

TEST(PaCa, Given) // 1
{
    std::vector<int> input;
    input.push_back(0);
    input.push_back(1);
    input.push_back(0);
    input.push_back(1);
    input.push_back(1);

    ASSERT_EQ(5, solution(input));
}

TEST(PaCa, Minimal1) // 2
{
    std::vector<int> input(2);
    input[0] = 1;

    ASSERT_EQ(0, solution(input));
}

TEST(PaCa, Huge3) // 3
{
    std::vector<int> input(50000);
    std::vector<int> more(50000, 1);
    input.insert(input.end(), more.begin(), more.end());

    ASSERT_EQ(-1, solution(input));
}

1. This is the test case provided by Codility. Our input is { 0, 1, 0, 1, 1 }. The first zero is followed by three ones, the second zero by another couple. Expected result is five.
2. A first minimal test case I have written (I spare you the other ones). Having in input { 1, 0 }, there is no one following the only zero available. Expected result is zero.
3. A test case that works on the biggest possible input. The first half of the elements are all set to zero, the second half includes only ones. Each zero is followed by 50,000 ones. Having 50,000 zeros, the expected result is 2,500,000,000. Two billion and an half. A clause in the problems says that if we have an output bigger than one billion we should return a sort of error code, minus one.

Inefficient solution

Anyone should find easily this solution. It works fine for small inputs, but it has a O(N**2) time complexity that makes less usable as the input size grows.

int solution(std::vector<int>& input) // 1
{
    if(input.size() < 2) // 2
        return 0;

    unsigned couples = 0; // 3
    for(unsigned i = 0; i < input.size() - 1; ++i) // 4
    {
        if(input[i]) // 5
            continue;

        for(unsigned j = i + 1; j < input.size(); ++j) // 6
        {
            if(input[j]) // 7
            {
                ++couples;
            }
        }
    }

    return couples > 1000000000 ? -1 : couples; // 8
}

1. Usually we would expect the input parameter to be passed as const reference. And here there is no real reason to to pass it as a non-const one, beside the fact that is a problem requirement. We'll see in the third proposed solution how this could be useful to save space complexity.
2. Trivial cases. In real life code, we should have also checked for size too big than expected - maybe throwing an exception.
3. The worst case tested in Huge3 tells us that a plain 32 bit signed int could be not enough to keep the number of couples. And for Codility compiler "int" means "32 bit int". The unsigned version of int, is more than enough for our purposes.
4. Loop on all the input elements but the last one. We are looking for zeros, and we don't care if the last element is one of them, since it can't be obviously followed by anything.
5. If the element is "not zero", it is not what we are looking for, get to the next iteration.
6. Loop on all the following elements looking for ones. Being an asymptotically O(N) loop in another O(N) loop, this is the weak instruction in this algorithm. We'll have to move it someway out of here.
7. If the current element is "not zero", we have found another couple. I could have put here the test on the one billionth couple. If you expect to have many huge inputs leading to a "-1" solution, that could be a better choice. I assumed instead that it is a relatively rare case, and I decided not to pay the price for this repeated check, and move it to (8). Another effect of having the check here, is that I could have given to couples the "plain int" status.
8. If the number of couples is too big, return the error value, otherwise implicitly cast couples to plain int (and I know I can do it) and return it.

Time (and space) linear solution

We have already seen how in previous problems how to trade time for space complexity. The idea here is creating a temporary buffer, same size of the input vector, where we are going to store the sum of all the ones to the right of each zero. Then it will only be a matter of adding the values we are interested in.

Here it comes handy the C++ STL numeric algorithm partial_sum(), that does exactly what we need, once that we pay attention to the fact that we have to look at the reversed input, starting from its rightmost element up to the leftmost one.

Here is the refactored solution:

int solution(std::vector<int>& input)
{
    if(input.size() < 2)
        return 0;

    std::vector<int> buffer(input.size());
    std::partial_sum(input.rbegin(), input.rend(), buffer.rbegin()); // 1

    unsigned couples = 0;
    for(unsigned i = 0; i < input.size() - 1; ++i) // 2
    {
        if(input[i] == 0)
        {
            couples += buffer[i];
        }
    }

    return couples > 1000000000 ? -1 : couples;
}

1. Scan the input vector in reverse order, and put the calculated partial sum likewise in the local buffer.
2. Having extracted the internal loop to place it in (1), the original eternal loop gets much simpler. If the current input element is a zero, I should add the current partial sum (in buffer) to the value I am going to return.

This is a good solution, and it is going to get an 100% by Codility. Still it shouldn't be accepted, if you read carefully their requirements, where it is stated that a O(1) worst-case space complexity is expected.

Dirtier but space constant

Noticing that we are allowed to modify in-place the input (usually not a good idea, since it could cause unpleasant surprises to the caller), we could think a way to combine input and buffer behavior in a single place. Again, this is usually not a good idea. We are violating the single responsibility principle / separation of concerns, and this is going to make our code less readable, and hence less robust to changes. However, let's assume we are in a case where memory is at a premium, and refactor a second time the code, this time to get rid of the local buffer.

We could think to twist the partial sum algorithm so that we store in-place the calculated prefix sum only if the current value is zero, otherwise we wipe out the value in input. The change in the algorithm is so radical that we can't reuse the STL function, but we have to write an our variation.

int solution(std::vector<int>& input)
{
    if(input.size() < 2)
        return 0;

    int sum = 0; // 1
    for(std::vector<int>::reverse_iterator it = input.rbegin(); it != input.rend(); ++it)
    {
        sum += *it; // 2
        *it = *it == 0 ? sum : 0; // 3
    }

    unsigned couples = std::accumulate(input.begin(), input.end(), static_cast(0)); // 4
    return couples > 1000000000 ? -1 : couples;
}

1. Keep track of the current partial sum value.
2. Add the current sequence value to the partial sum.
3. Here is the tricky bit. If the current value is zero, we put the partial sum in. Otherwise we flag it as uninteresting.
4. We need to sum up all the partial sums stored in the input vector. We are not interested in the uninteresting values, but since we have marked them with a zero, we could just sum all the elements in the vector. I'm using the STL function accumulate(), notice that I have explicitly said to it that it as to use as starting value zero as an unsigned value.

The other way round

As Wen-Kai suggests in his comment to this post, we get the same result counting the zeros instead. Think how each 1-element assumes a different weight accordingly to the number of zeros that are before it, since each of them is going to count it for its own sum.

If we perform a slightly customized version of partial sum on zeros, storing the results where the original one-elements were, we get a specular behavior of the previous solution:

int solution(std::vector<int>& input)
{
  int sum = 0; // 1
  for(std::vector<int>::iterator it = input.begin(); it != input.end(); ++it)
  {
      if(*it == 0) // 2
        ++sum;
      else
        *it = sum; // 3
  }

  unsigned couples = std::accumulate(input.begin(), input.end(), static_cast<unsigned>(0)); // 5
  return couples > 1000000000 ? -1 : couples;
}

1. Now "sum" is the number of zeros that we have already scanned.
2. A new zero detected.
3. The weight of the current element grows to keep track of the zeros that are before it.
4. As before, now is just a matter of summing up the partial results.

You can get a sleeker code combining the partial sum loop with the accumulate one, following the Wen-Kai suggestion here below.

Go to the full post

Subscribe to: Posts (Atom)