Pages

Simple finite state automaton

This problem is about writing a trivial finite state machine that gets in input the number of elements that it should work with, and a vector of integers that represents the operations we want it to execute on them. As output we expect a vector containing its final state.

It comes from the Codility Train problems' collection, the section about Counting Elements, identified by the codename Max-Counters.

Our function has to create a vector of integers initialized to zero and then, accordingly to the the values specified in the input vector, increase by one the specified element, or set all of them to the current maximum value.

A test case (written in C++ and the Google Test framework) should clarify the expected behavior:
std::vector<int> solution(int N, std::vector<int> & input)

TEST(MaCo, Given)
{
    std::vector<int> input;
    input.push_back(3);
    input.push_back(4);
    input.push_back(4);
    input.push_back(6);
    input.push_back(1);
    input.push_back(4);
    input.push_back(4);

    std::vector<int> output = solution(5, input);
    ASSERT_EQ(5, output.size());

    ASSERT_EQ(3, output[0]);
    ASSERT_EQ(2, output[1]);
    ASSERT_EQ(2, output[2]);
    ASSERT_EQ(4, output[3]);
    ASSERT_EQ(2, output[4]);
}
The first parameter we pass to our function is a five. Meaning we want it to work on an int vector of five elements.
The second parameter is the vector containing the (sort of) program we want our function to execute. Each integer represent an instruction:
- 3: increase the third element, that should store now 1.
- 4: increase the fourth element, set to 1.
- 4: increase again the fourth element, now it is 2.
- 6: assign the top value (2) to all the elements.
- 1: increase the first element to 3.
- 4: increase the fourth element to 3.
- 4: increase the fourth element to 4.
The resulting vector should be { 3, 2, 2, 4, 2 }.

As it always happens in this kind of problem, we can avoid any error handling, trusting the user to pass correct input. Besides, we expect to work with reasonable small values (100,000 is the biggest integer we should see in input).

When you get what the problem is asking, you should be already close to get the solution. I could figure out rapidly a simple solution without the need of writing other test cases (even if I wouldn't recommend you to follow my example, and to play more safely instead, spending some time to improve the testing).

Inefficient solution

The codility evaluation for this solution is 77%. It works fine for small data input, it gets far too slow as N and M (the size of the input vector) grows. We should spot immediately that it has an O(N*M) time complexity:
vector<int> solution(int N, vector<int> & input)
{
    std::vector<int> result(N);

    int highest = 0; // 1
    for(int i = 0; i < input.size(); ++i)
    {
        int value = input[i];
        if(value > 0 && value <= N) // 2
        {
            int newValue = ++result[value-1];
            if(highest < newValue)
                highest = newValue;
        }
        else if(value == N + 1) // 3
        {
            for(int j = 0; j < result.size(); ++j) // 4
                result[j] = highest;
        }
    }
    
    return result;
}
1. The current highest value in the vector of results. It is going to be used for the "leveling" operation.
2. If the current value in the input vector is in [1..N-1], I should apply the "increase the specified element value" operation. If in this way I get a new highest value, I keep track of the change.
3. A less paranoid developer would have probably used a plain "else" here. I couldn't help to add a minimal error handling, checking if the value is actually the expected one before applying the leveling. In this way any unexpected input value would lead to a no-op.
4. This is the obvious weak point in the algorithm I have chosen. A loop in a loop that we should avoid.

Linear solution

We have immediately spot the first implementation issue, what we want to do is moving the loop for applying the leveling outside the loop that checks the operations we have to perform on our data.

If you think about it, this is nothing complicated. We just have to use another integer to remember which is the value we have used for the latest leveling operation.

The refactoring is not a complicated task:
std::vector<int> solution(int N, std::vector<int>& input)
{
    std::vector<int> result(N);

    int highest = 0;
    int watermark = 0; // 1

    for(unsigned i = 0; i < input.size(); ++i)
    {
        int index = input[i] - 1; // 2
        if(index >= 0 && index < N)
        {
            result[index] = std::max(watermark, result[index]) + 1; // 3
            highest = std::max(highest, result[index]); // 4
        }
        else if(index == N)
        {
            watermark = highest; // 5
        }
    }

    for(unsigned i = 0; i < result.size(); ++i) // 6
    {
        result[i] = std::max(result[i], watermark);
    }

    return result;
}
1. I think to the leveling operation like a sort of flooding. This variable keeps track of the level reached by the water.
2. It gets complicated to adjust the index as perceived by the user (based one) and as managed internally (based zero) in the following code, so I added an utility variable that should lead to a more pleasant reading.
3. The new value for the current element is determined by it previous value and the watermark. I get the highest value, and I increase it.
4. I could have left the if-check as in the original version, choose which one you feel is more readable.
5. This is the core of the change. I don't loop anymore, I just remember the new level. This leads to the nuisance in (3) but saves us load of time.
6. No more loop in a loop. However, we should pay attention to apply the watermark value only when required. We want to only level up the elements.

No comments:

Post a Comment