This Thread: February 2017

Fibonacci Iterator

Write an iterator that gives back all the Fibonacci numbers up to a given value.
Use it to get the highest Fibonacci number less than 1000, and to check which if there is in that range a multiple of 12.

This problem is thought as a follow-up to the previous post, where we implemented the same functionality as a Python generator. I have extracted it from Dive into Python 3, section 7.5. The idea is showing how python generators are shortcut implementation of iterators.

class Fibonacci:
    def __init__(self, top):  # 1
        self.top = top

    def __iter__(self):  # 2
        self.a = 0
        self.b = 1
        return self

    def __next__(self):  # 3
        if self.a > self.top:
            raise StopIteration  # 4

        result = self.a
        self.a, self.b = self.b, self.a + self.b
        return result

1. Initialize a Fibonacci object setting its top value.
2. The special __iter__() method prepares a Fibonacci object to be used as an iterator.
3. Each call to a Fibonacci object as iterator resolves to a call to its special method __next__()
4. The end of iterations is signaled raising a StopIteration exception. So, in this context an exception is not exceptional at all.

Even though the implementation is much verbose, in this case the functionality is just the same. So, beside studying purpose, the generator version easily wins over this iterator one.

The user is minimally affected by the implementation change, as you can see in these tests:

def test_list_1000(self):
    fib1000 = list(Fibonacci(1000))
    self.assertEqual(17, len(fib1000))
    self.assertEqual(987, fib1000[-1])

def test_multiple_12(self):
    for candidate in Fibonacci(1000):
        if candidate and candidate % 12 == 0:
            break
    else:
        self.fail('No fibonacci multiple of 12 found!')
    self.assertEqual(144, candidate)

If you want, you could check the full code on GitHub.

Go to the full post

Fibonacci Generator

Write a function that generates Fibonacci numbers up to a given value.
Use it to get the highest Fibonacci number less than 1000, and to check if there is in that range a Fibonacci number that is multiple of 12.

You can solve this problem in a myriad of ways. However, the original idea was to push towards writing a Python generator. I extracted this example from Dive into Python 3, paragraph 6.6.1.

Here is a possible implementation:

def fibonacci(top):
    a, b = 0, 1
    while a < top:
        yield a
        a, b = b, a + b

Having a 'yield' statement in its body, we see that this fibonacci function is a generator. That is, a sort of function that behaves like an iterator. Each time we call it, it should yield a value. If not, as in this case when a is not less than top, the generator is consumed, and the iteration stops.

Let's see it in action:

def test_list_1000(self):
    fib1000 = list(fibonacci(1000))  # 1
    self.assertEqual(17, len(fib1000))
    self.assertEqual(987, fib1000[-1])

def test_multiple_12(self):
    for candidate in fibonacci(1000):  # 2
        if candidate and candidate % 12 == 0:  # 3
            break
    else:
        self.fail('No fibonacci multiple of 12 found!')
    self.assertEqual(144, candidate)

1. Here fibonacci(1000) is used to populate a list of seventeen elements, when the generated value is not less than 1000, the generation of elements ends.
2. Each time we call fibonacci(1000) a new Fibonacci number is generated, starting from 0 on.
3. Since 144 is both a Fibonacci number and a multiple of 12, we won't consume completely the generator.

Full code pushed to GitHub.

Go to the full post

HackerRank Sorting: Comparator

Write a comparator for a class Player that has, as data member, name (a string) and score (integer) so that sorting a list of players would put higher score on top. In case of a tie, names should be ordered in natural order.
I don't think this HackerRank problem was conceived with Python in mind, however there is some interest in implementing a solution in this language too.

As usual, I converted the proposed sample in a python test.

def test_provided_1(self):
    data = {'amy': 100, 'david': 100, 'heraldo': 50, 'aakansha': 75, 'aleksa': 150}
    players = []
    for k, v in data.items():
        players.append(Player(k, v))

    players.sort(key=cmp_to_key(Player.comparator))

    self.assertEqual(5, len(players))
    self.assertEqual('aleksa', players[0].name)
    self.assertEqual('amy', players[1].name)
    self.assertEqual('david', players[2].name)
    self.assertEqual('aakansha', players[3].name)
    self.assertEqual('heraldo', players[4].name)

The tie case, where amy and david have the same score, is already sorted. So I added a second test.

def test_ex_aequo(self):
    players = []
    for i in range(3):
        players.append(Player('Tom' + chr(ord('k') - i), 100))

    players.sort(key=cmp_to_key(Player.comparator))

    self.assertEqual(3, len(players))
    self.assertEqual('Tomi', players[0].name)
    self.assertEqual('Tomj', players[1].name)
    self.assertEqual('Tomk', players[2].name)

Here I have Tomk, Tomj, Tomi, same score but in reversed order.
Writing a comparator for the class Player shouldn't be difficult.

class Player:
    def __init__(self, name, score):
        self.name = name
        self.score = score

# ...

    def comparator(self, rhs):
        if self.score == rhs.score:
            return 1 if self.name > rhs.name else -1
        return 1 if self.score < rhs.score else -1

The interesting point is, do we really need it? If the comparator is commonly used by the client code for the class Player, yes, definetely. However, if this is not a standard requirement we should leave the user free to define his own comparator on the fly. Instead of using the functools cmp_to_key() function to convert a comparator in a suitable key function for sorting, we could use the operator attrgetter() function to specify on which attribute of the class the sort should act. Something like:

players.sort(key=attrgetter('name'))
players.sort(key=attrgetter('score'), reverse=True)

I am using the fact that python sort() is stable, so I firstly sort the collection by the second key of interest, in this case 'name', in natural order, and then I sort again the players by their score, this time in reversed order. The advantage of this approach is that it is much more flexible. Any user could decide how to sort its players, and the Player does not have to create a comparator each time a new way of sorting is required. I pushed to GitHub both the python script that I submitted to HackerRank and the unit test that includes also the attrgetter() variation.

Go to the full post

Phone Numbers

Write a function that checks if a given string represents a phone number.

This problem should look to you suspiciously similar to the previous one, that asked to check about Roman numbers. And actually I have extracted it from the same source, Dive into Python 3, chapter 5, that is about Regular Expressions.

So, the problem is defining a good pattern that matches the expected input. Here is a first try:

pattern = re.compile(r'^(\d{3})-(\d{3})-(\d{4})$')

I have asked to the compile() function in the python re library to compile it to a pattern object, so that I can use it to call on it its method search(), as you can see below.
Notice that I passed to compile() a "raw" string, signaled by an 'r' before its begin. It is a useful trick, so that we can avoid backslashing the backslashes to specify them as actual characters and not escape ones.
Then I say that my number should use all the characters in the string, starting from the beginning, as the caret '^' anchor specifies, to the end, given the dollar '$' sign.
In the string I define three groups, round brackets, of digits. I'm using the '\d' shortcut to mean each possible digit. The curly brackets after it give the numerosity of that element. In the first and second case we have exactly 3 digits, in the third case are four.

This pattern works alright for simple numbers, as I proved in a test case:

result = pattern.search('800-555-1212')
self.assertIsNotNone(result)  # 1

groups = result.groups()
self.assertEqual(3, len(groups))  # 2
self.assertEqual('800', groups[0])
self.assertEqual('555', groups[1])
self.assertEqual('1212', groups[2])

1. Seach succeeds.
2. In the result there are three groups, as expected, and each group contains the expected block of the phone number.

However, this pattern is too limited. We want a fourth optional group, representing the number extension; we need to accept any possible kind of separators, and even the total lack of them; and we should expect some extra leading characters that should be skipped.

Last issue is easily solved, it's enough to get rid of the caret at the beginning of the pattern. In this way the number is not forced to start at the beginning of the string.
Adding the extension group is not a problem at all. We should just be prepared to check it, expecting an empty string if it is not present.
Separators require deciding what could actually be accepted in that position. Let's be permissive and accept anything that is not a number.

These considerations lead to a new pattern:

pattern = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$')

Notice the disappearing of the '^' and the mutation of the literal dash to '\D*', meaning any character that is not a digit - uppercase D, where lowercase d represents any possible digit - repeated zero or more times, that's the star '*'.

I have written a few tests that I pushed to GitHub, and I am quite satisfied with the result.

Go to the full post

Roman Numbers

Write a function that checks if a given string represents a Roman number.

We can keep simple the solution to this problem using Regular Expressions. Using the python re library we just have to wrap a call to its search function, passing to it the input string and an appropriate pattern. If it does not find it, it returns None.

re.search(pattern, data)

The problem is identify the right pattern. An hint, some unit test would help. Another hint, you could have a look at chapter 5 section 3 of Dive into Python 3 that uses this example as introduction to pattern matching in Python.

Assuming that the candidate Roman number should start at the beginning of the passed string, and end with it, here is a solution:

pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'

First character, a caret ('^'), acts as an anchor. The matching starts at the beginning of the passed string.
Then we have a M, saying that we expect this character in the passed string, just at its beginning (as stated by the caret before). How many of them? It is specified by the following curly brackets. Zero or more, up to three.
The following stuff is in round brackets. We have three of them. All of them are optional. And then we have a dollar sign, saying that the input string should end there. Nothing not specified in the pattern should follow, otherwise there would be no recognition.

The three optional blocks are very similar. Let's see the first one.

(CM|CD|D?C{0,3})

It says that we are expecting one of three possible alternatives. The first two are just two-character string, 'CM' or 'CD'. The third one is more interesting.

It says we are expecting a 'D'. But, wait, it is followed by a question mark, meaning that it is optional. We can have 0 or 1 of them. We could get the same result writing

D{0,1}  # same as D?

Usually the more compact 'D?' is preferred.

And finally we have zero or more, up to three, 'C'.

I pushed the python unit test I have written when playing with this regular expression on GitHub.

Go to the full post

Hackerrank Tries: Contacts

We are asked to provide a way to store contacts and check how many of them starts with a given prefix. As the name of this HackerRank problem suggests, we are gently pushed in the direction of implementing a trie. The issue is that one of the test cases that the solution has to pass to be accepted - #12 - is a bit too heavy for a plain python implementation.

My initial idea was to create a trie based on this node structure.

class Node:
    def __init__(self):
        self.count = 1
        self.children = {}

trie = Node()

An empty trie is just a node containing an empty dictionary of children.
If I add a new item on it, say a contact named 'x', I create a children with key 'x' and as value a Node created by the above constructor.
If I add a second item with the same starting letter, say 'xy, I increase the counter for the 'x' element and then create a 'y' element as its child.

I need two functions to implement the required interface, add() and find():

def add(node, data):
    pass


def find(node, data):
    pass

Given this skeleton, I written a python test unit based on the sample given by HackerRank:

def test_provided_1(self):
    add(trie, 'hack')
    add(trie, 'hackerrank')
    self.assertEqual(2, find(trie, 'hac'))

def test_provided_2(self):
    add(trie, 'hack')
    add(trie, 'hackerrank')
    self.assertEqual(0, find(trie, 'hak'))

I played a bit around, in the end I came out with this implementation:

def add(node, name):
    for letter in name:
        sub = node.children.get(letter)  # 1
        if sub:
            sub.count += 1
        else:
            sub = node.children[letter] = Node()  # 2
        node = sub  # 3

1. I am looping on each letter of the passed contact name, if the children dictionary in this level contains the relative letter, I access that node and increase its counter.
2. Otherwise I create a new node in the dictionary using as key the current letter.
3. Then I follow the recursive trie structure passing to the node in the level below.

Finding the counter associated to a prefix in a trie is just a matter of following its path in the tree.

def find(node, data):
    for letter in data:
        sub = node.children.get(letter)  # 1
        if not sub:
            return 0
        node = sub
    return node.count  # 2

1. If we don't find a letter in the path we are following in a matching level of the tree, there is no contact with such prefix. Otherwise we get the node among the children and pass to the level below.
2. If we have not intercepted a missing letter, the counter associated to the current node is the answer we want pass back to the caller.

This works fine, it is just a bit slow, with all the nodes that we create for each name we push in the trie.

We need a faster solution. A way would be changing programming language, and it looks very easy to convert this code in plain C. However, I had an idea that could make the algorithm faster.

Basically, I want to reduce the number of nodes created. When I insert the first name, for instance 'hack', I don't know if I really need to split it letter by letter. Maybe it will be the only contact starting by 'h'. So, I could keep it in a single chunk. Just one node instead of four. Good deal.
I pay it when I insert 'hackerrank', here I am forced to split 'hack' letter by letter (actually, I could have been more conservative, and keep the common prefix 'hack' in a single node. I kept it as a note. If the simpler split-all technique I went for was not enough, I would have tried this other approach). Still I let the tail of the new name, 'errank', in a single node.

Let's try it. No pun intended.

I have to store the actual piece of data in any Node:

class Node:
    def __init__(self, data=None):
        self.count = 1
        self.data = data
        self.children = {}

trie = Node()

Adding gets more complicated, since I have to think about splitting:

def add(node, name):
    for i in range(len(name)):  # 1
        sub = node.children.get(name[i])
        if not sub:
            node.children[name[i]] = Node(name[i:])  # 2
            return

        sub.count += 1
        if sub.data == name[i:]:  # 3
            return
        elif len(sub.data) > 1:  # 4
            split = sub.data[1:]
            sub.data = sub.data[0]
            sub.children[split[0]] = Node(split)
        node = sub  # 5

1. I can't use anymore the handy for-each loop, I must fallback to a plain for in range one since I need the current position in the word to split it when required.
2. When I detect a missing node I created it. The node is not associated anymore to a single letter in the name, but to the full chunk starting from the current position to the end of the word.
3. When I get a match, things are getting a bit more complicated. But not in this specific case, when the tail of the current word is a match with the content of the current node. Job done!
4. Here is the real extra work. The current node contains a chunk of two or more letters, and they don't match with the chunk from the new name. I need to split, I don't do any check on what I am splitting, I simply slice off one letter from the current chunk, let the first one in this node and put the rest in the child relative to the first letter of the split.
5. Back on the original track. I move to the node in the level below, and I am ready to the next step.

Finding stays simple, just a minor change:

def find(node, data):
    for i in range(len(data)):  # 1
        node = node.children.get(data[i])
        if not node:
            return 0
        if node.data.startswith(data[i:]):  # 2
            break
    return node.count

1. We need to know the position in the current word, here too.
2. We find the node when the data in it contains the current chunk of the new name, starting from its current position to its end.

Now the algorithm is fast enough so that also this python implementation gets fully accepted. I pushed on GitHub both versions, the "plain" one and the "chunk" one, and the relative unit tests.

I pushed a new python script on GitHub, that follows the code passed as anonymous comment here below (thank you again for contributing). It'a a "no trie" solution that works fine. I just changed the 'find' operation implementation to get rid of the try-except check. Being myself an old C++ programmer I always feel not at ease with throwing exceptions when I can avoid to.

Even though the rule of the games would push us to write a trie solution, I think it is a good idea to have a look at an implementation that avoid it. We could imagine what is its weak point, lots and lots of elements created, whereas the trie promises to reduce them on the long run. We should choose which implementation to use accordingly to the actual problem we are working with.

Go to the full post

HackerRank Tree traversal

Given a Binary Search Tree, give its node values following the in-order, pre-order, post-order, and level-order traversals, using both recursive and iterative algorithms. Actually, the HackerRank problem that gave me the idea for this post asks only to implement the level-order traversal algorithm, and you can write the code as you like, so probably in the iterative way, that comes more natural in this case.

Firstly, I sketched a BST on paper, to better visualize the problem. Nothing fancy, but not too basic too. Something like this.

Then I wrote the unit test for the eight functions I am about to write, using Python as implementation language. That lead me naturally to describe the data type Node that they all use, and setting up each test translating the picture I had of my BST in a data structure in my code.

class Node:
    def __init__(self, data, left=None, right=None):
        self.data = data
        self.left = left
        self.right = right

# ...
  
class TestCodeEval(unittest.TestCase):

    def setUp(self):
        node1 = Node(1)
        node2 = Node(2, node1)
        node4 = Node(4)
        node7 = Node(7)
        node5 = Node(5, node4, node7)

        self.root = Node(3, node2, node5)

As you can see, I gave to Node the bare minimal stuff to make it work.

def test_in_order_r(self):
    self.assertEqual([1, 2, 3, 4, 5, 7], in_order_r(self.root))

def test_in_order_i(self):
    self.assertEqual([1, 2, 3, 4, 5, 7], in_order_i(self.root))

def test_pre_order_r(self):
    self.assertEqual([3, 2, 1, 5, 4, 7], pre_order_r(self.root))

def test_pre_order_i(self):
    self.assertEqual([3, 2, 1, 5, 4, 7], pre_order_i(self.root))

def test_post_order_r(self):
    self.assertEqual([1, 2, 4, 7, 5, 3], post_order_r(self.root))

def test_post_order_i(self):
    expected_values = [1, 2, 4, 7, 5, 3]
    result = post_order_i(self.root)
    self.assertEqual(len(expected_values), len(result))
    for expected, actual in zip(expected_values, result):
        self.assertEqual(expected, actual)

def test_level_order_r(self):
    self.assertEqual([3, 2, 5, 1, 4, 7], level_order_r(self.root))

def test_level_order_i(self):
    self.assertEqual([3, 2, 5, 1, 4, 7], level_order_i(self.root))

The 'r' or 'i' at the end of the function names hint which version of the algorithm I plan to implement in there, recursive or iterative.
I have written the post-order iterative test in a slightly different way to hide the fact that I let this function return a deque instead of a plain list. Just to stress that is a bit different from all the other also when it comes to write it. In production code you don't usually want to leave such a thing. Better to convert the return value inside the function.

In order

The in/pre/post order traversals are so intrinsically recursive that could be easily written as a one-liner in Python. I have force myself to be more readable, still the code is very compact and nice to see.

def in_order_r(node):
    if not node:
        return []
    return in_order_r(node.left) + [node.data] + in_order_r(node.right)

I go down to fetch the most to the left node, then up, completing the left branch before looking in the right one.

If we don't want to use recursion, we need to explicitly use a stack.

def in_order_i(node):
    if not node:
        return []

    cur = node
    result = []
    stack = []
    while True:
        if cur:
            stack.append(cur)
            cur = cur.left
        elif stack:
            cur = stack.pop()
            result.append(cur.data)
            cur = cur.right
        else:
            return result

We push the nodes in the stack, until we find a leftmost node, than we pop the last node we pushed to the stack and put its value in the result. Look for what is now the leftmost node, and so on, until we consume all the nodes.

Pre order

The recursive code is almost the same.

def pre_order_r(node):
    if not node:
        return []
    return [node.data] + pre_order_r(node.left) + pre_order_r(node.right)

In this case, first we check the left subtree, than the right one, and finally we get the one on top.

The iterative version asks again for an explicit stack where to store the nodes temporary set apart. The loop is even more natural than the in-order one.

def pre_order_i(node):
    if not node:
        return []

    result = []
    stack = [node]
    while stack:
        cur = stack.pop()
        result.append(cur.data)
        if cur.right:
            stack.append(cur.right)
        if cur.left:
            stack.append(cur.left)

    return result

Push the root on the stack before starting the loop, then consume the top of the stack and add its children.

Post order

Again, a very clean recursive implementation.

def post_order_r(node):
    if not node:
        return []
    return post_order_r(node.left) + post_order_r(node.right) + [node.data]

The iterative version is not so immediate. At least, I spent some time on it thinking how was better to proceed. The trick is looking to the pre-order implementation and change, seeing the structure similarity, and acknowledging the differences. At first sight, they look completely different. However, if you reverse the result of the post-order traversal, you could see that something is there. Beside reversing, we need also to consider the branch in different order. This means, I am pushing the result in a queue (actually a deque, the python Queue is not a good choice in this context) and not in a stack, and I reverse the order of pushing nodes in the stack.

def post_order_i(node):
    if not node:
        return []

    stack = [node]
    result = deque()
    while stack:
        cur = stack.pop()
        result.appendleft(cur.data)  # 1
        if cur.left:
            stack.append(cur.left)  # 2
        if cur.right:
            stack.append(cur.right)
    return result

1. Push to the left in the result, and not to the right as in pre-order.
2. First the left node and then the right one.

Level order

This is kind of different from the other traversals. The recursive implementation is less immediate, and it usually comes more natural to implement it in an iterative way.

Here too it comes handy using a queue (again, here a deque because of Python) but instead of the stack, not for the result. This is because we want to get the value list level by level, so the first node we push to the buffer should be the first one to be consumed.

def level_order_i(node):
    result = []
    buffer = deque([node])
    while buffer:
        cur = buffer.popleft()
        result.append(cur.data)
        if cur.left:
            buffer.append(cur.left)
        if cur.right:
            buffer.append(cur.right)
    return result

If we really want to implement it in a recursive way, it is probably a good idea to have an helper function, since we have to carry on all the nodes at the same level, to have a way to check the level below.

Let the user call it passing the root, as usual, then convert it to a list an call the helper function:

def level_order_r(node):
    return level_order_rs([node])

In the result list we store, as usual, the values of the nodes we have already found, then we populate a list with the children of the nodes on this level, and we pass them to a recursive call.

def level_order_rs(nodes):
    result = []
    below = []
    for node in nodes:
        result.append(node.data)
        if node.left:
            below.append(node.left)
        if node.right:
            below.append(node.right)
    if below:
        result += level_order_rs(below)
    return result

I pushed both the unit test and the python script to GitHub.

Go to the full post

Hackerrank Heaps: Find the Running Median

Given in input a stream of integers, calculate their running median. As the name of this HackerRank problem suggests, we should find a solution based on heaps data structures.

However, it is difficult to avoid the temptation of trying a different approach. Since I am using Python 3 as implementation language, the insort() function from the bisect library, makes the naive solution of sorting the list of data each time a new element arrives less expensive, to the point that is accepted by HackerRank, even though is far from optimal.

Insort

Here is a possible implementation:

for i in range(n):  # 1
    item = int(input().strip())
    insort(data, item)  # 2
    pos = len(data) // 2  # 3
    result = float(data[pos])
    if len(data) % 2 == 0:
        result += data[pos - 1]
        result /= 2
    print(result)

1. I have previously read in n the number of elements that I should expect in input.
2. I insert in the list data each value I get using insort() so, keeping the list sorted. The cost of finding the right position in the list is logarithmic, hence the name bisect of the library, then we have to pay a linear cost to move to the right all the elements bigger than the current one. Still a Big-Oh n*log(n) algorithm, however, on average cheaper than sorting data each time, if we don't use an algorithm that knows it is dealing with a quasi-sorted collection.
3. Than we calculate the median, peaking the central element and, if the current size of the data list is even, also its left neighbor - in this case dividing by two.

It works find, but it sort of cheating.

Heaps

Unfortunately, the Python standard library does not support heaps so strongly as other languages do. There is a good implementation for the min heap, but here we need to work also with a max heap. The common hack, that I am going to use here, is negating the values inserted in a heap, to let it work as a max heap. Not the cleanest thing to do, however it works fine here.

I have written a class, MedianHeap, that shield the intricacies of the min/max heaps, and offer a simple interface to my solution, that gets very easy:

def solution(values):
    heap = MedianHeap()

    result = []
    for value in values:  # 1
        heap.push(value)
        result.append(str(heap.median()))

    return '\n'.join(result)

The MedianHeap constructor is trivial:

def __init__(self):
    self.left = []   # max heap
    self.right = []  # min heap

I am going to use two heaps. One the left, keeping the smaller values, that is going to be a max heap, and one on the right, a min heap that keeps the bigger values. I will be always only interested in the max value on the left and the min value on the right.

To push an element, I check if it is smaller that the max to the left, and in this case I am going to push it there, otherwise it will go to the right.

def push(self, value):
    if not self.left or value < -self.left[0]:  # 1
        heappush(self.left, -value)  # 2
    else:
        heappush(self.right, value)

1. Couple of nuisances. Before checking I must ensure the left heap is not empty. And, I have to remember my hack on the max heap, so I change the sign of the max element on it before comparing it to the current value.
2. Beside that, I simply use the heappush() function from the heapq python core library. Still, remember that the left heap is a max one, so change the sign of the value I pushing in.

After the push, we should ensure the size difference between the two heaps is minimal, zero or one, otherwise I have to balance them:

bigger = self.left if len(self.left) > len(self.right) else self.right
smaller = self.left if len(self.left) < len(self.right) else self.right
if len(bigger) - len(smaller) > 1:  # 1
    moved = -1 * heappop(bigger)  # 2
    heappush(smaller, moved)

1. If the unbalance is bigger than one, I move from the smaller heap to the bigger one. I care only of their size, left to right or the other way round is just the same.
2. Still, I should remember to swapping the sign of the element, since I am moving from a min to a max heap, or viceversa. Notice the use of the other heapq function at work here, heappop().

Getting the median element from my MedianHeap is easy:

def median(self):
    bigger = self.left if len(self.left) > len(self.right) else self.right
    smaller = self.left if len(self.left) < len(self.right) else self.right

    if bigger is smaller:
        return (self.right[0] - self.left[0]) / 2  # 1
    else:
        maxi = -1.0 if bigger is self.left else 1.0
        return maxi * bigger[0]

1. If the heaps have the same size, I should return the mean of their max/min elements. Instead of adding, I subtract them, for the hack on the max heap.
2. Otherwise I return the min/max element of the biggest heap. If the winner is the left heap, it is the max one, so I have to change its sign.

I have pushed both the full python script, the bisect insort and the heap based one, and the unit test I used for help me to develop the more complex second solution, to GitHub.

Go to the full post

Spring Prototype Bean

By default, the Spring framework creates beans as singleton. However we can instruct it to create a different bean each time a request is made specifying the prototype mode. Another couple of ways are available, to create a bean for each request or session.

I have written a simple JUnit test to see the difference between prototype and singleton.

I have created two empty classes, both of them implementing the empty interface Scoping:

@Component
@Scope(ConfigurableBeanFactory.SCOPE_SINGLETON)
public class UniqueThing implements Scoping {
}

@Component
@Scope(ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public class Notepad implements Scoping {
}

The SCOPE_SINGLETON scope is the default, so usually you won't see it explicitly set in production code. Here is just to remark its existence.
The class Notepad, marked with SCOPE_PROTOTYPE is the interesting stuff.

This is what normally happens with beans:

@Test
public void testSingleton() {
    UniqueThing thing1 = context.getBean(UniqueThing.class);
    UniqueThing thing2 = context.getBean(UniqueThing.class);
    assertSame(thing1, thing2);
}

When I get a bean - here I get them through the Spring application context - the same bean is returned.

But if the bean is marked as SCOPE_PROTOTYPE, we have a different behavior:

@Test
public void testPrototype() {
    Notepad notepad1 = context.getBean(Notepad.class);
    Notepad notepad2 = context.getBean(Notepad.class);
    assertNotSame(notepad1, notepad2);
}

Each bean is a newly created one!

Reference: Advanced wiring, from Spring in Action, Fourth Edition by Craig Walls. Chapter three, section four, Scoping beans.

I pushed the code I created on GitHub, see the scoping package and the associated JUnit test case.

Go to the full post

CodeEval Reverse Words

Given a string of words, write a function that returns them in reversed order. This is CodeEval problem #8 that has a trivial Python solution.

This is what we want to get, as a python test:

def test_provided_1(self):
    self.assertEqual('World Hello', solution('Hello World'))

And the core of the solution is a mere one-liner:

' '.join(reversed(line.split()))

We split the line, getting a list of strings, reverse it, and join the result on a blank.

I pushed the complete python script, and even the associated unit test, to GitHub.

Go to the full post

HackerRank Trees: Is This a Binary Search Tree?

As the title suggest, the point of this hackerrank problem is checking if a tree is a BST.

Before being able to work on this problem, you should know what we are talking about. In a few words, a BST is a tree in which for each node is true that its left children are strictly smaller and the right one are strictly bigger.

In Python, if a node is represented in this way

class Node:
    def __init__(self, data):
        self.data = data
        self.left = None
        self.right = None

We can check a tree passing its root to a recursive function like this one:

def is_bst(node, left, right):  # 1
    if node is None:  # 2
        return True
    if not left < node.data < right:  # 3
        return False
    return is_bst(node.left, left, node.data) and is_bst(node.right, node.data, right)  # 4

1. The first parameter, node, is an instance of the class Node. The other two parameters are the limit in which the value is supposed to be. 2. An empty tree is a BST. This comes handy to manage the leaves of the tree. 3. The node value should lie in the specified interval, otherwise this is not a BST. 4. If the current node passes the check, we explore the rest of the tree. On the left we restrict the interval cutting it to the right, on the right we cut it on the left. Couple of questions. What should we pass as left and right for the root? And, What if we don't want to accept an empty tree as a valid BST? I think that in Python a good solution could be having a starting call like is_bst(root) that do not check for an interval and define on its own what to do in case of None. And then it call the above is_bst() for all the other nodes.

Go to the full post

Autowiring ambiguity

I have a hierarchy of Spring Components rooted in the interface Dessert. Let's see what could go wrong when I try to autowire on them.

Here is the base interface:

public interface Dessert {
    default double getPrice() {
        return 0;
    }
}

It has three empty implementing classes named Cake, Cookies, and IceCream. For instance, here is the third one:

@Component
public class IceCream implements Dessert {
}

In my test case, I autowire a Dessert that I am going to use in the tests, like this:

@Autowired
Dessert dessert;

@Test
public void test() {
    assertThat(dessert.getPrice(), is(0.0));
}

Spring has to know which Dessert to create and wire, and this could be done by class configuration. In this case, the tester should know it, by annotating as SpringApplicationConfiguration the class, that should be defined and annotated as a Spring Configuration file, like this:

@Configuration
@ComponentScan(basePackageClasses = Dessert.class)
public class DessertConfig {
}

Do not forget to specify the component scan annotation, otherwise Spring won't see the beans, and would complain something like:

org.springframework.beans.factory.BeanCreationException:
 Error creating bean with name 'dd.sia.restaurant.DessertTest':
 Injection of autowired dependencies failed; nested exception is
  org.springframework.beans.factory.BeanCreationException:
Could not autowire field:
 dd.sia.restaurant.Dessert dd.sia.restaurant.DessertTest.dessert;
nested exception is
 org.springframework.beans.factory.NoSuchBeanDefinitionException:
 No qualifying bean of type [dd.sia.restaurant.Dessert] found for dependency:
 expected at least 1 bean which qualifies as autowire candidate for this dependency.
 Dependency annotations:
 {@org.springframework.beans.factory.annotation.Autowired(required=true)}

However, activating the ComponentScan is only the first step. Since we have three Components based on Dessert, we should say to Spring which one to choose. If we don't we are going to have a different, albeit similar, exception:

org.springframework.beans.factory.BeanCreationException:
 Error creating bean with name 'dd.sia.restaurant.DessertTest':
 Injection of autowired dependencies failed; nested exception is
 org.springframework.beans.factory.BeanCreationException:
Could not autowire field:
 dd.sia.restaurant.Dessert dd.sia.restaurant.DessertTest.dessert;
nested exception is 
 org.springframework.beans.factory.NoUniqueBeanDefinitionException:
 No qualifying bean of type [dd.sia.restaurant.Dessert] is defined:
 expected single matching bean but found 3: cake,cookies,iceCream

One way to solve this issue is annotating a Component as primary:

@Component
@Primary
public class Cake implements Dessert {
}

In this way, Spring knows that Cake is the Dessert it should pick up when it is in doubt.

We can always bypass this default, specifying a default in the autowiring:

@Autowired
@Qualifier("cookies")
Dessert dessert;

Now Spring knows that we want cookies and won't pay attention to the primary annotation.
If we don't like the name generated by Spring for the component, that is, the class name but starting lowercase, we can set it as we please, using again the Qualifier annotation, this time on the Component itself. So, for instance, I can qualify Cookies as "yummy" and use this qualification when autowiring to ask Spring to select it.

Reference: Advanced wiring, of Spring in Action, Fourth Edition by Craig Walls. Chapter three, section three, Addressing ambiguity in autowiring.

I pushed the code I have written on this stuff to GitHub. See the package restaurant for the Java classes in the Spring application, and the DessertTest for the JUnit testcase.

Go to the full post

CodeEval Prefix Expressions

The CodeEval problem #7 asks us to write a toy calculator for simple arithmetic expressions written in Polish notation. We expect only addictions, multipications, and divisions. And no inner expressions are allowed, meaning that we can safely expect to have in input firstly all the operators then operands, as integer numbers. We should give an integer answer, even though the internal process should keep correctly track of real numbers generated by divisions.

As sample we are given a unique case, that I converted in a python test:

def test_provided_1(self):
    self.assertEqual(20, solution('* + 2 3 4'))

Since we have only to deal with binary operations, we can use the clever trick of starting from the middle of the input, were we can find the first operand, moving to the left to get the list of operators and to the right for the other operands. This lead naturally to think using zip() a python implementation:

i = len(data) // 2  # 1
result = float(data[i])  # 2
for op, value in zip(reversed(data[0:i]), data[i + 1:]):  # 3
    # ...

1. Getting the central element index in the input data list.
2. Convert the first value to real number, so that floating point arithmetic is applied to the expression.
3. I zip together the left and right part of the input data. First part reversed, so that I start to center and move to the edges fetching elements.

Now, what I should do in the loop. As C programmer my instinct went to a switch on the operators, to detect which operation I have to apply. But Python has no switch, so I had to think to something smarter. I put in a dictionary the mapping between each operator and the actual piece of code I need to run in that context, in form of a lambda function, and then I simply have to lookup in it to have the job done:

mapping = {'+': lambda r, v: r + v, '*': lambda r, v: r * v, '/': lambda r, v: r / v}

# ...

for # ...
    result = mapping[op](result, int(value))

Then, after the for loop, I just have to cast the result to integer and return its value.

On GitHub you can see the unit test and the full python script.

Go to the full post

Hackerrank Queues: A Tale of Two Stacks

This Hackerrank problem asks us to emulate a queue using two stacks. The rationale for this apparently weird data structure could be that we know for sure that we are going to have a data traffic in a sort of wave on the seaside fashion. We have a huge number of pushes with (about) no peek or pop, and then a huge number of peek or pop.
The architect should have calculate that it is not worthy to implement a proper queue, maybe memory comes at a premium on that environment. Or whatever.
Actually, the real sense of it is to see in an interview if a candidate knows about stacks and queue, and how to deal with them.

As usual, before starting writing my python code, I jotted down a few tests, so to clarify to myself what I am going to do. If I get on some soft spot during development, I can always get back here and fatten up the case.

def test_enqueue(self):
    queue = MyQueue()
    queue.put(42)
    self.assertEqual(42, queue.peek())

def test_enqueue_2(self):
    queue = MyQueue()
    queue.put(42)
    queue.put(2)
    self.assertEqual(42, queue.peek())

def test_dequeue(self):
    queue = MyQueue()
    queue.put(42)
    queue.put(2)
    queue.pop()
    self.assertEqual(2, queue.peek())

As often happens in this kind of coding, we can blissfully ignore the possible exception. Here letting our MyQueue crash in case the user try to pop when it is empty is a non-issue.

I have to use to stacks. One would emulate the head of the queue, form which pop and peek would operate, the other the tail, where push will add its values. So, I found it natural to name them head and tail:

class MyQueue(object):
    def __init__(self):
        self.head = []
        self.tail = []

Being in python world, the most obvious choice is using lists as their data type.

The push method (here called "put", as suggested by the template provided) is a no-brainer. I just append the passed value to tail.

def put(self, value):
    self.tail.append(value)

Peek and pop will substantially operate in the same way, getting the data from the other stack. There is only a question. How I should ensure that the head is populate with the correct data. Let's delegate it to another method, and just write their core functionality, for the moment.

def peek(self):
    self.refill_head()
    return self.head[-1]

def pop(self):
    self.refill_head()
    return self.head.pop()

And this is the point of the exercise, refill_head(). When we get the first call to pop or peek, we need to retrieve the first value stored in the tail queue. To do that, we can simply move all data stored in that stack to the head stack. As a bonus we have also all the other data ready to be used for pop and peek. Remember that the tail stack is accessed only for pushing data, so it won't mind at all of what there is inside it. We can safely empty it any time we need more data from head. But beware, we can move it only when head is empty, otherwise we would mess up with the elements' order.

def refill_head(self):
    if not self.head:  # 1
        while self.tail:  # 2
            self.head.append(self.tail.pop())

1. If the head is not empty we can (actually, we must) delay the refilling.
2. Otherwise we move all the data from tail to head. Being both stacks, the order is respected.

No need for extra testing, the solution got accepted, and I pushed unit test and python script to GitHub.

Go to the full post

Functional interfaces

In Java everything that is not a primitive is an object. OK, but what about functions? Introducing a support for functional programming in Java 8 led to the need of give an answer to this question, and the designers decided to use the ad-hoc solution of functional interfaces.

A functional interface is a Java interface that has only one abstract method. It is a good idea to annotate it as FunctionalInterface so that it's clear what it is mean for, and let the compiler to check if it is alright. However it is not mandatory.

We can define and use any interface we like for use it in this way, but we should save to ourselves, and to our code clients, some pain using when possible the standard one, define in the java.util.function package.

Let's see a few examples.

Predicate

The predicate functional interface is defined as:

@FunctionalInterface
public interface Predicate<T> {
    boolean test(T t);

    // ...
}

I can use it if I have to write a method filter() that gets in input a list of data and filter them accordingly some requirements decided by the user.

public static <T> List<T> filter(List<T> data, Predicate<T> p) { // 1
    List<T> results = new ArrayList<>();
    for (T t: data) { // 2
        if (p.test(t)) {
            results.add(t);
        }
    }
    return results;
}

1. It takes as input a list of T and a predicate on T.
2. For each element in the list, if it pass the test defined by the caller, I add it to the list I return.

Here is a piece of code that uses the filter() method:

List<String> result = FunctionalInterfaces.filter(Arrays.asList("", "x", ""), (s) -> !s.isEmpty());

I create a list of strings and I pass it to filter() alongside a lambda function, that has to match with the expected Predicate, so the lambada input has to be a String - I could state it explicitly, but I can also leave it implicit - and it should return a boolean.
The result is a list containing just one string, namely "x".

Consumer

It is defined as:

@FunctionalInterface
public interface Consumer<T> {
    void accept(T t);

    // ...
}

Here I use it in a forEach() method that let the user specify what it should do with each element of a List:

public static <T> void forEach(List<T> data, Consumer<T> c) {
    for(T t: data) {
        c.accept(t);
    }
}

I use this forEach() to keep calculate the length of all the strings in a list:

List<String> data = Arrays.asList("Functional", "Interfaces", "in", "Java 8");
consumed = 0;
FunctionalInterfaces.forEach(data, (s) -> consumed += s.length());
assertThat(consumed, is(28));

Again, note that here the type of the lambda parameter s is inferred from T, type of the list passed to forEach() and used to determine the actual Consumer functional interface to which the lambda should refer to.
Interesting to note how the consumed variable is captured by the lambda without the need of any explicit action by programmer. This mechanism works fine only for non-local variables. Local variables can be captured only if final (at least effectively so).

Function

@FunctionalInterface
public interface Function<T, R> {
    R apply(T t);

    // ...
}

This map() method maps objects of a given type to another one, delegating to the caller the job of specifying how to do the conversion.

public static <T, R> List<R> map(List<T> data, Function<T, R> f) {
    List<R> result = new ArrayList<>();
    for(T t: data) {
        result.add(f.apply(t));
    }
    return result;
}

I use it here to convert a list of strings in a list of integers:

List<String> data = Arrays.asList("Functional", "Interfaces", "in", "Java 8");
List<Integer> result = FunctionalInterfaces.map(data, (s) -> s.length());

The compiler does a lot of job for us here. Since map() is templatized by T, the template type of the input list, and R, from the returned list, it figures out that s, passed to the lambda, is a String, so it should return an object of the class returned by the String method lenght(), that is an int, boxed to Integer.

IntBinaryOperator

@FunctionalInterface
public interface IntBinaryOperator {
    int applyAsInt(int left, int right);
}

In the example for Function, boxing the integers was an expected behavior. We need them as objects since we want them in a collection.
Other times, boxing and unboxing could be just a useless cost that we could avoid using a special functional interface like this one, that works with lambdas working on primitives.

This combine() method works on primitive integers and don't want to pay any extra costs for objects

public static int combine(int[] data, IntBinaryOperator op) {
    if(data == null || data.length != 2) {
        return 0;            
    }
    return op.applyAsInt(data[0], data[1]);
}

I could use it to decide dinamically what kind of operation applying to a couple of integers, accordingly to the lambda passed.

int result = FunctionalInterfaces.combine(new int[] {6, 7}, (a, b) -> a * b);

Supplier

@FunctionalInterface
public interface Supplier<T> {
    T get();
}

We don't have anything to pass to the lambda, and we blindly let it to create an object of the specified type.

public static <T> T aFactory(Supplier<T> s) {
    return s.get();
}

Here is a trivial example:

String result = FunctionalInterfaces.aFactory(() -> "Hello");

I pushed to GitHub a class FunctionalInterfaces with the static method having a reference to functional interfaces as parameter and a JUnit tester that shows how I called them.

Reference: Java 8 in action, section 3.4 Using functional interfaces.

Go to the full post

CodeEval Longest Common Subsequence

The CodeEval problem #6 is commonly used as an example to introduce to Dynamic Programming.
We have two strings, we want to know how many and which characters they have in common.

On CodeEval we are given a sample, I added a second test, shorter, that helped me better to develop my python script:

def test_provided_1(self):
    self.assertEqual('MJAU', solution('XMJYAUZ;MZJAWXU'))

def test_simple(self):
    self.assertEqual('HI', solution('AHOI;HI'))

Solving this problem with DP comes quite naturally. I have two strings, I put it one vertically, the other horizontally, I leave top row and left column set to zero, just to simplify the calculations, and then I scan the matrix from top-left to right, row by row, since I get to the right-bottom corner.

Cell by cell, I check if the characters in the two strings match or not. In case of matching, I get the value of the cell one step to the left, one step over, and increase it by one. Otherwise I get the values from the cell over, the cell to the left, and I keep the highest value between the two.

Here is the matrix for my simple test above:

-  -  H  I
- [0, 0, 0]
A [0, 0, 0]
H [0, 1, 1]
O [0, 1, 1]
I [0, 1, 2]

The bottom right cell tells us that there are two matches. The algorithm I am using makes a bit more laborious to get which are them. Well, not in this case, since the shorter string has size two, we already know that "HI" is the longest common subsequence. But generally speaking, we have to backtrack the job we have done, starting from the bottom-right, observing where there is a change of value in a cell compared to its neighbors, moving up to find the next border inside the matrix.

The resulting python is quite terse, so I think could help to better understand the thing.

First part, generating the matrix. In ver and hor I put the two strings. Think that ver is "AHOI" and hor is "HI", from the example above.

t = [[0 for j in range(len(hor)+1)] for i in range(len(ver)+1)]  # 1
for i in range(len(ver)):  # 2
    for j in range(len(hor)):
        t[i + 1][j + 1] = t[i][j] + 1 if ver[i] == hor[j] else max(t[i+1][j], t[i][j+1])

1. Create an all-zero bidimensional list sized +1 of the respective referenced string.
2. Loop on all the cells, and apply the algorithm that I described above. Notice that the indexes i,j are 0-based, so they are OK when referring to the strings in ver and hor. When working on the matrix, however, I adjust them adding one to refer to the current cell, row and column. I guess this is the most confusing part. Maybe following the debugger stepping in the code could help a better understanding.

Now I prepare for backtracking. Let's set a few variables:

i, j = len(ver), len(hor)  # 1
result = [None] * t[i][j]  # 2
cur = -1  # 3

1. I let i and j refer to the bottom-right cell.
2. I am going to put the result in this list. I know the size, it is stored in the bottom-left cell, I initialize each element to None, missing any better default.
3. Since I am backtracking. I am going to put the first character in the last position, and so on.

And this is the backtracking loop. Notice that I changed the convention for i, j. Here it looked to me more natural using i and j without adjustment to refer to the cells in the matrix.

while i > 0 and j > 0:
    if t[i][j] == t[i-1][j]:  # 1
        i -= 1
    elif t[i][j] == t[i][j-1]:  # 2
        j -= 1
    else:
        result[cur] = ver[i-1]  # 3
        cur -= 1
        i -= 1
        j -= 1

1. Compare the current cell with the one above. If they are the same, I can move upward.
2. I couldn't move upward, I try to move leftward.
3. I am on the verge of changing values. This means that the letter in the string (pay attention to the index, adjusted by subtracting one!) is part of the subsequence. Store it in the result, and move on.

Finally we have just to convert the result from list to a string, the usual pythonic way is a join on an empty string.

The script has been accepted with full marks. It was just a bit slow, and it consumed lot of memory. So I decided to port the code to C++, task that was easy and rewarding. Just a few minor changes, besides the obvious differences due to the languages grammars, for instance, I found easier in the backtracking to write the result backward and the reverse it by the STL function.

For full reference, I put the unit test and the python script on GitHub.

Go to the full post

Hackerrank Stacks: Balanced Brackets

This HackerRank problem is an interview classic. I have got in input a string containing only brackets, I have to ensure they are correctly balanced. It is very easy if you get you should use a stack to process them. Here I implement a solution in Python, that is interesting because we should make use here a couple of pythonic constructions.

Besides the provided samples, I added a missing important element to my unit test.

def test_provided_1(self):
    self.assertEqual(True, solution('{[()]}'))

def test_provided_2(self):
    self.assertEqual(False, solution('{[(])}'))

def test_provided_3(self):
    self.assertEqual(True, solution('{{[[(())]]}}'))

def test_too_close(self):
    self.assertEqual(False, solution('}}'))

In the last one I have only closing brackets, that's obviously a False. I feel that explicitly setting a test on this case helps to get sooner to the solution.

The idea is pushing the opening brackets on the stack and popping an element when we are examining a closing bracket. If there is no matching bracket in the stack or, worse, the stack is empty, the sequence has to be rejected.

Here is my implementation:

def solution(data):
    matches = { '(': ')', '[': ']', '{': '}' }  # 1

    stack = []  # 2
    for c in data:  # 3
        if c in matches.keys():  # 4
            stack.append(c)
        elif not stack or c != matches[stack.pop()]:  # 5
            return False
    return True if not stack else False  # 6

1. I use a dictionary to store the opening brackets as keys and the closing ones as values. That helps keeping the code compact.
2. A plain list works a as a stack.
3. Looping on the characters in the input data. Notice that there is no error handling whatsoever. This is a typical interview code, robustness would be added on demand.
4. First use of the dict in (1), I compare the character against its keys to check if it is an opening bracket. If so, I push it in the stack.
5. Otherwise, I expect it to be a closing bracket. Before popping from the stack, I should ensure it is not empty, then I use the element in the stack as key for the dict in (1) to retrieve the matching bracket, and I compare it against the actual character.
6. I get to the end of the input data without detecting any mismatch. So I return true. But only if the stack is empty, i.e., all the opening brackets have been consumed.

Solution accepted by HakerRank, test case and python script pushed to GitHub.

Go to the full post

CodeEval Detecting Cycles

Getting in input a list of integers, we should identify where a cycle starts, and return it.
It's the CodeEval problem #5, and it is easier than it looks, because, even if it is not explicitly said, the passed samples suggest that the numbers are present only once, if their are not in a cycle. Maybe we should think of them as labels in a graph, and the sequence could represent the result of a traversal algorithm that enters in an infinite loop.

I have converted the samples in python tests, I added one more just because I felt bad of risking an exception if the detected cycle was interrupted in the input sequence.

def test_provided_1(self):
    self.assertEqual('6 3 1', solution('2 0 6 3 1 6 3 1 6 3 1'))

def test_provided_2(self):
    self.assertEqual('49', solution('3 4 8 0 11 9 7 2 5 6 10 1 49 49 49 49'))

def test_provided_3(self):
    self.assertEqual('1 2 3', solution('1 2 3 1 2 3 1 2 3'))

def test_tail(self):
    self.assertEqual('1 2', solution('1 2 3 1 2'))

For how the test case look, we simply have to search for the first repetition, and then count how many elements fit in the cycle. I assumed, see the last test, that we should return only the elements in the cycle we are sure are repeated.

The core of my solution is a triple (!) for loop:

for i in range(len(data)-1):
    for j in range(i+1, len(data)):
        if data[i] != data[j]:
            continue

        size = min(j - i, len(data) - j)  # 1
        for k in range(1, size):
            if data[i+k] != data[j+k]:  # 2
                size = k
                break
        return ' '.join(data[i:i+size])
return ''

1. I loop until I find two matching numbers. At this point I ensure that this is really a loop, checking that there is a match between the following numbers on the left and on the right. I added the extra check on the list size to avoid falling off the end.
2. When I complete the check, maybe finding a spurious element, I stop looping and return the found cycle.

Solution accepted, unit test and python script pushed to GitHub.

Go to the full post

Execute around pattern and try-with-resources

Say that you have to implement in Java 8 a functionality where there is a lot of boilerplate. Instead of duplicating code in sibling methods, we can try to create a single method where the changing behavior is passed by parameter as a lambda function.

Reduced to the minimal term, here is the example of a method that we could refactor using the execute around pattern. The code is made more interesting by using the try-with-resources statement available in Java since version 7.

public static String processFileLimited(String filename) throws IOException {
    try (BufferedReader br = new BufferedReader(new FileReader(filename))) { // 1
        return br.readLine(); // 2
    }
}

1. We try to create a BufferedReader. At the end of the block, even in case of an exception, Java takes care of closing the resource. Neat.
2. This is the behavior we want to be parametrized.

Lambda is here our friend. Only a minor nuisance, we can't use a standard functional interface because here we should let our lambdas throw an IO exception, and in Java this should specified in the method signature.

@FunctionalInterface
public interface BufferedReaderProcessor {
    public String process(BufferedReader br) throws IOException;
}

Given this interface, we can refactor the above method in this way:

public static String processFile(String filename, BufferedReaderProcessor p) throws IOException {
    try (BufferedReader br = new BufferedReader(new FileReader(filename))) {
        return p.process(br);
    }
}

The boilerplate is there unchanged, the actual behavior has been externalized. It should be user resposibility to provide something sensed to be done in. Here is a couple of different ways processFile() could be called:

ExecuteAround.processFile(FNAME, (BufferedReader b) -> b.readLine());
ExecuteAround.processFile(FNAME, (BufferedReader b) -> b.readLine() + '-' + b.readLine());

As second parameter, I pass a lambda that in its body uses the BufferedReader to perform its job. A string is (implicitly) returned.

I have divided the code between the actual ExecuteAround class and its unit test. You can find both on GitHub.

Code and ideas are based on Java 8 in action, section 3.3 Putting lambdas into practice: the execute around pattern.

Go to the full post

Hackerrank Hash Tables: Ransom Note

Given two lists of words, we have to say if the second is a subset of the first. This HackerRank problem is very similar to the previous one.

The given sample, that I converted to a python test, should clarify the point of the problem:

def test_provided_1(self):
    self.assertEqual(True, solution(['give', 'me', 'one', 'grand', 'today', 'night'], ['give', 'one', 'grand', 'today']))

I don't bother to add more tests, being the problem so simple. And I don't even spend much time in thinking on the function design, since the problem name itself gives a huge spoiler on its solution. I would put the words in hash tables (dictionaries, since I am using python as implementation language) and I ensure all the words in the second collection are also present in the first one.

from collections import Counter  # 1


def solution(magazine, ransom):
    available = Counter(magazine)
    needed = Counter(ransom)  # 2

    for word in needed:
        if needed.get(word) > available.get(word, 0):  # 3
            return False
    return True

1. Since it is all about counting items, instead of using naked dictionaries I use Counters, making my script a bit shorter. See my solution to the previous problem to see how the code looks like using dictionaries.
2. Actually, the second dictionary is not strictly required, and a check as implemented in the previous problem would reduce slightly the cost of the algorithm. However, I am not here to write optimized code (otherwise I would have chosen C++ as implementation language), and in this way the code is probably more readable.
3. If I don't have enough of the current word, or even not at all, I can stop looping and return a False. Only if all the words in ransom are also in magazine I'll return True.

Solution accepted, unit test and python script pushed to GitHub.

Go to the full post

Hackerrank Strings: Making Anagrams

Given two strings in input, tell how many characters we should remove from both ones to leave just the same characters even if a different order.

This HackerRank problem is meant to be about strings. however, I solved it using Python, and in this case I ended up seeing the two strings not differently as they was lists of whatever elements.

One aspect I have overseen initially was that in a string there could be more than an occurrence of a character. To ensure that my code would work correctly I added a test to the one suggested by Hackerrank:

def test_provided_1(self):
    self.assertEqual(4, solution('cde', 'abc'))

def test_many_a(self):
    self.assertEqual(3, solution('aaaa', 'a'))

In the one provided, I have to remove 'd' and 'e' from the first string, plus 'a' and 'b' from the second one to get to the same 'c' in both of them, total, 4 eliminations.
In my test, I just have to remove 3 'a' from the first string.

I decided to implement my solution using a dictionary to store the letters in the first string and their numbers. I could have written a piece of code like this:

counter = {}
for ch in a:
    counter[ch] = counter.setdefault(ch, 0) + 1

The handy setdefault() function return the value for the specified key, or second parameter, instead of the default None, if there is not such key in the dictionary.

But the job of counting the number of elements in a collection is so common, that the standard python collections library give us a class, Counter, that works exactly in this way. So I saved same typing and I used it instead.

from collections import Counter


def solution(a, b):
    counter = Counter(a)
    # ...

Now I have to sort of compare this counting with the values store in the other string. I decided to loop on all the characters in it, and decrease the associated value if the key exists and the value is not zero, otherwise to increase a buffer variable to keep track of all the characters that are in the second string and not in the first one.

extra = 0  # 1
for ch in b:
    if counter.get(ch):  # 2
        counter[ch] -= 1
    else:
        extra += 1

1. Number of characters that are in the second string without a match in the first one.
2. Remember that get() on dictionary returns the associated value to the passed key or None. Here I decrease the value only if I get a value (not None) and it is different from zero.

Finally, I add up all the values in the counter dictionary, plus the extra count coming from the second string, and I return it.

return sum(counter.values()) + extra

I pushed both the unit test and the python script to GitHub for full reference.

Go to the full post

HackerRank Arrays: Left Rotation

Given a collection of integers and the left shift we want to apply to it, return a collection in the new configuration.

This is the first HackerRank problem in their Cracking the Coding Interview series.

Python solution is trivial, nevertheless I converted their sample to a test case:

def test_provided_1(self):
    self.assertEqual([5, 1, 2, 3, 4], solution([1, 2, 3, 4, 5], 5, 4))

Given a list, its length, and the required left shift. The result should be the left-shifted input list.

This is a one-liner solution:

def solution(values, size, shift):
    return values[shift:] + values[:shift]

Obviously the second parameter is not needed, I left it there as for problem requirement.

I ask Python to slice a first time the passed list starting from shift to the end, then a second time, from beginning to shift (remember that the interval is close to the left and open to the right), and finally join them.

I have spent more time setting up the repository on GitHub and the project than solving the problem itself. So, even if I don't think it adds much value, here is the link to the test case and the python script in there.

Go to the full post

JFreeChart and PDFBox

In a previous post, I have created a pie chart with JFreeChart and I saved it as a file in PNG format. Let's now put it in a PDF file, using the Apache PDFBox library, version 2. And I stress version 2 because it is still young and has a few changes that also impact right in this area.

First thing, I have added a dependency to PDFBox in my POM file - it's a Maven project, as you have guessed:

<dependency>
 <groupId>org.apache.pdfbox</groupId>
 <artifactId>pdfbox</artifactId>
 <version>2.0.2</version>
</dependency>

For Gradle fueled project, your build.gradle should have this line among its dependencies:

compile group: 'org.apache.pdfbox', name: 'pdfbox', version: '2.0.2'

I suggest you to check the current version on the Apache PDFBox web site. Actually, this post is already slightly outdated since version 2.0.4 has already been released.

Since large part of the code I am about to use here has been already written, I have extracted the JFreeChart creation to a method, createPiePlotSimple(), that would return the object as seen before, without saving it to file. Then I created a createSimpleOnPdf() method that does the more interesting stuff.

Firstly, it creates a PDF document, and add a page to it.

PDDocument doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage(page);

Then I call the createPiePlotSimple() to get the JFreeChart object, and create from it a buffered image.

JFreeChart chart = createPiePlotSimple();
BufferedImage bi = chart.createBufferedImage(500, 500);

Finally, in a try-catch block, since there are lot of IOException that could be thrown around in this piece of code, I convert the buffered image to a PDImageXObject, specific for the PDF format, using a specialized factory class. The image is drawn passing through a PDPageContentStream object, and then I can finally save my PDF file.

PDPageContentStream cs = new PDPageContentStream(doc, page);
PDImageXObject ximage = LosslessFactory.createFromImage(doc, bi);
cs.drawImage(ximage, 100, 100);
cs.close();
doc.save(new File("dump/pie.pdf"));

Pay attention that PDImageXObject and LosslessFactory are new in version 2. Before that we had PDXObjectImage and PDJpeg, PDPixelMap, and PDCCitt. I suggest to have a look at the migration page if you need more information.

I pushed the changes to GitHub.

Go to the full post

CodeEval Number Operations

Given five integers in the close interval [1 .. 52], determine if it is possible, just applying addition, subtraction, and multiplication, to get as result 42.
I spent some time thinking to a smarter way to solve codeeval problem #190, in the end I fell back to a brute force approach. After all, I thought, 42 must be an hint that the answer is easy, but the question is difficult to get.

Here is how I implemented a Python 3 solution for it.

These are the tests provided by CodeEval, that I put in a python unit test:

def test_provided_1(self):
    self.assertEqual('NO', solution('44 6 1 49 47'))

def test_provided_2(self):
    self.assertEqual('YES', solution('34 1 49 2 21'))

def test_provided_3(self):
    self.assertEqual('NO', solution('31 38 27 51 18'))

Assume that "data" is a list containing the five integers we have to deal with. I need to check all the possible permutations of them, after all they are only five, and 5! is just 120.
However, we have to check all the different combinations of +, -, * over them. That means that, for each permutation, examining all the items in the Cartesian product of three elements with four repetitions, that is 3**4, 81.
That's a total of 120 * 81 = 9720 tests.

Not terrible, however I can't expect this code to be blazingly fast. Good for us, the standard python library itertools gives us permutations() that generates all the possible permutations for a collection, and product() that generates all the elements in the Cartesian product for the passed collection and a given size of the result. Just what we need here.

for numbers in permutations(data):  # 1
    for operations in product('*+-', repeat=4):  # 2
        result, *values = numbers  # 3
        for value, op in zip(values, operations):  # 4
            if op == '+':
                result += value
            elif op == '-':
                result -= value
            else:
                result *= value
        if result == 42:  # 5
            return 'YES'
return 'NO'

1. Loop on all the data permutations.
2. Loop on all the combinations of the available operations, knowing that we need four of them.
3. Split the numbers, that is tuple, in two parts, result, the first number, on its own, and all the other values in a list. Notice the use of the star operator that here works as a gatherer, bringing together all the other elements besides the first one, taken by the variable before the comma.
4. Here you see why I did that weird thing in the line above (3). Now I can zip() values and operations. Each component with the same index are extracted together and associated to the loop variables value and op, so that I can happily apply the operation to the current result and the next value in list.
5. If the current permutation with the current operations result as required, I return success. Only if I won't find any match, after looping on all the elements, I will say that, no, I could not find a way to get 42 from those integers.

As I expected, CodeEval accepted the solution, but didn't was too large in giving points for it. Anyway, I pushed test case and python script to GitHub.

Go to the full post

CodeEval Sum of Primes

The CodeEval problem #4 is similar to the previous one. Again prime numbers, this time we have to get the first one thousand of them, and return their sum.

No need for a test case, we have just a single possible outcome, and it should be 3682913.

We need a generic function to check if a number is prime, I implemented it in this way:

primes = [2, 3]  # 1


def is_prime(value):
    for prime in primes:
        if prime ** 2 > value:  # 2
            break;
        elif i % prime == 0:  # 3
            return False
    return True

1. In this list I am going to store the prime numbers. It make sense to store them somewhere instead of simply checking if a number is prime and then increase a counter, because we also rely on this list to see if a candidate is prime or not.
2. Extra condition to interrupt the for loop. We stop checking at the square root of the candidate. Or, other way round, cheaper to calculate, when the square of the current prime is bigger than the candidate.
3. If we find a divisor for the candidate, it is not a prime number.

And here is the body of my python script:

while len(primes) < 1000:  # 1
    for i in count(primes[-1] + 2, 2):  # 2
        if is_prime(i):  # 3
            primes.append(i)
            break

1. I won't stop calculating until I have one thousand primes.
2. I start looking from the next prime from the last one stored in the list, increased by two. And then I go on, stepping by two. This is because I am not interested in even numbers, evidently not prime. I use the itertool count() function because I don't have a (theoric) end in this loop, and the built-in range() would require a dummy end to work, making the code a tad less clear.
3. When I find the next prime, I push it to the list, break the for loop, and go on to the next iteration in the when loop.

Then I just have to print the sum of all the primes.

print(sum(primes))

For a complete reference, I have pushed my python script to GitHub.

Go to the full post

CodeEval Prime Palindrome

Find the largest prime palindrome less than 1000. Such a simple request is CodeEval problem #3.

It doesn't make sense writing a test case in this case, we can just ensure that our code, in this case my python script, generates the expected solution, 929.

I have split the job in two parts. We are looking for a palindrome number of three ciphers. So, we want its first and last cipher to be the same:

for candidate in range(989, 100, -2):  # 1
    if (candidate // 100 == candidate % 10) and is_prime(candidate):  # 2
        print(candidate)
        break

1. I should have started looping from 999. However, this number is evidently not prime. So I started from the second palindrome available. I step by 2, downward, since I don't care about even numbers, being them not prime.
2. If I divide the candidate number by one hundred, I get the third, most significative, cipher. This is Python 3, so I use the // to apply the integer division. The remainder of the division by ten gives us the less significative cipher. They should be the same.

But this is not enough, the candidate should also be a prime number. We all know how to check if a number is prime. We ensure that it has no divisor besides one and itself. A safe trick is stopping checking when we reach the square root of the tested number. This lead to this function:

PRIMES = [3, 5, 7, 11, 13, 17, 19, 23, 29, 31]  # 1


def is_prime(value):
    for prime in PRIMES:  # 2
        if prime**2 > value:  # 3
            break;
        if value % prime == 0:  # 4
            return False

    return True

1. Sort of cheating. Since I know that the biggest number I have to check is less than 1024, I need only prime numbers less than 32. It is easy to find out the list of them.
2. I am going to check the value against the prime numbers in the list.
3. It is cheaper to get the square of a number that a square root, so I inverted the check to determine if I can safely stop to loop on primes. Notice that I can't merge this check in the one in (2) even if it has to be performed on each cycle. That's Python, love it or leave it.
4. Only if no divisor is found, return true.

I guess you won't need to see the full python script. However I pushed it to GitHub.

Go to the full post

First Pie with JFreeChart

I have to generate charts from Java. Among the available possibility, JFreeChart has been chosen.
In this post I start from nothing to save a PNG with a pie chart in it.

Well, actually, I have something more than nothing, namely a windows box with Java 8 on it and Eclipse Neon as IDE. I create a Maven project, and I added a few dependencies. The most relevant here is the JFreeChart one, however, don't forget a test library, I'm using JUnit 4:

<dependency>
 <groupId>org.jfree</groupId>
 <artifactId>jfreechart</artifactId>
 <version>1.0.19</version>
</dependency>
<dependency>
 <groupId>junit</groupId>
 <artifactId>junit</artifactId>
 <version>4.12</version>
 <scope>test</scope>
</dependency>

I want to explore how to create pie charts, so I create a class Pie in a chart package with a static method in it, createSimple() that would return true if the process succeed.

package chart;

public class Pie {
    static boolean createSimple() {
        return false;
    }
}

Then I create a test case, that is going to be quite trivial, but at least clearly split the concern about writing the code and testing it in two different places. The alternative, giving a main function to the Pie class would have been more regrettable.

@Test
public void testCreate() {
    assertTrue(Pie.createSimple());
}

So, let's create a pie reporting a few elements in it, defined by a name and an associated value. We are going to use all the possible default provided, just to have a picture to display.

DefaultPieDataset dataset = new DefaultPieDataset(); // 1
dataset.setValue("Merlin", 2.50);
dataset.setValue("Tiger", 18.27);
dataset.setValue("Mustang", 41.33);
dataset.setValue("Dolphin", 72.02);

JFreeChart chart = ChartFactory.createPieChart("Versions", dataset); // 2

1. I create dataset, and I push a few description/value couples to it.
2. Then I create a pie chart going through the ChartFactory class. As parameter, this factory method requires just the name I want to give the chart and the data it should use to create it.

I want to put the image in a file, in the PNG format. This is easily done.

try {
    ChartUtilities.saveChartAsPNG(new File("dump/pie.png"), chart, 500, 500);
    return true;
} catch(IOException ioe) {
    log.error(ioe.getMessage());
    return false;
}

Be careful of passing to saveChartAsPNG() a file that could be created. If the path does not exist, you are going to get an exception. The other parameters are the chart that has to be created, and its size, width x height. Here is the result I've got:

Nice. However, I'd like not to see the tooltips. From the documentation, I read that I can have a better control on the generated picture calling the overload of createPieChart() that accepts three boolean more, ruling the display of legend, tooltips, and URLs. The shorter version I used above is equivalent to:

// legend and tooltips
JFreeChart chart = ChartFactory.createPieChart("Languages", dataset, true, true, false);

So we just set the second boolean to false to get rid of the tooltips, right? Well, not. For same strange reason, that flag does not work. Fortunately there is a workaround, we can disable the label generator.

// disable tooltips not working!
JFreeChart chart = ChartFactory.createPieChart("Versions", dataset, true, false, false);

PiePlot pp = (PiePlot) chart.getPlot();
// force tooltips off
pp.setLabelGenerator(null);

Try commenting the call to setLabelGenerator(), you will still see the tooltips on the pie, otherwise you'll have only the legend.

Say that I don't like the default color choice, and I want the Mustang slice in magenta. I can call setSectionPaint() on the PiePlot for that slice to change it:

pp.setSectionPaint("Mustang", Color.MAGENTA);

Notice that magenta replaced green for Mustang, but green is not out of the game, it has merely shifted to the next slice.

Last change I want to do here, let also the legend off:

JFreeChart chart = ChartFactory.createPieChart("Versions", dataset, false, false, false);

I have pushed the project on GitHub.

Go to the full post

CodeEval Consecutive Primes

Consider the sequence of natural number, up to a given even element. We have to say in how many ways we can rearrange these numbers so that any couple of neighbors added up gives a prime number. The first element should always be 1, and should go with the last one.
This is the CodeEval problem #187, and here I am giving a Python 3 solution that gets accepted with full marks but does not give any point for the rank, being too slow. Any suggestion for a better algorithm is welcome.

As usual, my first step is converting the given sample in a python test case.

def test_provided_1(self):
    self.assertEqual(1, solution('2'))

def test_provided_2(self):
    self.assertEqual(2, solution('4'))

def test_provided_3(self):
    self.assertEqual(0, solution('5'))

def test_sixteen(self):
    self.assertEqual(81024, solution('16'))

def test_eighteen(self):
    self.assertEqual(770144, solution('18'))

Do not even try the last one until you have a good implementation at hand, because it is going to take a long time to give you back an answer.
Notice that the third test violates the requirement stating that we should expect an even number in input. I interpretate it as a suggestion to explicitly take care of these cases, avoiding the main computation, as better described by the code below:

def solution(line):
    size = int(line)
    if size % 2:
        return 0
    return necklaces([x for x in range(1, size + 1)], 1)

If the passed size is odd, I just return zero. In any case, if you think about it, an odd number of numbers in the natural sequence can't be reorganized as requested by the problem.
I build on the fly, by list comprehension, the sequence I need, starting from 1 ending to size.
I called the function necklaces because, in the metaphor used in the problem, we want to count how many necklaces we can generate from the passed pebbles.

A brute force solution is absolutely out of scope, since generating all the possible pebbles permutations is a factorial algorithm. You could try it, using the itertools permutations function, but keep your input very low.

To save some CPU time, I decided to cache the prime numbers that my script should be aware of in a set:

PRIMES = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31}

Actually, this won't be a substantial help. Even for a simple prime checker function, the number involved are not big enough to cause performance problems.

After some thinking I came up with this recursive function:

def necklaces(beads, pos):  # 1
    if len(beads) == pos:  # 2
        return 1 if 1 + beads[- 1] in PRIMES else 0

    result = 0

    if beads[pos] + beads[pos-1] in PRIMES:  # 3
        result += necklaces(beads, pos + 1)

    for i in range(pos+2, len(beads), 2):  # 4
        if beads[pos-1] + beads[i] in PRIMES:  # 5
            beads[i], beads[pos] = beads[pos], beads[i]
            result += necklaces(beads, pos + 1)
            beads[i], beads[pos] = beads[pos], beads[i]

    return result

1. I pass to the function the list of beads, and the position we have to check. The first time I call it, from the solution() function, beads is the natural sequence from 1 to the number passed in input, and pos is 1, since we have as a requisite that the first one in the list, position 0, won't change.
2. We have reached the last pebble. Check the sum of its value, added to 1 for the first in the list. If this is a prime number, return 1.
3. Check the sum of the current pebble with the previous one. If it is a prime, accept it and call recursively the function on the next pebble.
4. We need to check all the other pebbles in this position. Actually, not really all of them, since we know that we are looking for a prime number. So, if the previous pebble was even, now we want an odd one, and viceversa. This means, we can safely skip by two in the loop.
5. If the current 'i' pebble makes a good pair with the previous one, we accept it and check for the other ones, just like as above in (3). With the variation that we have to put the 'i' pebble in the 'pos' position, and moving somewhere else the 'pos' pebble. Best place I could think of, was the one resulting from a swap between the two pebbles. This has the advantage that I can easily swap them back after the recursive call is done. An alternative would be using two different lists. One for the actual necklace, the other for the currently unused pebbles.

I couldn't think of a better algorithm, still, the only way to get marks from it on CodeEval was reimplementing it in C++. It was easy, just a few minor changes, and the reported time costs improved by a factor x20.

Even if a bit disappointed, I pushed test case and python script to GitHub.

Go to the full post

Subscribe to: Posts (Atom)