Pages

A few divide and conquer problems

The exercises for week four of edX MOOC Algs200x Algorithmic Design and Techniques by the UC San Diego are, in my opinion, a bit disappointing. I would have preferred if they pushed me to think more to how the divide and conquer strategy apply to different situation. Still, there are a few interesting points than we can get from them.

Binary Search

Task: Given in input two list of integers, being assured that the first one is sorted, output a list of integers where each value is the index of the positional matching element in the second list, as found in the first one, or -1 if missing.

There is almost nothing to say about to this problem. It is just a matter of implementing binary search.

Majority Element

Task: Given a list of integers, return true if it contains a majority element, that is, there is an element repeated more than half the size of the list.

Solving this problem with a divide and conquer approach, is not the best idea one could have. Even though it is certainly better than a naive approach, the common alternatives that should spring to the mind of any programmer (just a couple of hints, sorting, hashing) look more attractive. And I haven't even started talking about the amazing Boyer–Moore algorithm.

My python solution is a recursive function that returns the majority element in the closed subinterval [left..right] on the passed data list, or None if there is not such a beast:
def majority(data, left, right):
    size = right - left + 1
    if size < 3:  # 1
        return data[left] if data[left] == data[right] else None

    pivot = left + size // 2  # 2
    lhs = majority(data, left, pivot)
    rhs = majority(data, pivot + 1, right)

    if not lhs or not rhs:  # 3
        return lhs if lhs else rhs
    if lhs == rhs:
        return lhs

    for candidate in (lhs, rhs):  # 4
        count = 0
        for i in range(left, right + 1):
            if data[i] == candidate:
                count += 1
        if count > size // 2:
            return candidate
    return None
1. Base case, when we have just one or two elements, it is immediate to get the majority.
2. The divide part of the algorithm. Split the list in two, call recursively the function on the left and right side
3. Base cases for the conquering. If we have at least a None, return the definite element, if any, or None. Otherwise, if both the candidate majority are the same, return it.
4. The expensive part of the algorithm. We should decide which candidate choose, and to do that we just count the instances of both on the entire interval. If none of them has the majority here, we return None.

In the next post I provide different approaches to this problem.

Improving Quick Sort

Task: Improve the "plain" quick sort algorithm adapting it to support a 3-way partitioning.

I didn't spend any time at all for selecting a decent pivot, I just picked anytime the leftmost element. Don't mind about it, just focus on the partitioning.

Number of Inversions

Task: Count the number of inversions in a (partially sorted) list of integers.

Here the point is tweaking a merge sort algorithm to keep track of the detected inversions.

Organizing a Lottery

Given in input a list of segments on the x axis, each item being a 2-tuple of integers, begin and end, and a list of integers, representing points on the same axis, count for each point how many segments it intersects.

I spent quite a long time trying to figure out how to solve this problem using a divide and conquer approach. After a while, I decided to implement a solution based on sorting instead. Then I asked on the course forum about the right way to solve this exercise. Well, it looks like it was meant to be solved with sorting. How disappointing.

My solution is based on the consideration that I could get the number of intersecting segments in a point simply counting all the begin-end of segments up to there.

Here is my python code:
def solution_sort(segments, points):
    s_open = []  # 1
    s_close = []

    for segment in segments:
        s_open.append(segment[0])
        s_close.append(segment[1])

    s_open.sort()  # 2
    s_close.sort()
    points.sort()

    results = []
    count = 0  # 3
    oi = 0  # 4
    ci = 0
    size = len(segments)
    for point in points:  # 5
        while oi < size and s_open[oi] <= point:  # 6
            oi += 1
            count += 1
        while ci < size and s_close[ci] < point:  # 7
            ci += 1
            count -= 1
        results.append(count)

    return results
1. I put in the list s_open all the beginnings of segments, and in s_close their endings.
2. Segments and point had better to be sorted.
3. Current count for the number of segments currently open.
4. I am going to linearly scan open and close segment lists, oi and ci are the indexes for their current positions.
5. The scan is ruled by the points, being sorted, I start from the leftmost.
6. Each opening segment detected up to the point is added to the count of the currently open segments.
7. Decrease the count of open segments when a end of segment is seen. But be careful, the close should be strictly minor than the point.

Finding the Closest Pair of Points

Task: Given n points in a bidimensional space, find the smallest distance between a pair of them.

This is not an easy problem. Fortunately, it is also a well known one. See, for instance, this paper by Subhash Suri. In this other post, I show my python solution to this problem following that algorithm.

I pushed on GitHub the full python code for these problems.

8 comments:

  1. this algo is incorrect not work for many test cases

    ReplyDelete
    Replies
    1. I'm sorry to say, Abhishek, that I find your comment disappointing. I'm citing a few problems in this post, so it is not clear which algorithm you are criticizing. Besides, you are not constructive. You should provide an alternative solution or, at least, point out the weak spots you detected.

      Delete
  2. > Then I asked on the course forum about the right way to solve this exercise. Well, it looks like it was meant to be solved with sorting.

    Hi! As I know there is second approach described in a course book (http://bit.ly/moocbookc). Do you know anything about it? Maybe that one is more 'divide and conquer'-like?

    ReplyDelete
    Replies
    1. Oh, I see... They just used binary search (which is divide and conquer algorithm) to count right ends that are smaller than point and left ends that are greater than point.

      Delete
    2. Hello Egor, thank you for the hint. I guess that when I followed the course the book was not already published. I'll get a copy, sure it would be fun to read!

      Delete
  3. The explanation is very crisp and provides very helpful hints. Thanks for this blog.

    ReplyDelete
  4. Organizing a Lottery

    in above problem what will be time complexity. in my view it will be O(N^2). Can we make any inprovement?

    ReplyDelete
    Replies
    1. Hello Savan. Yes, we get a O(N^2), assuming the size of segments and points are more or less the same.
      In a "normal" case we should have O(N*M).
      Or, if one of the two is dominant, let's say N >> M, we could get a O(N lg N), for the sorting before the loop.
      I guess we need more information on the expected input data to improve the solution.
      Do you have any suggestion?

      Delete