Pages

CodeEval Suggest Groups

Given in input a number of strings where each of them represent an user, her/his friends and groups, generate a sorted list of suggestions (both by user and group name) for other groups a user could be interested to join, assuming that if a group is followed by half of his friends or more, it could be a good choice.
More details on this problem on its page on CodeEval #165.

Here is the provided sample, as I converted it to a Python test case:
def test_provided_1(self):
    data = 'Amira:Isaura,Lizzie,Madalyn,Margarito,Shakira,Un:Driving,Mineral collecting\n' \
            'Elliot:Isaura,Madalyn,Margarito,Shakira:Juggling,Mineral collecting\n' \
            'Isaura:Amira,Elliot,Lizzie,Margarito,Verla,Wilford:Juggling\n' \
            'Lizzie:Amira,Isaura,Verla:Driving,Mineral collecting,Rugby\n' \
            'Madalyn:Amira,Elliot,Margarito,Verla:Driving,Mineral collecting,Rugby\n' \
            'Margarito:Amira,Elliot,Isaura,Madalyn,Un,Verla:Mineral collecting\n' \
            'Shakira:Amira,Elliot,Verla,Wilford:Mineral collecting\n' \
            'Un:Amira,Margarito,Wilford:\n' \
            'Verla:Isaura,Lizzie,Madalyn,Margarito,Shakira:Driving,Juggling,Mineral collecting\n' \
            'Wilford:Isaura,Shakira,Un:Driving\n'
    output = 'Isaura:Driving,Mineral collecting\n' \
            'Lizzie:Juggling\n' \
            'Madalyn:Juggling\n' \
            'Margarito:Driving,Juggling\n' \
            'Shakira:Driving,Juggling\n' \
            'Un:Driving,Mineral collecting\n'
    self.assertEqual(output, solution(data))

I have split my solution in three parts. Firstly I parse the data in input to a couple of collection to easily manage them. The I actually look for the solution, and finally I prepare the output.

Get the input

I can keep the first two fields in each input string in a list, it is enough to manage the relation between each user and his associated friends. However, I'd better move the third field, in a different collection. For ease of access, and performance, I decided to put them in a dictionary, keeping the relation with the relative user.
def solution(data):
    users = [row.split(':') for row in data.rstrip().split('\n')] # 1
    u_groups = {} # 2
    for user in users:
        for i in [1, 2]: # 3
            user[i] = user[i].split(',') if user[i] else []
        u_groups[user[0]] = user.pop() # 4
1. The input parameter data contains all the lines in the file. I remove the last newline with a call to rstrip() to avoid a final empty record, then I split on newline. Again another split, on colon this time, so the result will be a list of list of strings.
2. In the dictionary u_groups I am going to put each user -> groups.
3: The first element in each line is the user name, no need to more processing on it. The other two elements need another split, on comma this time, to convert them from plain string to list of friends and groups respectively. Notice that it is possible for a user to have no friends or no groups, this should lead to an empty list.
4. I have formatted as expected the groups in the users list, now I pop it to the dictionary defined in (2)

The real job

In users we have, for each user, his name and his friends. In u_groups his groups. Let's put in the dictionary suggestions for each user which groups we think could be interesting for him.
suggestions = {}
for user in users:
    candidates = {} # 1
    for friend in user[1]:
        for group in u_groups[friend]:
            candidates[group] = candidates.setdefault(group, 0) + 1
    for group in candidates: # 2
        if group not in u_groups[user[0]] and candidates[group] >= len(user[1]) / 2:
            suggestions.setdefault(user[0], []).append(group)
1. I'm simply looping on all the friend for the current user, pushing the associated groups to this dictionary.
2. For each group that I found, I check if it is worthy as suggestion. It should not be already a group for the current user, and it should be quite popular in his circle.

To avoid the nuisance of checking if an item is in the dictionary, and then inserting it as new or modify its value, I used the more compact setdefault() method that return the value, if exists, or creatd the key inserting a specified value.

Formatting for output

Basically, I already have the output. I only have to convert it from the suggestions dictionary to a string in the expected format.
result = ['%s:%s' % (u, ','.join(sorted(suggestions[u]))) for u in sorted(suggestions.keys())]
return '\n'.join(result) + '\n'
In the list comprehension I loop on the sorted dictionary keys() then, for each user I put in the list a string that comes out from formatting as strings separated by colon the user name and all the sorted suggestions.
Finally, it is just a matter of putting a bunch of newlines where needed.

After getting the OK from CodeEval, I pushed test case and python script to GitHub.

No comments:

Post a Comment