Pages

File Fix-it with unordered_set

The Code Jam Round 1B 2010 Problem A, File Fix-it, requires a solver who knows how a file system is organized, better if from the UNIX family. We provides the already existing directories (maybe zero), and the one or more directories that we want to create. For some reason you can't specify the useful -p parameter on mkdir, and you need instead to iterate on any new subdirectory you need to create. Our solution should give the number of required mkdir runs.

This is a problem that asks loudly to be solved using a trie (aka prefix tree). Unfortunately there is not such a beast in the STL (I am using C++ as implementing language), and using here a non standard library, or developing a simple version of a trie, could be seen as an overkill.

So I am about to use an hash table instead. Namely, the C++11 unordered_set.

The idea is pretty simple, I store each already existing directory in an hash table, then I call a function that adds the new directories, keeping track of the numbers. The code gets a bit cleaner when organized in a class, like:
using MySet = std::unordered_set<std::string>; // 1

class FileFixIt
{
private:
    MySet paths_; // 2

public:
    FileFixIt(const std::vector<std::string>& paths); // 3

    int insert(const std::vector<std::string>& paths); // 4
};
1. I typedef'ed (C++11 style) the container I am using. Remember that the old STL set is a tree-based container that stores its elements in a ordered way. Here we don't care about ordering, so I use instead the C+11 unordered_set based on hash table.
2. The existing paths are stored here.
3. The constructor will get a number of paths (maybe zero) and store all of them in the member variable.
4. Another bunch of directories that have to be added to the hash table, but here we need also to say how many adds have been performed.

As usual, before starting developing, I have written a few test cases. Here is some of them:
TEST(FileFixIt, TestNoNews2) // 1
{
    std::vector<std::string> paths {"/a/b/c", "/b/c/d"};
    FileFixIt ffi(paths);
    EXPECT_EQ(0, ffi.insert(paths));
}

TEST(FileFixIt, TestSimple) // 2
{
    FileFixIt ffi({"/a/b/c"});
    EXPECT_EQ(3, ffi.insert({"/b/c/d"}));
}

TEST(FileFixIt, TestBadInput) // 3
{
    std::vector<std::string> paths {"a"};
    FileFixIt ffi(paths);
    EXPECT_EQ(0, ffi.insert(paths));
}

TEST(FileFixIt, Test1) // 4
{
    FileFixIt ffi({});
    EXPECT_EQ(4, ffi.insert({"/home/gcj/finals", "/home/gcj/quals"}));
}

TEST(FileFixIt, Test2) // 5
{
    FileFixIt ffi({"/chicken", "/chicken/egg"});
    EXPECT_EQ(0, ffi.insert({"/chicken/egg"}));
}

TEST(FileFixIt, Test3) // 6
{
    FileFixIt ffi({"/a"});
    EXPECT_EQ(4, ffi.insert({"/a/b", "/a/c", "/b/b"}));
}
1. In this test I create a string vector containing a couple of paths, and I use it both to create my FileFixIt object and as parameter to the call to insert(). I do expect it would return 0.
2. Here I store a path, then I insert a completely different one. The new one is composed by three parts, So the insert() function should create these new elements in its hash table, "/b", "/b/c", "/b/c/d", and return three.
3. No error handling is required in this simple test, but I'd like at least no disaster happens in case of unexpected input. If this code was meant for production, the constructor would have probably thrown an exception, since the path format is not respected. Here I am happy enough if this code does not crash.
4. First test given in the problem. No path at the beginning, two new ones passed, having the first two components in common. We expect 4 as a return value.
5. Second official test case, nothing new passed to insert(), zero should be the result.
6. Last Code Jam test, a bit more complicated, four new elements should be inserted.

I implemented the insert method in this way:
int FileFixIt::insert(const std::vector<std::string>& paths)
{
    int added = 0;

    for(std::string item : paths) // 1
    {
        while(!item.empty()) // 2
        {
            if(paths_.find(item) == paths_.end()) // 3
            {
                paths_.insert(item);
                ++added;
            }

            MySet::size_type last = item.rfind('/'); // 4
            if(last == 0 || last == std::string::npos) // 5
                break;
            item = item.substr(0, last); // 6
        }
    }

    return added;
}
1. Loop on each string passed in input. Notice that the loop variable "item" is copied from the current element in the vector, so that we can modify it without change the original value. And this matches what we declared in the function interface, being the parameter passed by constant reference.
2. If the current item is empty there is nothing to do. Otherwise we loop on it, checking all its components.
3. Is the current item already in the hash table? If not insert it and increase the counter.
4. Find the last (actually, the first starting from the reverse beginning of the string) occurrence of a slash in the string.
5. If we have reached the root, represented by the first slash in the path name, there is nothing more to do. The second check is to give a minimum of robustness to the code. If the class client passes a malformed path name, and there's no leading slash, I simply stop looping on the string. Even this simple check is kind of an overkilling in this case, on the other side, for real code this is far too less.
6. I found a slash, truncate the string to get rid of the last part, and go to the next iteration. Notice that the first part of the check on (5) is almost duplicated here. For a slash in first position, the resulting substring is empty, so the loop check on (1) would make the break redundant. Still I wanted to guarantee a loop termination in case of malformed path name.

The class constructor should do almost the same job that the insert() method. The only variation is that we don't care about the counter. Duplicating code would be a pity, so I just reused that function:
class FileFixIt
{
// ...

public:

    FileFixIt(const std::vector<std::string>& paths)
    {
        insert(paths);
    }
// ...
};

No comments:

Post a Comment