Studious Student Problem Analysis

You’ve been given a list of words to study and memorize. Being a diligent student of language and the arts, you’ve decided to not study them at all and instead make up pointless games based on them. One game you’ve come up with is to see how you can concatenate the words to generate the lexicographically lowest possible string.

Input
As input for playing this game you will receive a text file containing an integer N, the number of word sets you need to play your game against. This will be followed by N word sets, each starting with an integer M, the number of words in the set, followed by M words. All tokens in the input will be separated by some whitespace and, aside from N and M, will consist entirely of lowercase letters.

Output

Your submission should contain the lexicographically shortest strings for each corresponding word set, one per line and in order.

Constraints

1 <= N <= 100
1 <= M <= 9
1 <= all word lengths <= 10

Here is my problem analysis for Facebook Hacker Cup Qualification Round: Studious Student.

Studious Student Problem Analysis:
As I mentioned, this problem is not as straight forward as you think it might be. The first most natural way to approach this problem is sorting. Most people will reason that you can sort and concatenate each individual word together to form the lexicographically smallest string. This is incorrect, as illustrated in one of the sample inputs:

jibw ji jp bw jibw

By sorting and concatenate, the answer is:

bwjijibwjibwjp,

while the correct answer should be:

bwjibwjibwjijp.

Lexicographical order is also known as dictionary order, since it is how the words are ordered in the dictionary.

Notice in the above words, “ji” is the prefix of “jibw“. The “sort and concatenate” method definitely does not work when there is a case where a word is a prefix of one or more other words.

Well, one might try a naive way of doing a brute force. Although it is highly inefficient, it works for this problem. This is because each input would be at most 9 words, and it turns out there are only a total of 9! = 362880 possible permutations of words being concatenated together. Therefore, one can generate all possible permutations and find the answer.

We make an easy observation that if all words in the list are of equal length, then sort + concatenate must yield the smallest dictionary order. In fact, a better argument would be:

If no word appears to be a prefix of any other words, then the simple sort + concatenate must yield the smallest dictionary order string.

To solve this problem correctly, we must also handle the special case where a word appears as a prefix of other words. One efficient and easy (non-trivial to prove but easy to reason) solution for this problem is to re-define the order relation of two words, s1 and s2, as:

s1 is less than s2 iff (if and only if)

s1 + s2 < s2 + s1.

Then, by sorting and concatenating the words using the above ordering, it must yield the lexicographically smallest string. Why?

Here is a concrete example why this works. We use an example where the list of words are:

ji jibw jijibw

By the definition of s1 is less than s2 iff s1+s2 < s2+s1, we found that the lowest ordered word in the list is “jibw“. This is because “jibwji” < "jijibw” and “jibwjijibw” < "jijibwjibw“.

Now, the key to understand why the order relation s1+s2 < s2+s1 yields the smallest dictionary order is:

  • We have found the smallest-ordered word such that s1+x < x+s1. Therefore, it is impossible to swap the words to yield a smaller dictionary order.
  • For a case with more words, then this order relation holds: s1+x < x+s1, and x+y < y+x. As swapping at any point could not possibly yield a smaller dictionary order, therefore s1+x+y must yield the smallest dictionary order.
  • This result can be generalized to all M words by induction, due to the transitive property mentioned above.

I do not claim that this is a rigid or even a correctly constructed proof. However, it does help me in convincing myself this is a valid solution.

VN:F [1.9.22_1171]
Rating: 4.5/5 (17 votes cast)
Studious Student Problem Analysis, 4.5 out of 5 based on 17 ratings

30 thoughts on “Studious Student Problem Analysis

  1. Vishwas

    The way you have solved definitely works and all that we need to do is to provide the STL sort a new function :).

    VA:F [1.9.22_1171]
    0
  2. Anonymous

    I was surprised by how challenging this problem was. How did you arrive at your compareSort solution? That is very clever.

    I ended up writing a findLower() method that would find words that were lower than a given word using two facts: if words were the same length you can trust a traditional sort, and if they aren't, you can trust the sort if you add the first letter of the shorter word to the end of it. Like you said, its hard to prove why that works, but it does!

    Hopefully a problem this tricky wouldn't come up in an interview!

    VA:F [1.9.22_1171]
    +1
  3. 1337c0d3r

    @Anonymous:
    Just like you, my first intuition was to append a smaller letter (by finding the smallest first letter from the word list not including the word itself) to each word and sort it, progressively refining at each step.

    However, this does not work for the case:
    ji jijib

    Then, I thought that appending just one letter would not be enough. Then later it just lead me naturally to the order relation of s1 + s2 < s2 + s1.

    VA:F [1.9.22_1171]
    +1
  4. Cygwin98

    The comparer part is the tricky one. My solution was written in C#. It looks a bit clunky compared with your C++ solution.

    using System;
    using System.Collections;

    class Student
    {

    public class MyStringComparer : IComparer
    {
    int IComparer.Compare(object X, object Y)
    {
    string x = (string) X;
    string y = (string) Y;
    return (x+y).CompareTo(y+x);
    }
    }

    public void MySort(string[] words, IComparer comparer)
    {
    }
    static void Main()
    {
    int N = int.Parse(Console.ReadLine());
    for(int i=0; i < N; i++)
    {
    string line = Console.ReadLine();
    string[] splits = line.Trim().Split();
    int M = int.Parse(splits[0]);
    string[] words = new string[M];

    for(int j=0; j< M; j++)
    words[j] = splits[j+1];

    MyStringComparer comparer = new MyStringComparer();
    Array.Sort(words, comparer);
    for(int j=0; j< M; j++)
    Console.Write(words[j]);
    Console.WriteLine();
    }
    }
    }

    VA:F [1.9.22_1171]
    0
  5. Pop

    My post was wrong. This solution is brilliant. You are a genius.

    Here is my proof of your solution.

    Before we jump to the formal proof. Let us play with some simple cases.

    suppose we have a series of characters:a,b,c,d
    satisfied the criteria like in the solution. Then we have:
    abcab (fix c, exchange a and b)
    cab>acb (fix b, exchange a and c)
    acb>abc (fix a, exchange b and c)
    summarize above, we have cba>abc

    2. more exercise for abcd, dbca [here you will see the pattern, I simply exchange the first and last element]
    it is almost the same:
    dbca>dbac>dabc>adbc>abdc>abcd
    — — — — —

    the trick is exchange one pair at a time: the element 'a' moves from last potion to first and then move the element 'd' down from first position to last.

    Then we can conclude a general format
    x Z y < y Z x if xx[j] and ij.

    Obviously, we can construct the new solution by switching x[i] and x[j].

    the new solution will look like
    X' = x[0] x[1] x[2] ….x[i-1] x[j] x[i+1]…x[j-1] x[i] x[j+1]…x[n]

    Comparing X and X', they have the same prefix (x[0..i-1] and suffix(x[j+1..n]). We do not need to consider them. The most interesting part is

    in X', x[j]x[i+1]…x[j-1]x[i]
    in X, x[i]x[i+1]…x[j-1]x[j]

    Notice that we already prove X'j. This prove the solution.

    VA:F [1.9.22_1171]
    0
  6. Pop

    Did not notice that the "<" and ">" are special characters. Post it again.

    Here is my proof of your solution.

    Before we jump to the formal proof. Let us play with some simple cases.

    suppose we have a series of characters:a,b,c,d
    satisfied the criteria like in the solution. Then we have:
    cab > cab (fix c, exchange a and b)
    cab > acb (fix b, exchange a and c)
    acb > abc (fix a, exchange b and c)
    summarize above, we have cba > abc

    2. more exercise for abcd, dbca [here you will see the pattern, I simply exchange the first and last element]
    it is almost the same:
    dbca > dbac (fix db, exchange a,c)
    > dabc (fix d and c, exchange a,b)
    > adbc (fix bc, exchange a,d)
    > abdc (fix a and c, exchange b,d)
    > abcd (fix ab, exchange c,d)

    the trick is exchange one pair at a time: the element 'a' moves from last potion to first and then move the element 'd' down from first position to last.

    Then we can conclude a general format
    x {Z} y < y {Z} x if x < y and {Z} is in order. — A

    Obviously, we can construct the new solution by switching x[i] and x[j].

    the new solution will look like
    X' = x[0]..x[i-1] x[j] x[i+1]..x[j-1] x[i] x[j+1]…x[n]

    Comparing X and X', they have the same prefix (x[0..i-1] and suffix(x[j+1..n]). We do not need to consider them. The most interesting part is

    in X', x[j]x[i+1]…x[j-1]x[i]
    in X, x[i]x[i+1]…x[j-1]x[j]

    Notice that we already prove X' < X (see the general format A). This prove the solution.

    VA:F [1.9.22_1171]
    0
  7. Pop

    Darn. The blog editor sucks. It ate up one of the important sentences.

    We use proof of contradiction.
    Given the minimum solution X, we prove that there is no such pair of x[i], x[j] where x[i] > x[j] and i < j.

    Proof by contradiction, suppose we have one of such a pair of x[i], x[j], if we can construct a new solution X' based on X and X' < X, then we lead to a contradiction : X is not the minimum.

    Obviously, we can construct X' by switching x[i], x[j] in X.

    VA:F [1.9.22_1171]
    0
  8. Ron

    I ended up with the same solution in Python. Add I had to do from my initial wrong solution has add the custom comparison function.

    import sys

    def lex_compare(x, y):
    if x+y < y+x:
    return -1
    return 1

    f = open(sys.argv[1])
    m = int(f.readline())
    for r in range(0,m):
    l = f.readline().rstrip()
    arr = l.split(" ")
    l = arr[1:]
    l.sort(cmp=lex_compare)
    print ''.join(l)

    VA:F [1.9.22_1171]
    0
  9. 1337c0d3r

    @Pop:
    Thanks for your insightful comments. After reading your proof, I think I come up with a better argument (I mean better than my previous argument, not saying that my argument is better than yours), which is updated in my post above. I believe the key to the proof is the swapping and the transitive property.

    VA:F [1.9.22_1171]
    0
  10. K4pT3N

    i used PHP for this game…

    [php]
    // THis is the main code to sort the words
    if($_POST['generate']){
    for($ii=0;$ii<$_POST[n];$ii++){
    $list = $_POST['idx'.$ii];
    $getIdx = explode(" ",$list);
    asort ($getIdx);
    reset ($getIdx);
    while(list($key,$val)=each($getIdx)){
    echo "$val";
    }
    echo "
    ";
    }
    }
    [/php]

    VA:F [1.9.22_1171]
    0
  11. Ricardo Tomasi

    Wrote mine using nodeJS. I used a sort algorithm that finds equal character sequences in both strings and then compares their possible combinations. I didn't have much time, it's certainly incomplete but seems to have worked for the input.

    https://gist.github.com/794605

    VA:F [1.9.22_1171]
    0
  12. Anonymous

    Firstly, you have a nice BLOG, keep it going. Take a bow.
    Comparing s1 + s2 with s2 + s1 is not the best solution. That means your each compare is O(m+n). It can be made O(max(m,n)) by recursive call.

    for($i = 0; ($i < strlen($str1)) && ($i < strlen($str2)); $i++) {
    //echo $i . ': comparing ' . $str1[$i] . ' ' . $str2[$i];
    //echo "\n";
    if(ord($str1[$i]) < ord($str2[$i])) {
    return -1;
    } else if(ord($str1[$i]) > ord($str2[$i])) {
    return 1;
    }
    }
    if(($i == strlen($str1)) && ($i == strlen($str2))) {
    return 0;
    }
    if($i == strlen($str1)) {
    $str2 = substr($str2, $i);
    return self::cmp($str1, $str2);
    }

    if($i == strlen($str2)) {
    $str1 = substr($str1, $i);
    return self::cmp($str1, $str2);

    }

    VA:F [1.9.22_1171]
    0
  13. ted

    the proof would be easier and clearer if you can explicitly prove the transitive relation

    a < b && b a < c it can be proved by enumerating the 2 cases of a < b ( length(a) == length(b) && a < b lexicographically, or a = bx where the x < a lexicographically ) but it's not that clean

    VA:F [1.9.22_1171]
    0
  14. maxq

    I implemented a brute force string compare function:

    int string_compare(const char *str1, const char *str2)
    {
    const char *p1=str1, *p2=str2;

    for (; *p1 != ” && *p2 != ” && *p1 == *p2; p1++, p2++);

    if (*p1==” && *p2==”)
    return 0;

    if (*p1 == ”)
    return string_compare(str1, p2);
    else if (*p2 == ”)
    return string_compare(p1, str2);
    else
    return (int)(*p1) > (int)(*p2);
    }

    VA:F [1.9.22_1171]
    0
  15. a

    Why would ‘ji’ appear later than ‘jibw’ in the dictionary.
    Since ‘ji’ is a prefix of ‘jibw’ and len(‘ji’) check out ‘cat’ & ‘cataract’ in the dictionary.

    VA:F [1.9.22_1171]
    0
    1. zyfo2

      they want to connect these words. in ur example, the solution is that jibwji appears before jijibw in dictionary.

      VA:F [1.9.22_1171]
      0
  16. zyfo2

    @1337c0d3r
    I think the easiest way to compare is to redefine the comparing as following:
    one string is smaller if the character at the same index is smaller(which is the original definition), or the longer string’s next character is smaller than the first character.
    eg. jibx ji
    this just depends on the third character comparing with first character.
    this is easy to implement

    VA:F [1.9.22_1171]
    0
  17. akash

    Here is a ruby program. Since this wa a quick 10 minute hack, I didn’t compare performance.

    biginput = [
    [‘facebook’, ‘hacker’, ‘cup’, ‘for’, ‘studious’, ‘students’],
    [‘k’, ‘duz’, ‘q’, ‘rc’, ‘lvraw’],
    [‘mybea’, ‘zdr’, ‘yubx’, ‘xe’, ‘dyroiy’],
    [‘jibw’, ‘ji’, ‘jp’, ‘bw’, ‘jibw’],
    [‘uiuy’, ‘hopji’, ‘li’, ‘j’, ‘dcyi’],
    [‘ba’, ‘b’]
    ]

    biginput.each do |input|
    out = “”

    (‘a’..’z’).each do |x|
    temp = input.select{|str| str.start_with? x}
    temp.each {|str| str = str<<x}
    # puts temp.inspect
    temp.sort!
    # puts temp.inspect
    temp.each do |res|
    out << res[0..-2]
    end
    end

    puts out
    end

    VA:F [1.9.22_1171]
    0
    1. theOtherWC

      @1337c0d3r

      Unbelievable! Thanks for that!!

      My personal justification of the correctness of the new sorting scheme is :

      Given any ordering of any set of words, find the first pair of word (ith and i+1 th) such that i+1 appear before i in the new sorting rule.

      Swap ith and i+1 th word, notice the entire word gets smaller in dictionary order.

      Repeat the behavior until no such pair exists.

      Claim optimum is achieved.

      VA:F [1.9.22_1171]
      0
  18. Yunsong Wu

    @1337c0d3r

    You are a genius! Thank you very much for your post! I have learnt a lot from it!

    I have a question that needs your help. How can we prove transitivity? I mean, if we know

    s1 + s2 < s2 + s1 and s2 + s3 < s3 + s2

    how to prove s1 + s3 < s3 + s1? Thank you!

    VN:F [1.9.22_1171]
    0
  19. Pingback: LeetCode | techinterviewsolutions

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use the <code> tag to embed your code.

*