Sunday, June 30, 2013

Find the most frequent non-empty subarray

Problem:
Given an array of ints, find the most frequent non-empty subarray in it. If there are more than one such sub-arrays return the longest one/s.
Note: Two subarrays are equal if they contain identical elements and elements are in the same order.

For example: if input = {4,5,6,8,3,1,4,5,6,3,1}
Result: {4,5,6}

Solution:
The idea is to use a suffix array to easily identify repeated subarrays.

Suffix array is actually a 2D array. The suffix array for the given array {4,5,6,8,3,1,4,5,6,3,1} would be as below. Here, each element of the array itself is an array.

{4,5,6,8,3,1,4,5,6,3,1}
{5,6,8,3,1,4,5,6,3,1}
{6,8,3,1,4,5,6,3,1}
{8,3,1,4,5,6,3,1}
{3,1,4,5,6,3,1}
{1,4,5,6,3,1}
{4,5,6,3,1}
{5,6,3,1}
{6,3,1}
{3,1}
{1}

After sorting the suffix array, you'd get:
{8,3,1,4,5,6,3,1}
{6,8,3,1,4,5,6,3,1}
{6,3,1}
{5,6,8,3,1,4,5,6,3,1}
{5,6,3,1}
{4,5,6,8,3,1,4,5,6,3,1}
{4,5,6,3,1}
{3,1,4,5,6,3,1}
{3,1}
{1,4,5,6,3,1}
{1}

Checking for matching subarrays is easily done in a suffix array by comparing the prefixes. If you traverse the above sorted array and compare adjacent elements for similarity you'd see the prefix [4,5,6] is occurring maximum number(=2) of times and is also of maximum length. There are other subarrays as well, like [6], [5,6],[3,1] and [1] that are occurring 2 times, but they are shorter than the subarray [4,5,6], which is our required answer.

Complexity:
time - Θ(n) (to construct the suffix array)
space - O(n^2) (might probably avoid making copies of data, and only use indexes to the original array)
Links and credits:
http://www.careercup.com/question?id=20963685
http://en.wikipedia.org/wiki/Suffix_array

No comments:

Post a Comment