Discover Top Posts Tagged with #string matching

Pattern Matching using Mensa -- A Quick Overview

Mensa | Open Source Java on Github

Since first writing about the new open source Java project, Mensa, I’ve received a number of questions asking who is the target audience, what specific problems Mensa solves, and how it might actually be used. In this post, I address those questions. I also describe potential problems with pattern matching in Java using the standard, built-in functions and how…

View On WordPress

#Java #Mensa #open source #pattern matching #string matching

110. Substring with Concatenation of All Words

You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.

For example, given: S: "barfoothefoobarman" L: ["foo", "bar"]

You should return the indices: [0,9]. (order does not matter).

Very interesting problem. The 'naive' way is to iterate all the position i in S and do the scan for each token with the same length of L. When all the words in L are used a solution position is found. The time complexity is O(N^2).

However it is not easy to pass the large data test with an O(N^2) complexity. My solution here takes 1200ms to pass the test and without the optimization at line 20 it will be out of the time limit.

Any better solution?

I did some profiling and found most of the time is spent on cloning at line 17 and HashMap lookup at line 22. For the cloning I did rework to use an simple array to store the counts. The lookup at line 22 has lots of wasted cycle because for each token you need to check many times.

e.g.

S = "barfoothebarfooman" L = ["foo", "bar", "the"]

The bold 'the' is checked 3 times at line 22 when the iteration for S at i = 0, i = 3, and i = 6. By precomputing the index for each i in S will prevent this redundant lookup. Eventually this is still an O(N^2) complexity but the result takes 520ms which is less than half of the previous solution.

People at the forum mentioned about using Rabin–Karp algorithm to achieve better speedup. However I cannot write a solution to beat the 520ms time mentioned before because calculating hash values actually takes more time in our case. If you have solution better than 520ms please let me know!

You can read my code here.

https://gist.github.com/zyzyis/480cb2481f73ef77d521

#string matching

110. Substring with Concatenation of All Words

For example, given: S: "barfoothefoobarman" L: ["foo", "bar"]

You should return the indices: [0,9]. (order does not matter).

However it is not easy to pass the large data test with an O(N^2) complexity. My solution here takes 1200ms to pass the test and without the optimization at line 20 it will be out of the time limit.

Any better solution?

e.g.

S = "barfoothebarfooman" L = ["foo", "bar", "the"]

You can read my code here.

https://gist.github.com/zyzyis/480cb2481f73ef77d521

#string matching

#string matching

Trending Tags

Recently Viewed Tags

#string matching