Look, I’m all for fuzzy search, but… this search is straight-up plush.

seen from Japan
seen from Poland
seen from United States
seen from Russia
seen from China

seen from Switzerland

seen from Thailand

seen from United States
seen from Greece
seen from China
seen from United Kingdom

seen from United States
seen from United States

seen from United Kingdom
seen from China
seen from China
seen from China
seen from United States

seen from Türkiye
seen from United States
Look, I’m all for fuzzy search, but… this search is straight-up plush.
Pattern Matching using Mensa -- A Quick Overview
Mensa | Open Source Java on Github
Since first writing about the new open source Java project, Mensa, I’ve received a number of questions asking who is the target audience, what specific problems Mensa solves, and how it might actually be used. In this post, I address those questions. I also describe potential problems with pattern matching in Java using the standard, built-in functions and how…
View On WordPress
110. Substring with Concatenation of All Words
You are given a string, S, and a list of words, L, that are all of the same length. Find all starting indices of substring(s) in S that is a concatenation of each word in L exactly once and without any intervening characters.
For example, given: S: "barfoothefoobarman" L: ["foo", "bar"]
You should return the indices: [0,9]. (order does not matter).
Very interesting problem. The 'naive' way is to iterate all the position i in S and do the scan for each token with the same length of L. When all the words in L are used a solution position is found. The time complexity is O(N^2).
However it is not easy to pass the large data test with an O(N^2) complexity. My solution here takes 1200ms to pass the test and without the optimization at line 20 it will be out of the time limit.
Any better solution?
I did some profiling and found most of the time is spent on cloning at line 17 and HashMap lookup at line 22. For the cloning I did rework to use an simple array to store the counts. The lookup at line 22 has lots of wasted cycle because for each token you need to check many times.
e.g.
S = "barfoothebarfooman" L = ["foo", "bar", "the"]
The bold 'the' is checked 3 times at line 22 when the iteration for S at i = 0, i = 3, and i = 6. By precomputing the index for each i in S will prevent this redundant lookup. Eventually this is still an O(N^2) complexity but the result takes 520ms which is less than half of the previous solution.
People at the forum mentioned about using Rabin–Karp algorithm to achieve better speedup. However I cannot write a solution to beat the 520ms time mentioned before because calculating hash values actually takes more time in our case. If you have solution better than 520ms please let me know!
You can read my code here.
https://gist.github.com/zyzyis/480cb2481f73ef77d521
Might be useful.