kdb programming challenge - square-free sequences

Continuing on from the number base challenge, this time around we will be looking at square-free sequences. Although they might seem a little disconnected right now, trust me there is a theme to these challenges; all will become clear in time!

Not all squares are free

In combinatorics a square-free word is any word which does not contain any subword twice in row. An equivalent way to state this is that there can be no repetitions of the form XX, where X can be any list of symbols (or just a single symbol). So for example the word “squarefree” is not square-free, the letter “e” repeats, whereas “square” is square-free.

So the challenge today is to make a function that returns a bool telling us if a given list is square-free. The function should work regardless of the list type. So for example:

q) f "squarefree"
0b
q)f "square"
1b
q)f 10b
1b
q)f 1010b
0b

If you’re dealing with a list of booleans (i.e. an alphabet of two symbols) there are only 6 squarefree lists:

0b
1b
01b
10b
010b
101b

but with larger alphabets there are an infinite number…

[showhide type=”post” more_text=”Show Solution” less_text=”Solution” hidden=”yes”]

Again I’m sure there is more than one nice way to approach this problem, so I present you with two alternatives! The first way to approach the problem is by taking the list and cutting it up in every way possible, then comparing the neighbouring pieces for equality. Here’s a piece of q code that does just that:

f:{not any{any raze(~':')cut/:[x;til[x]_\:y]}\:[1+til count x;x]}

An observant reader might notice that – although this code is quite concise – it actually does a little more work than necessary. This version (courtesy of the one and only Kent Lee) takes away some of the redundant comparisons and is a little faster:

f:{not max{max{x~'next x}y cut z _ x}[x]'[where a;raze til@'a:til 1+floor .5*count x]}

An alternative approach is to make a list of every sub-word and double each one – “hello” goes to “hellohello” – and then search the original list of each double. Turns out kdb+ can do this quite concisely indeed:

f:{not any(l,'l)in l:raze -1_'{{-1_x}\[x]}each{1_x}\[x]}

So there’s at least two ways to solve the problem, but as always I’m sure there are others, and no doubt there are improvements that could be made to our solutions. Feel free to post!

[/showhide]