Skip to main content

Raymii.org Logo (IEC resistor symbol)logo

Quis custodiet ipsos custodes?
Home | About | All pages | RSS Feed | Gopher

Solve word puzzles with bash

Published: 08-03-2015 | Author: Ben Everard | Text only version of this article


Table of Contents


This article was originaly published in Linux Voice, issue 1, April 2014.This issue is now available under a Creative Commons BY-SA license. In anutshell: you can modify and share all content from the magazine (apart fromadverts), even for commercial purposes, providing you credit Linux Voice as theoriginal source, and retain the same license.

This remix is converted manually to Markdown and HTML for ease of archiving andcopy-pasting.

If you like this article, consider sponsoring me by trying out a Digital OceanVPS. With this link you'll get $100 credit for 60 days). (referral link)

Other converted Linux Voice articles can be found here.


The humble command line interface is amazingly powerful, for both real work andplaying games.

Why do this?

Solve word puzzles with Bash

It's no secret that Bash, the shell on most Linux systems, is an incrediblypowerful tool, however it's one that many Linux users don't take the time tofully learn. A lot of tutorials focus on boring but practical uses like managinglog files, but it doesn't have to be this way. Bash can be fun.

Here at Linux Voice, we want to give this tool some love, so we're inauguratingthe Grep Games. This is an event where you use Bash together with grep to solvethe sort of word puzzles you find in glossy magazines.

Here's an example: what is aedh an anagram of? To solve this, you're going toneed a list of English words. This comes as standard on most Linuxes, and canusually be found at /usr/share/dict/words or /usr/dict/words. If it's notthere, check for a words or wordlist package in your package manager. Failingthat, you can grab it from the DVD or linuxvoice.com.

In this article, we'll use /usr/share/dict/words, but you should change thisif your words file is elsewhere. We'll use egrep (like grep but uses extendedregular expressions, which have a cleaner syntax than plain regular explessions)to find the right words. If you haven't come across this tool before, take alook at the boxout on grep and regular expressions, right.

You can find any word that contains just the letters aedh with this line:

egrep "^[aedh]*$" /usr/share/dict/words 

The ^ matches the start of the line, $ the end of the line and [aedh]*matches any string of the letters aedh.

However, these aren't all anagrams. Any anagram must be exactly four letterslong, so let's only match words of exactly four characters:

egrep "^[aedh]{4}$" /usr/share/dict/words 

This is a bit better, but there are still some with repeated characters. Tosolve this we're going to pipe the output into a second instance of egrep, likethis:

egrep "^[aedh]{4}$" /usr/share/dict/words | egrep -v "(.).*\1" 

If you run this, you'll find that it only returns one line, the anagram ofaedh. The second egrep has the -v flag, which means that it works inreverse; that is, it only outputs lines that don't match the pattern.

The pattern (.).*\1 matches any line with a repeated character in it because(.) matches any character, .* matches any string of any length (includingnothing) and \1 is a back reference to the first character. For more detailson this, see backreferences in the boxout on Grep and regular expressions.

Sometimes an anagram will contain a repeated letter, and that would be missed bythe above. Take, for example, eeeddh. The previous method won't work, soinstead we need to match different letters different numbers of times. The codefor this is:

egrep "^[edh]{6}$" /usr/share/dict/words | egrep "*^[^e]*(e[^e]*) {3}[^e]*$" | egrep "^[^d]*(d[^d]*){2}[^d]*$" | egrep -v "([^ed]).*\1]*" 

Here the second and third egreps both work in the same way. They make sure thata particular letter is repeated exactly a certain number of times.

[^e] matches any character except e, so the second egrep matches any stringthat starts at a new line, has any character other than a letter e zero ormore times followed by three occurrences of the bracketed expression (whichcontains e once and any string of other characters), then anything that isn'tan e zero or more times followed by an end of line.

The final egrep makes sure that nothing other than e and d are repeated.

regex101

www.regex101.com is an online tool to help you understand regularexpressions. Unfortunately it uses regular expressions from PHP, Python andJavaScript, which are slightly different from egrep.

I'll have a vowel please Carol

This solves complete anagrams, but that's not always what you want to do. In theUK there's a quiz show called Countdown, in which the contestants have to makethe longest word they can out of a given sequence of nine letters.

You can solve this in a similar manner to the above problem, but by using rangesfor the number of characters rather than an absolute number.

Take a look at this example for the letters a,e,e,f,d,m,t,t,i:

egrep "^[aefdmti]{1,9}$" /usr/share/dict/words | egrep "*^[^e]*(e[^e]*){0,2}[^e]*$" | egrep "^[^t]*(t[^t]*){0,2}[^t]*$" | egrep -v "([^et]).*\1] 

However, this doesn't quite solve our problem. We don't want all the words thatmatch, just the longest one.

To get this, we need to go beyond a single line and create a script.

#!/bin/bash longestLength=0 longestWord="" while read word do   if (( ${#word} > longestLength ))   then     longestLength=${#word}     longestWord=$word   fi done echo $longestWord 

This code reads each line from standard in (while read line) and checks itslength against the previous longest word. At the end, it echos (prints) thelongest word its found.

To include this with the previous egrep commands, just use:

egrep "^[aefdmnti]{1,9}$" /usr/share/dict/words | egrep "*^[^e]*(e[^e]*){0,2}[^e]*$" | egrep "^[^t]*(t[^t]*){0,2}[^t]*$" | egrep -v "([^et]).*\1]*" | bash longest.sh 

Where longest.sh is the filename of the above script (it's on the website andDVD).

Another puzzle similar to Countdown is the word wheel. This is where there's aseries of letters on the outside of a circle and one in the middle. You thenhave to find as many words as possible that contain the letter in the middle andtwo or more of the letters on the outside.

wordwheel

Word wheels: a challenging mental puzzle or a simple command?

The example puzzle on the facing page can be solved with:

egrep "^[fedpt]*i[fedpt]*$" /usr/share/dict/words | egrep -v "(.).*\1" | egrep ".{3,}" 

gvim

Many programs have some form of regexes built in. Here, gvim is finding allUSB messages for user ben in the syslog.

Word ladders are a bit different to the puzzles we've looked at so far. Insteadof arranging various letters into words, you start with a word, then each rungof the ladder you change a single letter from the word above until you end upwith a final word.

There are two separate parts to look at. The first part is finding all the wordsthat can follow a particular word. The second part is finding out if aparticular word can precede the final word.

Let's try the ladder:

live ---- ---- ---- raft 

To solve this you have to come up with three words.

#!/bin/bash for x in $(egrep "^liv.$|^li.e$|^l.ve$|^.ive$" /usr/share/dict/words); do   query='^.'${x:1:3}'$|^'${x:0:1}'.'${x:2:2}'$|^'${x:0:2}'.'${x:3:1}'$| ^'${x:0:3}'.$'   for y in $(egrep $query /usr/share/dict/words); do     query2='^.'${y:1:3}'$|^'${y:0:1}'.'${y:2:2}'$|^'${y:0:2}'.'${y:3:1}'$|^'${y:0:3}'.$'    for z in $(egrep $query2 /usr/share/dict/words | egrep "^raf.$|^ra.t$|^r.ft$|^.aft$"); do      if [ $x != $y ] && [ $x != $z ] && [ $x != "live" ] && [ $x != "raft" ] && [ $y != $z ] && [ $y != "live" ] && [ $y != "raft" ] && [ $z != "live" ] && [ $z != "raft" ]; then         echo "live"        echo $x        echo $y         echo $z         echo "raft"         echo "---"      fi    done  donedone

This code performs three for loops, one for each of the missing words. The firstfor loop runs on every word that matches the regular expression "^liv.$|^li.e$|^l.ve$|^.ive$" this is effectively four different regular expressionsseparated by |. Together, it will return any word that matches any one ofthese sub-expressions.

Inside this for loop it runs the line:

query='^.'${x:1:3}'$|^'${x:0:1}'.'${x:2:2}'$|^'${x:0:2}'.'${x:3:1}'$|^'$ {x:0:3}'.$' 

This just builds up a regular expression equivalent to the first one but forevery word returned. x is the variable holding the word, and ${x:1:3} (forexample) returns characters 1 through 3 of the word held in variable x (thefirst character is 0).

The second for loop works in exactly the same way as the first. The final forloop is a bit different because it not only has to match the word above it, butthe word below it as well. For this reason it runs two egreps on the words: oneto match the words above, and the second to match the words below. The ifstatement simply removes any solutions that repeat words.

egrep

egrep will highlight the particular part of each line that matches the regularexpression.

Playing GCHQ

Substitution ciphers are easy-to-break encryption systems where you take eachletter of the alphabet and represent it with a different symbol. The point ofthe puzzle is to work out what letters the symbols represent.

As an example, the cipher:

12334, 56 7852 90 a27 

could correspond to:

hello, my name is ben 

because

h=1, e=2, l=3, o=4, m=5, y=6, n=7, a=8, i=9, b=a. 

Now take a look at the following:

123452 672 8298a2 bc 9889dbeb9c 

The main clue here are repeated letters which you can match using backreferences. You could try to build a script to match the whole lot in one go,but it's far easier and quicker to pick on part with quite a few repeatedcharacters and just match that.

Once you've got that, it should be quite trivial to finish it off. We decided towork with the final two words. A script to solve them is:

#!/bin/bash list2=$(egrep "(.)(.)\2\1.(.).\3\1." /usr/share/dict/words) for word1 in $(egrep "^.{2}$" /usr/share/dict/words); do   for word2 in $list2; do     echo $word1" "$word2 | egrep "^(.)(.) [[:space:]](.)(.)\4\3.\1.\1\3\2$"   done done 

The first loop goes through every two letter word while the second one loopsthrough every word that matches the particular pattern of backreferences.

The guts of the code is the line:

echo $word1" "$word2 | egrep "^(.)(.)[[:space:]](.) (.)\4\3.\1.\1\3\2$" 

It checks every pair of words generated by the two loops for a particularpattern of back references which correspond to repeated characters in theciphertext.

This method could be expanded to match three or more words, though it will slowdown significantly with each new word. Once you've got some of the letters, youshould be able to come up with patterns based on the letters you know to findthe other words.

Ben Everard is the co-author of Learning Python with Raspberry Pi, soon to bepublished by Wiley. He's also pretty good at turning foraged fruit into alcohol.

Boxout 1: Grep and regular expressions

Grep is a popular tool for finding particular pieces of text. As well as solvingword games, it's also useful in finding particular messages in log files andother 'real' work.

egrep is like grep, but it uses extended regular expressions rather thanordinary regular expressions. These have a cleaner syntax, so it's these thatwe'll use here. The basic usage is:

egrep <pattern> <file> 

This will output every line in the file that matches <pattern>. It can also beused in a pipe like this:

cat <file> | egrep <pattern> 

This just prints every line that cat outputs that matches <pattern>. The trickwith egrep is in mastering extended regular expressions. A letter just matchesitself, so for example, abc will match any line that contains the string abcanywhere in it.

^ matches the start of the line and $ matches the end of the line, so ^abcmatches any line that starts with abc, abc$ matches any line that ends withabc and ^abc$ matches any line that contains just abc.

The . character matches any character, so ^a.c$ will match abc, adc,aac, but not ac. This is known as backreferencing. You can also match groupsof characters, eg ^[ab] will match any line that starts with a or b, while^[^ab] will match any line that starts with any character other than a orb.

^[a-z] will match any line starting with a lower-case letter. There are also afew special options here such as [[:space:]], which matches any whitespace(space, tab, etc) and [[:lower:]] which matches any lower-case letter.

You can match characters more than once. * matches zero or more times, + oneor more time, and ? zero or one time. So, ^a*$ matches a line that containsa number of a's but no other characters. ^a.*a$ matches a line that startsand finishes with a letter a. ^a.+a$ matches any line that starts and endswith an a and has at least one character in between.

You can also specify a range of the number of matches you want by using {}.For example, ^a{2,3}$ will match the lines aa and aaa, but nothing else.You can bracket parts of regular expressions as well. This is useful because itallows you to refer to particular matches.

\1 matches whatever the first bracketed expression matched, \2 matches whatthe second matched and so on. For example, (.).\1 will match any twocharacters that are the same separated by a character, such as bob, did,aaa, but not abc.

The final part of extended regular expressions that we'll look at is |. Thisallows you to match against more than one pattern. For example, ^ab|^bc willmatch anything that starts with either ab or bc, but not ac or anythingelse. ^(ab|bc) does the same thing.

Challenges

Test your skills by writing scripts to solve the following word puzzles.

Anagrams

Countdown

Encryption

Word Wheel

wordwheel

wordwheel

wordwheel

Word ladder

First:

band------------meat

Second:

brag------------plan

Third:

wire------------pant
Tags: anagram, articles, bash, dict, egrep, grep, linux-voice, linux-voice-issue-1-2014, regex