http://xkcd.com/936/

Not worth creating a project for, and it might be interesting to see what changes people would make.

Non-standard dependencies:

  • words, for the dictionary
  • zsh (this will probably work just fine with bash, though, too)
#!/usr/bin/zsh
# Author: @sxan@midwest.social
# 2025-02-23

final=(xargs echo)
count=6
while getopts d opt; do
	case $opt in
		d)
			final=(tr 'A-Z' 'a-z')
			;;
		*)
			printf "Password generator based on the correcthorse algorithm from http://xkcd.com/936//n/n"
			printf "USAGE: %s [-d] [#]\n" "$0"
			printf " -d  make the result all lower case; otherwise, each word will be capitalized.\n"
			printf " #   the number of words to include. Defaults to 6."
			exit 1
			;;
	esac
done
shift $(($OPTIND - 1))
[[ $# -gt 0 ]] && count=$*

shuf -n $((count * 2)) /usr/share/dict/american-english | \
	sed 's/'"'"'.*//; s/^\(\w\)/\U\1/' | \
	sort | uniq | shuf -n $count | xargs echo | \
	tr -d ' ' | $final

What’s going on here:

Nearly 30% of the American dictionary (34,242) are words with apostrophes. They could be left in to help satisfy password requirements that demand “special characters,” but correcthorse isn’t an algorithm that handles idiot “password best practices” well anyway. So, since every word with an apostrophe has a pair word without one, we pull 2·N words to make sure we have enough. Then we strip out the plural/possessives and capitalize every word. Then we remove duplicates and select our N words from the result. Finally, we compact that into a space-less string of words, and if the user passed the -d option, we downcase the entire thing.

Without the user options, this really could be a 1-liner; that’s how it started:

alias pony="shuf -n 12 /usr/share/dict/american-english | sed 's/'\"'\"'.*//; s/^\(\w\)/\U\1/' | sort | uniq | shuf -n 6 | xargs echo | tr -d ' '"
  • I don’t feel great about the 2n solution to apostrophes. You could just as well end up with 2n words with apostrophes, no? Its not particularly robust.

    It doesn’t matter - the algorithm takes the stems, it doesn’t drop the words. “Dad’s” becomes “Dad”. If you get both “Dad’s” and “Dad”, you might indeed get a passphrase containing “DadDad” - but that’s not a weakness. Good randomness doesn’t include a guarantee of no duplicates. In fact, the uniq call reduces the quality of the passphrase: “DadDadDadDadDadDad” is a perfectly good phrase.

    But it’s a good catch in another way: I’d considering only plurals and possessives, but the American dictionary word file does indeed include many words with more than one apostrophe suffix. No word of more than one letter appears more than 5 times, so 5n would guarantee enough different words. But the best thing about your comment is that it exposes another weakness: the dictionary contains several 1-letter “words”, and one of them - “O” - contains 25 variations with apostrophes. They’re all names: “O’Connell”, “O’Keefe”, etc. The next largest is “L” with 8 appearances: all borrowed words from French, such as “L’Amour”.

    I don’t see a simple solution to excluding names, although a tweak could ensure that we get no single letter words. However, maybe simplifying the algorithm would be better: simply grab N words and delete any apostrophes. You might end up with mush like “OBrianMustveHed”, but perhaps that’s not a bad thing.

    Perhaps the best implementation would be the simplest:

    alias pony="shuf -n 6 /usr/share/dict/american-english | xargs echo | tr -d ' '
    

    Leave in the apostrophes; more random bits. Leave in the spaces, if they’re legal characters in the authentication program, and you get even more.

    • apotheotic (she/her)@beehaw.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 days ago

      Aaaaah I totally misunderstood why you were taking 2n. You were taking 2n in case the truncated string was the same as one you already had. Makes more sense now.