So the numbers to beat were 58,795 letters, or 10 × 540 words. I was able to exceed the goals, writing a program that generates a palindrome with 90,439 letters and 21,012 words.
Cognoscenti such as Mark Saltveit, editor of The Palindromist, rightfully point out that my creation should not be called a true palindrome, because it makes no sense. But Saltveit says that I am probably safe in calling this "the world's longest palindromic sentence, or the world's longest parody of `A man, a plan.' " I agree with that assessment.
The resulting program was easy to put together. (I appreciate the bug reports by Jasvir Nagra and Dan Hoey to fix it.) Let's look at the results:
|Palindrome:||A man, a plan, a caddy, Ore, Lee, tsuba, Thaine, a lair, ... (more) ..., Aeniah, Tabu, Steele, Roydd, a canal, Panama.|
Commentary: Maybe I'm biased, but I think the palindrome starts out quite strong. "A man, a plan, a caddy" is the basic premise of a classic piece of storytelling. Unfortunately, things go downhill from there rather quickly. It contains truths, but it does not have a plot. It has Putnam, but no logic; Tesla, but no electricity; Pareto, but no optimality; Ebert, but no thumbs up. It has an ensemble cast including Tim Allen, Ed Harris and Al Pacino, but they lack character development. It has Sinatra and Pink, but it doesn't sing. It has Monet and Goya, but no artistry. It has Slovak, Inuit, Creek, and Italian, but it's all Greek to me. It has exotic locations like Bali, Maui, Uranus, and Canada, but it jumps around needlessly. It has Occam, but it is the antithesis of his maxim "Entia non sunt multiplicanda praeter necessitatem." If you tried to read the whole thing, you'd get to "a yawn" and stop. Or you might be overcome by the jargon, such as PETN, ILGWU, PROM, UNESCO, and MYOB. Most serendipitous of all is that Steele, who collected several shorter versions of the Panama oeuvre in a book about a Lisp, shows up in the very last line. Steele and some others have some comments. You can read the results from top to bottom (if you don't get bored) or you can start in the middle; the letter "y" in "Moray".
Speed: The program increases the length of the longest palindrome found by roughly 200 words every second; so in 3 or 4 seconds it breaks Hoey's record, and in 30 seconds it is over 6000 words. At around 8000 words progress slows to about 1,000 words per minute, and by around 10,000 to 12,000 words progress is sporadic. This is because we are running out of good words: there are 126,000 words in the dictionary, but only about 10% of them are easily reversible. For example, there are 426 words that contain "eq" or "sq", but these are hard to use in a palindrome because there are no words containing "qe" or "qs", and only a few words that end in "q" (and could then be followed by a word starting with "e" or "s".
|Palindrome:||A man, a plan, a cameo, Zena, Bird, Mocha, ... (more) ..., Comdr, Ibanez, OEM, a canal, Panama!|
Commentary: After a reader sent in a suggestion, I made 4 changes to the program:
Speed: The program now starts out increasing by 1000 phrases per second, consistently gets over 10,000 phrases in under 30 seconds, and usually gets to 12,000 phrases in the first minute. Things slow down from there, but in ten runs all but one were over 17,000 words, which takes about an hour.
I thought I had leaped past Hoey's 540-word Panama Palindrome by a factor of 30, but when I showed him my palindrome, he said that he was really thinking of sticking with noun phrases of the form "<indefinite article> <noun>."
This is an example of the venerable language induction problem: given the single sentence "A man, a plan, a canal--Panama" as evidence, what language does it define? It seems clear that it consists of a series of any number of noun phrases. That is,
S => NP*But from one example, you can't say much more. If we look at the examples from Hoey and Steele, we can see they all start and end the same, so we can say:
S => "A man" "A plan" NP* "A canal" "Panama"But Hoey says that every noun phrase must have an indefinite article:
NP => IndefArt Noun IndefArt => "a" | "an"while I was allowing:
NP => ProperNoun NP => IndefArt Noun IndefArt => "a" | "an"Guy Steele suggested I should also try allowing other noun phrases, such as "two Xs". Gold said the language induction problem can't be solved in general, but others (e.g. Horning, Mooney, Muggleton, de Marcken) have shown that it can be solved if you use a "probably approximately correct" approach rather than a strictly logical formalism. With Hoey in mind, I also generated a solution with all indefinite articles:
|Created:||3:00-2003 (I forget what day, exactly)|
|Program:||pal2.py with a restricted dictionary|
|Palindrome:||A man, a plan, a casa, a bait, a lag, a malt, ... (more) ..., a natl, a mag, a lati, a baas, a canal, Panama!|
Commentary: Hoey said he thought that a better word list and
a smarter program could get to ten times his 540-word palindrome,
using only noun phrases with indefinite articles. I'm pretty sure that
will never happen. The problem is a dirth of "a"s. According to
Hoey's rules, every phrase must start with the letter "a". That means
that either the rest of the word must be an exact reverse of another
word (and we know there are 1100 of these) or the phrase must have
another "a" in it somewhere, and it must be matched by two or more
other phrases. Phrases such as "a man", "a plan" and "a canal" work
well because they contain multiple "a"s. Now consider a phrase such
as "a biologist". If that appears in the palindrome, then somewhere else
the letters "tsigoloib" must appear. But note that those letters must
all appear in one word/phrase, because there is no "a", and we only
get word boundries at "a"s. And of course, there is no single word that
contain those letters.
In general, take a word (such as "an asparagus" or "a biologist"),
split it into components around the "a"s (yielding
["n", "sp", "r", "gus"] and ["biologist"]). Collect the set of all
such segments, from all the phrases in the dictionary.
Now go back through the dictionary,
and for each word, see if the reverse of each of its components is
in this set. So "an asparagus" is good, because its reversed components
all appear in the set: "n" appears in many places (including "an asparagus"
itself), "ps" appears as a component in "a psalm", "r" appears in many
places (such as "a karat"), and "sug" appears in "a sugar". On
the other hand, "a biologist" is no good, because the component "tsigoloib"
does not appear.
When I first applied this test, I started with the 69,241-word anpdict.txt (containing only noun
phrases starting with "a" or "an"). I checked to see whether each
reversed component of a phrase appeared anywhere in any other phrase.
That eliminated "a biologist", but it let "a zoom" stick around,
because the reversed component "mooz" appears in "a schmooze". Doing
this level of reduction gets us down to a dictionary of 11,065
phrases. But on reflection I realized that not only does each
reversed component have to appear in some word, it must appear as a
component of some word. The fact that "mooz" appears in "a
schmooze" is not good enough. That would only work if "a zoom" could
be followed by the letters "hcsa", which of course it cannot; it must
be follwed by the letter "a". Using that stricter test, we get down
to only 4,528 noun phrases that are
actually usable. So to get a 5,400 word palindrome we would need to
start with a bigger dictionary.
Speed: My program consistently generates palindromes of over
2000 words in under 10 seconds using the 4,528 word dictionary. It doesn't
go much beyond that.
Version 3: 21,012 Word Palindrome
When I first applied this test, I started with the 69,241-word anpdict.txt (containing only noun phrases starting with "a" or "an"). I checked to see whether each reversed component of a phrase appeared anywhere in any other phrase. That eliminated "a biologist", but it let "a zoom" stick around, because the reversed component "mooz" appears in "a schmooze". Doing this level of reduction gets us down to a dictionary of 11,065 phrases. But on reflection I realized that not only does each reversed component have to appear in some word, it must appear as a component of some word. The fact that "mooz" appears in "a schmooze" is not good enough. That would only work if "a zoom" could be followed by the letters "hcsa", which of course it cannot; it must be follwed by the letter "a". Using that stricter test, we get down to only 4,528 noun phrases that are actually usable. So to get a 5,400 word palindrome we would need to start with a bigger dictionary.
Speed: My program consistently generates palindromes of over 2000 words in under 10 seconds using the 4,528 word dictionary. It doesn't go much beyond that.
|Program:||pal3.py or pal3.ipynb.|
|palindrome:||A man, a plan, a caretaker, ... (more) ..., Komarek, a ter, a canal, Panama!|
Commentary: When I saw a meme proclaiming the 6-10-2016 palindromic date, I went back to an old idea: maybe I could find a longer palindrome by advancing letter-by-letter rather than phrase-by-phrase. I could always choose a letter that advances both left and right, so I might not have to back up as much. Even better, I could choose first the letter that leads to the most completions of words on both left and right. It worked! I got a longer palindrome. The resulting program is quite similar to the original version, but I get to use more modern Python data structures (like a Counter, replacing the bisect class.) See the Python notebook for more on this version.
Speed: This program is about the same speed, but it makes more steady progress;, and ends up exceeding version 2.
|Program:||pal3.py or pal3.ipynb.|
|palindrome:||A man, a plan, a caret, a rec, ... (more) ..., a banana, a nan, a macerater, a canal, Panama!|
Commentary: We get a little farther than with the previous version.
|Peter Norvig, 20:02 02/20 2002||See some comments on this page.|