| pre-1800 | 1800-99 | 1900-24 | 1925-49 | 1950-74 | 1975-1999 | 2000-now | overall | |
|---|---|---|---|---|---|---|---|---|
| words | 87,217 | 97,734 | 98,425 | 99,525 | 99,968 | 100,000 | 100,000 | 100,000 | 
| occurrences | 3 B | 79 B | 53 B | 47 B | 112 B | 248 B | 203 B | 744 B | 
The resulting data file (5MB) is sorted by overall word frequency, with each line containing a single word followed by tab-separated occurrence counts for each of the seven time periods. The first and last 15 lines look like this:
| THE 214386567 6205478005 4136227396 3596808601 8434878977 17095905562 13413716353 OF 139080842 3835444951 2455783120 2141178652 5039228492 9911392356 7443965819 AND 104735110 2636959817 1659391866 1411410347 3297450997 7362373093 6159703274 TO 92589834 2236903043 1384523293 1209170285 2841681262 6260231443 5322298917 IN 61663464 1727940862 1202136632 1102307412 2695341106 5710080790 4391594997 A 52666775 1518556104 1067315757 965781133 2278228256 5149986234 4277553636 IS 30222523 845922593 630814098 548758334 1335987058 2818372092 2174169987 THAT 42207807 919231597 570562753 495588435 1148001772 2560771539 2264404325 FOR 22709571 587562031 427461710 407327855 988526132 2270067239 1841627493 IT 30586232 757131617 483135394 402426501 860369525 1702985300 1503450800 AS 25127245 649084129 428121174 366815239 850033321 1835148950 1546315200 WAS 21836236 679586883 445264203 397816862 852480827 1659824046 1445904911 WITH 22650143 600104567 377661407 325318042 755940286 1682707617 1418415187 BE 26016783 591061444 381343530 328060974 773748230 1541412667 1177221157 BY 24913613 624134890 379710635 322171395 767001993 1494625868 1090547690 . . . PREUVES 978 19027 6292 5140 19964 28651 15500 TARANTULAS 172 5254 6071 6385 9801 35034 32834 SPIRULINA 0 848 791 413 2524 50940 40033 CORNEY 113 12993 10726 7575 12067 26828 25245 PATHBREAKING 0 26 190 190 3218 47158 44764 LITTORINA 0 6275 4960 9937 21117 42326 10926 HOOKAH 106 13108 5647 4900 14804 29150 27823 AUSLAND 0 5224 3435 5333 19312 38004 24227 ROUMANIE 0 1157 4181 6801 26821 42310 14263 IVAS 6159 22550 6591 6779 12868 18666 21918 ALANS 1305 17402 7660 8129 18759 21509 20765 GORDIE 6 118 760 3680 5721 45385 39859 THATCHERITE 0 0 2 12 4 53353 42157 EXCOMMUNICATING 2362 30983 10239 6014 13512 16522 15895 HEUSER 2 761 3149 10533 17964 41368 21750 | 
In the table below, you can see the percentages for each letter, by time period. Overall, there has not been much change over time. The biggest change is that, as the Scrabblists have noted, there has been a steady increase in the frequency of "Z", doubling since pre-1800 (although the change in the 75 years since the invention of Scrabble has been smaller, from .08% to .10%).
In each column, the letters are ordered by frequency. When there is an exchange of frequency order for a time period (compared to the overall frequency) I have placed a horizontal line between the two exchanged letters (for example, "O" is more common than "A" in pre-1800). We see that 1950-74 is the most average time period (no letter exchanges), and 1975-99, which contains the so-called "me" decade, is the only decade where "I" surpasses "O" (but the word counts for "me", "my", and "I" are not unusual in that time period).
| E: 12.79 T: 9.76 O: 7.73 A: 7.69 I: 7.19 N: 7.07 S: 6.18 H: 6.26 R: 6.16 D: 3.93 L: 3.52 C: 2.84 U: 2.75 F: 2.70 M: 2.48 P: 1.92 W: 1.91 G: 1.72 Y: 1.74 B: 1.58 V: 1.07 K: 0.45 X: 0.21 J: 0.18 Q: 0.13 Z: 0.04 | E: 12.78 T: 9.50 A: 7.78 O: 7.67 I: 7.25 N: 7.10 S: 6.43 R: 6.15 H: 5.94 D: 3.96 L: 3.80 C: 3.01 U: 2.70 F: 2.61 M: 2.42 P: 1.95 W: 1.90 G: 1.77 Y: 1.70 B: 1.54 V: 1.04 K: 0.47 X: 0.21 J: 0.15 Q: 0.12 Z: 0.05 | E: 12.67 T: 9.42 A: 7.93 O: 7.66 I: 7.32 N: 7.12 S: 6.47 R: 6.19 H: 5.63 L: 3.97 D: 3.89 C: 3.09 U: 2.70 F: 2.57 M: 2.43 P: 1.98 W: 1.85 G: 1.83 Y: 1.69 B: 1.53 V: 1.00 K: 0.52 X: 0.21 J: 0.14 Q: 0.11 Z: 0.07 | E: 12.59 T: 9.36 A: 7.99 O: 7.66 I: 7.44 N: 7.16 S: 6.47 R: 6.24 H: 5.36 L: 4.02 D: 3.85 C: 3.21 U: 2.71 F: 2.52 M: 2.46 P: 2.06 G: 1.84 W: 1.77 Y: 1.66 B: 1.52 V: 1.02 K: 0.52 X: 0.22 J: 0.14 Q: 0.12 Z: 0.08 | E: 12.52 T: 9.33 A: 8.03 O: 7.64 I: 7.64 N: 7.24 S: 6.51 R: 6.29 H: 5.05 L: 4.06 D: 3.76 C: 3.38 U: 2.71 M: 2.51 F: 2.46 P: 2.15 G: 1.81 W: 1.64 Y: 1.63 B: 1.50 V: 1.05 K: 0.49 X: 0.24 J: 0.15 Q: 0.12 Z: 0.09 | E: 12.41 T: 9.19 A: 8.11 I: 7.68 O: 7.63 N: 7.29 S: 6.55 R: 6.35 H: 4.74 L: 4.15 D: 3.76 C: 3.48 U: 2.74 M: 2.55 F: 2.35 P: 2.22 G: 1.88 Y: 1.64 W: 1.57 B: 1.47 V: 1.07 K: 0.54 X: 0.25 J: 0.16 Q: 0.12 Z: 0.10 | E: 12.40 T: 9.20 A: 8.11 O: 7.64 I: 7.61 N: 7.25 S: 6.52 R: 6.27 H: 4.88 L: 4.12 D: 3.84 C: 3.38 U: 2.76 M: 2.53 F: 2.29 P: 2.16 G: 1.94 Y: 1.69 W: 1.67 B: 1.45 V: 1.06 K: 0.60 X: 0.24 J: 0.17 Q: 0.12 Z: 0.10 | E: 12.49 T: 9.28 A: 8.04 O: 7.64 I: 7.57 N: 7.23 S: 6.51 R: 6.28 H: 5.05 L: 4.07 D: 3.82 C: 3.34 U: 2.73 M: 2.51 F: 2.40 P: 2.14 G: 1.87 W: 1.68 Y: 1.66 B: 1.48 V: 1.05 K: 0.54 X: 0.23 J: 0.16 Q: 0.12 Z: 0.09 | 
When Alfred Butts invented Scrabble in 1938, he determined the point values based on a frequency analysis of English letters (done by hand, not by computer). In the letter frequency column of the table below, we see that point value does indeed vary roughly inversely with letter frequency in the English books corpus. (In every column of the table, letter frequency is normalized against the letter "Q". That is, by definition "Q" has a frequency score of 1, and the score of 104 for "E" means it is 104 times more frequent. The Scrabble point value of each letter is shown in parentheses.)
| E: 104 ( 1) T: 77 ( 1) A: 67 ( 1) O: 64 ( 1) I: 63 ( 1) N: 60 ( 1) S: 54 ( 1) R: 52 ( 1) H: 42 ( 4) L: 34 ( 1) D: 32 ( 2) C: 28 ( 3) U: 23 ( 1) M: 21 ( 3) F: 20 ( 4) P: 18 ( 3) G: 16 ( 2) W: 14 ( 4) Y: 14 ( 4) B: 12 ( 3) V: 9 ( 4) K: 5 ( 5) X: 2 ( 8) J: 1 ( 8) Q: 1 (10) Z: 1 (10) | 
To play a letter in Scrabble, you must form a word. The words column above shows the relative numbers of distinct words in the Scrabble word list that contain each letter. Of the 178,691 words in the Tournament Word List TWL06, 124,243 (or 70%) contain an "E", but only 2,576 (1.4%) contain a "Q". (Does that mean the "Q" should be worth 124243/2576 = 48 points? I don't think so, but you can decide what you think it means.) It does seem that there is an inequity in that there are 3 times as many words containing a "Z" than a "Q", but "Z" and "Q" have the same point value (10). Note also that "S" has moved up from the 7th spot to the 2nd -- in part because there are so many nouns that have a plural form ending in "S".
Not all Scrabble words are equally easy to play. You are more likely to be able to make "AT" than "SYZYGY." The weighted words column above compares the weighted sum of words that contain each letter. The weighting is by the number of letters: two-letter words are deemed easiest to make; a three-letter word was weighted as 4 times harder to make, a four-letter word as 4 times harder than a three-letter, and so on. (Why 4 times? It is somewhat arbitrary but based on the idea that 26 letters divided by 7 letters in a rack is approximately 4.)
Not all three-letter words are equally easy to play. It is hard to make "ZAX" because there is only one "Z" and one "X", and easy to make "EAT". In the first play column, I report the relative frequencies of being able to play a letter, based on the actual probability of being able to play each possible word as the first play of the game. For example, the probability of being able to play "THE" turns out to be 9.4%, based on the probability of drawing a "T", "H", and "E" (or blanks to make up for these letters) out of the seven letters in a hand.
Words longer than 7 letters are impossible on the first turn, but possible on subsequent turns. In the second play column of the table above, I show the letter frequencies based on the probability of playing a word as the second play of the game. That is, the word must either intersect the first-played word at one letter, or it must use all the letters of the first word. (That way, we can make words up to 14 letters.) I didn't attempt to model plays beyond the second, but I think the numbers would not change too much from the second play.
Conclusion: Based on the data above, I will make three possible proposals for Scrabble letter values: