At present, I have a data file that contains weighted associations for half of the Gutenberg corpus, around 600,000 words. Most of these words are even in English, though some crap data did work its way in places where English texts included non-English phrases or a text was incorrectly identified as English even if it was in some other language. My parser isn't that smart to figure out that it is being fed words that aren't English.

Over the next couple of days, I hope to put this up and searchable. The datafile is 185MB. Please email me if you would like access to it (but please respect my copyright).