up goer five wordlist noodling
Jan. 23rd, 2013 10:36 amUpgoer-five stuff - if I used the COCA first 1000 distinct entries on http://www.wordfrequency.info/top5000.asp instead we'd lose the following 364 words:
( lost )
and we'd gain the following 399:
( gained )
The lists aren't the same length because of the difference in the way they treat words with apostrophes in - I've removed apostrophe-containing terms from both lists. Note also that the COCA list categorises words by their part of speech, so this is a bit approximate, and might exclude lexemes that score pretty highly in lots of parts of speech, but are not in the top 1000 in any single one. If I have time I'll find a way to combine them.

( lost )
and we'd gain the following 399:
( gained )
The lists aren't the same length because of the difference in the way they treat words with apostrophes in - I've removed apostrophe-containing terms from both lists. Note also that the COCA list categorises words by their part of speech, so this is a bit approximate, and might exclude lexemes that score pretty highly in lots of parts of speech, but are not in the top 1000 in any single one. If I have time I'll find a way to combine them.
