I'd be interested to see where they got their wordlist from. The UG5 rubric is "the ten hundred words people use the most often", which if interpreted too literally, would include a lot of Chinese.
The COCA list seems rather more formal than the UG5 list - the UG5 list is more "everyday" in its concepts, and includes more profanity.
Unfortunately it is impossible to compose a thought corpus; I suspect that private thoughts account for most word use. Therefore I think getting the UG5 task precisely right is impossible. However, I think a speech corpus would be good. The web has created an explosion of informal written English - emails, the blogosphere, etc. - there should be something in that.
The other "feel" issue, cheating slightly, is that the 1000 words used most by people with small vocabularies may differ from the 1000 most common words used by people with big vocabularies.
Oh, google knows everything, let's ask google. Aha!" - apparently it is a Contemporary fiction word list held on wiktionary.
no subject
The COCA list seems rather more formal than the UG5 list - the UG5 list is more "everyday" in its concepts, and includes more profanity.
Unfortunately it is impossible to compose a thought corpus; I suspect that private thoughts account for most word use. Therefore I think getting the UG5 task precisely right is impossible. However, I think a speech corpus would be good. The web has created an explosion of informal written English - emails, the blogosphere, etc. - there should be something in that.
The other "feel" issue, cheating slightly, is that the 1000 words used most by people with small vocabularies may differ from the 1000 most common words used by people with big vocabularies.
Oh, google knows everything, let's ask google. Aha!" - apparently it is a Contemporary fiction word list held on wiktionary.