I really enjoyed the article, reading it more from the perspective of what 21st-century lexicography could be, less as a customer of a word game however thoughtfully designed. As a Wiktionary editor (and Android user who's also grown out of bare word-relationship puzzle games) though, it's sad that there seems to be no way to just use the end-product network as a reference, which I would love to do, but I suppose they did spend a million bucks on it.
I'll also use this post to wish that more people would edit Wiktionary. It has such a good mission (information on all words) and yet there are only like 80 people editing on any given day or whatever. In some languages, it's even the best or most updated dictionary available. The barriers to entry and bureaucracy are really not high for HN audience types.
> it's sad that there seems to be no way to just use the end-product network as a reference, which I would love to do, but I suppose they did spend a million bucks on it.
From the OP: "This research and computational scale was made possible by $295k NSF SBIR seed funding (#2329817) and $150k Microsoft Azure compute resources." Does that NSF funding mean it's open source? Also, I'm not 100% sure that the quote applies to all the research rather than just one component of it.
> I'll also use this post to wish that more people would edit Wiktionary. It has such a good mission (information on all words) ...
I support open source, contribute to it, and love the spirit of Wiktionary, I don't understand the practical reality of applying 'wisdom of the crowds' to a dictionary, especially the English edition, for two reasons:
Definitions are highly accurate (complete, correct, consistent), highly precise things - otherwise, what is their value? Assuming Wiktionary is descriptive - reporting the words' actual usage - it takes quite a bit of scholarship, skill, and editorial resources not to mislead people. I can't just write what I think it means - the meaning to me might not match the meaning to the person at the next desk. It takes quite a bit of research, using powerful (and sometimes expensive) tools, and understanding of lexicography to be complete and also precisely correct, including usages in places and times that are mostly unknown to any particular author. Also, writing definitions is tricky: You are using words - which have those aformentioned problems with meaning - to define words. Also, any writing anywhere can be easily misinterpreted - skill and editors are needed to avoid misunderstanding. How is the accuracy and precision problem solved?
Also, in English there are already many authoritative sources, many with a century of profesional lexicography behind them by the best in the business. Some are free. There are also meta-lookup engines such as Wordnik and OneLook. Why use Wiktionary? The few times I've compared definitions or etymologies, the authoritative sources almost always exceed or equal Wiktionary (though online copies of older print editions suffer from the minimalism caused by the constraint of printing costs). Arguably, there is nothing else both unabridged and free: Oxford unabridged costs $, so does Merriam-Webster (the free edition is abridged); American Heritage is free, but has the minimalism issue I mentioned above.
"Why use Wiktionary?"
I can answer that one. I have free access to the Oxford English Dictionary (OED), which is brilliant and generally more detailed and reliable than Wiktionary when it has the word I'm looking for, but their login page is so awful that I sometimes use en.wiktionary.org instead just to save my time and temper. Also, en.wiktionary.org has proper nouns, other languages, and occasionally it has some recent or technical English word that OED does not have. So if I'm doing some serious amateur research: OED. But if I'm doing a crossword and want to check that a word exists and is spelt how I think it is: Wiktionary.
> their login page is so awful
I've used the OED login page: username, pw, [] keep me logged in. What is so awful?
I'm one of those people who says, unironically, "words have meanings." I readily argue with people who present "language is living and evolves" - sure, but in order to communicate we have to agree on a decent subset of overall definitions.
I enjoy etymology, maybe too much. It's like magic, finding out what a barrow was, or how filibuster has a direct lineage to pirates (freebooters... In Dutch.)
I can't afford, really, the nicer old English, scandi, frisan, Norse, etc. etymology dictionaries. I have incomplete scans that were printed and bound of some of them. I still have 6 etymology dictionaries, so I can be about as quick getting a dictionary as getting on the computer and going to !eo.
> in order to communicate we have to agree on a decent subset of overall definitions.
sociologically speaking, however, it is precisely that agreement that is what evolves alongside changes in spelling, pronounciation (and occasionally "new" words).
>I'm one of those people who says, unironically, "words have meanings." I readily argue with people who present "language is living and evolves" - sure, but in order to communicate we have to agree on a decent subset of overall definitions.
A few things.
>we have to agree on a decent subset of overall definitions.
Yes but we should fairly obviously understand that a word can have multiple, often competing meanings, and make an effort to learn the new ones as they become available.
As language shifts, and its shifted rapidly in my own lifetime, you can either make an effort to keep up, or be a sourpuss and refuse to understand changes in language.
It seems to me there's usually a political dimension to people who refuse to understand what people mean, because its easier to denigrate people if they cling to definitions that aren't intended by their political opponents use of a word.
I see this shit constantly mind. Gender. Liberty. Capitalism. Communism. People get stuck fighting useless battles over the right to define a word instead of just learning and embracing their opponents intention.
> It seems to me there's usually a political dimension to people who refuse to understand what people mean, because its easier to denigrate people if they cling to definitions that aren't intended by their political opponents use of a word.
and to an extent, the rest of your comment - the solution, according to my PhD friend, is to establish the framing of the argument before you actually have the argument. It's more fun to not establish framing, but it's more effective to establish framing, first. I wonder if i have the publication (thesis?) he made on my NAS.
Yeah absolutely. I tend to just use definitions when I want someone to get my meaning rather than hotly contested words.
I don't think definitions "are" highly accurate precise things. Sometimes yes. The same scholarship, skill, and need to not mislead also applies for so many other things: encyclopedic articles, taxonomies, news, maps, operating systems. Do people still question the value of Wikipedia, OpenStreetMap? Yeah, there are problems with them, and with peer review. Using fuzzy words (or fuzzy phonetic symbols, fuzzy categories, fuzzy semantic links…) to define words is a problem (if at all) of literally any dictionary. I don't see any of these as particularly unique obstacles for Wiktionary.
Unabridged dictionaries take decades to release new editions and are still navigating transition into the exploding digital age. They are so expansive in scope, while often so limited in resources, and barely accept any crowd contributions. Such deliberately slow-going is often a good thing, but words also change quite quickly and these sources are now playing a very long game of catch-up. (Yesterday I tried to verify the latter English senses of "fandango" on Wiktionary with other dictionaries; OED's entry has not been touched for 131 years! What am I going to do with that, I need to use / understand the word now!)
Wiktionary is the big web-native word-resource (and is not cluttered with commercial junk) – allowing links, expandable quotes, images, diagrams, etc. that print's minimalism suffers from as you mention. When someone in 2025 wants information on a word, they'll likely use a search engine and click a link to Wiktionary (where Google blurbs steal some data from). Maybe they are a student wanting to confirm their nonstandard pronunciation with the IPA (still rarely used in mainstream English dictionaries) or if it's recognized in their own dialect (mainstream dictionaries rarely provide more than UK and US pronunciations) – if enough people have the same question, Wiktionary seems like the best place to put the answer – or see an accessible etymology tree. While you probably know this, it's also worth reminding that English Wiktionary isn't just for English words, it is a dictionary of all languages' words, which is written in English. It has metadata and links connecting languages' words that you can't find elsewhere.
Yes, I indeed do want people to just write what they think a word means – as a starting point in a collaborative refining process. I believe the number of word-users in the world with valuable potential contributions is a lot closer to a billion than the thousand gatekeepers working hard on classical dictionaries. The barrier to entry is really low, but the tooling could still be much better. This is one reason i'm putting my appeal under this article - because I think (professional) lexicography can stand to evolve more in the 21st century. (And are people today really buying enough dictionaries to sustain a professional version of Wiktionary, or even a professional dictionary offered in structured data form?) If we don't contribute to a crowdsourced dictionary, then we won't have any such thing.
(Meta-lookup sites are link/search engines, not dictionaries and IME really don't do a good job synthesizing their information or conventions.)
Wiktionary can be of great value without denigrating others.
> Unabridged dictionaries take decades to release new editions and are still navigating transition into the exploding digital age.
OED is now a 100% online service - a website - that releases updates every quarter, like much software. I don't see them 'still navigating' at all.
> barely accept any crowd contributions.
OED is famous for being arguably the first crowd-sourced research project. James Murray, the first great editor and driving force behind the first edition, solicited contributions from the public of usages of words and had a massive filing system of slips with all the contributions.
"Dictionary work relied on so much correspondence that a post box was installed right outside Murray’s Oxford home ...". "His children (eventually there were eleven) were paid pocket money to sort the dictionary slips into alphabetical order upon arrival." [0]
Today OED still solicits contributions, including specific appeals to the public. Every entry in the OED has a 'Contribute' button.
https://www.oed.com/information/using-the-oed/contributing-t...
> (Yesterday I tried to verify the latter English senses of "fandango" on Wiktionary with other dictionaries; OED's entry has not been touched for 131 years! What am I going to do with that, I need to use / understand the word now!)
You are misunderstanding what 'revise' means to the OED (which is unnecessarily confusing); they still update entries without a full revision. If you look at the entry history:
fandango, n. was first published in 1894; not yet revised.
fandango, n. was last modified in March 2025.
> I don't think definitions "are" highly accurate precise things. Sometimes yes. The same scholarship, skill, and need to not mislead also applies for so many other things: encyclopedic articles, taxonomies, news, maps, operating systems. Do people still question the value of Wikipedia, OpenStreetMap?
I think there's a difference between requirements - or expectations - for a dictionary and Wikipedia:
My guess is that people don't question Wikipedia because they have different expectations for it: They don't expect accuracy, as defined by the Three Cs: Completeness, Correctness, Consistency. Wikipedia is more the accumulation of information generally believed about a topic (with some standards, imperfectly followed, for secondary source support - but secondary sources reflect general, consensus belief). It's not expected to be Complete; no encyclopedia can completely cover any topic - the point is to be a starting place, a summary - and anyway Wikipedia is a sort of work in progress. It's not expected to be Correct; it's what people generally believe. And Consistency is tough with so many authors. It's really an product of the post-truth era; that's what people want - just try questioning it.
People's expectation for dictionaries - or my expectation at least :) - is not a starting point but the final word. Almost always I already have an idea of what the word means - from partial knowledge, from experience, from context, from its components. I'm expecting the Three Cs from the dictionary, to put a fine point on my understanding and use of the word, to fill in my blind spots - including knowledge of how others have been understanding and using the word.
Maybe Wiktionary just isn't for me. But I worry that people do assume it's CCC - many people believe anything they read is accurate, especially something from an authoritative-looking source - and are confused by it.
[0] https://www.oed.com/information/about-the-oed/history-of-the...
Could I make a plea to make a wikitionary export easier to find/use? Assuming I can even find the magical page which hosts them, Wikipedia dumps are terribly documented and seem to incorporate shorthand which I do not recognize.
And they are full of wiki markup, templates, and inconsistent formatting. A human brain can easily understand it, but automated parsing is impossible (pre LLM).
>I'll also use this post to wish that more people would edit Wiktionary.
If it's anything like wikipedia, there is probably a reason more people aren't working on it, and it's because the existing people discourage it.
I get the impulse to assume they'd be alike, but I've found that Wiktionary really isn't much like Wikipedia.
The Wiktionary equivalent of citing sources confuses me.
https://en.wiktionary.org/wiki/Wiktionary:Criteria_for_inclu...
Which words should be attested? Presumably only uncommon ones? And how is it done, is the "quotes" section the attestation? Is there vandalism to clean up, like people adding their own names to define themselves as awesome? Wiktionary seems to "just work", and I don't really understand what holds it together.
I have a feeling that LLM model collapse will be accelerated as humans lose control of smaller Wiki projects like Wiktionary.
They’ll be unable to effectively patrol or prevent generative updates to the project, and for all intensive porpoises, humans will be unwilling to step foot into disputes, and AI will have free reign to redefine all human knowledge.
I second that! I have edited a few Wiktionary pages myself, and find it's a better overall environment than Wikipedia, if you can find something meaningful to add.