I recently encountered a task that no LLM can solve. It should be super simple for an LLM, but somehow they can’t manage it.
There’s a list of about 1000 words. I need to keep only the most functional words from it, like which, should, would, etc.
Request: I have a list of words: …. Select only 50 words from this list that are primarily functional and carry minimal meaning in the context of keyword searches (for example, which generate significant noise in the case of partial matches). Example – which, shall, very. Do not add any words not present on the list above. The resulting experienced list should contain only words, one word per line.
ChatGPT-4o: started outputting some words alphabetically, ending at the word asking. Thus, it did not even go past asking.
Google Gemini: began inventing words not in the list, despite clear instructions not to do so.
Google Gemini Pro: produced something, but again, invented words that weren’t on the list. Almost half invented.
Anthropic Claude also listed words alphabetically, and stopped at words starting with the letter d.
Mistral 8x7B Instruct also made up half.
In fact, no LLM has managed the task. And it’s about words, not mathematics.
