Use LLMs for Translation and Fallible Reasoning
I keep seeing AI skeptic takes like "AI is useless" or "LLMs are only good at Natural Language Processing", usually because of hallucinations and AI slop. Both are real problems. But that conclusion still misses the point: it misidentifies what LLMs are actually good at.
My take is: LLMs are great at translation (in the broad sense), and they have emergent but fallible reasoning. Knowledge, in the "just tell me what's true" sense, is not their strong suit, despite how convincingly they can sound.
Translation: what LLMs are actually for
Translation was one of the original NLP tasks that motivated their creation (literally: translating between natural languages). But "translation" is a lot broader than Spanish-to-English. It's taking information in one form and turning it into another form.
Once you start looking for it, a huge number of practical problems are just translation tasks. Programming is translating from "what I want" into working code, which is why generative AI feels most obviously useful there.
Question-answering with Retrieval-Augmented Generation (RAG) is also translation. The LLM is turning retrieved snippets into the answer shape the AI Engineer asked for. The LLM gets context from somewhere else, and it turns that into a response. That's not "the model knows"; it's "the model can rewrite and synthesize what's in front of it."
As an aside: consumer tools like ChatGPT and Claude often do pull in web or other data sources behind the scenes. So yes, you can ask factual questions and often get decent answers. But that's because the product is a bundle of systems (retrieval + tools + orchestration), not a naked LLM sitting in a box.
Reasoning: the surprising emergent capability
The big surprise was that this translation machine could also do reasoning. Emergent abilities appear as models scale up, showing capabilities that weren't explicitly trained for. The most plausible story I've heard is that the training data contains lots of examples of human reasoning: explanations, proofs, arguments, step-by-step solutions. The model learns those patterns and can use language to "think through" problems, imitating step-by-step progress. Techniques like Chain-of-Thought prompting leverage this by explicitly encouraging the model to show its work.
But it's not formal reasoning, like decades-old computational systems that use provable algorithms to guarantee correctness. LLMs don't have that guarantee.
That said, human reasoning is also a messy mix of pattern-matching and approximation. Heuristics and biases research, from Tversky and Kahneman's foundational work onward, shows that humans use mental shortcuts and pattern matching when reasoning, leading to systematic errors. Human reasoning isn't formal and provable either. It's approximate and fallible.
LLMs often make mistakes that rhyme with human reasoning failures: confirmation bias, jumping to conclusions, missing edge cases. They can conflate content plausibility with logical validity, accepting plausible-sounding arguments that aren't logically sound. Still, they're remarkably good, just as smart humans are at reasoning in domains they know well, despite making errors.
It's worth noting that both humans and LLMs use external tools to increase their reasoning capabilities. Humans use pen-and-paper, calculators, and formal systems for reasoning they can't do reliably in their heads. LLMs can use similar tools -- calculators, code execution environments, formal logic systems, or retrieval augmentation. The parallel suggests the gap between human and LLM reasoning might not be as fundamental as it first appears.
Taxonomies of Understanding
Another lens on what LLMs are good at is Bloom's Taxonomy, which categorizes levels of learning and understanding from basic to advanced:
Bloom's Revised Taxonomy
- Remembering: This is the most basic level, and LLMs are oddly shaky here. The classic hallucination is a confident-sounding "fact" that's just wrong.
- Understanding: Organizing and summarizing information. They're excellent at this.
- Applying: They can often solve problems in new-but-analogous domains. In software engineering, translating between specs and code is where they shine. In less-structured work, "apply" can be hit-or-miss but often still useful.
- Analyzing: They're strong at proposing possible causes, motivations, and structure, especially as a starting point.
- Evaluating: Mixed results, especially when evaluating their own output. In production systems, "evals" (and sometimes one model critiquing another) help, but it's not magic.
- Creating: Originally called "Synthesis" by Bloom. This is another place where LLMs often disappoint: superficial synthesis is easy, but genuinely new, coherent synthesis is harder than it looks.
It's interesting that LLMs excel at the intermediate levels of this taxonomy but fail more often at the bottom and top levels. That's not a human-like pattern. LLMs are not human intelligences.
What this means
If you keep "translation + fallible reasoning" in your head, the practical guidance gets simpler.
Use them for translation-shaped tasks: converting between formats, generating code from descriptions, turning context into answers. Use them for reasoning when you can tolerate mistakes, when a human will review, or when the cost of being wrong is low. And whenever you can, let them use tools (retrieval, calculators, code execution) instead of pretending they have perfect internal knowledge.
What I try not to do: treat them as context-free knowledge bases, no matter how "sure" they sound. I also don't rely on them for formal correctness, or for situations where one bad answer is a real problem. And for the most creative work that requires deep, coherent synthesis, people are still better.
Critically, understand when you're working with a raw LLM vs. a multi-system AI with fewer limitations. Know what you're working with, and match the tool to the problem.