Why Machine Translators Still Stumble Over Idioms in 2026: A Guide for Writers, Students, and Translators

Table of Contents
What is Machine Translation? Definition, Process & Technology

Picture a translator feeding a line of Hemingway into a polished AI tool. The grammar comes back flawless. The tense is correct. The names are spelled right. And yet, something is off. The rhythm is gone. A phrase that once shimmered with meaning now lies flat on the page.

This is the quiet crisis of machine translation in 2026. Raw accuracy has soared. Figurative accuracy has not.

For readers of this site, that gap is more than a technical footnote. It is the difference between a simile that breathes and a simile that limps. It is the reason a Japanese haiku in English can feel like a weather report, or why a Spanish proverb in French reads like a stage direction. If you write, translate, or teach literature, the machines are now good enough to tempt you and still not good enough to trust.

This article explains why that is, what the latest research shows, and what has quietly changed in the last eighteen months that gives figurative language a fighting chance.

The Walk in the Park Problem

Start with the English phrase “it’s a walk in the park.” To an English speaker, it means easy. To a literal translator, it means ambulatory exercise in a landscaped public space.

That is the core of what linguists call the non-compositional problem. The meaning of an idiom cannot be built from the meanings of its parts. And as a 2024 academic study titled It’s Not a Walk in the Park pointed out, even top translation systems still render many idioms word by word, losing the figurative sense entirely.

This problem scales with every layer of figurative complexity. Research published on the idioms in literature used in English classics shows that many depend on cultural context that does not exist in the target language. A French reader will recognise “coûter les yeux de la tête” (to cost the eyes of the head) as “to cost a fortune.” An AI system, untrained on that precise cultural mapping, may render it as a description of ocular removal.

Idioms are only the beginning. Metaphors, similes, hyperbole, personification, and allusion all create the same obstacle. They are language speaking at two levels at once, and machines have historically understood only the first.

Five Idioms That Machines Get Wrong

Writers working across languages rarely need a full technical explanation. They need to see the failure. Here are five well-known English expressions and the kind of errors commonly produced when translated naively into other languages.

  1. “Break a leg.” Rendered into German by an early-2020s neural model as a literal instruction to cause a fracture. The theatrical good-luck sense vanishes.
  2. “It’s raining cats and dogs.” Translated into Japanese as a meteorological description of falling animals. No local idiom for sudden heavy rain is selected.
  3. “Spill the beans.” Frequently translated into French as a literal action with legumes, rather than the equivalent “vendre la mèche.”
  4. “The ball is in your court.” Rendered into Arabic as a sports commentary, losing the sense of decisional responsibility.
  5. “Once in a blue moon.” Translated into Chinese as a description of lunar colour rather than rarity.

Each failure looks small on its own. Across a full literary text, they accumulate into something much larger. They strip out what metaphors and figurative comparisons do in the first place, which is carry meaning beyond the literal.

What Actually Happens Inside an AI Translator

To understand why this keeps happening, it helps to know what an AI translator is actually doing.

Neural machine translation systems and Large Language Models both work by predicting the most statistically likely output given a source. They have read billions of sentences and learned which words tend to follow which. This works beautifully for plain prose. It works poorly for figurative prose, because the likeliest literal word is almost never the right figurative word.

A 2024 arXiv paper, Crossing the Threshold: Idiomatic Machine Translation, put it bluntly: idiomatic expressions tend to be translated literally by state-of-the-art systems. The authors found that retrieval-augmented techniques could improve idiomatic accuracy by up to thirteen percentage points, a significant figure that also reveals how far baseline systems still have to fall.

There is a second complication. The standard evaluation metric, known as BLEU, measures surface similarity between an AI translation and a reference translation. As recent research on figurative translation notes, BLEU often fails to detect semantic shifts in idioms and metaphors. A translation can score well on BLEU and still butcher the figurative meaning.

This is why writers have learned, sometimes painfully, that the cleanest-looking AI output is not always the most faithful one. The visible errors are easy. The invisible ones are the ones that matter when you are translating a novel, a poem, or any of the other forms catalogued in studies of literary devices.

The New Rule: Asking 22 Models Instead of One

Until recently, most online translation tools worked by routing a user’s input through one AI model. One opinion. One guess. If that model had a blind spot for Spanish proverbs or Russian diminutives, the user inherited the blind spot.

Something changed around 2024. Instead of asking one AI model, a newer class of tools began asking many and treating the question as one of agreement rather than authority.

MachineTranslation.com is one such AI translator. Its SMART system compares outputs from up to twenty-two different AI models, including ChatGPT, Claude, Gemini, DeepL, and several others, and selects the rendering the majority agrees on. The idea is simple. A single model can hallucinate in isolation. Twenty-two models rarely hallucinate in the same direction at the same time.

The Intento State of Translation Automation 2025 reported that single top-tier LLMs still plateau at roughly 84 to 87 percent accuracy on high-resource European languages and drop further on morphologically complex ones. Internal benchmarks from MachineTranslation.com show that a consensus approach lifts that figure to 93 to 95 percent across those same European languages and, crucially, reduces outright hallucinations from the 10 to 18 percent range down to under two percent.

For figurative language, the implication is direct. If one model translates a Japanese pun literally and seven others recognise the figurative layer, the literal reading is outvoted. The errors that used to slip through, because no one was there to contest them, now get contested twenty-one times.

Case in Point: What the Joke-Test Revealed

One of the harder tests for any translation system is humour. A joke usually depends on wordplay, cultural reference, or timing, which is the same stack that makes idioms and metaphors difficult.

In an internal benchmark, MachineTranslation.com ran thirty jokes through eight different translation engines and compared the outputs. The jokes translated using the platform’s candidate-term display and human review preserved the original humour in thirty-two percent more cases than default single-engine outputs.

Thirty-two percent is not a total solution. It means that roughly a third of the jokes that would have died in translation survived instead. For anyone translating comic fiction, children’s literature, or even a witty corporate tagline, that is the difference between shipping something that lands and shipping something that confuses.

The same pattern holds for poetic text, advertising copy, and any content where the meaning lives in the tone rather than the dictionary definition.

What This Means for Writers, Students, and Translators

None of this replaces a skilled human translator. The best literary translation is still a craft built on years of reading, listening, and cultural immersion. What has changed is the baseline.

For writers working with bilingual material, a consensus-based AI tool is now a reasonable first draft rather than a warning label. For students analysing translated poetry, the modern AI output is a more trustworthy reference point than it was even two years ago, though still not a final word. For professional translators, the role is shifting from producing every sentence from scratch to arbitrating between competing outputs and polishing the one that best carries the figurative weight. Comparing it against the similes drawn from classic literature that a translator already knows remains essential, because the machine still does not understand why Homer’s pine tree matters.

The 2025 Slator Pro Guide describes this shift as a move from task-level execution to outcome-driven supervision. Translators are no longer typists. They are editors of machine drafts, and the better those drafts are, the more attention translators can give to what only humans can do: the final one percent of nuance, cultural adaptation, and emotional resonance.

A Short Checklist Before Trusting Any AI Translation of Literary Text

For anyone using AI tools on text that contains idioms, metaphors, or other figurative devices, a few quick checks go a long way.

  • Run the passage through more than one model and compare. If they disagree sharply, the figurative layer is probably being lost. MachineTranslation.com do this automatically by showing the twenty-two-model comparison side by side.
  • Back-translate the output into the source language. If the back-translation drifts far from the original, something in the meaning layer has broken.
  • For any idiom, search the target language for a known equivalent before accepting the AI version. Many idioms have direct counterparts that machines still miss.
  • For any metaphor, ask whether the target-language image works culturally. A “black sheep” in English is a “lost sheep” in some Romance cultures, a fox in others, and a pariah with no animal reference at all in certain Asian languages.
  • When the text matters, layer human verification on top. A verified translation is still the only way to guarantee full figurative fidelity.

Figurative language is not a solved problem. It may never be fully solved, because it is the place where language stops being information and starts being art. But the gap between what machines could do in 2022 and what they can do in 2026 is real. Consensus architectures, retrieval-augmented models, and better evaluation metrics are closing it faster than most writers realise.

The next time a translator feeds a line of Hemingway into an AI tool, the output may still not match the original. But the odds that twenty-two models collectively preserve the rhythm, where one model alone would have flattened it, are higher than they have ever been.

That is progress worth paying attention to.

Share the Post:

Leave a Reply

Your email address will not be published. Required fields are marked *

Get 300+ Best English Phrases and Idioms E-book! 📘

Learn the phrases native English speakers use