Ren Ecosystem. How AI Learned a Complex Coding Language Nobody Taught It

Artificial intelligence (AI) models usually learn programming languages by absorbing huge amounts of training data. But what happens when the language is rare and the examples are scarce?

A new study shows that feedback may matter more than data. Researchers found that a language model could learn to write correct programs in the little-known coding language Idris, even though it had barely encountered it before.

By repeatedly feeding compiler error messages back to the model and asking it to fix its own mistakes, the team boosted its success rate from 39 percent to 96 percent.

The result suggests that AI systems can push far beyond the limits of their original training when clear signals reveal exactly what went wrong.

Where the system stumbled

Across a set of coding exercises written in the language Idris on the platform Exercism, the gap between what the model first produced and what the language demanded became immediately visible.

Working from the University of Southern California (USC) Viterbi School of Engineering, Minda Li captured each compiler complaint and fed it back to the model as it revised its own answers.

By repeatedly feeding compiler error messages back to the model and asking it to fix its own mistakes, the team boosted its success rate from 39 percent to 96 percent.

The result suggests that AI systems can push far beyond the limits of their original training when clear signals reveal exactly what went wrong.

Where the system stumbled

Across a set of coding exercises written in the language Idris on the platform Exercism, the gap between what the model first produced and what the language demanded became immediately visible.

Working from the University of Southern California (USC) Viterbi School of Engineering, Minda Li captured each compiler complaint and fed it back to the model as it revised its own answers.

The model hits a wall

On shared exercises, GPT-5 solved 90 percent in Python and 74 percent in Erlang, but only 39 percent in Idris. Those numbers matter because the AI model already knew how to code well when the language was common and forgiving.

Idris punished missing names, incomplete branches, and mismatched types before deeper logic could even surface. Early failure in Idris therefore said less about raw intelligence than about getting trapped by strict local rules.

Li first tried gentler fixes, including platform feedback, a self-written error manual, and official language documentation. Those additions nudged the model upward, but none pushed overall Idris success past 61 percent.

Static advice helped the model remember syntax, yet it still missed the exact reason a specific answer failed. That ceiling revealed a harder lesson: broad guidance was weaker than immediate, case-by-case correction.

AI fixed mistakes with feedback

The winning loop started locally, where a compiler – software that checks and translates code – could flag each broken line. Li sent those error messages back to GPT-5, asked for a fix, and repeated the cycle as many as 20 times.

Before the final loop, Li expected only a modest improvement. Instead, success jumped to 96 percent.

“I was surprised that just that alone, seemingly one simple thing, just keep recompiling, keep trying, was able to get to 96 percent,” said Li.

The reason was simple. Compiler messages pointed to the exact problem in each program. During the weak baseline, missing names appeared 123 times, and Idris often could not tell which meaning the code intended.

Manuals could warn about common pitfalls, but only the compiler could say what failed in that exact program.

“Our AI tools are now able to transcend their initial training,” said Krishnamachari.

Beyond software alone

This kind of precise feedback is not limited to computer programming. Many fields – including math proofs, legal reasoning, and other rule-heavy work – can clearly flag when a step is wrong.

In those situations, the same kind of correction loop could allow an AI system to revise its work before a person ever sees the first draft.

The approach might even help AI translate low-resource human languages, where limited written material has long made it difficult to train reliable systems.

But for the method to work, the feedback must be clear, accurate, and tied to the exact mistake each time.

Not every problem was solved

The approach is not perfect. Two Idris problems still remained unsolved even after 20 rounds of corrections, showing that repeated fixes can sometimes create new mistakes.

The researchers also note that some Idris examples or coding patterns may already have appeared somewhere in the model’s training data.

If that happened, part of the model’s success could reflect memory rather than learning purely from feedback.

To understand how much real learning occurred, future tests would need brand-new problems written after the model’s training was complete.

Teaching AI to remember

Li now wants the system to carry lessons from one problem into the next instead of starting cold each time. At the moment, the method mostly pushes through by trial and error, even when the same kind of mistake keeps returning.

A model that remembers past fixes could need fewer retries and waste less compute by falling into the same traps. That shift would turn a clever repair trick into something closer to steady skill building.

The work also suggests that AI may be less boxed in by missing data than by the absence of sharp, reliable correction. Better feedback will not solve every hard problem, but it could help AI reach smaller languages and stricter fields far more easily.

The study is published in arXiv.

This article appeared in Earth.com (https://www.earth.com/news/how-ai-learned-a-complex-coding-language-nobody-taught-it/).