Effective AI code suggestions: less is more

Tal Ridnik January 29, 2025 5 min

TL;DR

In our quest to generate effective code suggestions with LLMs at Qodo Merge, an AI-powered tool for automated pull request analysis and feedback, we discovered that prioritization isn’t enough.

Initially we tried telling the model to prioritize finding bugs and problems in pull requests, while still allowing lower-priority suggestions for style, code smells, and best practices. We discovered that this approach was ineffective, and despite the explicit prioritization, the model got overwhelmed by the easier-to-spot style issues, leading to developers flooded with low-importance suggestions.

The breakthrough came when we stopped asking for different priority levels altogether, and instead limited the model to look only for meaningful bugs and problems. This laser-focus approach improved bug detection rates and signal-to-noise ratio, resulting in code suggestions that developers actually wanted to implement, with suggestion acceptance rates jumping 50% and overall impact increasing by 11% across pull requests.

The key takeaway – sometimes the best strategy with LLMs isn’t to prioritize or add complicated instructions, but to eliminate distractions entirely.

Initial approach: casting a wide net

Our initial strategy was ambitious. The model was instructed to prioritize critical issues in its review process, looking first for major problems before considering style-like improvements. Each review would contain up to 5 suggestions, prioretized by severity:

Primary Concerns:

Critical bugs and problems
Security vulnerabilities

Secondary Improvements:

Code style
Best practices
Potential enhancements

This hierarchical review system seemed robust and effective. We leveraged modern prompting techniques like YAML-structured output formats to ensure consistency and machine-parseable responses, with each suggestion containing well-defined metadata fields.

The unexpected challenge: style suggestion overflow

In practice, we encountered an interesting phenomenon. Despite our careful prompt and flow engineering strategy, the model consistently gravitated toward style-related suggestions. For example:

Variable naming conventions
General exception handling
Minor refactoring opportunities
Documentation improvements

While these suggestions weren’t incorrect, they often overshadowed more critical issues. The abundance of minor suggestions often drowned out genuinely important findings. In addition, the model would sometimes misclassify style improvements as major concerns. The core problem wasn’t the model’s ability to spot serious bugs – it was that these easier-to-detect style issues dominated the output

We tried to combat this overflow by adding explicit instructions like “don’t suggest documentation improvements” to our prompt, but it proved futile. The sheer volume of potential style improvements was overwhelming. Even a single mention of ‘code enhancement suggestions’ in our instructions seemed to hijack the model’s attention – there were simply too many possible minor improvements for the model to maintain its focus on truly critical issues.

Figure 1: Example of code suggestions using the prioritization method.  The high number of categories and large quantity of suggestion, combined with inconsistent priority classification, resulted in ‘suggestion fatigue’ and poor signal-to-noise ratio.

The signal-to-noise problem

The ‘Casting-wide-net’ approach revealed significant challenges:

Developers struggled to locate critical issues buried within numerous style suggestions
The high volume of suggestions led to “suggestion fatigue,” where developers started ignoring the feedback altogether rather than going through numerous style fixes to find the few critical issues.

Beyond adding explicit exclusion rules to our prompts, we implemented both importance scoring and suggestion categorization (‘bugs’, ‘enhancements’, ‘best practices’). But even this proved insufficient – the labels and the importance score weren’t accurate enough to truly separate critical issues from nice-to-have improvements.

User feedback revealed a fundamental truth: we needed to show only critical ‘problem-finding’ suggestions by default – everything else was often seen as noise.

The breakthrough: laser focus on problems and bugs

What ultimately worked was a paradigm shift in our approach. Instead of trying to manage priorities across different types of suggestions, we simplified the model’s task to a single focus: identifying only meaningful problems that could result in bugs and issues in production. Our new prompt was ruthlessly focused:

“Only give suggestions that address major problems and bugs in the pull request code. If no relevant suggestions are applicable, return an empty list”

The results were dramatic:

The acceptance rate of each code suggestion increased by 50%
The impact level, meaning the percentage of pull requests where at least one suggestion was applied, increased by 11%

The power of this approach lies in its singular focus. Rather than explicitly excluding certain types of suggestions, we simply directed the model’s attention entirely toward finding meaningful code problems.

This clarity of purpose had two key benefits:

(1) it eliminated the complexity of evaluating and prioritizing different types of suggestions, and (2) it prevented the model from being overwhelmed by the sheer volume of potential style improvements that could otherwise dilute its attention from critical issues.

Figure 2: Example of code suggestions using the new focused method, which generates fewer, more targeted suggestions that highlight significant code issues.

Code best practices: a parallel solution

While maintaining our focused approach to problem detection, we developed a separate solution for style and best practices code suggestions.

Rather than attempting to enforce every possible best practice – an almost infinite set, we enabled teams and organizations to define their own standards. Qodo Merge now operates through two distinct routes: one dedicated to finding critical problems, and another for evaluating code against company-specific best practices. Teams can also utilize our learning system that analyzes which suggestions developers actually implement – either to establish an initial set of best practices, or to continuously refine and enhance future recommendations.

The success of this dual approach validated our strategy: developers trust and implement our critical bug suggestions because they know each one represents with high probability a genuine problem that needs fixing, while still having the flexibility to maintain their team’s coding standards through configurable best practices.

Final thoughts: the power of scope constraint

This separation reflects a broader insight: sometimes the best strategy isn’t to create complex prioritization schemes or scoring systems – it’s to ruthlessly eliminate scope until you’re left with only what truly matters. This might seem counterintuitive in an era where AI promises to handle increasingly complex tasks. However, our experience suggests that the path to better AI assistance lies not in just adding more complexity, but in being more thoughtful about what we ask these systems to do in the first place.

The principle extends beyond just code review: Instead of trying to handle everything at once, we can try to handle important aspects well but separately. When working with LLMs, this focused approach yields better results than attempting to juggle multiple priorities simultaneously.

Start to test, review and generate high quality code

Get Started

Effective AI code suggestions: less is more

TL;DR

Initial approach: casting a wide net

The unexpected challenge: style suggestion overflow

The signal-to-noise problem

The breakthrough: laser focus on problems and bugs

Code best practices: a parallel solution

Final thoughts: the power of scope constraint

Start to test, review and generate high quality code

More from our blog

Introducing Qodo Command: Build, Run, and Automate Agents Anywhere in your SDLC

Simplifying Async JavaScript: Promises, Callbacks & Async/Await

Why We Chose LangGraph (Again) to Power Qodo Command