Effective AI Assisted Software Development

(Written by a human)

Like it or not, you've adopted the use of AI in software development. 90% of developers are now using AI at work. But are you doing it the right way? This article makes recommendations about the scope and strategy for effectively engaging in low risk, high productivity, AI-assisted software development (AIAD). It also discusses how disciplined AIAD differs from "vibe coding" and describes how AI driven development fundamentally changes the practice of software engineering. First, let's consider the most obvious task we can put AI to work on within our software development workflow-- writing program code.

Why AI Can Code

As of 2025, you can easily spot that an article was AI generated because it will with insufferable frequency contain certain ghastly constructs such as "And here's the kicker.." or the formulaic "It's not just X ... it's Y!" Once you recognize the patterns of LLM generated text, they will begin to annoy you. Perhaps the reason AI is terrible at generating English prose is why it's also so useful for code generation. The unfortunate text generated will be syntactically correct standard written English, and the content of the article will convey meaning relevant to the topic. Criticisms of AI are then merely regarding style. AI generated English script is famously boring and predictable.

But isn't "boring and predictable" essentially the goal of all our coding standards? We want to make it readable, self-documenting, comprehensible, using standard naming conventions, etc. Coding standards, whatever you think about them, have the goal of producing homogeneous code so that we cannot recognize the author of the code simply by looking at it. Most code is actually unimaginative. Design patterns, in some cases overused, drive the implementation of algorithms. Between coding standards, best practices, and design patterns, much of the work of writing code is reduced to identifying the "formula" appropriate for desired result, and generating the least astonishing code required. Identifying and applying design patterns, the time and space complexity of data structures, programming language idioms syntax and standards, are all well within the capability of AI LLMs and also beneath the concern of experienced software developers. Therefore, we think AIAD is justified and even recommended.

Stochastic Slop Gadgets

A literal incarnation of the philosophical zombie, an LLM generates code and all other documents using autoregressive next-token prediction. Your prompt request is parsed and broken up into "tokens" which are fragments of text that might be words, parts of words, or individual characters. Then it generates output tokens in response as it processes each of the input tokens. It does this by predicting the most likely *next* token, based on its training network. To do this, it calculates what the probability of thousands of each of the possible next tokens is, and samples from this distribution with a degree of randomness. Each token is converted to a high-dimensional vector that represents its meaning in abstract space. It looks at all previous tokens in the context and calculates how much attention to pay to each one and combines information from relevant tokens. After this, each token goes through dense neural network layers that transform and refine the representations. Currently such networks have on the order of 100 layers. The final layer produces probabilities for each possible next token in the vocabulary. All this take immense processing power.

During training, the model has learned statistical patterns of what tokens tend to follow other token in code. The model can only see a limited number of previous tokens-- the window or context. This is why models lose track of details in very long conversations. Since it is predicting likely next tokens based on patterns, it can generate code that looks syntactically correct but which will not compile. It might "hallucinate" API functions and signatures into existence which do not exist, or perhaps exist or used to exist in libraries it's not even referencing. It can't actually run the code it generates in a sandbox and evaluate the results, which is close to what a human might do. Patterns of syntax it saw many times during training on billions of lines of code it has ingested from GitHub, Stack Overflow, and many other websites and code repositories will be strongly favored. After seeing trillions of tokens, the model has learned which sequences are "code-like."

Since the LLMs you will be using all work the same general way-- as a sophisticated pattern matcher, and NOT a reasoning engine, you must expect it to choke on novel or complex problems, and will not synthesize its "knowledge" in novel ways. The performance of a LLM is that of a very strange junior developer who, having perfect memory of billions of lines of code without any understanding, always responds plausibly and confidently to nearly any request. In the future, LLMs will obtain the ability to execute the code they generate. But as of the current year, we need to apply reasoning to this basic understanding of their limitations due to how they work, in order to not produce a lot of unusable, time-wasting garbage, or just as bad-- to produce a lot of actually usable, but dangerous and ill-advised garbage.

Decoupling Design from Implementation

Given that we have a justification for AIAD and knowledge of how AI generates code, we want to use it wisely-- by not asking it to perform higher level activities requiring engineering and critical decision making. This means we want to limit its participation to code generation by designing the structure of application modules (e.g. classes) before writing code, because AI is going to write the code, and we're going to give it the signature and purpose of each method we want it to write. Agile processes, with the exception of TDD (which has other problems), tend to guide human developers toward coupling the tasks of design and implementation. The result is having to choose between accumulating technical debt and frequent refactoring, both of which are expensive. If we don't do architecture carefully and first before coding, we certainly are bound to hit a local maxima of velocity, due to an accumulation of technical debt. Counter to current agile practice seen in the wild, we shouldn't select the simplest thing that could possibly work when it comes to design. When we work with AI to generate the code, we expect it to implement our designs. The time saved in writing this code by hand should be invested upfront in careful design, so as not to lose these saving by requiring deep refactorings along the way.

Many of us have been practicing object-oriented design for decades to the extent that it's as difficult now to imagine non-OO software design as it was for those of us old enough to have to switch from prior analysis methods, and so our designs featured deep class hierarchies and middleware-driven dependencies. I'm sorry, but the future is going to begin to look more like that past in that regard, because we will be working with LLMs which have no reason understanding of the design of large software systems, much less domain knowledge. As a matter of sanity when working with models to create code we expect the same or future models to replace or maintain, we should consider returning to design practices of the distant past such as top down/procedural styles. This sort of software design often required a greater volume of code that was less expressive and less elegant than object-oriented code. However with AI code generation, we need no longer worry about these drawbacks. For what it's worth, in hindsight the value of class inheritance in OOD has been seriously questioned as evidenced by the design of newer languages such as Rust and GoLang. We predict that "code maintainability" as a high-priority goal will be also be deprecated in favor of "code replaceability." AI-generated code will have many comments-- you should leave them in there. Remember the context window-- the short little "attention span" of the LLM-- the LLM will be able to use these comments when generating future modifications to the code.

In AI-assisted software development, it is engineering and design that become the important task of the software engineer. With disciplined AIAD, the time we save by not manually writing code should be invested in the prerequisite activities such as the selection of the technology stack (frameworks, programming languages, database, network protocols, message brokers, deployment targets, etc.), and software architecture (components/module interaction, logging, authorization, dependency injection, etc.) This is still the domain of engineers-- we should not, (at least not yet), trust a "mindless" pattern matching token generator to make important design decisions. We certainly don't employ "no code" platforms and frameworks. That said, it is useful to ask AI many questions, in service of making such decisions, as it has the ability to compare and contrast candidate approaches. But after this is accomplished, all that's left to do is the coding, which itself is rather boring, algorithmic and replete with stilted rubrics and abstruse idioms of particular development languages.

One way to think of AI in its current state of the art, is as an English-to-Programming-Language compiler. Compilers don't produce original or imaginative bytecode or assembly language, and that's a good thing. If we can expect AI to produce understandable code that will function reliably, efficiently, and according to spec, then it's an indispensable tool that can be enlisted to bang out code while we focus on higher level concerns such as *intention*, architecture, and domain-level rules.

Empirical Results and Pitfalls

What are the actual results of using AIAD in general? A few papers that have been written about it are cited below, and these refer to others you can read.

DORA State of AI-assisted Software Development

"AI can help individuals by handling boilerplate and other rote scaffolding, surfacing plausible options quickly, providing highly problem-specific output, summarizing and synthesizing large swaths of disparate information, and completing higher-order tasks like design, planning, and analysis."

"Treat your AI adoption as an organizational transformation."

..people are likely learning to offload mundane, tedious, and repetitive tasks to AI and spend more time on problem-solving, design, and creative work."

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

"..we find that allowing AI actually increases completion time by 19%—AI tooling slowed developers down."

"Speedup forecasts from 34 economics experts and 54 machine learning experts overestimate speedup even more drastically than developers, predicting AI will lead to decreases in implementation time of 39% and 38%, respectively"

"Developers accept <44% of AI generations.. Majority report making major changes to clean up AI code"

Practical Recommendations

General Advice

When prompting an AI session, keep the context window as small as practical by starting new sessions as necessary.
Mention in your prompt that you strongly favor an "I don't know" response for answers with a low confidence level.

Add the following to your prompts: If the information is uncertain or unavailable, say "I don’t know." If it looks like you're getting the best answer from a few years ago, add "Search the web for the most recent data." Just as some developers scratched thinking off their list of things to do when github became a thing, don't be that guy that needs the LLM to do either the too simple or the too complex; write the damned thing yourself.

Make your requests as clear, simple, and unambiguous as possible. Address only one problem or feature at a time.
When AI provides code, ask it for the unit tests that correspond to the code offered.
In OO designs, favor composition over inheritance, limiting class hierarchy depth, and provide the AI with any base classes and interfaces it is expected to use.
Classes should be as decoupled from others as possible. Consider any request to change a piece of code, a request to replace that piece of code. Let AI do this.
Have the same understanding every line of code that AI produces as if you had written it yourself. If that's unclear, ask it explain what it generated.

When prompting a model to make changes to code to fix a bug, for example, it will often present several options. If you don't have a good understanding of what the code is doing and how it's doing it (the root cause), you won't know which of these options to select, and you'll end up in an even worse situation.

If you reach a dead end where the model is producing defective code, try asking a different LLM to get a second opinion.

Decompose large problems into simpler sub-problems and integrate manually.

The Javascript function below does only one thing and has no side effects:

                        
    function argbToRgbHex(argbInt) {
        return "#" + (argbInt & 0x00FFFFFF).toString(16).padStart(6, '0');
    }

LLMs are good at generating simple things such as record types from SQL scripts and even validation logic for DTOs based on types, inferring meaning from the property names.
Don't trust LLM answers requiring specialized or domain-specific knowledge that you yourself don't possess.

Avoid Vibe Coding

If you like tech debt, you're gonna love Vibe Coding. Vibe coding can be most succinctly defined as using AI to generate an entire application at once, instead of limiting its use to generating small, simple, and independent pieces of code and then assembling them according to a plan or architecture like building blocks. You've probably seen videos of people relying fully AI to do this with minimal human intervention-- accepting suggesting wholesale and trusting AI to also correct any problems itself. While vibecoding can accelerate simple prototypes or throwaway projects, it introduces unacceptable risks in production systems.

Could you use vibe coding for rapid prototyping? Possibly. But the proof-of-concept prototypes we produce in rapidly application development are more than UI mockups. And they're a preliminary step along the way. In vibe coding, that product is your production product. "I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works" is a strategy for long term failure. You've used AI at too high a level and given it many inappropriate tasks. Remember, it provides only the illusion of understanding. It has none. Enjoy your AI slop.

Advice For New Projects

Consider all the constraints on your possible design, the budget, security requirements, performance, deadlines, acceptable vendors, external systems with which you must interface, etc. Research the answer to "How will I..?" and "What will I use to..?" questions. There will be technologies you *must* use due to policy or for practical reasons, those you prefer to use, and those you disfavor. For example, you might decide to build your application around Microsoft Azure, C#, .NET, Redis, Postgres, Dart, Flutter, Typescript, HTMX, etc.
Ask AI to assist in making these decisions; it's good at comparing and contrasting concepts and patterns, as well as listing candidate products and libraries.

Ask for a reference architecture for a system or design pattern implementation. Compare it to the one you've designed yourself first.
Ask it for its opinion about your design approaches and to offer alternatives
Include a unit tests project in your design. Use AI to write tests and add them to this project.
Let AI assist in setting up your project/repository. Much of the work for adding the projects, folders, startup and configuration code in support of your design can be effected in seconds. Test the build. Use AI to solve any initial version mismatches and missing library problems. (npm, pip, PEAR, etc.)

Advice For Existing Projects

Use AI tools such as GitHub Copilot to locate classes or other modules of interest in your codebase by functional description

Use AI to document program files and algorithms. "Self-documenting" code be damned, now it's free-- let AI describe what the existing code is doing.

Use AI to analyze existing code for thread safety, potential bugs, uncaught exceptions, and null references

Do not directly apply the changes it suggests-- do this manually on code for which unit tests exists.

Request test classes and methods for existing code.

Below we have class implementing "enhanced" enum functionality-- it is able to implicitly convert to and from int as well as string as required. What we need are unit tests for it:

            
public class Proximities {
    public enum Level {
        NotApplicable,
        VeryNear,
        Near,
        Far
    }
    public static readonly Kvp NotApplicable = new Kvp("NotApplicable", Level.NotApplicable);
    public static readonly Kvp VeryNear = new Kvp("VeryNear", Level.VeryNear);
    public static readonly Kvp Near = new Kvp("Near", Level.Near);
    public static readonly Kvp Far = new Kvp("Far", Level.Far);
}
public readonly record struct Kvp(string Label, TEnum Value) where TEnum : Enum {
    public override string ToString() => Label;
    public static implicit operator TEnum(Kvp kvp) =>kvp.Value;
    public static implicit operator string(Kvp kvp) => kvp.Label;
    public static implicit operator int(Kvp kvp) => Convert.ToInt32(kvp.Value);
}

The following xUnit test class was generated:

            
public class ProximitiesTests {
        
    [Fact]
    public void NotApplicable_Kvp_HasCorrectLabelAndValue() {
        Assert.Equal("NotApplicable", Proximities.NotApplicable);
        Assert.Equal(Proximities.Level.NotApplicable, Proximities.NotApplicable);
    }

    [Fact]
    public void VeryNear_Kvp_HasCorrectLabelAndValue() {
        Assert.Equal("VeryNear", Proximities.VeryNear);
        Assert.Equal(Proximities.Level.VeryNear, Proximities.VeryNear);
    }

    [Fact]
    public void Near_Kvp_HasCorrectLabelAndValue() {
        Assert.Equal("Near", Proximities.Near);
        Assert.Equal(Proximities.Level.Near, Proximities.Near);
    }
        
    [Fact]
    public void Far_Kvp_HasCorrectLabelAndValue() {
        Assert.Equal("Far", Proximities.Far);
        Assert.Equal(Proximities.Level.Far, Proximities.Far);
    }

    [Theory]
    [InlineData(Proximities.Level.NotApplicable, "NotApplicable")]
    [InlineData(Proximities.Level.VeryNear, "VeryNear")]
    [InlineData(Proximities.Level.Near, "Near")]
    [InlineData(Proximities.Level.Far, "Far")]
    public void Kvp_ToString_ReturnsExpectedFormat(Proximities.Level level, string label) {
        var kvp = new Kvp(label, level);
        Assert.Equal(label, kvp.ToString());
    }
}

Effective AI Assisted Software Development

Why AI Can Code

Stochastic Slop Gadgets

Decoupling Design from Implementation

Empirical Results and Pitfalls

Practical Recommendations

General Advice

Avoid Vibe Coding

Advice For New Projects

Advice For Existing Projects

Leave a Comment

Comments