Systems

AI won’t fix a broken process.

Toni · Co-founder, OpsKings · April 22, 2026 · 6 min read

Systems

Make AI actually work

OpsKings Blog

Every YouTube thumbnail in your feed is promising AI will replace your team by the end of the quarter. What actually happens when founders try, they prompt ChatGPT, get a generic result, get disappointed, and conclude AI isn’t ready. It’s not that AI isn’t ready. It’s that most people are skipping the two things that make it useful.

Why “hey AI, make me a website” never works

The default way people try to use AI in a business is to throw a broad prompt at it and press enter. “Write me a landing page.” “Build me a creative brief.” “Generate a marketing strategy.” Then they judge the AI by whatever generic output it returns.

The problem isn’t the AI. The problem is the instruction. You just asked it to do what a whole team normally does, research, scope, design, revise, approve, in one shot, with no examples, no context, and no definition of what “good” looks like. A brand-new human employee would fail that same prompt.

The founders who actually get useful AI output don’t have better models. They have better structure around the models. Three specific pieces:

A defined process broken into small, individually describable outputs.
A structured database the AI can read from and learn against.
A human-in-the-loop QA step before anything goes to a client.

Miss any one of these and you’ll be generating noise. Build all three and AI genuinely changes how your business operates. Here’s how each one works.

Component 1, Break the process into small, definable outputs

Think about how a human team would actually deliver the thing you want AI to build. A website doesn’t come out of someone’s head as a finished product, it goes through initial research, creative brief, wireframes, low-fidelity design, client review, high-fidelity design, handoff to development. Distinct stages. Distinct outputs. Each one feeds the next.

That’s exactly how you should feed AI too. Don’t ask it to build the website. Ask it to write the research doc. Then, separately, use that research to produce the creative brief. Then, separately, use the brief to propose wireframes. Each stage is a manageable ask with a clearly defined output format.

For each small output, the AI needs three things in the prompt:

The inputs. The specific information required to produce this output, client goals, brand context, research findings, whatever the prior stage delivered.
A good example. “Here’s a research doc we did for a previous client that we liked.”
A bad example (optional but useful). “Here’s one that missed — here’s why.”

Now you’re not asking AI to guess what “good research” means. You’re showing it. The output quality jumps dramatically, not because the model got smarter, but because the brief got tighter.

If you can’t write this brief for AI, it’s usually because you haven’t defined the stage for your own team yet. The AI prompt is a forcing function for the real work, having a process worth automating.

Broad prompt

One shot. No context.

“Build me a website.”

↓

Generic slop

Narrow stages

Inputs + examples per step.

1Research doc

2Creative brief

3Wireframes

4High-fidelity design

Usable work

Broad prompt, generic slop. Narrow stages with examples, actual usable work.

Component 2, Build a database the AI can learn from

This is the step most founders skip, and it’s the one that separates “AI as a gimmick” from “AI as a real ops lever.”

Every output the AI produces should get stored. Not in a chat log you scroll back through, in a structured database. One record per output, linked to the relevant client and project, with fields for the AI’s generated content, a quality score (1–10 or 1–5), notes from the reviewer on what worked and what didn’t, and any relevant KPIs from the downstream work.

Over time, this database becomes a grading record of what “good” looks like for your specific business. Then you feed it back into the AI’s context on future runs. The AI isn’t just generating in a vacuum anymore, it’s generating with your last 40 graded creative briefs as reference for what your team considers acceptable work.

This is the actual compounding part of AI in a business. The first 10 runs will be mediocre. The next 50 runs, with the feedback loop, get measurably better. By run 200, the AI is producing work that matches the quality and style of your best operator.

You don’t need to build this database from scratch. Airtable, Notion databases, any structured project management tool with relational linking works fine. What matters is discipline, every output gets stored, every output gets graded.

“If you can’t write a clear brief for the AI, it’s because you haven’t defined the process for your own team yet.”

, Toni

Component 3, A human always reviews before the client sees it

This is non-negotiable, especially in the first year of running AI in your workflow.

Every AI output gets reviewed by a human before it goes anywhere near a client. No exceptions. The reviewer grades it on a defined scale and leaves notes on what landed and what didn’t.

That review step does two things. First, it’s quality control, no client ever sees a hallucinated fact, a wrong name, or a generic-feeling deliverable, because a human caught it first. Second, it’s the feedback loop. Those notes and scores are what makes Component 2 work. Without the QA step, there’s nothing to grade, and the database becomes a pile of unscored outputs nobody learns from.

Founders often try to skip this layer because it feels like it defeats the purpose, “wasn’t the point to save time?” Yes, but the time savings come from AI producing 80% of the work, not from skipping review. A human reviewing and tweaking a strong draft is still way faster than writing it from scratch. The time math works. The skipping-review math does not.

Component 1

Defined outputs

Small, definable stages with inputs & examples.

→

Component 2

Learning database

Every output stored, scored, referenced on next run.

→

Component 3

Human QA

Reviewer grades, notes, and approves before client.

↵ QA scores & notes feed back into the database, that’s what closes the loop.

Defined outputs → learning database → human QA. All three, or the loop doesn’t close.

The tools matter less than you think

Founders ask which tool to use first, Make, n8n, Zapier, Airtable, custom GPTs, agents, the new thing someone tweeted about last Tuesday. Almost none of it matters compared to the three components above.

Pick any mainstream automation tool that can connect an AI model to a database. Make.com and n8n are both fine. Airtable handles the database side well. What matters is that your process is mapped, your outputs are defined, and your human QA step is in place. The tool is how you wire it up, not what makes it work.

And if your process isn’t mapped yet, that’s where to start. Not with the AI tool. With a whiteboard.

TL;DR

• AI won’t fix a broken process, it’ll amplify the mess. Fix the process first, then add AI.
• Break every job into small, definable outputs. Don’t ask AI to “build a website”, ask it to write the research doc, then the creative brief, then the wireframes.
• Feed every prompt with the inputs, a good example, and (optional) a bad example. Quality jumps.
• Store every AI output in a structured database with a quality score and reviewer notes. This is the compounding layer.
• A human reviews every output before it hits a client. No exceptions. That review is where the learning loop closes.
• The tools matter less than the structure. Pick any mainstream one and start.

Want us to implement this for you?

Book a 30-minute call. We’ll walk through which parts of your process are the highest-leverage places to introduce AI, and sketch the defined outputs, database, and QA loop you’d need to make it actually work. No pitch if it’s not a fit.

Book a Call