concept 4 min

Start With The Week

The test is not whether AI can do the task

If you already use Claude Code, Codex, Claude Cowork, or agents, you do not need another explanation of what they are.

You need a harsher test.

Did anything in the business get easier this week?

Not in theory. Not in a demo. This week.

A useful AI OS should create one of these changes:

fewer dropped leads
faster client handoffs
cleaner follow-up
less founder memory
fewer manual status checks
one repeated loop removed from the week

If none of that happened, the setup is still living beside the business. It is not inside it.

The week-first audit

Pick one week of real work and mark every point where the founder had to remember, approve, chase, copy, rewrite, or reconnect something.

Then score each loop:

Loop	Pain	Revenue tie	AI fit	Human risk
Lead follow-up	High	High	High	Medium
Client status update	Medium	High	Medium	High
Invoice prep	Low	Medium	High	High
Content idea capture	Medium	Medium	High	Low

Do not start with the loop that looks coolest.

Start with the loop that would make Friday feel different.

Prompt

Look at my last 7 days of work. List every repeated loop where I had to remember, chase, copy, approve, rewrite, or move information between tools. For each loop, score pain, revenue tie, AI fit, and human risk from 1 to 5. Then recommend the first workflow to build, with the reason.

What this prevents

This keeps you from building something impressive but irrelevant.

A small service business does not need an AI lab.

It needs the work that already matters to happen with less drag.

Section 1 of 7

checklist 5 min

Choose One Pod

Do not build 14 agents

A small team does not need a huge agent roster.

It needs one reliable pod.

Use four pods as the map:

Acquisition: leads, outreach, replies, calls.
Delivery: client work, drafts, reports, handoffs.
Support: questions, updates, follow-through.
Operations: invoices, admin, reporting, reminders.

Pick the pod with the most drag right now.

Not the pod you want to talk about on LinkedIn. The one that is costing time, money, or trust.

The pod checklist

For the chosen pod, write down:

Trigger: what starts the workflow?
Inputs: what does the agent need before it acts?
Context: which files, records, or rules matter?
Output: what should exist when the run finishes?
Destination: where should the output go?
Human gate: who approves the risky step?
Failure path: what happens when the agent is unsure?

If you cannot answer these seven questions, the agent is not the problem yet. The workflow is not ready.

A good first pod

For many service businesses, acquisition is the best first pod because it has clear business value and clear repeated loops.

Example first workflow:

scan leads with no reply in 7 days
pull last email and CRM notes
draft a short follow-up in the founder's tone
place it in a review queue
wait for human approval before sending

That is narrow enough to build.

It also changes the week fast.

Section 2 of 7

actionable 5 min

Write The Spec Before The Agent Runs

Vague work creates vague agents

"Automate follow-up" is not a task.

It is a wish.

A usable agent task is closer to this:

"Find leads in Airtable with status Interested and no reply in 7 days. Read the last email, draft a 90-word reply in my tone, mention the original pain point, and save the draft for approval. Do not send."

Same goal.

Very different build.

The agent-ready spec

Use this format before building any skill, routine, or Codex task:

text

Workflow name:
Business goal:
Trigger:
Required inputs:
Allowed tools:
Output format:
Destination:
Approval rule:
Failure rule:
Definition of good:
Definition of bad:

The last two lines matter more than people think.

If nobody defines good, the agent will invent it.

The spec prompt

Prompt

Turn this fuzzy workflow into an agent-ready spec. Ask me questions until the trigger, inputs, output, destination, approval rule, failure rule, definition of good, and definition of bad are clear. Do not design the automation yet. First make the work concrete.

What to watch for

A spec is weak if it contains phrases like:

handle the leads
improve the workflow
keep clients updated
use my tone
make it better
follow up when needed

Those can be starting points.

They cannot be final instructions.

The practical work is taking founder judgment and making it reusable.

Section 3 of 7

concept 5 min

Put Memory In The Right Place

Founder memory is not infrastructure

If every run needs the founder to explain the business again, the agent is not the operating layer.

It is another chat window.

Split memory into three parts.

Knowledge

Stable facts about the business.

Examples:

offer
ICP
voice
pricing
product notes
service rules
examples of good work

This can live in markdown files, project context, or a second brain.

State

Live records.

Examples:

lead status
client stage
last contact date
task owner
invoice state
approval status

This belongs in a database, CRM, Airtable, Linear, Notion, or another record system.

Do not bury live state in random markdown notes.

Judgment

Rules for what good looks like and what needs human review.

Examples:

never send cold replies without approval
client-facing reports need a preview
price, refund, contract, and promise steps require a human
if confidence is low, draft a question instead of acting

The simple architecture

text

Knowledge tells the agent what is true.
State tells it what changed.
Judgment tells it what it is allowed to do.

When those three are separate, the system is easier to debug.

When they are mixed together, the founder becomes the fallback for everything.

Section 4 of 7

checklist 4 min

Add Human Gates Where Trust Can Break

Do not automate the last mile of trust

Some steps can run while you sleep.

Some should not.

A useful AI OS separates preparation from commitment.

Preparation can often be agent-led:

research
draft
score
summarize
classify
prep a report
find the next action

Commitment should stay gated:

send the email
publish the post
issue the refund
change the contract state
promise a delivery date
mark a client-facing task complete

The gate map

For each workflow, add one of these labels to every step:

Label	Meaning
Auto	Safe to run without review
Draft	Agent prepares, human approves
Ask	Agent stops and asks a question
Block	Agent is not allowed to do this

Prompt

Review this workflow and label every step Auto, Draft, Ask, or Block. Use Draft for customer-facing communication, money, promises, contracts, and shared record changes. Explain any step where the label is not obvious.

The point

The goal is not to remove human judgment.

The goal is to stop wasting human judgment on the low-risk parts.

That is how AI makes the business faster without making it reckless.

Section 5 of 7

actionable 5 min

Build The Second Builder Into The System

Codex is not just a spare tire

If Claude Code is your main builder, Codex can still be part of the working system.

Use it for three jobs.

1. Backup path

Critical workflows should have a second way to run.

That means:

business context lives in files, not chat history
core workflows are skills or scripts
the repo has an AGENTS.md style file
the workflow has been tested in more than one builder

Do this before the outage, not during it.

2. Second implementation path

When a workflow matters, ask Codex for a second build path or review.

Not because one tool is always better.

Because disagreement exposes hidden assumptions.

Prompt

Review this Claude Code workflow as a second builder. Identify missing assumptions, unclear inputs, risky side effects, weak approval gates, and any part that would be hard to run in Codex. Do not rewrite yet. Return the risks first.

3. GUI-heavy work

Some business tools do not have clean APIs or connectors.

Codex can be useful when the interface itself is the path: clicking, checking, comparing, or driving a desktop app.

Use the richest interface available first.

Connector beats MCP. MCP beats raw API. API beats browser clicks. Browser clicks beat being stuck.

The founder rule

If losing one AI tool stops the business, the system is too fragile.

A real AI OS should have a Day 2 plan.

Section 6 of 7

checklist 4 min

Watch What Runs

The workflow that kinda works is the risky one

Clear failures get attention.

Quiet half-successes get ignored.

That is where AI work becomes expensive, stale, or wrong without anyone noticing.

A small business does not need a giant command center.

It needs visibility.

Minimum run log

Every repeated workflow should leave a record with:

text

Workflow name:
Run time:
Status:
Input record:
Output link:
Cost or token estimate:
Human approval needed:
Failure reason:
Next action:

This can start as a table.

It does not need to be fancy.

It does need to exist.

Weekly review

Once a week, ask:

Which workflow saved real time?
Which one failed quietly?
Which one asked for too much founder input?
Which one cost more than expected?
Which context file is stale?
Which approval rule needs tightening?

Prompt

Review this week's AI workflow run log. Find the top 3 issues by business impact. For each one, explain the likely cause, the fix, and whether this is a workflow problem, context problem, tool problem, or approval problem.

The actual goal

You are not trying to watch agents for fun.

You are trying to stop the founder from becoming the monitoring layer.

That is when the AI OS starts to feel like part of the business instead of another thing to manage.

Section 7 of 7