Start With The Week

Start With The Week

The test is not whether AI can do the task

If you already use Claude Code, Codex, Claude Cowork, or agents, you do not need another explanation of what they are.

You need a harsher test.

Did anything in the business get easier this week?

Not in theory. Not in a demo. This week.

A useful AI OS should create one of these changes:

  • fewer dropped leads
  • faster client handoffs
  • cleaner follow-up
  • less founder memory
  • fewer manual status checks
  • one repeated loop removed from the week

If none of that happened, the setup is still living beside the business. It is not inside it.

The week-first audit

Pick one week of real work and mark every point where the founder had to remember, approve, chase, copy, rewrite, or reconnect something.

Then score each loop:

LoopPainRevenue tieAI fitHuman risk
Lead follow-upHighHighHighMedium
Client status updateMediumHighMediumHigh
Invoice prepLowMediumHighHigh
Content idea captureMediumMediumHighLow

Do not start with the loop that looks coolest.

Start with the loop that would make Friday feel different.

Prompt
Look at my last 7 days of work. List every repeated loop where I had to remember, chase, copy, approve, rewrite, or move information between tools. For each loop, score pain, revenue tie, AI fit, and human risk from 1 to 5. Then recommend the first workflow to build, with the reason.

What this prevents

This keeps you from building something impressive but irrelevant.

A small service business does not need an AI lab.

It needs the work that already matters to happen with less drag.

Section 1 of 7
Choose One Pod

Choose One Pod

Do not build 14 agents

A small team does not need a huge agent roster.

It needs one reliable pod.

Use four pods as the map:

  1. Acquisition: leads, outreach, replies, calls.
  2. Delivery: client work, drafts, reports, handoffs.
  3. Support: questions, updates, follow-through.
  4. Operations: invoices, admin, reporting, reminders.

Pick the pod with the most drag right now.

Not the pod you want to talk about on LinkedIn. The one that is costing time, money, or trust.

The pod checklist

For the chosen pod, write down:

  • Trigger: what starts the workflow?
  • Inputs: what does the agent need before it acts?
  • Context: which files, records, or rules matter?
  • Output: what should exist when the run finishes?
  • Destination: where should the output go?
  • Human gate: who approves the risky step?
  • Failure path: what happens when the agent is unsure?
If you cannot answer these seven questions, the agent is not the problem yet. The workflow is not ready.

A good first pod

For many service businesses, acquisition is the best first pod because it has clear business value and clear repeated loops.

Example first workflow:

  • scan leads with no reply in 7 days
  • pull last email and CRM notes
  • draft a short follow-up in the founder's tone
  • place it in a review queue
  • wait for human approval before sending

That is narrow enough to build.

It also changes the week fast.

Section 2 of 7
Write The Spec Before The Agent Runs

Write The Spec Before The Agent Runs

Vague work creates vague agents

"Automate follow-up" is not a task.

It is a wish.

A usable agent task is closer to this:

"Find leads in Airtable with status Interested and no reply in 7 days. Read the last email, draft a 90-word reply in my tone, mention the original pain point, and save the draft for approval. Do not send."

Same goal.

Very different build.

The agent-ready spec

Use this format before building any skill, routine, or Codex task:

text
Workflow name:
Business goal:
Trigger:
Required inputs:
Allowed tools:
Output format:
Destination:
Approval rule:
Failure rule:
Definition of good:
Definition of bad:

The last two lines matter more than people think.

If nobody defines good, the agent will invent it.

The spec prompt

Prompt
Turn this fuzzy workflow into an agent-ready spec. Ask me questions until the trigger, inputs, output, destination, approval rule, failure rule, definition of good, and definition of bad are clear. Do not design the automation yet. First make the work concrete.

What to watch for

A spec is weak if it contains phrases like:

  • handle the leads
  • improve the workflow
  • keep clients updated
  • use my tone
  • make it better
  • follow up when needed

Those can be starting points.

They cannot be final instructions.

The practical work is taking founder judgment and making it reusable.

Section 3 of 7
Put Memory In The Right Place

Put Memory In The Right Place

Founder memory is not infrastructure

If every run needs the founder to explain the business again, the agent is not the operating layer.

It is another chat window.

Split memory into three parts.

Knowledge

Stable facts about the business.

Examples:

  • offer
  • ICP
  • voice
  • pricing
  • product notes
  • service rules
  • examples of good work

This can live in markdown files, project context, or a second brain.

State

Live records.

Examples:

  • lead status
  • client stage
  • last contact date
  • task owner
  • invoice state
  • approval status

This belongs in a database, CRM, Airtable, Linear, Notion, or another record system.

Do not bury live state in random markdown notes.

Judgment

Rules for what good looks like and what needs human review.

Examples:

  • never send cold replies without approval
  • client-facing reports need a preview
  • price, refund, contract, and promise steps require a human
  • if confidence is low, draft a question instead of acting

The simple architecture

text
Knowledge tells the agent what is true.
State tells it what changed.
Judgment tells it what it is allowed to do.

When those three are separate, the system is easier to debug.

When they are mixed together, the founder becomes the fallback for everything.

Section 4 of 7
Add Human Gates Where Trust Can Break

Add Human Gates Where Trust Can Break

Do not automate the last mile of trust

Some steps can run while you sleep.

Some should not.

A useful AI OS separates preparation from commitment.

Preparation can often be agent-led:

  • research
  • draft
  • score
  • summarize
  • classify
  • prep a report
  • find the next action

Commitment should stay gated:

  • send the email
  • publish the post
  • issue the refund
  • change the contract state
  • promise a delivery date
  • mark a client-facing task complete

The gate map

For each workflow, add one of these labels to every step:

LabelMeaning
AutoSafe to run without review
DraftAgent prepares, human approves
AskAgent stops and asks a question
BlockAgent is not allowed to do this
Prompt
Review this workflow and label every step Auto, Draft, Ask, or Block. Use Draft for customer-facing communication, money, promises, contracts, and shared record changes. Explain any step where the label is not obvious.

The point

The goal is not to remove human judgment.

The goal is to stop wasting human judgment on the low-risk parts.

That is how AI makes the business faster without making it reckless.

Section 5 of 7
Build The Second Builder Into The System

Build The Second Builder Into The System

Codex is not just a spare tire

If Claude Code is your main builder, Codex can still be part of the working system.

Use it for three jobs.

1. Backup path

Critical workflows should have a second way to run.

That means:

  • business context lives in files, not chat history
  • core workflows are skills or scripts
  • the repo has an AGENTS.md style file
  • the workflow has been tested in more than one builder

Do this before the outage, not during it.

2. Second implementation path

When a workflow matters, ask Codex for a second build path or review.

Not because one tool is always better.

Because disagreement exposes hidden assumptions.

Prompt
Review this Claude Code workflow as a second builder. Identify missing assumptions, unclear inputs, risky side effects, weak approval gates, and any part that would be hard to run in Codex. Do not rewrite yet. Return the risks first.

3. GUI-heavy work

Some business tools do not have clean APIs or connectors.

Codex can be useful when the interface itself is the path: clicking, checking, comparing, or driving a desktop app.

Use the richest interface available first.

Connector beats MCP. MCP beats raw API. API beats browser clicks. Browser clicks beat being stuck.

The founder rule

If losing one AI tool stops the business, the system is too fragile.

A real AI OS should have a Day 2 plan.

Section 6 of 7
Watch What Runs

Watch What Runs

The workflow that kinda works is the risky one

Clear failures get attention.

Quiet half-successes get ignored.

That is where AI work becomes expensive, stale, or wrong without anyone noticing.

A small business does not need a giant command center.

It needs visibility.

Minimum run log

Every repeated workflow should leave a record with:

text
Workflow name:
Run time:
Status:
Input record:
Output link:
Cost or token estimate:
Human approval needed:
Failure reason:
Next action:

This can start as a table.

It does not need to be fancy.

It does need to exist.

Weekly review

Once a week, ask:

  • Which workflow saved real time?
  • Which one failed quietly?
  • Which one asked for too much founder input?
  • Which one cost more than expected?
  • Which context file is stale?
  • Which approval rule needs tightening?
Prompt
Review this week's AI workflow run log. Find the top 3 issues by business impact. For each one, explain the likely cause, the fix, and whether this is a workflow problem, context problem, tool problem, or approval problem.

The actual goal

You are not trying to watch agents for fun.

You are trying to stop the founder from becoming the monitoring layer.

That is when the AI OS starts to feel like part of the business instead of another thing to manage.

Section 7 of 7