Harness Engineering for Legacy Migration (Part 2): Practical Implementation, Agent Design, and System Setup

Sascha Turowski

April 22, 2026

In theory, harness engineering gives us a safe way to use AI for legacy migrations. In practice, the challenge is very concrete:

How do you design a system that can understand, analyze, and transform an entire legacy codebase—without losing control?

This post focuses on a practical setup, including a multi-agent pipeline, system design choices, and the real limitations you must account for.

Starting point: inventory before transformation

Before any migration begins, the system needs a structured understanding of both codebases:

the legacy system (what exists)
the target system (what should exist)

A useful approach is to introduce a scanning agent as the first stage in the harness.

This agent does not rewrite code. It builds an inventory.

From a technical perspective, it extracts:

modules, services, and dependencies
APIs and data flows
database interactions
external integrations

From a business perspective, it attempts to infer:

domain concepts (e.g., “orders”, “payments”, “users”)
critical workflows
implicit business rules embedded in code

A well-designed harness persists this as a structured knowledge base (graphs, schemas, or indexed documents), not just text summaries.

Identifying mistakes and logical problems

A second responsibility of the scanning phase is diagnostic analysis.

A dedicated analysis agent (or pass) can flag:

dead code
duplicated logic
inconsistent naming or models
unreachable branches
error-prone patterns
violations of modern best practices

More interestingly, it can surface logical inconsistencies, such as:

conflicting business rules across modules
mismatched assumptions between services
edge cases handled differently in different places

This is one of the hidden advantages of AI-assisted migration:
you are not just moving the system—you are auditing it.

Why you should not scan everything at once

It is tempting to give a large model the entire codebase and ask:

“Explain this system and rewrite it.”

This approach still fails in practice, for several reasons.

1. Context window limits still matter

Even with modern large context windows, full codebases often exceed limits. More importantly, even when they fit, effective attention degrades.

Models do not treat all tokens equally. As input grows:

important details get diluted
long-range dependencies are missed
reasoning becomes less reliable

2. “Slope” effects in long contexts

As context size increases, performance does not degrade linearly—it often drops off in a slope-like pattern:

early parts of the input dominate
middle sections are partially ignored
later sections may be over-weighted or inconsistently used

This creates subtle, hard-to-detect errors.

3. Hallucination risk increases with scope

When asked to reason over very large, complex systems:

the model fills gaps with plausible assumptions
it may “connect” components that are unrelated
it may invent abstractions that do not exist

This is not eliminated by newer models—it is reduced, but not removed.

Are bigger models solving this?

Larger and more advanced models have:

bigger context windows
better retrieval and reasoning
improved consistency

However, the fundamental constraints remain:

Attention is still finite
Noise increases with scale
Precision drops when tasks are underspecified

In practice, the best results still come from:

structured decomposition + retrieval + iterative reasoning

Not from single-pass “understand everything” prompts.

A multi-agent harness pipeline

To address these constraints, a practical system uses multiple specialized agents, each operating within controlled scope.

This is not just architectural preference—it is a reliability requirement.

A typical pipeline might look like this:

1. Inventory Agent

scans code in chunks
builds structured representations
extracts entities, relationships, and flows

Output:

system map
dependency graph
domain model candidates

2. Analysis Agent

reviews inventory data
identifies problems and inconsistencies
flags technical and logical issues

Output:

diagnostic report
risk areas for migration

3. Mapping Agent

aligns legacy components with target architecture
defines how modules should be transformed

Output:

migration plan per module/service
mapping between old and new structures

4. Transformation Agent

rewrites code incrementally
translates patterns, languages, and frameworks

Output:

candidate implementations

5. Verification Agent

runs tests and behavioral comparisons
checks correctness against legacy behavior

Output:

pass/fail signals
structured feedback

6. Integration Agent

prepares validated code for merge
ensures compatibility with the new system

Output:

deployable artifacts or pull requests

Why multiple agents matter

Splitting responsibilities across agents provides:

Controlled context
Each agent sees only what it needs. This improves accuracy.

Clear handoffs
Outputs become structured inputs for the next step.

Debuggability
You can identify where failures occur:

misunderstanding (inventory)
bad reasoning (analysis)
incorrect rewrite (transformation)

Iterative improvement
You can refine prompts and behavior per agent, instead of globally.

Human in the loop: where it fits

A fully autonomous system is not the goal—at least not today.

Human involvement is most valuable at:

Inventory validation
Confirm that the system map reflects reality
Business logic review
Validate inferred rules and workflows
Risk decisions
Approve transformations in sensitive areas
Final integration
Decide what gets merged or deployed

The harness should make human input:

targeted (only where needed)
efficient (review structured outputs, not raw code dumps)

Iterative learning and prompt evolution

One of the most practical advantages of this setup is that it improves over time.

You can:

refine prompts per agent
add rules based on past failures
incorporate domain-specific constraints
store successful patterns and reuse them

This turns the harness into a learning system, not just a pipeline.

A minimal system setup

A practical implementation might include:

Code ingestion layer
- repo access
- file chunking and indexing
Knowledge store
- vector database or structured graph
- stores system understanding
Agent orchestration
- workflow engine controlling agent sequence
Execution environment
- sandbox for running code and tests
Evaluation layer
- automated checks and comparisons
Observability
- logs, traces, and metrics for every step

Final thought

It is natural to hope that better models will eventually remove the need for careful system design.

So far, the opposite has been true.

As models become more powerful, the importance of harness engineering increases, not decreases.

The winning approach is not:

“Give the model everything”

It is:

“Design a system where the model only needs to understand what matters—at the right time.”

That is what turns AI from a risky abstraction engine into a precise tool for evolving complex systems.

Harness Engineering for Legacy Migration (Part 2): Practical Implementation, Agent Design, and System Setup

Starting point: inventory before transformation

Identifying mistakes and logical problems

Why you should not scan everything at once

Are bigger models solving this?

A multi-agent harness pipeline

Why multiple agents matter

Human in the loop: where it fits

Iterative learning and prompt evolution

A minimal system setup

Final thought

You may also like

Leave a Comment Antwort abbrechen

Harness Engineering for Legacy Migration (Part 2): Practical Implementation, Agent Design, and System Setup

Starting point: inventory before transformation

Identifying mistakes and logical problems

Why you should not scan everything at once

Are bigger models solving this?

A multi-agent harness pipeline

Why multiple agents matter

Human in the loop: where it fits

Iterative learning and prompt evolution

A minimal system setup

Final thought

Enjoying this article?

You may also like

Compacting in LLMs: Making Big Models Leaner Without Losing Their Mind

Harness Engineering for Legacy Migration (Part 2): Practical Implementation, Agent Design, and System Setup

Harness Engineering: The Missing Layer in AI-Powered Software Development

Leave a Comment Antwort abbrechen