How Kaptrix Works

The problem

AI diligence has a structural blind spot.

Diligence on AI-driven businesses has a structural problem: the thing being evaluated is often the thing least visible in a data room. Demos perform. Decks assert. Founders narrate. The underlying system, models, dependencies, data posture, controls, failure modes, sits behind a layer of interpretation that traditional diligence was not designed to penetrate.

Demos don't generalize

A polished prototype and a production-grade system can look identical for forty-five minutes.

Interviews anchor on narrative

Management tells a coherent story. Coherence is not evidence.

Workstreams fragment

Technical, data, and governance analysis live in different workstreams. Nothing stitches them into a single decision.

AI systems fail in new ways

Hidden vendor wrappers, brittle models, unclear data provenance, governance that hasn't caught up. Traditional frameworks weren't built to detect these.

Kaptrix is built for the moment after that gap becomes visible. It gives investment teams a structured, evidence-backed view of whether an AI system is real, durable, and worth the capital, and a live reasoning surface to interrogate that view as new information arrives.

What Kaptrix is

Three engines. One platform.

Kaptrix combines three things that don't usually coexist in a single platform. The operator owns the score. The AI expands what the operator can see. Evidence, not opinion, is what moves anything.

Structured scoring engine

The same inputs produce the same output, every time.

Evidence engine

Turns artifacts into structured, machine-readable signals.

Reasoning engine

Operates continuously on top of both, grounded in what has actually been observed about this specific system.

The core idea

Most AI evaluation tools do one of two things.

They summarize

They read artifacts and produce a narrative. Fast, but non-defensible.

They score by rubric

They apply a checklist. Defensible, but static and blind to evidence.

Kaptrix does neither in isolation

A rubric-driven scoring engine runs underneath a live reasoning layer, with a strict separation between what the operator decides and what the AI contributes. The scoring logic is fixed and inspectable. The AI layer is bounded and auditable. Together they produce something that is both fast and defensible, a combination traditional diligence and first-generation AI tools have not delivered.

The three layers

Coordinated. Bounded. Each with a specific job.

Understanding the separation between the three layers is the key to understanding why the platform's outputs hold up under scrutiny.

Layer 01

The Scoring Engine

Auditable. Operator-controlled.

This is the trust anchor. Every downstream output, insights, comparisons, recommendations, risk flags, rolls up to a score produced by a fixed, inspectable process.

Six risk dimensions

Product Credibility

Whether the AI claims being made are real and substantiated.

Tooling & Vendor Exposure

How much of the system depends on external providers, and how concentrated that dependency is.

Data & Sensitivity Risk

How data is sourced, handled, licensed, and protected.

Governance & Safety

What controls exist, and whether they match the system's operating context.

Production Readiness

Whether the system is genuinely operational or still prototype-grade.

Open Validation

What has been independently verified, and what remains untested.

Four properties define how this layer behaves

Scores are reproducible

The same inputs produce the same output, every time. No model variance, no drift between runs, no ‘the AI felt differently today.’

Failure-weighted, not feature-weighted

A system that looks complete on the surface but is weak on claim integrity cannot score its way out through strong peripheral signals.

The operator assigns every base score

The AI does not. This is a hard architectural constraint, not a configuration setting, not user-toggleable.

Every score carries rationale

Nothing is stored as a number alone. Every sub-criterion is accompanied by the reasoning that produced it.

Layer 02

The Evidence Engine

Artifact-driven, not interview-driven.

Traditional diligence front-loads interviews because there is no better way to surface hidden structure when starting from zero. Kaptrix inverts this.

Artifacts come first, architecture diagrams, model documentation, policies, contracts, vendor dependencies, logs, metrics, output samples. The platform ingests them and extracts structured signals: claims made, controls in place, dependencies declared, gaps visible.

The effect: interviews become targeted follow-up rather than primary discovery. You walk into the management call already knowing what is missing, what is inconsistent, and what needs pressure.

Layer 03

The Reasoning Engine

Continuous. Grounded. Context-aware.

This is the layer most evaluation tools lack entirely, and it is what makes Kaptrix a live system rather than a report generator.

Questions it answers in context

Where is this system most likely to fail?
Which claims in the pitch are unsupported by the artifacts provided?
What is missing that should exist for a system at this maturity stage?
How does this system compare to others we have evaluated in the same category?
Where has our confidence in the score shifted since diligence began, and why?

Outputs are grounded. No generic answers. No hallucinated confidence. If the evidence does not support a conclusion, the engine says so, and flags it as a gap to close rather than a question to paper over.

How evidence changes a score

Evidence never silently modifies a score.

When the platform ingests a new artifact and extracts a signal that affects the model, it generates a structured proposal. That proposal names the sub-criterion it affects, the direction of the pressure it creates, the rationale behind it, and the supporting artifacts.

Support

Evidence that confirms an existing score.

Contradiction

Evidence that creates downward pressure.

Augmentation

Evidence that supports an upward signal.

Gap

Missing evidence that should exist but does not.

Bounded. Reviewed. Approved.

A single piece of evidence cannot move a score by an arbitrary amount. Evidence affecting one dimension cannot bleed into another. These are enforced at the engine level, not left to operator discipline.
Proposals are reviewed and approved by the operator before they affect the score. Nothing enters the composite without a human decision.
The result is a scoring model that stays live , continuously updated as new artifacts arrive, while remaining controlled.

Confidence is separate from score

A score tells you where the system landed. Confidence tells you how much to trust that landing.

Kaptrix calculates confidence independently from the score itself, based on how much of the model is actually covered by evidence, the quality of the sources feeding it, how recent the evidence is, and how consistent the signals are with each other. Confidence qualifies the score. It does not override it.

Soft signal

High score, low confidence

The evidence we have is positive, but we have not seen enough.

Strong signal

High score, high confidence

We have seen enough, and it holds up.

Investment committees need to be able to tell these two states apart. Kaptrix makes that distinction visible, explicit, and reviewable.

Why this holds up under scrutiny

Every output can be walked backwards to its source.

When an LP, an investment committee, or a legal team asks how did you arrive at this view, the answer needs to be traceable.

Every score traces to its sub-criteria and the rationale behind them.
Every adjustment traces to the proposal that triggered it and the artifact that supported it.
Every AI-generated insight traces to the signals and evidence it drew from.
No scoring logic is hidden. No adjustments happen silently. No conclusions float free of their evidence base.

The audit trail is not a feature bolted on afterward, it is the structure the platform is built on. If a conclusion cannot be traced to its source, it is not a conclusion Kaptrix will present.

What makes Kaptrix different

Two existing approaches. Each with structural limits.

Manual diligence

Defensible but slow
Inconsistent across deals
Dependent on whichever partner is most technical
Scales poorly, every evaluation starts from zero

First-gen AI analysis tools

Fast but shallow
Summarize without structure
No audit trail behind outputs
No operator judgment inside outputs

What Kaptrix combines

Structured human judgment

Automated evidence extraction

Rubric-driven scoring logic

Full auditability

Continuous reasoning that stays live throughout diligence

Faster than traditional diligence, more structured than ad hoc review, more defensible than pure AI output.

What Kaptrix will not do

Trust also comes from knowing where a system stops.

Does not predict financial performance

Evaluates the AI system itself, not the business built around it.

Does not replace full technical or legal diligence

Compresses the fragmented parts of AI-specific evaluation into a structured layer the rest of diligence can build on.

Does not operate as a black-box autonomous evaluator

The operator decides. The platform equips that decision.

Does not accept claims it cannot trace

If the evidence is not there, the gap is surfaced, not filled with inference.

What you get

When diligence concludes.

Composite score

Calibrated to failure-weighted risk, with a full dimension-level breakdown.

Confidence signal

Qualifies how much of the model is evidence-backed.

Evidence coverage map

Showing what has been validated and what has not.

Identified gaps & contradictions

A structured view of risks requiring follow-up.

Complete audit trail

Linking every output to the artifacts and decisions that produced it.

Live reasoning surface

Continues to answer questions during diligence, at IC, and after close.

The shift

Small on the surface. Large in practice.

Kaptrix moves the central question of AI diligence from one with no defensible answer to one that does.

Before

“Do we believe this system?”

After

“What evidence supports it, what contradicts it, and how much risk remains?”

Kaptrix is built to answer the second question in real time, with full context, and with logic you can defend, in front of an investment committee, a board, or an LP.

Built for high-stakes evaluation

For the moments where the decision has to be right and the reasoning has to hold up.

Pre-investment evaluation of AI-driven targets. Acquisition diligence. Validation of vendor AI claims. Assessment of internal AI initiatives where governance and capital exposure intersect. Most valuable when decisions must be made quickly, information is incomplete or biased, and claims need to be pressure-tested against evidence.

Open the platform preview

The evaluation layer for AI-driven businesses.