Redproof: AI red-teaming in Europe & EU AI Act testing

The clock that's already running

Aug 2
2026

Most of the EU AI Act becomes operative for high-risk and general-purpose AI systems. By then adversarial testing is no longer a nice-to-have. Procurement teams in regulated sectors across Europe already ask for a red-team report before they sign anything. Around 3,200 Dutch businesses fall directly in scope, and the work itself takes weeks. The real risk is leaving it too late.

What I test

I test the whole system, not just the model.

The base model is rarely the weak point. The real gaps sit in your prompts, your retrieval pipeline, and the tools your agent is allowed to call. That is where I go looking.

LLM01

Prompt injection

Direct and indirect, including multi-turn jailbreaks and payloads buried in the documents, web pages, and tool output your agent quietly trusts.

LLM02

Sensitive information disclosure

Coaxing the system into leaking secrets, training data, internal errors, or other users' information.

LLM05

Improper output handling

Unsafe or unescaped output (XSS, SSRF, leaked stack traces) that the app around the model trusts and renders.

LLM06

Excessive agency

Getting an agent to call APIs, move money, or take actions it was never meant to take.

LLM07

System prompt leakage

Extracting your system prompt and the tool schema that is supposed to protect it.

Bespoke

Business-logic abuse

The exploits specific to your product (your pricing, your workflow, your data boundaries) that sit beyond the OWASP checklist.

How an engagement runs

A fixed, predictable sequence.

Scope

I agree the target system, threat model, and rules of engagement with you, all in writing.

Attack

Automated breadth, then manual depth where the real findings hide.

Triage

Each finding ranked by severity, with a working proof of exploit.

Report

Plain-language findings mapped to OWASP LLM and the relevant EU AI Act articles.

Re-test

You patch, I re-run the attacks, and your evidence shows the fix held.

Pricing

Fixed-scope packages. No "contact sales" maze.

Most teams start with a Full Engagement, then move to a quarterly retainer as the product changes.

Baseline

Baseline Scan

from €3,500

Automated testing of one AI feature (for example, your chatbot or document summarizer)
Findings report with severity
3-5 days

Most start here

Full Engagement

from €8,000

Automated + manual custom attacks
OWASP LLM + EU AI Act mapping
Remediation guidance
2-3 weeks

Agents

Agent Engagement

from €15,000

Full engagement on a tool-using agent
Tool-misuse & action-safety testing
Re-test included
3-4 weeks

Ongoing

Retainer

from €1,500 / quarter

Re-test as your system changes
New-attack coverage
Always-current evidence

Enterprise vendors start around €15k for one engagement, and a junior often does the actual testing. With Redproof the person who understands your system is the person testing it. Priced for the stage you are at, not theirs.

Who's behind it

Not a scanner with a logo.

I'm Mohamad (Sam) Rostami, a platform and infrastructure engineer. I build and run the production systems behind large AI models at Together AI. Most security shops point a tool at your endpoint and email you the printout. I work the way an attacker actually does, because knowing how these systems are built, deployed, and broken is my day job, not a side service.

I scope your engagement, I run the attacks, and I write the report. No handoff to a junior, no account manager in the middle. As the work grows I bring in vetted specialists for the larger jobs, but the standard holds on every test: senior hands, start to finish.

FAQ

The questions every team asks first.

Q. Can you test our live production system?

I prefer a staging or test tenant with seeded data, but yes, I can test production with your signed authorization, strict rate limits, and clear stop conditions. If anything risks real user data or stability, I pause and call you. I never run load or denial-of-service testing.

Q. How long does it take?

A Baseline Scan is 3-5 days, a Full Engagement 2-3 weeks, and an Agent Engagement 3-4 weeks. The testing itself takes weeks, so the teams that start before procurement asks are the ones who aren't squeezed.

Q. What happens after I book a call?

A 20-minute scoping call to agree the target and threat model, then a fixed quote within a day. No "contact sales" maze. Once the scope is agreed, you sign the engagement paperwork and authorization, give me access, I test, and you get a ranked report plus a walkthrough call.

Q. What do you need from us?

What your AI does and the rules it must hold, the tools and data it can reach, and a safe way to connect to it. You can share your system prompt or just describe it. I adapt to closed third-party platforms too. No prep beyond a short checklist I send you.

Q. What if you disagree with a finding?

I walk you through it on the call. Every finding ships with a working proof of exploit and a fixed severity rubric, but if something is genuinely not exploitable or is known, accepted debt, I say so and annotate it. The evidence is only useful if it's honest.

Q. Do you offer ongoing testing?

Yes. After you patch, I re-run the attacks and the report shows the fixes held. As your system changes, a quarterly retainer keeps the evidence current with new-attack coverage. Most teams move to one once the product is shipping changes regularly.

Break it in private.
Prove it in public.