Does Tracehold sit in the data path between my agent and the cloud?

Yes. Tracehold acts as a single enforcement gateway that intercepts every tool call before it reaches your cloud provider. Every action is classified by risk level, evaluated against your policies, and either allowed, blocked, or held for human approval. You can also run in observe-only mode to start without blocking anything.

Which cloud providers and agent frameworks does Tracehold support?

AWS, GCP, Azure, Kubernetes, Cloudflare, Datadog, GitHub, and Terraform ship with 540+ pre-classified actions out of the box. Snowflake, PagerDuty, GitLab, Vault, Pulumi, Confluent, and most REST-API providers can be added through custom classifier rules. Tracehold works with most agent frameworks that can call external tools. Typical integrations take minutes with no agent code changes. We support LangChain, Claude Agent SDK, OpenAI Agents SDK, and CrewAI out of the box.

What happens if the Tracehold gateway goes down?

Tracehold supports configurable fail modes per organization. You can choose fail-open (agents continue with logging only) or fail-closed (agents are paused until the gateway recovers). Most production deployments use fail-closed for critical workloads and fail-open for non-critical ones.

Is Tracehold self-hosted or managed?

Both. You can self-host Tracehold in your own VPC so your audit trail and credentials never leave your infrastructure, or use our managed deployment. The entire stack runs on Docker Compose with PostgreSQL, Redis, and an OpenTelemetry collector.

How is Tracehold different from a WAF or prompt injection filter?

WAFs and prompt filters protect the input side. Tracehold protects the output side: the actual cloud actions your agent takes. We classify every tool call by risk and blast radius, issue short-lived credentials scoped to each task, and maintain a tamper-evident audit trail. Think of it as IAM and audit for AI agents, not another input filter.

Does Tracehold align with OWASP and NIST standards?

Yes. Tracehold is built around the OWASP Top 10 for Agentic Applications 2026 (ASI01 through ASI10) and maps every gateway decision to NIST SP 800-53 controls (AC-6, AU-2, AU-12, CM-5, IA-9, SC-28, SI-7). The immutable audit trail exports as compliance evidence for SOC 2 and NIST AI RMF assessments.

How long does it take to integrate Tracehold?

Typically minutes. You create a connection endpoint in the Tracehold dashboard, select which cloud actions to expose, and point your agent at the URL with an API key. In common setups, no SDK installation, no code changes, and no redeployment are required.

How to build an immutable audit log with HMAC hash chaining

Append-only is not tamper-evident. Here is how to build an audit log that proves nothing in the past has changed, using HMAC-SHA256, a canonical schema, and a Postgres advisory lock. Lessons from production.

Append-only is not tamper-evident

The first time a customer asks "can you prove nothing in this log has been changed?" is usually the moment a team realizes their audit log is not an audit log. It is a table of events that happens to be written to, but not read, by the application.

A normal database table is append-only in practice because nobody writes UPDATE or DELETE queries against it. That is not a security property. It is a convention. An attacker who lands on the production database with credentials, or an insider with legitimate access, can edit a row and the rest of the system will never notice.

What an auditor, a CISO, or a regulator asks for is different. They want proof. They want a cryptographic primitive that makes it impossible to rewrite history without also rewriting every entry that followed, and for that rewriting to be detectable with a verification run they can execute themselves.

This post walks through how to build that. The code is from our production audit engine. The lessons are from the mistakes we made getting there.

The shape of a hash-chained log

Every entry carries a sequence_number that monotonically increases per organization, and an entry_hash computed over a canonical string that includes:

The entry's own content fields.
The previous entry's entry_hash.

If you change any field in entry 42, its hash no longer matches its own content. You can rewrite the hash, but now entry 43's prev_hash is wrong, and so are 44, 45, 46, and every entry through to the latest one. To cover the tampering, an attacker has to rewrite every entry from 42 to MAX(sequence_number), and also produce a valid signature for each.

That last part is where HMAC comes in. Without a shared secret, the attacker could recompute every hash and the chain would look correct. HMAC-SHA256 means the hash itself is a keyed MAC; you need the key to forge a valid entry. Without it, any tampering shows up as a verification failure.

The canonical string: the part everyone gets wrong

The single most important decision in a hash-chained log is which fields go into the hash. Anything you leave out can be silently mutated.

Our first version covered six fields:

"|".join([
    str(seq),
    organization_id,
    event_type,
    summary,
    params_hash or "",
    decision or "",
    prev_hash,
])

This was fine until a code reviewer pointed out that actor_id, actor_type, action_id, risk_level, cloud_provider, trace_id, and nist_controls were all mutable without invalidating the chain. An attacker with UPDATE rights could rewrite who did the action, which resource was touched, what the risk level was, and the chain would still verify as valid.

The fix was to expand the canonical string to cover every field an attacker with UPDATE rights could plausibly want to rewrite:

def _canonical_v2(entry, *, seq, prev_hash, params_hash, created_at_iso) -> str:
    return "|".join([
        "v2",
        str(seq),
        entry.organization_id,
        entry.actor_type,
        entry.actor_id,
        entry.agent_id or "",
        entry.task_id or "",
        entry.action_id or "",
        entry.user_id or "",
        entry.event_type,
        entry.summary,
        params_hash or "",
        entry.decision or "",
        entry.execution_status or "",
        _stringify_list(entry.nist_controls),
        _stringify_list(entry.owasp_tags),
        entry.trace_id or "",
        entry.risk_level or "",
        entry.cloud_provider or "",
        created_at_iso,
        prev_hash,
    ])

Two things worth calling out.

Schema versioning. We prefixed the v2 canonical with the literal string "v2" and added a hmac_schema_version column on the row. Old entries verify against _canonical_v1. New entries verify against _canonical_v2. The verifier dispatches per-row. This means you can evolve the canonical string over time without invalidating the history you already wrote.

Never mutate a canonical function. Once _canonical_v1 has been used for a single production row, it is frozen forever. If you want to change what's covered, write _canonical_v2. This is exactly the same discipline as database migration files: the past is read-only.

The concurrency trap

A hash chain is a sequence. If two requests append entries concurrently, they must not both read the same MAX(sequence_number) or you end up with two entries claiming sequence N, both pointing at the same prev_hash. The chain has forked, and verification will fail against whichever one wins the insert.

Our first implementation looked like this:

# WRONG: racy
max_seq = await db.execute(select(func.max(AuditLog.sequence_number)))
# Then SELECT ... FOR UPDATE on the row at max_seq

The intent was: lock the last row, read its hash, compute the next one, insert. The bug is that two concurrent transactions can both execute the MAX query before either one has inserted. Neither blocks the other. Both read the same value. Both insert max + 1. The chain forks.

FOR UPDATE on the last row does not help; it only serializes transactions that also try to read or update that row, and inserting a new row does not count.

The fix is a transaction-scoped advisory lock keyed on the organization:

await self.db.execute(
    text("SELECT pg_advisory_xact_lock(hashtext(:org_key))"),
    {"org_key": f"tracehold:audit:{req.organization_id}"},
)

max_result = await self.db.execute(
    select(func.max(AuditLog.sequence_number))
    .where(AuditLog.organization_id == req.organization_id)
)

pg_advisory_xact_lock takes an integer key, is held until the transaction commits or rolls back, and serializes every concurrent append for a given organization without touching the audit rows themselves. hashtext() deterministically maps the org UUID to an int4 key. No extra infrastructure. No Redis. No retry logic. Just Postgres.

Per-organization scoping matters. If you take a global lock, you serialize every append in the system behind a single mutex. With a per-org lock, Org A and Org B append in parallel; only within an org is the chain single-threaded. That is exactly what the chain property requires. No more, no less.

What the append actually does

Once the lock is held, the append is four steps:

Read MAX(sequence_number) for this org. Next sequence is max + 1.
Read the previous entry's entry_hash (or use a genesis value of "0" * 64 for the first entry).
Hash the request parameters separately: params_hash = sha256(canonical_json(parameters)). Store the hash in the canonical string; either store the raw params or redact them, depending on your PII posture.
Build the canonical string, compute entry_hash = hmac_sha256(key, canonical_string), and insert the row.

from app.core.security import compute_hmac

canonical = _canonical_v2(
    req,
    seq=next_seq,
    prev_hash=prev_hash,
    params_hash=params_hash,
    created_at_iso=now_iso,
)
entry_hash = compute_hmac(canonical)  # HMAC-SHA256 with the audit key

row = AuditLog(
    organization_id=req.organization_id,
    sequence_number=next_seq,
    prev_entry_hash=prev_hash,
    entry_hash=entry_hash,
    hmac_schema_version=2,
    created_at=now,
    # ... all the fields that went into canonical
)
db.add(row)

Commit releases the advisory lock. The next waiter proceeds.

Verifying the chain

The verifier walks the chain in order and re-computes each hash from scratch:

async def verify_chain_integrity(db, org_id: str) -> ChainIntegrityReport:
    rows = await db.execute(
        select(AuditLog)
        .where(AuditLog.organization_id == org_id)
        .order_by(AuditLog.sequence_number.asc())
    )
    expected_prev = GENESIS_HASH  # "0" * 64

    for row in rows.scalars():
        canonical = _canonical_for_version(row, prev_hash=expected_prev)
        expected = compute_hmac(canonical)
        if expected != row.entry_hash or row.prev_entry_hash != expected_prev:
            return ChainIntegrityReport(
                organization_id=org_id,
                total_entries=row.sequence_number,
                valid=False,
                first_invalid_sequence=row.sequence_number,
                error="hash mismatch" if expected != row.entry_hash else "prev_hash mismatch",
            )
        expected_prev = row.entry_hash

    return ChainIntegrityReport(organization_id=org_id, total_entries=count, valid=True)

The function dispatches to _canonical_v1 or _canonical_v2 based on row.hmac_schema_version. Old rows verify under the old rules. New rows verify under the new rules. Neither changes.

First-invalid reporting matters: when verification fails, you want to know which row is the first one out of sync, because that is the boundary of the tampering. Everything before it is still provably intact.

The layered threat model

HMAC hash chaining is strong against a narrow threat: an attacker with UPDATE or DELETE rights to the audit table who does not have the HMAC key. That covers a lot of realistic scenarios (a compromised replica, an insider with SQL access, a backup exfiltration), but it does not cover two important ones.

The app is compromised. If an attacker runs code in your FastAPI process, they have the HMAC key in memory and can forge any entry or rewrite the chain. HMAC alone cannot defeat this.

The database is compromised at the schema level. An attacker who can grant themselves the HMAC key via a stored secret, or who tampers with the verification code itself, can hide their own rewrites.

The defense-in-depth answers are:

Database-level append-only. A migration that issues REVOKE UPDATE, DELETE ON audit_logs FROM PUBLIC for the app role, plus a BEFORE UPDATE OR DELETE trigger that raises an exception. An application bug cannot then accidentally mutate a row, and a SQL-injection vulnerability loses the ability to rewrite history. This is not a cryptographic guarantee, but it removes one of the easier attack paths.
External anchoring. On a timer, write the latest chain hash to an external system the attacker does not control: S3 Object Lock with compliance mode, a KMS-signed artifact, or a managed audit-log service. An attacker rewriting the chain now has to also forge a valid anchor at the external system, which is materially harder. The verifier can cross-check the anchor during compliance export, making any rewrite detectable even if the HMAC key has leaked.
Per-chunk KMS signatures. For high-assurance environments, sign every 1,000-row chunk with an AWS KMS or GCP KMS key. The key never leaves the HSM. Even a fully-compromised app process cannot forge a chunk signature without going through the KMS API, which is itself logged.

We ship (1) and (2) today; (3) is on the roadmap for customers with FedRAMP High requirements.

What it looks like in a compliance export

A customer asked us last quarter whether we could produce evidence that a specific decision was real and not retrofitted after an incident. We gave them a compliance bundle with a chain_proof.json:

{
  "organization_id": "...",
  "first_sequence": 1,
  "last_sequence": 4731,
  "first_entry_hash": "a3f2...",
  "last_entry_hash": "e109...",
  "latest_external_anchor": {
    "written_to": "s3://customer-audit-anchors/2026-04-17T23:00:00Z",
    "sha256": "e109..."
  },
  "verify_command": "python scripts/verify_bundle.py"
}

Paired with a MANIFEST.json (sha256 of every file in the zip) and a detached SIGNATURE, this gave the auditor three independently verifiable artifacts: the chain itself, the manifest hash, and the external anchor. Any one of them being tampered with would show up as a failed verification step.

The takeaway

A hash-chained, HMAC-signed audit log is not a heavy piece of infrastructure. It is four fields on your audit table (sequence_number, prev_entry_hash, entry_hash, hmac_schema_version), an advisory lock around the append, and a canonical string you agree never to change.

The work is not in the code. The work is in:

Picking a canonical string that covers every mutable field, on day one, and agreeing not to alter it.
Versioning the canonical so you can evolve it without breaking old rows.
Using a per-org advisory lock, not a row-level one.
Layering database-level append-only and external anchoring on top, because HMAC alone does not defend against a compromised app.

Do those four things and you have something you can hand to an auditor. Skip any of them and you have a table that looks like an audit log but does not pass a serious review.

If you want to see this in production, Tracehold intercepts every tool call an AI agent makes, writes a hash-chained audit row, and ships a signed compliance bundle on demand. Book a 30-minute walk-through and we will show you the chain end-to-end, including a live tamper demonstration against a read replica.