/10 min read/Tracehold team

Task-scoped credentials for AI agents: AWS STS vs GCP WIF vs Azure Managed Identity

Your AI agent does not need a permanent API key. Here is how to issue short-lived, least-privilege credentials per task using each cloud's native identity federation, and why the old way will eventually burn you.

credentialsAI agent securityAWS STSGCP Workload IdentityAzure Managed Identityleast privilege

The permanent key problem

Most AI agent setups ship with a long-lived API key baked into the environment. AWS access keys in .env, a GCP service account JSON on disk, an Azure client secret in a vault that nobody rotates. The agent starts, reads the key, and now it has the same permissions for every task it runs until someone remembers to rotate.

This works until it doesn't. A prompt injection trick, a confused-deputy bug, or a simple scope creep and suddenly the agent is calling iam:AttachRolePolicy with a key that has AdministratorAccess. The blast radius is the entire account.

The fix is not a better secret manager. The fix is credentials that are born scoped to one task and die when the task ends.

All three major clouds already support this. The mechanisms are different, the terminology is different, but the outcome is the same: a short-lived token that can only do what the current task requires.

How task-scoped credentials work

The idea is simple:

  1. Agent starts a task ("provision staging-v2 for QA").
  2. Before the first tool call, a credential is minted that covers only the actions this task needs (e.g. ec2:RunInstances, ec2:CreateTags in a specific VPC).
  3. The credential has a TTL, typically 15 minutes. It auto-expires even if nobody revokes it.
  4. When the task ends, the credential is revoked explicitly as a belt-and-suspenders measure.

No permanent key is stored. No broad permissions are inherited. If the agent gets hijacked mid-task, the attacker gets a 15-minute token that can only launch t3.medium instances in staging. That is a bad day. It is not an account takeover.

AWS STS: AssumeRole with inline session policies

AWS Security Token Service is the most mature option and the one most teams hit first.

The mechanism

Your organization creates a single gateway IAM role (e.g. TraceholdGatewayRole) with a broad trust policy that allows your credential issuer to assume it. The role itself has wide permissions, but that's fine because every assumption narrows them down with an inline session policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["ec2:RunInstances", "ec2:CreateTags"],
      "Resource": "*"
    }
  ]
}

The resulting session token can only do what the inline policy allows, regardless of what the underlying role permits. This is the key insight: the intersection of the role policy and the session policy is what the agent actually gets.

The call

sts.assume_role(
    RoleArn="arn:aws:iam::123456789:role/TraceholdGatewayRole",
    RoleSessionName=f"tracehold-{agent_id[:8]}-{task_id[:8]}",
    Policy=inline_session_policy_json,
    DurationSeconds=900,  # 15 minutes
)

The response contains three values: AccessKeyId, SecretAccessKey, and SessionToken. Hand these to the agent as environment variables. They expire in 15 minutes whether or not anyone revokes them.

The tradeoff

Strengths:

  • Inline session policies are powerful. You can scope down to specific resource ARNs, conditions, even IP ranges.
  • Session names show up in CloudTrail, so you get free traceability back to the agent and task.
  • No additional infrastructure. STS is built into every AWS account.

Weaknesses:

  • STS session tokens cannot be revoked early. If you need instant revocation, you have to attach a deny-all inline policy to the role, which affects all active sessions.
  • The maximum session duration is 12 hours (role-level setting), and inline session policies are limited to 2048 characters. Complex permission sets require creative policy design.
  • The agent needs network access to STS. In a VPC without a NAT gateway or VPC endpoint, this fails silently.

When to use it

AWS STS is the right choice when your agents run on AWS infrastructure and you need fine-grained, per-action scoping. It is the most flexible of the three options.

GCP Workload Identity Federation: token exchange without service account keys

GCP's approach to short-lived credentials is Workload Identity Federation (WIF). The goal is the same as STS but the mechanism is different: instead of assuming a role, you exchange an external identity token for a GCP access token.

The mechanism

You configure a workload identity pool and a provider that trusts your credential issuer (e.g. an OIDC provider, an AWS account, or a custom token service). When the agent needs credentials, your issuer mints a token, exchanges it with GCP's Security Token Service, and then impersonates a service account.

The flow:

  1. Your credential issuer generates a signed JWT or OIDC token that asserts the agent's identity and task scope.
  2. The token is exchanged via sts.googleapis.com/v1/token for a federated access token.
  3. The federated token impersonates a GCP service account via iamcredentials.googleapis.com/v1/serviceAccounts/{sa}:generateAccessToken.
  4. The resulting access token has the service account's permissions, scoped by IAM conditions.

Scoping permissions

Unlike AWS inline session policies, GCP does not support per-request permission narrowing at the token level. Instead, you scope through:

  • Multiple service accounts: create one per permission set (e.g. bigquery-reader@, storage-writer@) and impersonate the right one per task.
  • IAM Conditions: use resource.name conditions, time-based conditions, or custom attributes to limit what the service account can do.
  • Short TTL: the access token defaults to 1 hour but can be set as low as 15 minutes via lifetime parameter.

The tradeoff

Strengths:

  • No service account keys on disk, ever. The entire flow is keyless.
  • Workload Identity Federation supports cross-cloud identity (e.g. an AWS-hosted agent can get GCP credentials without storing GCP secrets).
  • Access tokens are bearer tokens, simpler to pass around than STS's three-value credential set.

Weaknesses:

  • Per-request scoping is coarser than AWS. You cannot attach an inline policy to a token exchange. Scoping requires pre-built service accounts or IAM conditions.
  • Initial setup is more complex. You need a workload identity pool, a provider, and the trust chain configured before you can issue a single token.
  • Revocation: you can disable the service account, but individual tokens cannot be revoked. TTL is the primary control.

When to use it

GCP WIF is the right choice when your agents touch GCP resources and you want to eliminate service account keys entirely. It is also the best option for cross-cloud setups where agents run on AWS or Azure but need GCP access.

Azure Managed Identity: identity at the infrastructure level

Azure takes a different approach. Instead of issuing tokens from a central credential service, Azure assigns an identity to the compute resource itself (the VM, the container, the Function). The agent inherits the identity from the infrastructure it runs on.

The mechanism

There are two flavors:

  • System-assigned: Azure creates an identity tied to the lifecycle of the resource. Delete the VM, the identity goes with it.
  • User-assigned: you create the identity independently and attach it to one or more resources. This is the one you want for agents, because you can share a single identity across a fleet and manage its permissions centrally.

The agent retrieves a token from the Azure Instance Metadata Service (IMDS):

GET http://169.254.169.254/metadata/identity/oauth2/token
    ?api-version=2018-02-01
    &resource=https://management.azure.com/

No credentials are passed. The VM's identity is proven by the fact that the request comes from the VM itself. The response is a bearer token with the identity's permissions.

Scoping permissions

Azure scoping happens through RBAC role assignments:

  • Assign roles at the resource group or resource level, not the subscription level.
  • Use custom roles with only the actions the task needs (e.g. Microsoft.Compute/virtualMachines/read + Microsoft.Compute/virtualMachines/deallocate).
  • Combine with Conditional Access policies for time-based or location-based restrictions.

The tradeoff

Strengths:

  • Zero credential management. No keys, no tokens to store, no rotation to schedule. The identity is the infrastructure.
  • Simplest developer experience: the SDK handles token retrieval automatically via DefaultAzureCredential.
  • System-assigned identities are automatically cleaned up when the resource is deleted. No orphaned credentials.

Weaknesses:

  • Scoping is per-identity, not per-request. You cannot narrow permissions on a per-task basis without switching identities.
  • Task-level scoping requires creating multiple user-assigned identities (one per permission set) and attaching/detaching them per task. This is operationally heavier than STS inline policies.
  • Only works when the agent runs on Azure compute. No equivalent of WIF's cross-cloud federation.
  • IMDS is only reachable from the VM itself. If your credential issuer runs outside the VM, you need a different approach.

When to use it

Azure Managed Identity is the right choice when your agents run on Azure compute and you want zero credential management overhead. For fine-grained per-task scoping, combine it with multiple user-assigned identities and automate the attachment.

Side-by-side comparison

DimensionAWS STSGCP WIFAzure MI
Per-request scopingInline session policies (fine-grained)Service account + IAM conditions (coarse)RBAC role assignment (coarse)
Credential formatAccessKeyId + SecretKey + SessionTokenBearer access tokenBearer access token
Default TTL15 min to 12 hours15 min to 1 hour1 hour (adjustable)
Early revocationNot possible for sessionsNot possible for tokensNot possible for tokens
Cross-cloudNoYes (WIF accepts external OIDC)No
Setup complexityLow (built into every account)Medium (pool + provider + SA chain)Low (assign identity to resource)
Key on diskNo (STS is keyless)No (WIF is keyless)No (IMDS is keyless)
Agent locationAnywhere with STS accessAnywhere with OIDC tokenAzure compute only

The pattern: a credential gateway

Regardless of which cloud you use, the pattern is the same:

  1. Intercept the tool call before it reaches the cloud API.
  2. Classify the action to determine what permissions it needs.
  3. Mint a credential scoped to exactly those permissions, with a TTL tied to the task duration.
  4. Hand the credential to the agent as environment variables or a bearer token.
  5. Log everything: which agent, which task, which action, which permissions, when it was issued, when it expired.
  6. Revoke on task completion as a safety net, even though the TTL handles it.

This is what a credential gateway does. It sits between the agent and the cloud, maps every action to the minimum credential it needs, and ensures nothing gets a permanent key or a broad role.

The alternative is trusting that your agent will never be tricked, never drift, and never use its AdministratorAccess key for something you didn't intend. That bet gets worse every time you add an agent.

If you want to see task-scoped credentials in action against a live agent workflow, book a 30-minute walkthrough. We run it on our sandbox, no SDK install.


See Tracehold in action

30-minute sandbox walkthrough. No SDK install, no credentials.

Book a demo