Quota Management

AI costs can spiral quickly without control. Almirant's quota system gives you the guardrails you need to keep spending under control without having to micromanage every execution.

What are quotas

Quotas are configurable limits that define how much your organization can consume in terms of:

Metric	Description
Tokens	Maximum number of tokens that can be processed
Cost USD	Maximum spending in US dollars
Requests	Maximum number of API calls to the provider

You can configure independent quotas for each AI provider (OpenAI, Anthropic) and define different limits based on the time period.

Period types

Quotas are configured by period, allowing you to set limits that fit your budget:

Type	Description	Use case
Daily	Resets every 24 hours at midnight (UTC)	Granular control for teams with intensive usage
Weekly	Resets every Monday at midnight (UTC)	Balance between flexibility and control
Monthly	Resets on the first day of each month	Alignment with billing cycles

tip

We recommend starting with monthly quotas aligned to your AI budget, and adding daily quotas if you detect consumption spikes that affect availability for the entire team.

Configuring quotas

To configure your organization's quotas:

Go to Settings > Quota Management.
Select the provider you want to configure.
Define the limits for each period type:
- Max tokens: Token limit per period
- Max cost USD: Spending limit in dollars
- Max requests: API call limit
Enable or disable the quota with the Active toggle.
Save changes.

Per-provider configuration

Each provider can have independent configurations. This is useful when:

You have different budgets allocated to each provider
You want to limit one provider more strictly while testing another
You need to control costs for premium models (like o1) separately

Example configuration:

OpenAI:
  - Monthly: 500,000 tokens / $50 USD / 1,000 requests
  - Daily: 50,000 tokens / $10 USD / 200 requests

Anthropic:
  - Monthly: 300,000 tokens / $30 USD / 500 requests

Alert system

Almirant proactively notifies you when your consumption approaches the configured limits. Alerts are triggered at the following thresholds:

Alert type	Threshold	Recommended action
warning_75	75% of limit	Monitor consumption closely
warning_80	80% of limit	Consider reducing non-critical operations
warning_90	90% of limit	Prepare quota expansion if needed
exceeded	100% of limit	Quota exhausted, new operations blocked

Receiving alerts

Alerts are sent to:

Organization administrators: Receive all alerts via email
Settings panel: Active alerts appear in the quotas section
Dashboard: Visual indicator when there are pending alerts

Acknowledging alerts

You can acknowledge an alert to indicate you have taken action:

Go to Settings > Quota Management.
In the Active alerts section, click on the alert.
Select Acknowledge to mark it as addressed.

Acknowledged alerts are not shown again for the same period, but a new alert will be generated if the next threshold is reached.

Viewing current usage

The quota management page shows a summary of current consumption by provider and period:

Field	Description
Provider	OpenAI or Anthropic
Period type	Daily, weekly, or monthly
Tokens used / max	Current consumption vs limit
Cost used / max	Current spending vs limit
Requests used / max	Current calls vs limit
Percentage	Visual indicator of consumption
Period end	When the quota resets

info

The percentage shown corresponds to the highest metric among tokens, cost, and requests. This ensures you see the most restrictive indicator.

Auto-resume

When a period ends, quotas reset automatically:

Counter reset: Token, cost, and request counters return to zero.
Operation unblocking: Blocked operations can resume.
Alert cleanup: Alerts from the previous period are archived.

No manual action is required for the quota to renew.

Blocked operation behavior

When the quota runs out:

AI Planning: New conversations show a quota exhausted message
AI Agents: Jobs remain in pending state and are processed automatically when quota becomes available
Running jobs: Not interrupted, but cannot initiate new calls to the provider

When the quota renews, pending jobs begin processing in FIFO order (first in, first out).

Best practices

Configure early alerts -- The 75% threshold gives you time to react before running out of quota.
Use daily quotas for granular control -- If your team consumes a lot in one day, a daily quota prevents leaving the rest of the week without service.
Align monthly quotas with billing -- Configure limits that match your monthly AI budget.
Review breakdown by project -- Identify high-consumption projects to optimize or adjust expectations.
Reserve margin for emergencies -- Don't configure quotas at 100% of your budget; leave a 10-15% margin.

For Developers

MCP Tools

The following tools are available via MCP to query and verify quotas:

Tool	Description	Main parameters
`check_quota`	Checks if quota is available for an operation	`organizationId`, `provider`, `estimatedTokens`
`get_quota_usage`	Gets consumption details by provider and period	`organizationId`, `provider`, `periodType`

Example: Checking quota before an operation

Tool: check_quota
Parameters:
  organizationId: "organization-uuid"
  provider: "openai"
  estimatedTokens: 10000

Response when quota is available:

{
  "available": true,
  "provider": "openai",
  "remainingTokens": 45000,
  "remainingCostUsd": 12.50,
  "remainingRequests": 150,
  "periodEnd": "2024-02-01T00:00:00Z"
}

Response when quota is not available:

{
  "available": false,
  "provider": "openai",
  "remainingTokens": 0,
  "reason": "exceeded",
  "periodEnd": "2024-02-01T00:00:00Z"
}

Example: Querying detailed usage

Tool: get_quota_usage
Parameters:
  organizationId: "organization-uuid"
  provider: "openai"
  periodType: "monthly"

Response:

{
  "provider": "openai",
  "periodType": "monthly",
  "maxTokens": 500000,
  "maxCostUsd": 50.00,
  "maxRequests": 1000,
  "usedTokens": 125000,
  "usedCostUsd": 12.50,
  "usedRequests": 250,
  "percentTokens": 25,
  "percentCost": 25,
  "percentRequests": 25,
  "periodStart": "2024-01-01T00:00:00Z",
  "periodEnd": "2024-02-01T00:00:00Z"
}

Data model

For technical reference, these are the main types in the quota system:

QuotaConfig

Field	Type	Description
`id`	UUID	Unique configuration identifier
`provider`	string	AI provider (openai, anthropic)
`quotaType`	QuotaType	Period type (daily, weekly, monthly)
`maxTokens`	number	Token limit
`maxCostUsd`	number	Cost limit in USD
`maxRequests`	number	Request limit
`isActive`	boolean	Whether the quota is active

UsageSummaryItem

Field	Type	Description
`provider`	string	AI provider
`periodType`	QuotaType	Period type
`maxTokens`	number	Configured limit
`usedTokens`	number	Tokens consumed
`percentTokens`	number	Usage percentage (0-100)
`maxCostUsd`	number	Cost limit
`usedCostUsd`	number	Cost consumed
`percentCost`	number	Cost percentage
`maxRequests`	number	Request limit
`usedRequests`	number	Requests made
`percentRequests`	number	Request percentage
`periodEnd`	DateTime	Current period end

QuotaAlert

Field	Type	Description
`id`	UUID	Alert identifier
`providerQuotaId`	UUID	Reference to QuotaConfig
`alertType`	AlertType	Alert type (warning_75, warning_80, warning_90, exceeded)
`periodStart`	DateTime	Alert period start
`message`	string	Descriptive message
`acknowledgedAt`	DateTime	Acknowledgment date (null if not acknowledged)

What are quotas​

Period types​

Configuring quotas​

Per-provider configuration​

Alert system​

Receiving alerts​

Acknowledging alerts​

Viewing current usage​

Auto-resume​

Blocked operation behavior​

Best practices​

MCP Tools​

Example: Checking quota before an operation​

Example: Querying detailed usage​

Data model​

QuotaConfig​

UsageSummaryItem​

QuotaAlert​