Skip to main content

Quota Management

AI costs can spiral quickly without control. Almirant's quota system gives you the guardrails you need to keep spending under control without having to micromanage every execution.

What are quotas

Quotas are configurable limits that define how much your organization can consume in terms of:

MetricDescription
TokensMaximum number of tokens that can be processed
Cost USDMaximum spending in US dollars
RequestsMaximum number of API calls to the provider

You can configure independent quotas for each AI provider (OpenAI, Anthropic) and define different limits based on the time period.

Period types

Quotas are configured by period, allowing you to set limits that fit your budget:

TypeDescriptionUse case
DailyResets every 24 hours at midnight (UTC)Granular control for teams with intensive usage
WeeklyResets every Monday at midnight (UTC)Balance between flexibility and control
MonthlyResets on the first day of each monthAlignment with billing cycles
tip

We recommend starting with monthly quotas aligned to your AI budget, and adding daily quotas if you detect consumption spikes that affect availability for the entire team.

Configuring quotas

To configure your organization's quotas:

  1. Go to Settings > Quota Management.
  2. Select the provider you want to configure.
  3. Define the limits for each period type:
    • Max tokens: Token limit per period
    • Max cost USD: Spending limit in dollars
    • Max requests: API call limit
  4. Enable or disable the quota with the Active toggle.
  5. Save changes.

Per-provider configuration

Each provider can have independent configurations. This is useful when:

  • You have different budgets allocated to each provider
  • You want to limit one provider more strictly while testing another
  • You need to control costs for premium models (like o1) separately
Example configuration:

OpenAI:
- Monthly: 500,000 tokens / $50 USD / 1,000 requests
- Daily: 50,000 tokens / $10 USD / 200 requests

Anthropic:
- Monthly: 300,000 tokens / $30 USD / 500 requests

Alert system

Almirant proactively notifies you when your consumption approaches the configured limits. Alerts are triggered at the following thresholds:

Alert typeThresholdRecommended action
warning_7575% of limitMonitor consumption closely
warning_8080% of limitConsider reducing non-critical operations
warning_9090% of limitPrepare quota expansion if needed
exceeded100% of limitQuota exhausted, new operations blocked

Receiving alerts

Alerts are sent to:

  • Organization administrators: Receive all alerts via email
  • Settings panel: Active alerts appear in the quotas section
  • Dashboard: Visual indicator when there are pending alerts

Acknowledging alerts

You can acknowledge an alert to indicate you have taken action:

  1. Go to Settings > Quota Management.
  2. In the Active alerts section, click on the alert.
  3. Select Acknowledge to mark it as addressed.

Acknowledged alerts are not shown again for the same period, but a new alert will be generated if the next threshold is reached.

Viewing current usage

The quota management page shows a summary of current consumption by provider and period:

FieldDescription
ProviderOpenAI or Anthropic
Period typeDaily, weekly, or monthly
Tokens used / maxCurrent consumption vs limit
Cost used / maxCurrent spending vs limit
Requests used / maxCurrent calls vs limit
PercentageVisual indicator of consumption
Period endWhen the quota resets
info

The percentage shown corresponds to the highest metric among tokens, cost, and requests. This ensures you see the most restrictive indicator.

Auto-resume

When a period ends, quotas reset automatically:

  1. Counter reset: Token, cost, and request counters return to zero.
  2. Operation unblocking: Blocked operations can resume.
  3. Alert cleanup: Alerts from the previous period are archived.

No manual action is required for the quota to renew.

Blocked operation behavior

When the quota runs out:

  • AI Planning: New conversations show a quota exhausted message
  • AI Agents: Jobs remain in pending state and are processed automatically when quota becomes available
  • Running jobs: Not interrupted, but cannot initiate new calls to the provider

When the quota renews, pending jobs begin processing in FIFO order (first in, first out).

Best practices

  • Configure early alerts -- The 75% threshold gives you time to react before running out of quota.
  • Use daily quotas for granular control -- If your team consumes a lot in one day, a daily quota prevents leaving the rest of the week without service.
  • Align monthly quotas with billing -- Configure limits that match your monthly AI budget.
  • Review breakdown by project -- Identify high-consumption projects to optimize or adjust expectations.
  • Reserve margin for emergencies -- Don't configure quotas at 100% of your budget; leave a 10-15% margin.
For Developers

MCP Tools

The following tools are available via MCP to query and verify quotas:

ToolDescriptionMain parameters
check_quotaChecks if quota is available for an operationorganizationId, provider, estimatedTokens
get_quota_usageGets consumption details by provider and periodorganizationId, provider, periodType

Example: Checking quota before an operation

Tool: check_quota
Parameters:
organizationId: "organization-uuid"
provider: "openai"
estimatedTokens: 10000

Response when quota is available:

{
"available": true,
"provider": "openai",
"remainingTokens": 45000,
"remainingCostUsd": 12.50,
"remainingRequests": 150,
"periodEnd": "2024-02-01T00:00:00Z"
}

Response when quota is not available:

{
"available": false,
"provider": "openai",
"remainingTokens": 0,
"reason": "exceeded",
"periodEnd": "2024-02-01T00:00:00Z"
}

Example: Querying detailed usage

Tool: get_quota_usage
Parameters:
organizationId: "organization-uuid"
provider: "openai"
periodType: "monthly"

Response:

{
"provider": "openai",
"periodType": "monthly",
"maxTokens": 500000,
"maxCostUsd": 50.00,
"maxRequests": 1000,
"usedTokens": 125000,
"usedCostUsd": 12.50,
"usedRequests": 250,
"percentTokens": 25,
"percentCost": 25,
"percentRequests": 25,
"periodStart": "2024-01-01T00:00:00Z",
"periodEnd": "2024-02-01T00:00:00Z"
}

Data model

For technical reference, these are the main types in the quota system:

QuotaConfig

FieldTypeDescription
idUUIDUnique configuration identifier
providerstringAI provider (openai, anthropic)
quotaTypeQuotaTypePeriod type (daily, weekly, monthly)
maxTokensnumberToken limit
maxCostUsdnumberCost limit in USD
maxRequestsnumberRequest limit
isActivebooleanWhether the quota is active

UsageSummaryItem

FieldTypeDescription
providerstringAI provider
periodTypeQuotaTypePeriod type
maxTokensnumberConfigured limit
usedTokensnumberTokens consumed
percentTokensnumberUsage percentage (0-100)
maxCostUsdnumberCost limit
usedCostUsdnumberCost consumed
percentCostnumberCost percentage
maxRequestsnumberRequest limit
usedRequestsnumberRequests made
percentRequestsnumberRequest percentage
periodEndDateTimeCurrent period end

QuotaAlert

FieldTypeDescription
idUUIDAlert identifier
providerQuotaIdUUIDReference to QuotaConfig
alertTypeAlertTypeAlert type (warning_75, warning_80, warning_90, exceeded)
periodStartDateTimeAlert period start
messagestringDescriptive message
acknowledgedAtDateTimeAcknowledgment date (null if not acknowledged)