Quota Management
AI costs can spiral quickly without control. Almirant's quota system gives you the guardrails you need to keep spending under control without having to micromanage every execution.
What are quotas
Quotas are configurable limits that define how much your organization can consume in terms of:
| Metric | Description |
|---|---|
| Tokens | Maximum number of tokens that can be processed |
| Cost USD | Maximum spending in US dollars |
| Requests | Maximum number of API calls to the provider |
You can configure independent quotas for each AI provider (OpenAI, Anthropic) and define different limits based on the time period.
Period types
Quotas are configured by period, allowing you to set limits that fit your budget:
| Type | Description | Use case |
|---|---|---|
| Daily | Resets every 24 hours at midnight (UTC) | Granular control for teams with intensive usage |
| Weekly | Resets every Monday at midnight (UTC) | Balance between flexibility and control |
| Monthly | Resets on the first day of each month | Alignment with billing cycles |
We recommend starting with monthly quotas aligned to your AI budget, and adding daily quotas if you detect consumption spikes that affect availability for the entire team.
Configuring quotas
To configure your organization's quotas:
- Go to Settings > Quota Management.
- Select the provider you want to configure.
- Define the limits for each period type:
- Max tokens: Token limit per period
- Max cost USD: Spending limit in dollars
- Max requests: API call limit
- Enable or disable the quota with the Active toggle.
- Save changes.
Per-provider configuration
Each provider can have independent configurations. This is useful when:
- You have different budgets allocated to each provider
- You want to limit one provider more strictly while testing another
- You need to control costs for premium models (like o1) separately
Example configuration:
OpenAI:
- Monthly: 500,000 tokens / $50 USD / 1,000 requests
- Daily: 50,000 tokens / $10 USD / 200 requests
Anthropic:
- Monthly: 300,000 tokens / $30 USD / 500 requests
Alert system
Almirant proactively notifies you when your consumption approaches the configured limits. Alerts are triggered at the following thresholds:
| Alert type | Threshold | Recommended action |
|---|---|---|
| warning_75 | 75% of limit | Monitor consumption closely |
| warning_80 | 80% of limit | Consider reducing non-critical operations |
| warning_90 | 90% of limit | Prepare quota expansion if needed |
| exceeded | 100% of limit | Quota exhausted, new operations blocked |
Receiving alerts
Alerts are sent to:
- Organization administrators: Receive all alerts via email
- Settings panel: Active alerts appear in the quotas section
- Dashboard: Visual indicator when there are pending alerts
Acknowledging alerts
You can acknowledge an alert to indicate you have taken action:
- Go to Settings > Quota Management.
- In the Active alerts section, click on the alert.
- Select Acknowledge to mark it as addressed.
Acknowledged alerts are not shown again for the same period, but a new alert will be generated if the next threshold is reached.
Viewing current usage
The quota management page shows a summary of current consumption by provider and period:
| Field | Description |
|---|---|
| Provider | OpenAI or Anthropic |
| Period type | Daily, weekly, or monthly |
| Tokens used / max | Current consumption vs limit |
| Cost used / max | Current spending vs limit |
| Requests used / max | Current calls vs limit |
| Percentage | Visual indicator of consumption |
| Period end | When the quota resets |
The percentage shown corresponds to the highest metric among tokens, cost, and requests. This ensures you see the most restrictive indicator.
Auto-resume
When a period ends, quotas reset automatically:
- Counter reset: Token, cost, and request counters return to zero.
- Operation unblocking: Blocked operations can resume.
- Alert cleanup: Alerts from the previous period are archived.
No manual action is required for the quota to renew.
Blocked operation behavior
When the quota runs out:
- AI Planning: New conversations show a quota exhausted message
- AI Agents: Jobs remain in
pendingstate and are processed automatically when quota becomes available - Running jobs: Not interrupted, but cannot initiate new calls to the provider
When the quota renews, pending jobs begin processing in FIFO order (first in, first out).
Best practices
- Configure early alerts -- The 75% threshold gives you time to react before running out of quota.
- Use daily quotas for granular control -- If your team consumes a lot in one day, a daily quota prevents leaving the rest of the week without service.
- Align monthly quotas with billing -- Configure limits that match your monthly AI budget.
- Review breakdown by project -- Identify high-consumption projects to optimize or adjust expectations.
- Reserve margin for emergencies -- Don't configure quotas at 100% of your budget; leave a 10-15% margin.
MCP Tools
The following tools are available via MCP to query and verify quotas:
| Tool | Description | Main parameters |
|---|---|---|
check_quota | Checks if quota is available for an operation | organizationId, provider, estimatedTokens |
get_quota_usage | Gets consumption details by provider and period | organizationId, provider, periodType |
Example: Checking quota before an operation
Tool: check_quota
Parameters:
organizationId: "organization-uuid"
provider: "openai"
estimatedTokens: 10000
Response when quota is available:
{
"available": true,
"provider": "openai",
"remainingTokens": 45000,
"remainingCostUsd": 12.50,
"remainingRequests": 150,
"periodEnd": "2024-02-01T00:00:00Z"
}
Response when quota is not available:
{
"available": false,
"provider": "openai",
"remainingTokens": 0,
"reason": "exceeded",
"periodEnd": "2024-02-01T00:00:00Z"
}
Example: Querying detailed usage
Tool: get_quota_usage
Parameters:
organizationId: "organization-uuid"
provider: "openai"
periodType: "monthly"
Response:
{
"provider": "openai",
"periodType": "monthly",
"maxTokens": 500000,
"maxCostUsd": 50.00,
"maxRequests": 1000,
"usedTokens": 125000,
"usedCostUsd": 12.50,
"usedRequests": 250,
"percentTokens": 25,
"percentCost": 25,
"percentRequests": 25,
"periodStart": "2024-01-01T00:00:00Z",
"periodEnd": "2024-02-01T00:00:00Z"
}
Data model
For technical reference, these are the main types in the quota system:
QuotaConfig
| Field | Type | Description |
|---|---|---|
id | UUID | Unique configuration identifier |
provider | string | AI provider (openai, anthropic) |
quotaType | QuotaType | Period type (daily, weekly, monthly) |
maxTokens | number | Token limit |
maxCostUsd | number | Cost limit in USD |
maxRequests | number | Request limit |
isActive | boolean | Whether the quota is active |
UsageSummaryItem
| Field | Type | Description |
|---|---|---|
provider | string | AI provider |
periodType | QuotaType | Period type |
maxTokens | number | Configured limit |
usedTokens | number | Tokens consumed |
percentTokens | number | Usage percentage (0-100) |
maxCostUsd | number | Cost limit |
usedCostUsd | number | Cost consumed |
percentCost | number | Cost percentage |
maxRequests | number | Request limit |
usedRequests | number | Requests made |
percentRequests | number | Request percentage |
periodEnd | DateTime | Current period end |
QuotaAlert
| Field | Type | Description |
|---|---|---|
id | UUID | Alert identifier |
providerQuotaId | UUID | Reference to QuotaConfig |
alertType | AlertType | Alert type (warning_75, warning_80, warning_90, exceeded) |
periodStart | DateTime | Alert period start |
message | string | Descriptive message |
acknowledgedAt | DateTime | Acknowledgment date (null if not acknowledged) |