Chapter 7: API Pricing
Overview
The Polysystems Backend API uses a transparent, usage-based pricing model. This chapter explains how pricing works, how costs are calculated, and strategies for optimizing your API spend.
Pricing Model
Pay-Per-Use
The API operates on a pay-per-use model:
- No Subscription Fees: Only pay for what you use
- No Minimum Commitment: Start with any amount
- Transparent Pricing: Know the cost before making requests
- Real-Time Deduction: Credits deducted immediately upon request
- Detailed Tracking: Every transaction logged and itemized
Pricing Components
Total Request Cost = Base Price + Token-Based Price
Where:
├── Base Price: Fixed cost per API call
└── Token-Based Price: Variable cost based on tokens processed
└── (Only for LLM/generation endpoints)Price Structure
Fixed-Price Endpoints
Some endpoints have a flat cost regardless of usage:
| Endpoint Category | Price per Request | Description |
|---|---|---|
| Health Checks | $0.0000 | Free status checks |
| Memory Storage | $0.0001 | Store/retrieve memory |
| Search Indexing | $0.0002 | Index content for search |
| Document Upload | $0.0005 | Upload document |
Token-Based Pricing
LLM and generation endpoints charge based on tokens processed:
Cost = Base Price + (Tokens × Price per Token)
Example:
├── Base Price: $0.0020
├── Tokens Used: 1,000
├── Price per Token: $0.00001
└── Total Cost: $0.0020 + (1,000 × $0.00001) = $0.0120Token-Based Endpoints
| Endpoint | Base Price | Price per Token | Typical Cost Range |
|---|---|---|---|
/api/hub/agents/chat | $0.0020 | $0.00001 | 0.0500 |
/api/hub/agents/generate | $0.0020 | $0.00001 | 0.0500 |
/api/omm/generate | $0.0030 | $0.00002 | 0.1000 |
/api/yggdrasil/process | $0.0040 | $0.00002 | 0.1500 |
/v1/chat/completions | $0.0020 | $0.00001 | 0.0500 |
RAG (Retrieval Augmented Generation)
RAG endpoints combine retrieval and generation costs:
| Endpoint | Price | Description |
|---|---|---|
/api/hub/rag/index | $0.0005 | Index document for retrieval |
/api/hub/rag/retrieve | $0.0010 | Retrieve relevant documents |
/api/hub/rag/generate | $0.0030 + tokens | Generate with retrieved context |
Domain-Specific Services
Specialized domain services have different pricing:
| Domain | Endpoint | Base Price | Token Price |
|---|---|---|---|
| Legal | /api/legal/analyze | $0.0050 | $0.00003 |
| Legal | /api/legal/compare | $0.0100 | $0.00003 |
| Professional | /api/professional/analyze | $0.0040 | $0.00002 |
| Scholar | /api/scholar/research | $0.0060 | $0.00003 |
| Creative | /api/creative/generate | $0.0030 | $0.00002 |
| Documents | /api/docs/generate | $0.0035 | $0.00002 |
Viewing Current Pricing
Get All Active Pricing
curl -X GET https://api.polysystems.ai/api/pricing \
-H "X-API-Key: YOUR_ACCESS_TOKEN"Response:
{
"pricing": [
{
"id": "price-123",
"route": "/api/hub/agents/chat",
"method": "POST",
"price_per_request": 0.0020,
"price_per_token": 0.00001,
"currency": "USD",
"is_active": true,
"description": "Chat completion with agents"
},
{
"id": "price-456",
"route": "/api/hub/memory",
"method": "POST",
"price_per_request": 0.0001,
"price_per_token": null,
"currency": "USD",
"is_active": true,
"description": "Store memory entry"
}
]
}Understanding Pricing Response
- route: API endpoint path
- method: HTTP method (POST, GET, etc.)
- price_per_request: Fixed base cost
- price_per_token: Cost per token (null if not applicable)
- currency: Always USD
- is_active: Whether pricing is currently in effect
- description: Human-readable description
Cost Calculation Examples
Example 1: Simple Fixed-Price Request
# Memory storage request
curl -X POST https://api.polysystems.ai/api/hub/memory \
-H "X-API-Key: YOUR_ACCESS_TOKEN" \
-d '{"key": "user_pref", "value": "dark_mode"}'
# Cost: $0.0001 (fixed price, no token cost)Example 2: Chat Completion with Tokens
# Chat request
curl -X POST https://api.polysystems.ai/api/hub/agents/chat \
-H "X-API-Key: YOUR_ACCESS_TOKEN" \
-d '{
"messages": [
{"role": "user", "content": "Explain quantum computing in 100 words"}
]
}'
# Calculation:
# Base Price: $0.0020
# Tokens Used: 150 (prompt + completion)
# Token Cost: 150 × $0.00001 = $0.0015
# Total Cost: $0.0020 + $0.0015 = $0.0035Example 3: Long Conversation
# Extended chat with context
curl -X POST https://api.polysystems.ai/api/hub/agents/chat \
-H "X-API-Key: YOUR_ACCESS_TOKEN" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Previous question..."},
{"role": "assistant", "content": "Previous response..."},
{"role": "user", "content": "Follow-up question..."}
],
"max_tokens": 500
}'
# Calculation:
# Base Price: $0.0020
# Tokens Used: 2,500 (large context + response)
# Token Cost: 2,500 × $0.00001 = $0.0250
# Total Cost: $0.0020 + $0.0250 = $0.0270Example 4: RAG Workflow
# Step 1: Index document
curl -X POST https://api.polysystems.ai/api/hub/rag/index \
-H "X-API-Key: YOUR_ACCESS_TOKEN" \
-d '{"document": "content..."}'
# Cost: $0.0005
# Step 2: Retrieve relevant passages
curl -X POST https://api.polysystems.ai/api/hub/rag/retrieve \
-H "X-API-Key: YOUR_ACCESS_TOKEN" \
-d '{"query": "search query"}'
# Cost: $0.0010
# Step 3: Generate with context
curl -X POST https://api.polysystems.ai/api/hub/rag/generate \
-H "X-API-Key: YOUR_ACCESS_TOKEN" \
-d '{"query": "question", "context": "retrieved..."}'
# Cost: $0.0030 + (1,200 tokens × $0.00001) = $0.0150
# Total Workflow Cost: $0.0005 + $0.0010 + $0.0150 = $0.0165Response Headers
Every API response includes cost information in headers:
HTTP/1.1 200 OK
X-Request-Cost: 0.0035
X-Tokens-Used: 150
X-Balance-Remaining: 47.2915Cost Headers
- X-Request-Cost: Total cost of this request in USD
- X-Tokens-Used: Number of tokens processed (if applicable)
- X-Balance-Remaining: Your remaining account balance
Cost Optimization Strategies
1. Minimize Token Usage
Optimize Prompts
# ❌ Inefficient: Verbose prompt
{
"messages": [{
"role": "user",
"content": "I would like to kindly ask you to please help me understand the concept of machine learning. Could you please explain it to me in great detail with many examples and use cases? Thank you very much for your help."
}]
}
# Tokens: ~50, Cost: $0.0020 + $0.0005 = $0.0025
# ✅ Efficient: Concise prompt
{
"messages": [{
"role": "user",
"content": "Explain machine learning with examples."
}]
}
# Tokens: ~10, Cost: $0.0020 + $0.0001 = $0.0021
# Savings: $0.0004 per requestLimit Response Length
# Control max_tokens to limit cost
curl -X POST https://api.polysystems.ai/api/hub/agents/chat \
-H "X-API-Key: YOUR_ACCESS_TOKEN" \
-d '{
"messages": [{"role": "user", "content": "Explain AI"}],
"max_tokens": 100
}'
# Caps token cost at: 100 × $0.00001 = $0.00102. Use Appropriate Endpoints
Choose Simpler Endpoints When Possible
# ❌ Expensive: Using premium endpoint for simple task
POST /api/yggdrasil/process
# Cost: $0.0040 base + tokens
# ✅ Cost-effective: Using standard endpoint
POST /api/hub/agents/chat
# Cost: $0.0020 base + tokens
# Savings: $0.0020 per request3. Cache Results
Cache Repeated Queries
from functools import lru_cache
import requests
@lru_cache(maxsize=1000)
def cached_api_call(prompt):
"""Cache API responses to avoid duplicate calls"""
response = requests.post(
'https://api.polysystems.ai/api/hub/agents/chat',
headers={'X-API-Key': os.getenv('PS_API_KEY')},
json={'messages': [{'role': 'user', 'content': prompt}]}
)
return response.json()
# First call: Makes API request, costs $0.0035
result1 = cached_api_call("What is AI?")
# Second call: Returns cached result, costs $0.0000
result2 = cached_api_call("What is AI?")
# Savings: $0.0035 per cached hit4. Batch Operations
Combine Multiple Requests
# ❌ Multiple individual requests
# 10 requests × $0.0020 = $0.0200
# ✅ Single batch request
curl -X POST https://api.polysystems.ai/api/hub/batch \
-H "X-API-Key: YOUR_ACCESS_TOKEN" \
-d '{
"requests": [
{"prompt": "Question 1"},
{"prompt": "Question 2"},
...
]
}'
# Cost: $0.0050 (reduced base price for batch)
# Savings: $0.01505. Use Streaming for Long Responses
Stream Responses to Stop Early
// Stop streaming when you have enough data
const response = await fetch('https://api.polysystems.ai/api/hub/stream', {
method: 'POST',
headers: {'X-API-Key': apiKey},
body: JSON.stringify({messages: [{role: 'user', content: 'Explain...'}]})
});
const reader = response.body.getReader();
let text = '';
while (true) {
const {done, value} = await reader.read();
if (done) break;
text += new TextDecoder().decode(value);
// Stop when you have enough
if (text.length > 500) {
reader.cancel();
break; // Saves token costs for ungenerated tokens
}
}6. Implement Request Deduplication
Prevent Duplicate Requests
import hashlib
import time
class RequestDeduplicator:
def __init__(self, window_seconds=60):
self.cache = {}
self.window = window_seconds
def get_hash(self, request_data):
"""Generate hash of request"""
content = json.dumps(request_data, sort_keys=True)
return hashlib.sha256(content.encode()).hexdigest()
def should_skip(self, request_data):
"""Check if request is duplicate"""
req_hash = self.get_hash(request_data)
now = time.time()
if req_hash in self.cache:
last_time = self.cache[req_hash]
if now - last_time < self.window:
return True # Skip duplicate
self.cache[req_hash] = now
return False
# Usage
dedup = RequestDeduplicator(window_seconds=300) # 5 minute window
request_data = {'messages': [{'role': 'user', 'content': 'Hello'}]}
if not dedup.should_skip(request_data):
response = make_api_call(request_data)
else:
print("Skipping duplicate request - saved $0.0035")Cost Monitoring
Track Daily Spending
import requests
from datetime import datetime, timedelta
def get_daily_cost():
"""Get cost for today"""
today = datetime.utcnow().date()
response = requests.get(
'https://api.polysystems.ai/api/payments/transactions/stats',
params={
'start_date': today.isoformat(),
'end_date': today.isoformat()
},
headers={'Authorization': f'Bearer {jwt_token}'}
)
stats = response.json()
return stats['summary']['total_credits_spent']
cost = get_daily_cost()
print(f"Today's cost: ${cost:.4f}")Set Cost Alerts
def check_cost_threshold(threshold=10.0):
"""Alert if daily cost exceeds threshold"""
cost = get_daily_cost()
if cost > threshold:
send_alert(f"Daily cost ${cost:.2f} exceeds threshold ${threshold:.2f}")
return True
return False
# Run hourly
check_cost_threshold(threshold=10.0)Cost Attribution by Feature
def analyze_costs_by_endpoint():
"""Break down costs by endpoint"""
response = requests.get(
'https://api.polysystems.ai/api/payments/transactions/stats',
params={
'start_date': '2024-01-01',
'end_date': '2024-01-31'
},
headers={'Authorization': f'Bearer {jwt_token}'}
)
stats = response.json()
print("Cost Breakdown by Endpoint:")
for route in stats['top_routes']:
print(f" {route['route']}: ${route['total_cost']:.4f} ({route['count']} requests)")
# Output:
# Cost Breakdown by Endpoint:
# /api/hub/agents/chat: $24.6800 (1,234 requests)
# /api/hub/memory: $0.0567 (567 requests)
# /api/hub/rag/generate: $8.9100 (234 requests)Pricing Calculator
Interactive Cost Estimator
class PricingCalculator:
def __init__(self):
self.prices = {
'chat': {'base': 0.0020, 'per_token': 0.00001},
'memory': {'base': 0.0001, 'per_token': None},
'rag_generate': {'base': 0.0030, 'per_token': 0.00001},
'legal_analyze': {'base': 0.0050, 'per_token': 0.00003}
}
def calculate(self, endpoint, requests, avg_tokens=0):
"""Calculate estimated cost"""
pricing = self.prices.get(endpoint)
if not pricing:
return None
base_cost = pricing['base'] * requests
if pricing['per_token'] and avg_tokens > 0:
token_cost = pricing['per_token'] * avg_tokens * requests
else:
token_cost = 0
total = base_cost + token_cost
return {
'endpoint': endpoint,
'requests': requests,
'avg_tokens': avg_tokens,
'base_cost': base_cost,
'token_cost': token_cost,
'total_cost': total,
'per_request': total / requests
}
def estimate_monthly(self, endpoint, requests_per_day, avg_tokens=0):
"""Estimate monthly cost"""
monthly_requests = requests_per_day * 30
result = self.calculate(endpoint, monthly_requests, avg_tokens)
result['daily_cost'] = result['total_cost'] / 30
return result
# Usage
calc = PricingCalculator()
# Estimate chat endpoint
chat_estimate = calc.estimate_monthly('chat', requests_per_day=1000, avg_tokens=500)
print(f"Monthly chat cost: ${chat_estimate['total_cost']:.2f}")
print(f"Daily average: ${chat_estimate['daily_cost']:.2f}")
# Output:
# Monthly chat cost: $210.00
# Daily average: $7.00Cost Projection
def project_costs(current_daily_spend, growth_rate=0.1):
"""Project costs with growth"""
projections = []
for month in range(1, 13):
monthly_spend = current_daily_spend * 30 * (1 + growth_rate) ** month
projections.append({
'month': month,
'estimated_cost': monthly_spend
})
return projections
# Example: Currently spending $5/day with 10% monthly growth
projections = project_costs(current_daily_spend=5.0, growth_rate=0.10)
for p in projections[:6]: # First 6 months
print(f"Month {p['month']}: ${p['estimated_cost']:.2f}")
# Output:
# Month 1: $165.00
# Month 2: $181.50
# Month 3: $199.65
# Month 4: $219.62
# Month 5: $241.58
# Month 6: $265.73Volume Discounts
Enterprise Pricing
For high-volume usage, contact sales for custom pricing:
| Monthly Volume | Standard Rate | Enterprise Rate | Savings |
|---|---|---|---|
| 500 | Standard | Standard | 0% |
| 2,000 | Standard | -10% | 10% |
| 10,000 | Standard | -15% | 15% |
| $10,000+ | Standard | -20%+ | 20%+ |
Contact: enterprise@polysystems.ai
Pricing FAQs
When are credits deducted?
Credits are deducted immediately when the API request is processed, before the response is returned.
What happens if I run out of credits mid-request?
The system checks balance before processing. If insufficient, the request is rejected with a 402 Payment Required error.
Are failed requests charged?
No. Only successful requests (HTTP 2xx) are charged. Failed requests (4xx, 5xx) are not charged.
How are tokens counted?
Tokens are counted using the same tokenization as the underlying model (typically GPT tokenizer). Both input and output tokens are counted.
Can I get a refund for unused credits?
No. Credits are prepaid and non-refundable. They never expire, so you can use them at any time.
Do prices ever change?
Prices may be adjusted with 30 days notice. Current pricing is always available via the API.
Summary
In this chapter, you learned:
- ✅ How the pay-per-use pricing model works
- ✅ Fixed-price vs token-based pricing
- ✅ Pricing for different endpoint categories
- ✅ How to calculate costs for your use case
- ✅ Cost optimization strategies
- ✅ Cost monitoring and tracking
- ✅ Using the pricing calculator
- ✅ Volume discounts and enterprise options
Next Steps
- Chapter 6: Spending Limits - Control costs with limits
- Chapter 8: Webhooks - Get notified of events
- Chapter 12: Best Practices - Optimize your integration