Chapter 7: API Pricing

Overview

The Polysystems Backend API uses a transparent, usage-based pricing model. This chapter explains how pricing works, how costs are calculated, and strategies for optimizing your API spend.

Pricing Model

Pay-Per-Use

The API operates on a pay-per-use model:

No Subscription Fees: Only pay for what you use
No Minimum Commitment: Start with any amount
Transparent Pricing: Know the cost before making requests
Real-Time Deduction: Credits deducted immediately upon request
Detailed Tracking: Every transaction logged and itemized

Pricing Components

Total Request Cost = Base Price + Token-Based Price

Where:
├── Base Price: Fixed cost per API call
└── Token-Based Price: Variable cost based on tokens processed
    └── (Only for LLM/generation endpoints)

Price Structure

Fixed-Price Endpoints

Some endpoints have a flat cost regardless of usage:

Endpoint Category	Price per Request	Description
Health Checks	$0.0000	Free status checks
Memory Storage	$0.0001	Store/retrieve memory
Search Indexing	$0.0002	Index content for search
Document Upload	$0.0005	Upload document

Token-Based Pricing

LLM and generation endpoints charge based on tokens processed:

Cost = Base Price + (Tokens × Price per Token)

Example:
├── Base Price: $0.0020
├── Tokens Used: 1,000
├── Price per Token: $0.00001
└── Total Cost: $0.0020 + (1,000 × $0.00001) = $0.0120

Token-Based Endpoints

Endpoint	Base Price	Price per Token	Typical Cost Range
`/api/hub/agents/chat`	$0.0020	$0.00001	$0.0020 -$ 0.0500
`/api/hub/agents/generate`	$0.0020	$0.00001	$0.0020 -$ 0.0500
`/api/omm/generate`	$0.0030	$0.00002	$0.0030 -$ 0.1000
`/api/yggdrasil/process`	$0.0040	$0.00002	$0.0040 -$ 0.1500
`/v1/chat/completions`	$0.0020	$0.00001	$0.0020 -$ 0.0500

RAG (Retrieval Augmented Generation)

RAG endpoints combine retrieval and generation costs:

Endpoint	Price	Description
`/api/hub/rag/index`	$0.0005	Index document for retrieval
`/api/hub/rag/retrieve`	$0.0010	Retrieve relevant documents
`/api/hub/rag/generate`	$0.0030 + tokens	Generate with retrieved context

Domain-Specific Services

Specialized domain services have different pricing:

Domain	Endpoint	Base Price	Token Price
Legal	`/api/legal/analyze`	$0.0050	$0.00003
Legal	`/api/legal/compare`	$0.0100	$0.00003
Professional	`/api/professional/analyze`	$0.0040	$0.00002
Scholar	`/api/scholar/research`	$0.0060	$0.00003
Creative	`/api/creative/generate`	$0.0030	$0.00002
Documents	`/api/docs/generate`	$0.0035	$0.00002

Viewing Current Pricing

Get All Active Pricing

curl -X GET https://api.polysystems.ai/api/pricing \
  -H "X-API-Key: YOUR_ACCESS_TOKEN"

Response:

{
  "pricing": [
    {
      "id": "price-123",
      "route": "/api/hub/agents/chat",
      "method": "POST",
      "price_per_request": 0.0020,
      "price_per_token": 0.00001,
      "currency": "USD",
      "is_active": true,
      "description": "Chat completion with agents"
    },
    {
      "id": "price-456",
      "route": "/api/hub/memory",
      "method": "POST",
      "price_per_request": 0.0001,
      "price_per_token": null,
      "currency": "USD",
      "is_active": true,
      "description": "Store memory entry"
    }
  ]
}

Understanding Pricing Response

route: API endpoint path
method: HTTP method (POST, GET, etc.)
price_per_request: Fixed base cost
price_per_token: Cost per token (null if not applicable)
currency: Always USD
is_active: Whether pricing is currently in effect
description: Human-readable description

Cost Calculation Examples

Example 1: Simple Fixed-Price Request

# Memory storage request
curl -X POST https://api.polysystems.ai/api/hub/memory \
  -H "X-API-Key: YOUR_ACCESS_TOKEN" \
  -d '{"key": "user_pref", "value": "dark_mode"}'
 
# Cost: $0.0001 (fixed price, no token cost)

Example 2: Chat Completion with Tokens

# Chat request
curl -X POST https://api.polysystems.ai/api/hub/agents/chat \
  -H "X-API-Key: YOUR_ACCESS_TOKEN" \
  -d '{
    "messages": [
      {"role": "user", "content": "Explain quantum computing in 100 words"}
    ]
  }'
 
# Calculation:
# Base Price: $0.0020
# Tokens Used: 150 (prompt + completion)
# Token Cost: 150 × $0.00001 = $0.0015
# Total Cost: $0.0020 + $0.0015 = $0.0035

Example 3: Long Conversation

# Extended chat with context
curl -X POST https://api.polysystems.ai/api/hub/agents/chat \
  -H "X-API-Key: YOUR_ACCESS_TOKEN" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant..."},
      {"role": "user", "content": "Previous question..."},
      {"role": "assistant", "content": "Previous response..."},
      {"role": "user", "content": "Follow-up question..."}
    ],
    "max_tokens": 500
  }'
 
# Calculation:
# Base Price: $0.0020
# Tokens Used: 2,500 (large context + response)
# Token Cost: 2,500 × $0.00001 = $0.0250
# Total Cost: $0.0020 + $0.0250 = $0.0270

Example 4: RAG Workflow

# Step 1: Index document
curl -X POST https://api.polysystems.ai/api/hub/rag/index \
  -H "X-API-Key: YOUR_ACCESS_TOKEN" \
  -d '{"document": "content..."}'
# Cost: $0.0005
 
# Step 2: Retrieve relevant passages
curl -X POST https://api.polysystems.ai/api/hub/rag/retrieve \
  -H "X-API-Key: YOUR_ACCESS_TOKEN" \
  -d '{"query": "search query"}'
# Cost: $0.0010
 
# Step 3: Generate with context
curl -X POST https://api.polysystems.ai/api/hub/rag/generate \
  -H "X-API-Key: YOUR_ACCESS_TOKEN" \
  -d '{"query": "question", "context": "retrieved..."}'
# Cost: $0.0030 + (1,200 tokens × $0.00001) = $0.0150
 
# Total Workflow Cost: $0.0005 + $0.0010 + $0.0150 = $0.0165

Response Headers

Every API response includes cost information in headers:

HTTP/1.1 200 OK
X-Request-Cost: 0.0035
X-Tokens-Used: 150
X-Balance-Remaining: 47.2915

Cost Headers

X-Request-Cost: Total cost of this request in USD
X-Tokens-Used: Number of tokens processed (if applicable)
X-Balance-Remaining: Your remaining account balance

Cost Optimization Strategies

1. Minimize Token Usage

Optimize Prompts

# ❌ Inefficient: Verbose prompt
{
  "messages": [{
    "role": "user",
    "content": "I would like to kindly ask you to please help me understand the concept of machine learning. Could you please explain it to me in great detail with many examples and use cases? Thank you very much for your help."
  }]
}
# Tokens: ~50, Cost: $0.0020 + $0.0005 = $0.0025
 
# ✅ Efficient: Concise prompt
{
  "messages": [{
    "role": "user",
    "content": "Explain machine learning with examples."
  }]
}
# Tokens: ~10, Cost: $0.0020 + $0.0001 = $0.0021
# Savings: $0.0004 per request

Limit Response Length

# Control max_tokens to limit cost
curl -X POST https://api.polysystems.ai/api/hub/agents/chat \
  -H "X-API-Key: YOUR_ACCESS_TOKEN" \
  -d '{
    "messages": [{"role": "user", "content": "Explain AI"}],
    "max_tokens": 100
  }'
# Caps token cost at: 100 × $0.00001 = $0.0010

2. Use Appropriate Endpoints

Choose Simpler Endpoints When Possible

# ❌ Expensive: Using premium endpoint for simple task
POST /api/yggdrasil/process
# Cost: $0.0040 base + tokens
 
# ✅ Cost-effective: Using standard endpoint
POST /api/hub/agents/chat
# Cost: $0.0020 base + tokens
# Savings: $0.0020 per request

3. Cache Results

Cache Repeated Queries

from functools import lru_cache
import requests
 
@lru_cache(maxsize=1000)
def cached_api_call(prompt):
    """Cache API responses to avoid duplicate calls"""
    response = requests.post(
        'https://api.polysystems.ai/api/hub/agents/chat',
        headers={'X-API-Key': os.getenv('PS_API_KEY')},
        json={'messages': [{'role': 'user', 'content': prompt}]}
    )
    return response.json()
 
# First call: Makes API request, costs $0.0035
result1 = cached_api_call("What is AI?")
 
# Second call: Returns cached result, costs $0.0000
result2 = cached_api_call("What is AI?")
 
# Savings: $0.0035 per cached hit

4. Batch Operations

Combine Multiple Requests

# ❌ Multiple individual requests
# 10 requests × $0.0020 = $0.0200
 
# ✅ Single batch request
curl -X POST https://api.polysystems.ai/api/hub/batch \
  -H "X-API-Key: YOUR_ACCESS_TOKEN" \
  -d '{
    "requests": [
      {"prompt": "Question 1"},
      {"prompt": "Question 2"},
      ...
    ]
  }'
# Cost: $0.0050 (reduced base price for batch)
# Savings: $0.0150

5. Use Streaming for Long Responses

Stream Responses to Stop Early

// Stop streaming when you have enough data
const response = await fetch('https://api.polysystems.ai/api/hub/stream', {
  method: 'POST',
  headers: {'X-API-Key': apiKey},
  body: JSON.stringify({messages: [{role: 'user', content: 'Explain...'}]})
});
 
const reader = response.body.getReader();
let text = '';
 
while (true) {
  const {done, value} = await reader.read();
  if (done) break;
  
  text += new TextDecoder().decode(value);
  
  // Stop when you have enough
  if (text.length > 500) {
    reader.cancel();
    break; // Saves token costs for ungenerated tokens
  }
}

6. Implement Request Deduplication

Prevent Duplicate Requests

import hashlib
import time
 
class RequestDeduplicator:
    def __init__(self, window_seconds=60):
        self.cache = {}
        self.window = window_seconds
    
    def get_hash(self, request_data):
        """Generate hash of request"""
        content = json.dumps(request_data, sort_keys=True)
        return hashlib.sha256(content.encode()).hexdigest()
    
    def should_skip(self, request_data):
        """Check if request is duplicate"""
        req_hash = self.get_hash(request_data)
        now = time.time()
        
        if req_hash in self.cache:
            last_time = self.cache[req_hash]
            if now - last_time < self.window:
                return True  # Skip duplicate
        
        self.cache[req_hash] = now
        return False
 
# Usage
dedup = RequestDeduplicator(window_seconds=300)  # 5 minute window
 
request_data = {'messages': [{'role': 'user', 'content': 'Hello'}]}
 
if not dedup.should_skip(request_data):
    response = make_api_call(request_data)
else:
    print("Skipping duplicate request - saved $0.0035")

Cost Monitoring

Track Daily Spending

import requests
from datetime import datetime, timedelta
 
def get_daily_cost():
    """Get cost for today"""
    today = datetime.utcnow().date()
    
    response = requests.get(
        'https://api.polysystems.ai/api/payments/transactions/stats',
        params={
            'start_date': today.isoformat(),
            'end_date': today.isoformat()
        },
        headers={'Authorization': f'Bearer {jwt_token}'}
    )
    
    stats = response.json()
    return stats['summary']['total_credits_spent']
 
cost = get_daily_cost()
print(f"Today's cost: ${cost:.4f}")

Set Cost Alerts

def check_cost_threshold(threshold=10.0):
    """Alert if daily cost exceeds threshold"""
    cost = get_daily_cost()
    
    if cost > threshold:
        send_alert(f"Daily cost ${cost:.2f} exceeds threshold ${threshold:.2f}")
        return True
    return False
 
# Run hourly
check_cost_threshold(threshold=10.0)

Cost Attribution by Feature

def analyze_costs_by_endpoint():
    """Break down costs by endpoint"""
    response = requests.get(
        'https://api.polysystems.ai/api/payments/transactions/stats',
        params={
            'start_date': '2024-01-01',
            'end_date': '2024-01-31'
        },
        headers={'Authorization': f'Bearer {jwt_token}'}
    )
    
    stats = response.json()
    
    print("Cost Breakdown by Endpoint:")
    for route in stats['top_routes']:
        print(f"  {route['route']}: ${route['total_cost']:.4f} ({route['count']} requests)")
 
# Output:
# Cost Breakdown by Endpoint:
#   /api/hub/agents/chat: $24.6800 (1,234 requests)
#   /api/hub/memory: $0.0567 (567 requests)
#   /api/hub/rag/generate: $8.9100 (234 requests)

Pricing Calculator

Interactive Cost Estimator

class PricingCalculator:
    def __init__(self):
        self.prices = {
            'chat': {'base': 0.0020, 'per_token': 0.00001},
            'memory': {'base': 0.0001, 'per_token': None},
            'rag_generate': {'base': 0.0030, 'per_token': 0.00001},
            'legal_analyze': {'base': 0.0050, 'per_token': 0.00003}
        }
    
    def calculate(self, endpoint, requests, avg_tokens=0):
        """Calculate estimated cost"""
        pricing = self.prices.get(endpoint)
        if not pricing:
            return None
        
        base_cost = pricing['base'] * requests
        
        if pricing['per_token'] and avg_tokens > 0:
            token_cost = pricing['per_token'] * avg_tokens * requests
        else:
            token_cost = 0
        
        total = base_cost + token_cost
        
        return {
            'endpoint': endpoint,
            'requests': requests,
            'avg_tokens': avg_tokens,
            'base_cost': base_cost,
            'token_cost': token_cost,
            'total_cost': total,
            'per_request': total / requests
        }
    
    def estimate_monthly(self, endpoint, requests_per_day, avg_tokens=0):
        """Estimate monthly cost"""
        monthly_requests = requests_per_day * 30
        result = self.calculate(endpoint, monthly_requests, avg_tokens)
        result['daily_cost'] = result['total_cost'] / 30
        return result
 
# Usage
calc = PricingCalculator()
 
# Estimate chat endpoint
chat_estimate = calc.estimate_monthly('chat', requests_per_day=1000, avg_tokens=500)
print(f"Monthly chat cost: ${chat_estimate['total_cost']:.2f}")
print(f"Daily average: ${chat_estimate['daily_cost']:.2f}")
 
# Output:
# Monthly chat cost: $210.00
# Daily average: $7.00

Cost Projection

def project_costs(current_daily_spend, growth_rate=0.1):
    """Project costs with growth"""
    projections = []
    
    for month in range(1, 13):
        monthly_spend = current_daily_spend * 30 * (1 + growth_rate) ** month
        projections.append({
            'month': month,
            'estimated_cost': monthly_spend
        })
    
    return projections
 
# Example: Currently spending $5/day with 10% monthly growth
projections = project_costs(current_daily_spend=5.0, growth_rate=0.10)
 
for p in projections[:6]:  # First 6 months
    print(f"Month {p['month']}: ${p['estimated_cost']:.2f}")
 
# Output:
# Month 1: $165.00
# Month 2: $181.50
# Month 3: $199.65
# Month 4: $219.62
# Month 5: $241.58
# Month 6: $265.73

Volume Discounts

Enterprise Pricing

For high-volume usage, contact sales for custom pricing:

Monthly Volume	Standard Rate	Enterprise Rate	Savings
$0 -$ 500	Standard	Standard	0%
$500 -$ 2,000	Standard	-10%	10%
$2,000 -$ 10,000	Standard	-15%	15%
$10,000+	Standard	-20%+	20%+

Contact: enterprise@polysystems.ai

Pricing FAQs

When are credits deducted?

Credits are deducted immediately when the API request is processed, before the response is returned.

What happens if I run out of credits mid-request?

The system checks balance before processing. If insufficient, the request is rejected with a 402 Payment Required error.

Are failed requests charged?

No. Only successful requests (HTTP 2xx) are charged. Failed requests (4xx, 5xx) are not charged.

How are tokens counted?

Tokens are counted using the same tokenization as the underlying model (typically GPT tokenizer). Both input and output tokens are counted.

Can I get a refund for unused credits?

No. Credits are prepaid and non-refundable. They never expire, so you can use them at any time.

Do prices ever change?

Prices may be adjusted with 30 days notice. Current pricing is always available via the API.

Summary

In this chapter, you learned:

✅ How the pay-per-use pricing model works
✅ Fixed-price vs token-based pricing
✅ Pricing for different endpoint categories
✅ How to calculate costs for your use case
✅ Cost optimization strategies
✅ Cost monitoring and tracking
✅ Using the pricing calculator
✅ Volume discounts and enterprise options

Next Steps

Chapter 6: Spending Limits - Control costs with limits
Chapter 8: Webhooks - Get notified of events
Chapter 12: Best Practices - Optimize your integration

Spending Limits Code Examples