Docs/Core Concepts/Tracing

Tracing Fundamentals

Distributed tracing is essential for understanding AI agent behavior. Learn how AgenticAnts implements tracing for AI systems.

What is Tracing?

Tracing tracks a request as it flows through your system, capturing:

  • What happened
  • When it happened
  • How long it took
  • What data was involved
  • Any errors that occurred

Trace vs Span vs Event

Example:

Creating Traces

Basic Trace

typescript
import { AgenticAnts } from '@agenticants/sdk' const ants = new AgenticAnts({ apiKey: process.env.AGENTICANTS_API_KEY }) async function processQuery(query: string) { // Start a trace const trace = await ants.trace.create({ name: 'process-customer-query', input: query, metadata: { userId: 'user_123', timestamp: new Date().toISOString() } }) try { // Your agent logic const result = await agent.process(query) // Complete the trace await trace.complete({ output: result, metadata: { success: true, confidence: 0.95 } }) return result } catch (error) { // Record error await trace.error({ error: error.message, stack: error.stack }) throw error } }

Nested Spans

Add spans for detailed breakdown:

typescript
async function processQuery(query: string) { const trace = await ants.trace.create({ name: 'process-query' }) // Span 1: Classification const classifySpan = trace.span('classify-intent') const intent = await classifyIntent(query) classifySpan.end({ intent }) // Span 2: Retrieval const retrievalSpan = trace.span('retrieve-context') const context = await retrieveContext(intent) retrievalSpan.end({ documents: context.length }) // Span 3: Generation const generationSpan = trace.span('generate-response') const response = await generateResponse(query, context) generationSpan.end({ tokens: response.usage.total, cost: response.cost }) await trace.complete({ output: response.text }) }

Trace Context

Propagation

Trace context flows through your system:

typescript
// Service A starts a trace const trace = await ants.trace.create({ name: 'main-request' }) const traceContext = trace.getContext() // Pass context to Service B await fetch('https://service-b.com/api', { headers: { 'x-trace-id': traceContext.traceId, 'x-span-id': traceContext.spanId } }) // Service B continues the trace const trace = ants.trace.fromContext(traceContext) const span = trace.span('service-b-operation') // ...

Correlation

Link related traces:

typescript
// Parent trace const parentTrace = await ants.trace.create({ name: 'user-session' }) // Child traces reference parent const childTrace1 = await ants.trace.create({ name: 'query-1', parentTraceId: parentTrace.id }) const childTrace2 = await ants.trace.create({ name: 'query-2', parentTraceId: parentTrace.id })

Metadata and Tags

Adding Metadata

Enrich traces with context:

typescript
const trace = await ants.trace.create({ name: 'agent-execution', metadata: { // User information userId: 'user_123', userEmail: 'user@example.com', userTier: 'premium', // Request context requestId: 'req_abc', sessionId: 'session_xyz', ipAddress: '192.168.1.1', // Business context customerId: 'customer_456', accountType: 'enterprise', feature: 'customer-support', // Technical context agentVersion: '1.2.3', model: 'gpt-4', temperature: 0.7, region: 'us-east-1' } })

Using Tags

Categorize and filter traces:

python
trace = ants.trace.create( name='agent-execution', tags={ 'environment': 'production', 'team': 'customer-success', 'priority': 'high', 'ab_test': 'variant_b' } ) # Query by tags later traces = ants.traces.query( tags={'environment': 'production', 'priority': 'high'} )

Sampling Strategies

Head-Based Sampling

Decide at trace creation:

typescript
const ants = new AgenticAnts({ apiKey: process.env.AGENTICANTS_API_KEY, sampling: { strategy: 'head-based', rate: 0.1 // Sample 10% of traces } }) // Or custom logic const shouldSample = (request) => { // Always sample errors if (request.expectedError) return true // Always sample premium users if (request.userTier === 'premium') return true // Sample 10% of others return Math.random() < 0.1 }

Tail-Based Sampling

Decide after trace completes:

python
ants = AgenticAnts( api_key=os.getenv('AGENTICANTS_API_KEY'), sampling={ 'strategy': 'tail-based', 'rules': [ # Keep all errors {'condition': 'error = true', 'rate': 1.0}, # Keep all slow requests {'condition': 'duration > 5000', 'rate': 1.0}, # Keep 50% of high-value customers {'condition': 'customer_tier = "enterprise"', 'rate': 0.5}, # Keep 10% of everything else {'condition': 'true', 'rate': 0.1} ] } )

Performance Tracking

Measuring Latency

typescript
const trace = await ants.trace.create({ name: 'agent-run' }) // Automatic timing const span = trace.span('llm-call') const result = await llm.generate(prompt) span.end() // Duration calculated automatically // Manual timing const start = Date.now() const result = await operation() const duration = Date.now() - start span.end({ duration })

Token Tracking

python
span = trace.span('llm-inference') response = openai.chat.completions.create( model='gpt-4', messages=messages ) span.end({ 'tokens': { 'prompt': response.usage.prompt_tokens, 'completion': response.usage.completion_tokens, 'total': response.usage.total_tokens }, 'cost': calculate_cost(response.usage, 'gpt-4') })

Error Tracking

Recording Errors

typescript
try { const result = await riskyOperation() span.end({ output: result }) } catch (error) { span.error({ error: error.message, stack: error.stack, code: error.code, severity: 'error', context: { operation: 'riskyOperation', inputs: { /* ... */ } } }) throw error }

Error Categories

python
# Classify errors if isinstance(error, ValidationError): severity = 'warning' elif isinstance(error, RateLimitError): severity = 'warning' elif isinstance(error, NetworkError): severity = 'error' else: severity = 'critical' span.error( error=str(error), severity=severity, recoverable=isinstance(error, RetryableError) )

Visualizing Traces

Trace Timeline

Flamegraph

Best Practices

1. Meaningful Names

typescript
// Good span('llm-inference') span('database-query') span('vector-search') // Avoid span('step1') span('process') span('func')

2. Rich Metadata

python
trace.complete( output=response, metadata={ 'model': 'gpt-4', 'tokens': 350, 'cost': 0.0105, 'confidence': 0.95, 'cache_hit': False, 'retries': 0 } )

3. Proper Error Handling

Always record errors in traces:

typescript
catch (error) { await trace.error({ error: error.message, stack: error.stack, severity: 'error', context: { /* relevant data */ } }) throw error // Still throw after recording }

4. Smart Sampling

Balance coverage and cost:

python
# Sample strategically def should_trace(request): # Always trace errors if has_error(request): return True # Always trace slow requests if is_slow(request): return True # Sample by user tier if request.user_tier == 'enterprise': return random.random() < 0.5 # 50% else: return random.random() < 0.1 # 10%

Next Steps

Explore Integrations →

© 2026 ANTS Platform, Inc.Docs v1.0 · Last updated June 2026