Context Management¶

Overview¶

The Context Management feature provides intelligent control over conversation token usage through automated message pruning and summarization. When enabled as a project-level secret, it helps maintain conversation continuity while staying within model token limits by automatically managing message history, generating summaries of older conversations, and preserving important messages.

The Context Budget widget displays real-time token usage metrics across Chat conversations, Agent runs, Pipeline executions, and Application configurations, providing immediate visibility into context consumption and management status.

Prerequisites

To use context management, you need:

Project-level secret named context_manager with value true
An active conversation in Chat, Agent, Pipeline, or Application
LLM model configured with context management support

Enabling Context Management¶

Context management is controlled by a project-level secret that enables the feature across all applicable interfaces.

Access Project Secrets

Navigate to Settings in the main menu
Click on the Secrets section
Click + button
In the secret creation form:
- Name: Enter context_manager (exact name required)
- Value: Enter true (exact value required)
Click ✔ to store the secret

Widget Visibility

The Context Budget widget only appears when the context_manager secret exists and is set to true. Changes take effect immediately after the secret is created or updated.

Accessing Context Management¶

Context management is available in multiple locations within ELITEA:

From Chat Conversations¶

Monitor and control context during active conversations:

Navigate to Chat → Conversations in the main menu
Select or create a conversation
Send the first message to initiate the conversation
The Context Budget widget appears in the right panel (bottom left) after the first message
The widget displays real-time token usage and management status
Click on the widget to view detailed metrics and controls

In Agent Runs¶

Track context usage during agent execution:

Navigate to Agents and select an agent
Send the first message to initiate the conversation
The Context Budget widget appears above the chat panel interface after the first message
Monitor token consumption as the agent processes requests
View pruning and summarization activity in real-time

In Pipeline Executions¶

Monitor context during pipeline chat panels:

Navigate to Pipelines and select a pipeline
Open the pipeline's chat panel interface
Send the first message to initiate the conversation
The Context Budget widget appears above the chat panel interface after the first message
Track context usage across pipeline node executions
Observe automatic context management as the pipeline runs

¶

The Context Budget widget provides three view modes that display progressively more detailed information.

Collapsed View

The minimal view shows essential token usage at a glance:

Status Indicator: Simple line indicator showing usage status
- Green: Normal usage (0-100%)
- Orange: High usage (more than 100%)

Compact View

The compact view adds strategy and message tracking:

Strategy Indicator: Current pruning strategy (e.g., "oldest_first", "importance_based")
Messages Count: Total messages in conversation context
Summaries Count: Number of generated summaries
Expand Button: Click to reveal full details

Conversation

Agents and Pipelines

¶

Expanded View

The full view displays comprehensive context management details organized in collapsible sections. Click on each section to expand and configure settings.

Available Sections:

Context Strategy & Token Management: Configure pruning strategy, token limits, and message preservation settings
Summarization: Enable automatic summarization and configure summary generation parameters
System Messages: Manage system-level instructions and preservation settings

For detailed information about each parameter, see the Configuration Parameters section below.

Context Management Toggle

At the top of the expanded view, there is a toggle switch to enable or disable Context Management entirely. When disabled, all automatic context management features (pruning and summarization) are turned off.

How Context Management Works¶

Automatic Token Tracking

The system continuously monitors token consumption:

Message Addition: Every new message added to conversation context
Token Estimation: Tokens calculated using tiktoken library (with character-based fallback)
Real-Time Display: Context Budget widget updates immediately
Threshold Monitoring: System checks if usage exceeds summary_trigger_ratio Context Management settings are organized into three main sections in the expanded view modal.

Configuration Parameters¶

Overview Metrics¶

Metric	Description	Example
Tokens	Current token usage with percentage	"2,591 / 64,000 (4%)"
Messages	Total number of messages in conversation	"7"
Summaries	Number of generated summaries	"0"

Context Strategy & Token Management¶

Parameter	Description	Default	Range/Options	Purpose
Pruning Strategy	Method for removing messages from context when limit is exceeded	Oldest First	• Oldest First: Remove oldest messages first when limit is reached • Importance Based: Prioritize messages based on importance scoring • Thread Aware: Maintain thread continuity when pruning messages • Hybrid: Combine multiple strategies for optimal context management	Determines how messages are removed when context limit is exceeded • Note:`Currently view-only in the UI`
Max Context Tokens	Maximum number of tokens to keep in conversation context	64,000 tokens	1,000 - 100,000	Defines the upper limit before pruning or summarization occurs
Preserve Recent Messages	Number of most recent messages to always keep in context	5 messages	1 - 50	Ensures the most recent messages are protected during context optimization
Summaries Limit Count	Maximum number of conversation summaries to maintain	5 summaries	1 - 20	Prevents unlimited summary accumulation while preserving conversation history

Pruning Strategy Details¶

Oldest First (FIFO)

Description: Removes oldest messages first when context limit is reached
Behavior: Simple chronological pruning
Use Case: Basic context management with straightforward message history
Preserved: Recent messages (per preserve_recent_messages setting)

Importance Based

Description: Scores messages by importance and removes lowest-scored messages
Scoring Factors:
- Message recency (newer messages score higher)
- Role importance (system/user messages scored higher than assistant)
- Message length (longer messages may score higher)
- Position in conversation (earlier messages in threads preserved)
Use Case: Intelligent context management for complex conversations
Preserved: High-importance messages and recent messages

Strategy Selection

The pruning strategy dropdown is currently disabled in the UI.

How Pruning Works¶

When context approaches the token limit:

Trigger Detection: System detects usage approaching max_context_tokens
Recent Message Protection: Preserves last N messages (per preserve_recent_messages setting)
Strategy Application: Applies active pruning strategy (oldest_first or importance_based)
Message Removal: Removes messages according to strategy logic
Context Rebuild: Rebuilds conversation context with remaining messages

Summarization¶

Parameter	Description	Default	Range/Options	Purpose
Enable Automatic Summarization	Toggle to enable or disable automatic conversation summarization	Enabled	On/Off	Controls whether the system automatically generates summaries when context limits are approached
Summarization Instructions	Custom instructions for how summaries should be generated	"Generate a concise summary of the following conversation messages"	Free text (multiline)	Guides the LLM on how to create summaries that match your needs
Summary Model	AI model used for generating conversation summaries	Project's default model	All available LLM models from your project and shared models	Determines which model processes the summarization task
Summary Trigger Ratio	Trigger summarization when context reaches this percentage of max tokens	0.8 (80%)	0.1 - 1.0	Controls when automatic summarization is initiated
Min Messages for Summary	Minimum number of messages required before creating a summary	5 messages	1 - 50	Prevents summarization of very short conversations
Target Summary Tokens	Target length for generated summaries	4,096 tokens	1 - 100,000	Controls the conciseness of generated summaries

How Summarization Works¶

When summary_trigger_ratio threshold is reached:

Summarization Trigger: System detects token usage exceeds trigger ratio (e.g., 100%)
Message Selection: Identifies messages eligible for summarization (excludes preserved recent messages)
Summary Generation: LLM generates concise summary of selected messages using the configured Summary Model and Summarization Instructions
Message Replacement: Original messages replaced with summary in context
Token Reduction: Context token count reduced while preserving conversation continuity
Summary Storage: Summary tracked (total summaries limited by summaries_limit_count)

System Messages¶

Parameter	Description	Default	Range/Options	Purpose
Always Preserve System Messages	Toggle to keep system messages during context pruning	Enabled	On/Off	Ensures system-level instructions remain available throughout the conversation
System Messages	Custom system messages for the conversation	"You are a helpful assistant."	Free text (multiline)	Defines the AI assistant's role and behavior guidelines

Manual Context Optimization¶

In addition to automatic context management, you can manually trigger context optimization when needed. This is particularly useful when you want to immediately prune messages without waiting for automatic thresholds to be reached.

When to Use Manual Optimization

High Context Usage: When you see the orange status indicator (>100% usage) in the Context Budget widget
Immediate Cleanup: When you want to reduce token usage before continuing a conversation
Before Important Interactions: To ensure maximum available context for upcoming complex tasks
Performance Issues: When experiencing slow response times due to high token counts

How to Manually Optimize

Click on the Context Budget widget to open the expanded view
When context usage exceeds 100%, a yellow warning banner appears at the top with the message:

Click the Optimize now button in the warning banner
Confirm the action in the dialog that appears
The system will immediately prune messages based on your configured strategy

Irreversible Action

Manual optimization cannot be undone. Preserved recent messages (per your configuration) will always be retained.

What Happens During Manual Optimization

When you manually trigger optimization:

Strategy Application: The system applies your configured pruning strategy (oldest_first or importance_based)
Message Protection: Recent messages (per preserve_recent_messages setting) are protected from removal
System Message Preservation: If enabled, system messages are retained
Token Reduction: Messages are removed until the target token count (max_context_tokens) is reached
Context Rebuild: The conversation context is rebuilt with the remaining messages
Widget Update: The Context Budget widget updates to reflect the new token count

Best Practices for Manual Optimization

Review Settings First: Before manually optimizing, review your Context Strategy settings to ensure recent message preservation is appropriate
Monitor Usage: Use manual optimization proactively when you see the status indicator turning orange
Strategic Timing: Trigger optimization before starting new complex tasks or multi-turn interactions
Combine with Configuration: Use manual optimization alongside proper configuration of automatic settings for best results

Usage Scenarios¶

Long-Running Conversations

Use Case: Maintain coherent conversations that exceed model token limits

Configuration:

Max Context Tokens: 64,000
Summary Trigger Ratio: 0.8
Preserve Recent Messages: 10
Pruning Strategy: importance_based

Behavior:

User engages in extended conversation with AI assistant
Context grows naturally as messages are added
At 51,200 tokens (80% of 64,000), summarization triggers automatically
System generates summary of older messages
Last 10 messages always preserved for immediate context
Conversation continues seamlessly with reduced token usage

Benefits:

No manual intervention required
Important conversation details preserved in summaries
Recent context always available
Conversation never "forgets" early important information

Multi-Turn Agent Tasks

Use Case: Agent performing complex tasks requiring multiple interactions

Configuration:

Max Context Tokens: 32,000
Summary Trigger Ratio: 0.75
Preserve Recent Messages: 5
Pruning Strategy: oldest_first

Behavior:

Agent starts task with initial instructions
Multiple tool calls and responses accumulate
At 24,000 tokens (75% of 32,000), oldest messages are pruned
Last 5 exchanges preserved for immediate task context
Agent continues task execution without context overflow

Benefits:

Task execution never interrupted by token limits
Most recent tool results always accessible
Efficient token usage for long-running tasks
Simplified context management for automated workflows

Pipeline Chat Contexts

Use Case: Pipeline with chat panel interface requiring context preservation

Configuration:

Max Context Tokens: 16,000
Summary Trigger Ratio: 0.8
Preserve Recent Messages: 8
Pruning Strategy: importance_based

Behavior:

Pipeline nodes generate output and chat messages
User interactions add additional context
Context Budget widget shows real-time usage across pipeline execution
At 12,800 tokens (80% of 16,000), importance-based pruning occurs
System preserves critical pipeline outputs and recent user messages
Pipeline continues with optimized context

Benefits:

Pipeline execution state preserved
User can continue interacting without interruption
Important node outputs retained
Balanced context across pipeline stages

Best Practices¶

Monitoring Context Usage

Regular Budget Checks

Check Context Budget widget periodically during long conversations
Pay attention to color changes in the percentage bar:
Green: Safe range, no action needed
Yellow: Monitor closely, approaching limit
Red: Critical range, summarization or pruning likely
Expand widget to full view for detailed metrics when yellow or red
Use compact view for quick strategy and message count checks

Understanding Token Consumption

Different message types consume different token amounts:
System prompts: Variable (often 100-500 tokens)
User messages: Depends on length (typically 10-200 tokens)
Assistant responses: Variable (often 100-1000+ tokens)
Tool calls: Includes function definitions (can be 50-300 tokens each)
Attachments and images can significantly increase token usage
Summary messages reduce overall token count while preserving information

Strategy Awareness

Know which pruning strategy is active for your conversations
oldest_first: Predictable, simple, but may lose important early context
importance_based: Intelligent, preserves high-value messages, but less predictable
Contact administrator if strategy doesn't match your use case needs

Optimizing Conversations

Message Structure

Keep messages concise when possible to reduce token consumption
Break long messages into smaller logical chunks
Use clear, structured formatting to help importance-based scoring
Avoid unnecessary repetition or verbose phrasing

Preserve Recent Messages Setting

Adjust based on conversation type:
Quick Q&A: Lower number (3-5 messages)
Complex discussions: Higher number (10-15 messages)
Multi-step tasks: Medium number (5-10 messages)
Remember: Preserved messages are never pruned or summarized
Higher numbers mean more guaranteed context but less flexibility

Summary Trigger Ratio

Lower ratios (0.7-0.75): More frequent summarization, lower peak token usage
Higher ratios (0.8-0.9): Less frequent summarization, higher token efficiency
Balance based on:
Model token limits
Conversation importance
Cost considerations (summarization uses LLM calls)
Desired conversation continuity

Troubleshooting Common Issues¶

Context Budget Widget Not Visible

Symptoms:

Widget completely missing from right panel
No context management controls available

Diagnosis:

Verify project secret context_manager exists
Check secret value is exactly true (case-sensitive)
Confirm you're viewing a supported interface (Chat, Agent, Pipeline)
Check browser console for errors

Resolution:

Navigate to Settings → Secrets
Create or update context_manager secret with value true
Refresh the page
Widget should appear immediately if secret is correct

Token Count Seems Inaccurate

Symptoms:

Displayed token count doesn't match expectations
Percentage bar doesn't align with message count

Explanation:

Token counting uses tiktoken library with character-based fallback (~4 chars per token)
Different message types have different token densities
System messages, role labels, and formatting add overhead
Tool calls include function definitions in token count

Resolution:

Token counts are estimates and may vary slightly from actual LLM processing
Focus on relative changes (increasing/decreasing) rather than absolute accuracy
If consistently far off, contact administrator to check token estimation configuration

Summarization Not Occurring

Symptoms:

Token usage reaches trigger ratio but no summary is generated
Context continues to grow beyond expected limit

Possible Causes:

Insufficient messages to summarize (all recent messages preserved)
Summary limit already reached (summaries_limit_count)
LLM model configuration issue
Backend summarization disabled

Resolution:

Check expanded view: Compare Messages vs Preserve Recent count
If Messages ≤ Preserve Recent, summarization cannot occur
Check expanded view: Review Summaries count
If at limit (default 5), oldest summary will be replaced
Verify LLM model is properly configured for text generation
Contact administrator to check backend context management configuration

Messages Disappearing Unexpectedly

Symptoms:

Messages from earlier in conversation no longer visible
Conversation feels disjointed or missing context

Explanation:

This is expected behavior when pruning occurs
Messages are removed from context when token limit is approached
Pruned messages are not deleted, just removed from active context

Understanding:

Check Context Budget widget for strategy in use
oldest_first: Messages disappear in chronological order
importance_based: Lower-scored messages disappear first
Recent messages (per preserve_recent_messages) never disappear

If Unwanted:

Increase max_context_tokens to reduce pruning frequency
Increase preserve_recent_messages to keep more messages
Request administrator to adjust pruning strategy

Performance Issues

Symptoms:

Slow message sending or response times
UI lag when interacting with Context Budget widget
Browser becomes unresponsive

Possible Causes:

Very high max_context_tokens causing expensive operations
Excessive message count in conversation
Frequent summarization operations
Browser memory limitations

Resolution:

Reduce max_context_tokens:
Lower values mean less context to process
Typical range: 8,000 - 32,000 for optimal performance
Start new conversation:
Very long conversations can accumulate state
Consider starting fresh for new topics
Check browser resources:
Close unnecessary tabs
Ensure browser is up to date
Clear browser cache if needed
Contact administrator:
May need to adjust backend processing limits
Could configure more aggressive pruning

Context Management¶

Overview¶

Enabling Context Management¶

Accessing Context Management¶

From Chat Conversations¶

In Agent Runs¶

In Pipeline Executions¶

¶

Understanding the Context Budget Widget¶

¶

How Context Management Works¶

Configuration Parameters¶

Overview Metrics¶

Context Strategy & Token Management¶

Pruning Strategy Details¶

How Pruning Works¶

Summarization¶

How Summarization Works¶

System Messages¶

Manual Context Optimization¶

Usage Scenarios¶

Best Practices¶

Troubleshooting Common Issues¶