Index Confluence Data¶

Availability

Indexing tools are available in the Next environment (Release 1.7.0) and replace legacy Datasources/Datasets. For context, see Release Notes 1.7.0 and the Indexing Overview.

This guide provides a complete step-by-step walkthrough for indexing Confluence data and then searching or chatting with the indexed content using ELITEA's AI-powered tools.

Primary Interface

Indexing operations are performed through the Indexes interface within Toolkit Configuration. This interface provides comprehensive index management with visual status indicators, real-time progress monitoring, and integrated search capabilities. For detailed information, see How to create and use indexes.

Overview¶

Confluence indexing allows you to create searchable indexes from your Confluence knowledge base content:

Pages & Documentation: Meeting notes, project documentation, how-to guides, and knowledge articles
Spaces: Organized content collections for teams, projects, or departments
Labels & Metadata: Content categorization and tagging systems
Attachments: Document files, images, and other media attached to pages
Comments: User discussions and collaborative input on pages

What you can do with indexed Confluence data:

Semantic Search: Find information across your Confluence spaces using natural language queries
Context-Aware Chat: Get AI-generated answers from your knowledge base with citations
Cross-Space Discovery: Search across multiple Confluence spaces and projects
Knowledge Extraction: Transform Confluence content into searchable organizational knowledge
Content Analysis: Analyze patterns, gaps, and relationships in your documentation

Common use cases:

Onboarding new team members by allowing them to ask questions about processes and documentation
Finding specific procedures, policies, or best practices across your knowledge base
Generating summaries or explanations from existing documentation
Support ticket resolution using indexed knowledge articles and FAQ content
Content audit and analysis to identify documentation gaps or outdated information

Prerequisites¶

Before indexing Confluence data, ensure you have:

Confluence Credential: A Confluence API token or authentication credentials configured in ELITEA
Vector Storage: PgVector selected in Settings → AI Configuration
Embedding Model: Selected in AI Configuration (defaults available) → AI Configuration
Confluence Toolkit: Configured with your Confluence instance details, credentials, and Index Data tool enabled

Requirements

The Indexes interface requires: - PgVector and Embedding Model configured at the project level - The Index Data tool enabled in your toolkit configuration

Complete both project-level setup and toolkit configuration to access indexing functionality.

Required Permissions¶

Your Confluence credential needs appropriate permissions based on what you want to index:

For Content Access:

Read access to Confluence spaces and pages
Permission to view the specific spaces you want to index

For Comprehensive Indexing:

Access to view attachments (if including attachments)
Permission to view comments (if including comments)
Access to both public and restricted content (based on your requirements)

Authentication Methods:

Basic Authentication: Username and API Key
Bearer Token: Confluence API token

Step-by-Step: Creating a Confluence Credential¶

Generate Confluence API Token in your Atlassian account (Security → API Tokens)
Create Credential in ELITEA: Navigate to Credentials → + Create → Confluence → enter details and save

Detailed Instructions

For complete credential setup steps including token generation and security best practices, see:

Create a Credential
Confluence Credential Setup
Confluence Toolkit Guide (Token generation section)

Step-by-Step: Configure Confluence Toolkit¶

Create Toolkit: Navigate to Toolkits → + Create → Confluence
Configure Settings: Set base URL, space, hosting option, and assign your Confluence credential
Enable Tools: Select Index Data, List Collections, Search Index, Stepback Search Index, Stepback Summary Index, and Remove Index tools
Save Configuration

Required Tool

The Index Data tool must be enabled for indexing functionality to be available. Without this tool, you cannot access the indexing interface.

Tool Overview:¶

Index Data: Creates searchable indexes from Confluence pages and content
List Collections: Lists all available collections/indexes to verify what's been indexed
Search Index: Performs semantic search across indexed content using natural language queries
Stepback Search Index: Advanced search that breaks down complex questions into simpler parts for better results
Stepback Summary Index: Generates summaries and insights from search results across indexed content
Remove Index: Deletes existing collections/indexes when you need to clean up or start fresh

Detailed Instructions

For complete toolkit configuration including space setup and authentication options, see:

Step-by-Step: Index Confluence Data¶

Step 1: Access the Interface¶

Navigate to Toolkits: Go to Toolkits in the main navigation
Select Your Confluence Toolkit: Choose your configured Confluence toolkit from the list
Open Indexes Tab: Click on the Indexes tab in the toolkit detail view

If the tab is disabled or not visible, verify that: - PgVector and Embedding Model are configured in Settings → AI Configuration - The Index Data tool is enabled in your toolkit configuration

Step 2: Create a New Index¶

Click Create New Index: In the Indexes sidebar, click the + Create New Index button
New Index Form: The center panel displays the new index creation form

Step 3: Configure Index Parameters¶

Fill in the required and optional parameters for your Confluence space:

Parameter	Description	Example Value	Required
Index Name	Suffix for collection name (max 7 chars)	`kb` or `docs`	✓
Clean Index	Remove existing index data before re-indexing	✓ (checked) or ✗ (unchecked)	✗
Chunking Tool	Method for splitting content into chunks	Default chunking or custom	✗
content_format	The format of the content to be retrieved	`view`, `storage`, `export_view`, `editor`, `anonymous`	✗
page_ids	List of page IDs to retrieve	`["123456", "789012"]`	✗
label	Label to filter pages	`"api-docs"`	✗
cql	CQL query to filter pages	`"space = DEV"`	✗
limit	Limit the number of results	`10`	✗
max_pages	Maximum number of pages to retrieve	`1000`	✗
include_restricted_content	Include restricted content in indexing	✓ (checked) or ✗ (unchecked)	✗
include_archived_content	Include archived content in indexing	✓ (checked) or ✗ (unchecked)	✗
include_attachments	Include attachments in indexing	✓ (checked) or ✗ (unchecked)	✗
include_comments	Include page comments	✓ (checked) or ✗ (unchecked)	✗
include_labels	Include page labels	✓ (checked) or ✗ (unchecked)	✗
ocr_languages	OCR languages for processing attachments	`eng`, `fra`, `deu`	✗
keep_markdown_format	Preserve Markdown formatting	✓ (checked) or ✗ (unchecked)	✗
keep_newlines	Preserve line breaks and formatting	✓ (checked) or ✗ (unchecked)	✗
bins_with_llm	Use LLM for processing binary files	✓ (checked) or ✗ (unchecked)	✗

Step 4: Start Indexing¶

Form Validation: The Index button remains inactive until all required fields are filled
Review Configuration: Verify all parameters are correct
Click Index Button: Start the indexing process
Monitor Progress: Watch real-time updates with visual indicators:
- 🔄 In Progress: Indexing is currently running
- ✅ Completed: Indexing finished successfully
- ❌ Failed: Indexing encountered an error

Step 5: Verify Index Creation¶

Once indexing completes:

Check Index Status: Verify the index shows ✅ Completed status in the sidebar
Review Index Information: Click on your index to see:
- Document Count: Number of indexed pages
- Last Updated: Timestamp of indexing completion
- Index Name: Your specified collection suffix

Using Search Tools with Indexed Data¶

Once your Confluence data is indexed, you can search and interact with it directly through the interface:

Accessing Search Functionality¶

Select Your Index: Click on your completed index from the sidebar
Navigate to Run Tab: Click the Run tab in the center panel
Choose Search Tool: Select from available search tools in the dropdown:
- Search Index: Basic semantic search across indexed content
- Stepback Search Index: Advanced search that breaks down complex questions
- Stepback Summary Index: Search with automatic summarization of results

Running a Search¶

Enter Your Query: Type your search query (e.g., "What is our vacation policy?")
Configure Parameters: Adjust optional settings like filters and model configuration
Click Run: Execute the search
View Results: Results appear in the integrated chat interface on the right panel

Search and Chat with Indexed Data¶

Once your Confluence data is indexed, you can use the toolkit to search and interact with your content in multiple ways:

Using Toolkit in Conversations and Agents¶

Your Confluence toolkit can be used in two main contexts:

In Conversations: Add the toolkit as a participant to ask questions and search your indexed Confluence data
In Agents: Include the toolkit when creating AI agents to give them access to your Confluence knowledge base

How to use:

Start a New Conversation or Create an Agent
Add Toolkit as Participant: Select your Confluence toolkit from the available toolkits
Ask Natural Language Questions: The toolkit will automatically search your indexed data and provide relevant answers with citations

Real-Life Example Workflow¶

Let's walk through a complete example of indexing and using a company knowledge base:

Step 1: Setup Confluence Toolkit for Company Knowledge Base

Configure Confluence Toolkit:

Base URL: https://yourcompany.atlassian.net/wiki
Space: KB (Knowledge Base space key)
Hosting Option: Cloud
Credential: Your Confluence API token
Tools enabled: Index Data, List Collections, Search Index, Stepback Search Index, Stepback Summary Index, Remove Index

Step 2: Use in Conversations

Add toolkit to a conversation and start asking questions:

Navigate to Conversations → + New Conversation
In the participants section click + to add Toolkit
Select your Confluence toolkit from the available toolkits list

Step 3: Index the Knowledge Base

User Request in Chat:

"Index our knowledge base with collection suffix 'kb'. Include attachments and labels but exclude comments. Keep markdown formatting and clean any existing index first."

Step 4: Verify Index Creation

User Request:

"List my collections"

Confluence Toolkit Response:

This confirms the indexing completed successfully and your knowledge base collection is available for searching.

Step 5: Start the conversation and begin asking questions about your indexed content

User Request: "What is our company's vacation policy?"
Confluence Toolkit: "Based on your indexed knowledge base, here's your company's vacation policy:
1. Annual Leave Entitlement: All full-time employees are entitled to 25 days of annual leave per year
2. Accrual System: Vacation days accrue monthly at a rate of 2.08 days per month
3. Carryover Policy: Up to 5 unused vacation days can be carried over to the next year...

Source: HR Policies → Employee Benefits → Vacation Policy (Page ID: 12345)"

User Request: "How do I set up a new development environment?"
Confluence Toolkit: "Here's the step-by-step process for setting up a new development environment:

Prerequisites:
1. Install Docker Desktop
2. Obtain access credentials from the DevOps team
3. Clone the main repository from GitHub
Setup Steps:
1. Run the environment setup script: ./scripts/setup-dev.sh
2. Configure your local environment variables...

Source: Developer Documentation → Environment Setup → Development Environment (Page ID: 67890)"

Troubleshooting & Tips¶

Common Issues and Solutions¶

"Indexing interface not visible" or "Tab disabled":

Verify PgVector and Embedding Model are configured in Settings → AI Configuration
Ensure the Index Data tool is enabled in your Confluence toolkit configuration
Check that your toolkit supports indexing (Confluence is supported)
Refresh the browser page and retry

"+ Create New Index button not working":

Verify all project-level prerequisites are met (PgVector and Embedding Model)
Check that you have proper permissions for the toolkit
Ensure the toolkit is properly saved with credentials

Common Errors and Solutions¶

"Space not found" or "Authentication failed":

Verify your Confluence credential has the correct API token
Ensure the space key is exact and case-sensitive (e.g., KB, not kb)
Check that your token has appropriate permissions for the space

"Index creation failed" or "Indexing stuck in progress":

Check your space and filter configurations aren't too restrictive
Verify the space contains pages matching your criteria
Monitor the progress indicators for specific error messages

"API rate limit exceeded":

Large spaces may hit Confluence API limits
Reduce Max Pages or use more specific filtering (labels, CQL)
Wait and retry, or consider indexing in smaller batches

"No documents indexed":

Check your label or CQL filters aren't too restrictive
Verify the space contains pages matching your criteria
Try indexing without filters first, then add restrictions

"No search results returned":

Verify the index shows ✅ Completed status
Check that your search query matches the type of content indexed
Try broader search terms or different search tools (Stepback Search, Stepback Summary)
Ensure the indexed content contains relevant information

"Vector database connection failed" or "PgVector errors":

Ensure PgVector is properly configured in Settings → AI Configuration
Verify the vector database is running and accessible
Check connection credentials and database permissions
Restart the vector database service if connection issues persist

"Embedding model not found" or "Embedding errors":

Verify an embedding model is selected in AI Configuration
Check if the embedding model is properly downloaded/initialized
Try switching to a different embedding model (e.g., text-embedding-ada-002)
Ensure sufficient system resources for the embedding model to load

Performance and Scope Considerations¶

For Large Confluence Spaces:

Use specific label filters: label="documentation" or label="public"
Use CQL queries to target specific content: space="KB" AND created>="2024-01-01"
Consider indexing by space hierarchy: index parent pages first, then children
Monitor the progress indicators and document count during indexing

Search Result Quality¶

If search returns few/no results:

Verify the index shows ✅ Completed status in the sidebar
Lower the cut-off score from 0.5 to 0.35 or 0.3
Increase search_top from 10 to 20 or 30
Try rephrasing your query with different keywords
Verify the indexed content contains relevant information for your query

For better search quality:

Include both pages and attachments for comprehensive coverage
Use natural language queries rather than exact keyword matches
Leverage the integrated chat interface for follow-up questions
Create separate indexes for different content types (public vs internal, different spaces)

Content-Specific Indexing Tips¶

For Knowledge Base Content:

Focus on pages with public or documentation labels
Include attachments for comprehensive policy and procedure documents
Consider excluding comments unless they contain valuable information

For Project Documentation:

Use space-specific indexing for better organization
Include both current and archived content for historical context
Index meeting notes and project updates separately from technical documentation

For Troubleshooting Guides:

Include pages with labels like troubleshooting, faq, or support
Index both internal and customer-facing content
Consider including comments as they often contain additional solutions