Migrate Datasources to Indexing¶

Overview¶

This guide explains how to migrate your existing Datasources from the Nexus environment to Indexing in the Next environment. The new indexing system provides better performance, more flexibility, and standardized tools across all supported platforms.

What Changed: Datasources → Indexing¶

Before (Nexus - Datasources):

Datasources were standalone entities that created datasets
Limited to specific data types and configurations
Separate entity management outside of toolkits
Fixed chunking and processing options

Now (Next - Indexing):

Indexing is integrated directly into Toolkits
Standardized tools available across all supported platforms
Flexible collection management and naming
Advanced search capabilities with stepback search
Better chunking and metadata handling

Deduplication Deprecated

The Deduplication feature available in Datasources is deprecated and not available in the new Indexing system. Focus on using the improved search and filtering capabilities of the new indexing tools instead.

Migration Process Overview¶

Step 1: Identify Your Current Datasources¶

In the Nexus environment, review your existing datasources to understand:

What type of data each datasource indexes (Jira, Confluence, Files, etc.)
The scope and filters used in each datasource
Which agents or pipelines use these datasources

Step 2: Set Up Prerequisites in Next Environment¶

Before migrating, ensure you have:

Credentials: Migrate or recreate your credentials in the Next environment
Vector Storage: Configure PgVector in Settings → AI Configuration
Embedding Model: Select an embedding model in AI Configuration

Step 3: Create Corresponding Toolkits¶

Create the appropriate toolkits in the Next environment based on your datasource types:

File/Table → Artifact Toolkit or SharePoint Toolkit
Jira → Jira Toolkit
Confluence → Confluence Toolkit
Git → GitHub Toolkit

Step 4: Index Your Data¶

Use the Index Data tool in each toolkit to recreate your indexed content with improved capabilities.

Migration Examples by Datasource Type¶

File/Table Datasources → Artifacts or SharePoint Indexing¶

What you had in Nexus:

File datasources that indexed documents, spreadsheets, or text files
Table datasources for structured data or spreadsheets
Basic file content extraction and tabular data indexing

What to do in Next:

Option A: Use SharePoint Indexing

Upload files to SharePoint (if not already there):
- Upload your files (documents, Excel, CSV, etc.) to your SharePoint site/library
- Ensure files are accessible with your SharePoint credentials
Create SharePoint Credential:
- Create credential in Credentials → + Create → SharePoint
Create SharePoint Toolkit with your SharePoint credentials
Index SharePoint files containing your data:
- Select Index Data tool and specify indexing parameters (e.g., collection suffix, file type filters, etc.)
Advanced Search Capabilities:
- Use Stepback Search Index for complex document queries
- Search across multiple SharePoint libraries with semantic understanding

Option B: Use Artifacts Indexing (for local files)

Upload files to Artifacts:
- Navigate to Artifacts → + Create bucket
- Upload your files to the artifact bucket
Create Artifact Toolkit:
- Navigate to Toolkits → + Create → Artifact
- Specify the bucket name containing your files
- Enable indexing tools: Index Data, Search Index, Stepback Search Index, Stepback Summary Index, List Collections, Remove Index
Index your artifact data:
- Use Index Data tool and specify indexing parameters (e.g., collection suffix, chunking method, clean index option, file filters, etc.)
Advanced Search Capabilities:
- Use Stepback Search Index for complex document queries
- Search across multiple artifact buckets with semantic understanding

Detailed Instructions

For complete step-by-step instructions, see:

Index SharePoint Data
Index Artifacts Data

Jira Datasources → Jira Indexing¶

What you had in Nexus:

Jira datasources that indexed issues, stories, comments, and attachments
Project-based or JQL-filtered content

What to do in Next:

Create Jira Credential:
- Use your existing Jira API token (or generate a new one if needed)
- Create credential in Credentials → + Create → Jira
Create Jira Toolkit:
- Navigate to Toolkits → + Create → Jira
- Configure with your Jira instance URL and credentials
- Enable indexing tools: Index Data, Search Index, Stepback Search Index, Stepback Summary Index, List Collections, Remove Index
Index Jira Data:
- Use Index Data tool and specify indexing parameters (e.g., collection suffix, project keys or JQL queries, etc.)
Enhanced Search Capabilities:
- Use Stepback Search Index for complex queries
- Search across multiple projects with natural language

Detailed Instructions

For complete step-by-step instructions, see: Index Jira Data

Confluence Datasources → Confluence Indexing¶

What you had in Nexus:

Confluence datasources that indexed pages, spaces, and attachments
Space-based or label-filtered content

What to do in Next:

Create Confluence Credential:
- Use your existing Confluence API token (or generate a new one if needed)
- Create credential in Credentials → + Create → Confluence
Create Confluence Toolkit:
- Navigate to Toolkits → + Create → Confluence
- Configure with your Confluence instance and credentials
- Enable indexing tools: Index Data, Search Index, Stepback Search Index, Stepback Summary Index, List Collections, Remove Index
Index Confluence Data:
- Use Index Data tool and specify indexing parameters (collection suffix, space keys, content filters, etc...)
Advanced Documentation Search:
- Use Stepback Summary Index for documentation analysis
- Search across multiple spaces with semantic queries

Detailed Instructions

For complete step-by-step instructions, see: Index Confluence Data

Git Datasources → Repository Indexing¶

What you had in Nexus:

Git datasources that indexed code repositories, documentation, and commit history
Branch-based or repository-wide content indexing
Basic code and documentation search

What to do in Next:

Create Repository Credential:
- Use your existing GitHub Personal Access Token (or generate a new one if needed)
- Create credential in Credentials → + Create → GitHub
Create Repository Toolkit:
- Navigate to Toolkits → + Create → GitHub
- Configure with your repository URL and credentials
- Enable indexing tools: Index Data, Search Index, Stepback Search Index, Stepback Summary Index, List Collections, Remove Index
Index Repository Data:
- Use Index Data tool and specify indexing parameters (collection suffix, branch name, file type filters, etc.)
Enhanced Code and Documentation Search:
- Use Stepback Search Index for complex code analysis queries
- Search across multiple GitHub repositories with semantic understanding
- Find code patterns, documentation, and implementation examples

Detailed Instructions

For complete step-by-step instructions, see: Index Repository Data

Key Improvements in Indexing¶

Better Search Capabilities¶

New search tools available:

Search Index: Basic semantic search across indexed content
Stepback Search Index: Advanced search that breaks down complex questions for better results
Stepback Summary Index: Search with automatic summarization of results
List Collections: View all available indexed collections
Remove Index: Clean up or refresh indexed data

Improved Organization¶

Collection Management:

Use meaningful collection suffixes to organize different types of content
Create multiple indexes for different scopes or time periods
Better naming conventions for easier discovery

Enhanced Integration¶

Toolkit Integration:

Indexing tools are built into each toolkit
Consistent experience across all platforms
Direct integration with conversations and agents

Migration Checklist¶

Before You Start¶

☐ Review all existing datasources in Nexus environment
☐ Document the scope and purpose of each datasource
☐ Identify which agents/pipelines use each datasource
☐ Plan your new collection naming strategy

Setting Up Next Environment¶

☐ Configure Vector Storage (PgVector) in AI Configuration
☐ Select Embedding Model in AI Configuration
☐ Recreate or migrate credentials for each data source
☐ Create appropriate toolkits for each data type

Migration Process¶

☐ Start with most critical datasources first
☐ Create indexes using the Index Data tool in each toolkit
☐ Test search functionality with sample queries
☐ Update agents and pipelines to use new toolkits
☐ Verify search results match expected content

Post-Migration¶

☐ Remove references to old datasources in agents/pipelines
☐ Train team members on new indexing tools
☐ Establish process for maintaining and updating indexes
☐ Consider creating multiple collections for better organization

Getting Help¶

If you encounter issues during migration:

Check Prerequisites: Ensure vector storage and embedding models are properly configured
Review Toolkit Configuration: Verify credentials and connection settings
Test with Small Scope: Start with a small subset of data to validate the process
Consult Documentation: Each indexing guide provides troubleshooting for specific platforms