Index SharePoint Data¶
Availability
Indexing tools are available in the Next environment (Release 1.7.0) and replace legacy Datasources/Datasets. For context, see Release Notes 1.7.0 and the Indexing Overview.
Migration from Datasources
In previous releases, there was a Source type: File option in Datasources. Now, Datasources have been removed from ELITEA, and users can perform the same action through SharePoint Toolkit indexing.
How to migrate from Datasources to SharePoint Toolkit:
- Add files to SharePoint: Upload your documents to SharePoint document libraries or sites (if not already there)
- Set up credentials: Configure your SharePoint credentials in Settings → Credentials
- Create SharePoint Toolkit: Go to Toolkits → + Create → SharePoint and configure with your site details
- Index your data: Use the "Index Data" tool from the SharePoint Toolkit to create searchable indexes
- Search and chat: Use the toolkit in conversations or agents to query your indexed SharePoint content
This provides the same SharePoint indexing capabilities as the previous Datasources system with improved performance and integration.
This guide provides a complete step-by-step walkthrough for indexing SharePoint data and then searching or chatting with the indexed content using ELITEA's AI-powered tools.
Overview¶
SharePoint indexing allows you to create searchable indexes from your SharePoint document management and collaboration content:
- Documents & Files: Word documents, PDFs, Excel spreadsheets, PowerPoint presentations, and other file types
- Document Libraries: Organized collections of documents with metadata and version control
- Lists & Custom Data: Task lists, contact lists, custom data tables, and project tracking information
- Site Collections: Multiple SharePoint sites organized under a shared management structure
- Metadata & Properties: Document properties, custom fields, and content categorization
What you can do with indexed SharePoint data:
- Semantic Search: Find documents and content across SharePoint sites using natural language queries
- Context-Aware Chat: Get AI-generated answers from your document content with citations to specific files
- Cross-Site Discovery: Search across multiple SharePoint sites and document libraries
- Document Analysis: Extract insights and summaries from business documents and reports
- Content Organization: Analyze document types, usage patterns, and content relationships
Common use cases:
- Finding specific documents, policies, or procedures across your organization's SharePoint sites
- Onboarding new employees by allowing them to ask questions about company documents and processes
- Analyzing project documents and extracting key information for reporting and decision-making
- Support teams searching for solutions and documentation from indexed knowledge bases
- Compliance and audit teams finding relevant documents based on content and metadata
Prerequisites¶
Before indexing SharePoint data, ensure you have:
- SharePoint Credential: An Azure AD app registration with authentication credentials configured in ELITEA
- Vector Storage: PgVector selected in Settings → AI Configuration
- Embedding Model: Selected in AI Configuration (defaults available) → AI Configuration
- SharePoint Toolkit: Configured with your SharePoint site details and credentials
Required Permissions¶
Your SharePoint credential needs appropriate permissions based on what you want to index:
For Content Access:
- Read access to SharePoint sites and document libraries
- Permission to view the specific sites and libraries you want to index
For Comprehensive Indexing:
- Access to view document content and metadata
- Permission to access both public and restricted content (based on your requirements)
- Ability to read from multiple document libraries and lists
Authentication Methods:
- Azure AD App Registration: Client ID and Client Secret for application-only access
- Delegated Permissions: For user-context access (alternative approach)
Step-by-Step: Creating a SharePoint Credential¶
- Register App in Azure AD: Create an Azure AD app registration with appropriate SharePoint permissions
- Generate Client Secret: Create a client secret for secure authentication
- Grant Site Permissions: Use SharePoint's AppInv.aspx to grant site-level permissions
- Create Credential in ELITEA: Navigate to Credentials → + Create → SharePoint → enter details and save
Detailed Instructions
For complete credential setup steps including Azure AD app registration, permissions, and security best practices, see:
Step-by-Step: Configure SharePoint Toolkit¶
- Create Toolkit: Navigate to Toolkits → + Create → SharePoint
- Configure Settings: Set SharePoint site URL and assign your SharePoint credential
- Enable Tools: Select
Index Data,List Collections,Search Index,Stepback Search Index,Stepback Summary Index, andRemove Indextools - Save Configuration
Tool Overview:¶
- Index Data: Creates searchable indexes from SharePoint documents and content
- List Collections: Lists all available collections/indexes to verify what's been indexed
- Search Index: Performs semantic search across indexed content using natural language queries
- Stepback Search Index: Advanced search that breaks down complex questions into simpler parts for better results
- Stepback Summary Index: Generates summaries and insights from search results across indexed content
- Remove Index: Deletes existing collections/indexes when you need to clean up or start fresh
Detailed Instructions
For complete toolkit configuration including site URL setup and authentication options, see:
Step-by-Step: Index SharePoint Data¶
Step 1: Open the Indexes Tab
- Navigate to Toolkits and select your SharePoint toolkit.
- Click the Indexes tab in the toolkit detail view.
-
In the left sidebar, click + Create New Index — the creation form opens in the center panel.
Indexes tab disabled?
Verify PgVector and Embedding Model are configured in Settings → AI Configuration and that the Index Data tool is enabled in your toolkit configuration.
Step 2: Configure Index Parameters
-
Fill in the required and optional parameters for your SharePoint indexing:
Parameter Required Description Example Value Index Name ✓ Suffix for collection name (min 1, max 7 chars) docsorfilesClean Index ✗ Remove existing index data before re-indexing (default: false)✓ (checked) or ✗ (unchecked) Progress Step (0 - 100) ✗ Step size for progress reporting during indexing (default: 10)10or25Chunking Config ✗ Configuration for document chunking Default or custom settings Limit Files ✗ Maximum number of files to index from the document library (default: 1000)500Include Extensions ✗ File extensions to include. Applies to both document library files and OneNote attachments. Leave empty to include all (except skipped). Accepts pdfor.pdfform.["*.docx", "*.pdf", "*.xlsx"]Skip Extensions ✗ File extensions to skip. Applies to both document library files and OneNote attachments. Accepts exeor.exeform.["*.exe", "*.zip", "*.png"]Path ✗ Scope indexing to a specific folder. Accepts a full server-relative path ( /sites/SiteName/...) or a relative path (searched recursively under Shared Documents and private libraries). Leave empty to index the whole site./sites/MySite/Shared Documents/ReportsInclude Onenote ✗ When true, also indexes OneNote pages from the SharePoint site in addition to document library files. Requires Delegated (User OAuth) authentication — will not work with App-Only credentials. (default:false)trueOnenote Filter ✗ Optional dictionary to control which OneNote notebooks, sections, and pages are indexed, and how they are processed (e.g., whether to capture images or include file attachments). Leave empty to index all OneNote content with defaults. Only used when include_onenoteistrue.See note below
onenote_filter structure
The onenote_filter parameter accepts a JSON object with the following optional keys:
notebooks— list of notebook scope filters. Omit to index all notebooks. Each entry:{"id": "<notebook-id>", "sections": [{"id": "<section-id>", "pages": ["<page-id>", ...]}]}. Omitsectionsto include all sections; omitpagesto include all pages in a section.capture_images— boolean, defaulttrue. Whentrueand an LLM is configured, embedded images are described.include_attachments— boolean, defaultfalse. Whentrue, file attachments on OneNote pages are also indexed.
Extension filtering for attachments reuses the top-level include_extensions and skip_extensions parameters.
Step 3: Start Indexing
- Click Index to start the process.
-
Monitor real-time status via the indicator on the index list item and detail header:
Stopping an indexing run
If indexing is in progress and the backend task ID is available, a Stop button appears in place of the Index button. Click Stop to cancel the current run. The index retains any documents successfully indexed before the stop.
¶
Managing Existing Indexes¶
Once an index exists, selecting it in the sidebar reveals a three-tab detail view:
| Tab | Description |
|---|---|
| Run | Run search tools against the index: Search Index, Stepback Search Index, or Stepback Summary Index. Select the tool from the dropdown, enter a query, and click Run. Only tools enabled in the toolkit configuration appear here. |
| Configuration | View the parameters used when the index was created. This tab is read-only. |
| History | Audit log of all indexing events (Created, Reindexed, Stopped, Failed, Partially Indexed) with timestamps. Selecting a history entry lets you view the chat results from that specific run. |
Reindexing¶
-
To refresh an existing index with updated SharePoint content, click the Reindex button (visible in the Run tab header of an existing index). Reindexing uses the same configuration as the original index. The history entry for the new run is added to the History tab.
Scheduling Automatic Reindexing¶
The Schedule toggle (visible when an existing index is selected) lets you enable cron-based automatic reindexing:
- Click the gear icon next to the Schedule toggle to open the Schedule Settings dialog.
- Choose "Default" for a picker UI or "Advanced" to enter a cron expression directly.
- Optionally select specific credentials to use for scheduled runs.
- Click Apply to save the schedule, then enable the Schedule toggle.
The default cron value is 0 0 * * 6 (every Saturday at midnight). Scheduling requires appropriate project permissions.
Detailed Instructions
For a complete walkthrough of the scheduling feature, see Schedule Indexing.
Deleting an Index¶
-
Click the Delete button on an existing index to permanently remove all indexed data for that collection. The Remove index tool must be enabled in the toolkit for this button to be active.
Search and Chat with Indexed Data¶
Once your SharePoint data is indexed, you can use it in multiple ways:
Using the Indexes Interface¶
Direct Search via Indexes Tab:
- Access Indexes Tab: Navigate to your SharePoint toolkit → Indexes tab
- Select Index: Click on your created index from the sidebar
- Open Run Tab: Click the Run tab in the center panel
- Choose Search Tool: Select from available search tools:
- Search Index: Basic semantic search
- Stepback Search Index: Advanced search with question breakdown
- Stepback Summary Index: Summarized insights from search results
- Enter Query: Type your natural language question
-
View Results: See responses with citations to specific SharePoint documents
Using Toolkit in Conversations and Agents¶
Your SharePoint toolkit can be used in two main contexts:
- In Conversations: Add the toolkit as a participant to ask questions and search your indexed SharePoint data
- In Agents: Include the toolkit when creating AI agents to give them access to your document data
How to use:
- Start a New Conversation or Create an Agent
- Add Toolkit as Participant: Select your SharePoint toolkit from the available toolkits
-
Ask Natural Language Questions: The toolkit will automatically search your indexed data and provide relevant answers with citations
Real-Life Example Workflow¶
The following examples show a typical end-to-end flow: indexing documents, verifying the index, and querying the content — all through a conversation with the SharePoint toolkit added.
Index the Document Library
User:
"Index all documents from our SharePoint site with suffix 'docs'. Include Word and PDF files but skip image files. Clean any existing index first."
Verify Index Creation
User:
"List my collections"
This confirms the indexing completed successfully and the collection is available for searching.
Search Indexed Content
User: "What are the key points from our Q3 financial report?"
SharePoint Toolkit: "Based on your indexed documents, here are the key points from the Q3 financial report:
- Revenue Growth: 15% increase compared to Q2, reaching $2.3M total revenue
- Cost Management: Operating expenses reduced by 8% through efficiency initiatives
- Market Expansion: Successful launch in two new regional markets contributing 12% of revenue
Source: Q3_Financial_Report.pdf (Created: 2024-10-15, Modified: 2024-10-20)"
-
User: "Find me the latest project management templates"
-
SharePoint Toolkit: "Here are the latest project management templates from your indexed documents:
Templates Found:
- Project Charter Template v3.2: Comprehensive project initiation document with stakeholder analysis
- Sprint Planning Template: Agile project management with backlog and capacity planning
- Risk Management Matrix: Template for identifying and tracking project risks
Template Features:
- Pre-formatted sections for scope, objectives, and deliverables
- Built-in formulas for timeline and resource calculations
- Integration with company branding and approval workflows
Sources: Project_Charter_Template_v3.2.docx, Sprint_Planning_Template.xlsx, Risk_Management_Matrix.xlsx"
Troubleshooting¶
Common Errors and Solutions
Indexes Tab Not Visible or Tab Disabled
- Verify PgVector and Embedding Model are configured in Settings → AI Configuration
- Ensure the Index Data tool is enabled in your SharePoint toolkit configuration
- Check that your toolkit supports indexing (SharePoint is supported)
- Refresh the browser page and retry
+ Create New Index Button Not Working
- Verify all project-level prerequisites are met (PgVector and Embedding Model)
- Check that you have proper permissions for the toolkit
- Ensure the toolkit is properly saved with credentials
Authentication Failed or Unauthorized Access
The toolkit supports two authentication modes — ensure you are using the correct one:
App-Only (Client Credentials): client_id + client_secret
- Verify the Client ID and Client Secret are correct and not expired
- Ensure your Azure AD app registration has
Sites.Read.All(orSites.FullControl.All) application permissions and admin consent was granted - Grant the app access to the specific site collection via
/_layouts/15/AppInv.aspx - Verify the SharePoint site URL includes
https://and the full site path
Delegated (User OAuth): token + scopes
- Ensure the OAuth token is valid and has not expired — click Log in again to refresh it
- Verify all required scopes are listed in the credential (e.g.,
https://graph.microsoft.com/Sites.ReadWrite.All) - Ensure your Azure AD app has Delegated permissions (not Application) for the required Graph API scopes
Note
Providing only a token without scopes routes to the REST backend (legacy). All Graph API and OneNote features require token + scopes.
Site Not Found or Access Denied to Site
- Verify the SharePoint site URL is correct and accessible
- Ensure your Azure AD app has been granted permissions to the specific site collection using AppInv.aspx
- Check that the site collection exists and is not archived or deleted
- Confirm your app registration has the necessary SharePoint API permissions
No Files Indexed or Empty Document Library
- Check that the document library contains accessible files
- Verify file extensions are not being filtered out by the Skip Extensions parameter
- Ensure your app has read permissions to the document library
- Try indexing without extension filters first, then add restrictions
- Check the Limit Files parameter — a value of
0disables file collection entirely; use the default (1000) or a positive number
Vector Database Connection Failed or PgVector Errors
- Ensure PgVector is properly configured in Settings → AI Configuration
- Verify the vector database is running and accessible
- Check connection credentials and database permissions
- Restart the vector database service if connection issues persist
Index Name Validation Error (Exceeds 7 Characters)
Cause: The index_name field has a hard limit of 7 characters enforced by the SDK. Providing a longer value will cause a Pydantic validation error before indexing starts.
- Keep the Index Name to 7 characters or fewer (e.g.,
docs,sp2024,hr) - Leave the field blank to use the default collection name without a suffix
- Use short abbreviations for descriptive names (e.g.,
fininstead offinance)
File Processing Errors or Document Parsing Failures
Individual file-level parse errors are non-fatal — the SDK logs them as warnings and continues indexing remaining files. The final result will report a partial count if some files failed.
- Large files may cause timeouts; consider using file size limits or Skip Extensions
- Binary files (executables, archives) should be excluded via Skip Extensions
- Check available storage space for the vector database
- Verify document formats are supported (Word, PDF, Excel, PowerPoint, text files)
- Review application logs for specific file paths that failed to parse
OneNote Indexing Fails with 401 Unauthorized (include_onenote: true enabled)
Cause: The Delegated OAuth token does not include the Notes.ReadWrite.All permission — either it was never added, admin consent was not granted, or the token was issued before the scope was added.
- In the Azure Portal, go to Microsoft Entra ID → App registrations → [your app] → API permissions. Click "+ Add a permission" → Microsoft Graph → Delegated permissions, search for
Notes.ReadWrite.All, and add it. - Click "Grant admin consent for [Your Organization]" on the API permissions page. The
Notes.ReadWrite.Allentry must show a green checkmark. - Update the Scopes field in your ELITEA SharePoint credential using full URL format:
- Click Log in again in your ELITEA credential to obtain a fresh token that includes the Notes scope.
Note
include_onenote requires Delegated (User OAuth) authentication — it will not work with App-Only (Client Credentials).
Performance and Scope Considerations
Large SharePoint Sites
- Use file type filters:
Include Extensions: ["*.docx", "*.pdf", "*.xlsx"] - Set reasonable file limits: start with 500-1000 files for testing
- Consider indexing by document library: create separate indexes for different libraries
- Index by content type: separate indexes for documents vs. lists vs. archived content
Search Result Quality
Search Returns Few or No Results
- Lower the cut-off score from 0.5 to 0.35 or 0.3
- Increase search_top from 10 to 20 or 30
- Try rephrasing your query with document-specific terms (file names, content types)
- Verify the indexed content contains relevant information for your query
Improving Search Quality
- Include multiple document types for comprehensive coverage
- Use natural language queries rather than exact file names
- Leverage stepback search for complex business questions that require reasoning
- Create separate indexes for different content types (current vs archived, public vs restricted)
Content-Specific Indexing Tips
Business Documents
- Focus on current documents: exclude outdated templates and archived files
- Include metadata-rich content: documents with proper titles, descriptions, and tags
- Index both working documents and finalized reports for complete coverage
Project Management
- Include project templates, status reports, and planning documents
- Index across multiple project sites for portfolio-level insights
- Consider including both active and completed projects for lessons learned
Knowledge Management
- Include policy documents, procedures, and training materials
- Index FAQ documents and troubleshooting guides for support scenarios
- Focus on documents with high business value and frequent access patterns
Related Documentation
For additional information and detailed setup instructions, see:
- Indexing Overview - General indexing concepts and features
- Create a Credential - Step-by-step credential creation guide
- How to Use Credentials - Credential management and SharePoint setup
- Toolkits Menu - Toolkit configuration and management
- SharePoint Toolkit Integration Guide - Complete SharePoint toolkit reference
- AI Configuration - Vector storage and embedding model setup
- Chat Menu - Creating conversations and adding toolkit participants










