Add source code indexing and search with comprehensive documentation
Features: - Implemented SourceCodeIndexer class for indexing TypeScript/JavaScript source files - Chunks large files into 200-line segments with 20-line overlap - Extracts imports, exports, and metadata - Generates semantic embeddings using Xenova/all-MiniLM-L6-v2 - Creates GitHub URLs with line numbers for easy navigation - Enhanced LanceDBSearch with source code search capabilities - Added searchSourceCode() method for semantic source code search - Added getSourceFile() method for retrieving specific files or line ranges - Supports package filtering and configurable table names - Fixed score calculation to ensure values between 0-100% - Added two new MCP tools - search_babylon_source: Search Babylon.js source code with semantic search - get_babylon_source: Retrieve full source files or specific line ranges - Both tools include comprehensive error handling and JSON responses - Created indexing and testing scripts - scripts/index-source.ts: Production script for indexing all packages - scripts/test-source-indexing.ts: Test script for core package only - scripts/test-source-search.ts: Test script for search functionality - Updated package.json with comprehensive indexing commands - npm run index:docs - Index documentation only - npm run index:api - Index API documentation only - npm run index:source - Index source code only - npm run index:all - Master script to index everything - Created comprehensive README.md - Complete setup and installation instructions - Claude Desktop integration guide with configuration examples - Documentation of all 5 MCP tools with parameters and examples - Project structure, development commands, and troubleshooting guide - Architecture overview and disk space requirements Testing: - All 118 tests passing - TypeScript compilation successful - Source code search verified with real queries - Successfully indexed 1,561 files into 5,650 searchable chunks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
5459fe9179
commit
779fa53363
322
README.md
Normal file
322
README.md
Normal file
@ -0,0 +1,322 @@
|
|||||||
|
# Babylon MCP Server
|
||||||
|
|
||||||
|
A Model Context Protocol (MCP) server that provides AI agents with access to Babylon.js documentation, API references, and source code through semantic search.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The Babylon MCP server enables AI assistants to:
|
||||||
|
- Search and retrieve Babylon.js documentation
|
||||||
|
- Query API documentation for classes, methods, and properties
|
||||||
|
- Search through Babylon.js source code
|
||||||
|
- Retrieve specific source code files or line ranges
|
||||||
|
|
||||||
|
This provides a canonical source for Babylon.js framework information, reducing token usage and improving accuracy when working with AI agents.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Documentation Search**: Semantic search across Babylon.js documentation
|
||||||
|
- **API Documentation**: Search TypeScript API documentation with full TSDoc details
|
||||||
|
- **Source Code Search**: Vector-based semantic search through Babylon.js source code
|
||||||
|
- **Source Code Retrieval**: Fetch specific files or line ranges from the repository
|
||||||
|
- **Local Repository Management**: Automatically clones and updates Babylon.js repositories
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
- Node.js 18 or higher
|
||||||
|
- npm or yarn
|
||||||
|
- ~2GB disk space for repositories and vector database
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
1. Clone this repository:
|
||||||
|
```bash
|
||||||
|
git clone <repository-url>
|
||||||
|
cd babylon-mcp
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Install dependencies:
|
||||||
|
```bash
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Build the project:
|
||||||
|
```bash
|
||||||
|
npm run build
|
||||||
|
```
|
||||||
|
|
||||||
|
## Initial Setup
|
||||||
|
|
||||||
|
Before using the MCP server, you need to index the Babylon.js repositories. This is a one-time setup process.
|
||||||
|
|
||||||
|
### Index All Data (Recommended)
|
||||||
|
|
||||||
|
Run the master indexing script to index documentation, API, and source code:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run index:all
|
||||||
|
```
|
||||||
|
|
||||||
|
This will:
|
||||||
|
1. Clone the required repositories (Documentation, Babylon.js, havok)
|
||||||
|
2. Index all documentation files (~5-10 minutes)
|
||||||
|
3. Index API documentation from TypeScript source (~10-15 minutes)
|
||||||
|
4. Index source code from core packages (~15-20 minutes)
|
||||||
|
|
||||||
|
Total indexing time: **30-45 minutes** depending on your system.
|
||||||
|
|
||||||
|
### Index Individual Components
|
||||||
|
|
||||||
|
You can also index components separately:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Index documentation only
|
||||||
|
npm run index:docs
|
||||||
|
|
||||||
|
# Index API documentation only
|
||||||
|
npm run index:api
|
||||||
|
|
||||||
|
# Index source code only
|
||||||
|
npm run index:source
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running the Server
|
||||||
|
|
||||||
|
### Development Mode
|
||||||
|
|
||||||
|
Run the server with hot reload:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
### Production Mode
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm start
|
||||||
|
```
|
||||||
|
|
||||||
|
The server runs on **port 4000** by default.
|
||||||
|
|
||||||
|
## Integration with Claude Desktop
|
||||||
|
|
||||||
|
To use this MCP server with Claude Desktop, add it to your Claude configuration file.
|
||||||
|
|
||||||
|
### Configuration File Location
|
||||||
|
|
||||||
|
**macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json`
|
||||||
|
**Windows**: `%APPDATA%\Claude\claude_desktop_config.json`
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Add the following to your `claude_desktop_config.json`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"mcpServers": {
|
||||||
|
"babylon-mcp": {
|
||||||
|
"command": "node",
|
||||||
|
"args": [
|
||||||
|
"/absolute/path/to/babylon-mcp/dist/mcp/index.js"
|
||||||
|
],
|
||||||
|
"env": {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Replace `/absolute/path/to/babylon-mcp` with the actual path to your babylon-mcp directory.
|
||||||
|
|
||||||
|
### Restart Claude Desktop
|
||||||
|
|
||||||
|
After updating the configuration, restart Claude Desktop for the changes to take effect.
|
||||||
|
|
||||||
|
## Available MCP Tools
|
||||||
|
|
||||||
|
Once configured, Claude will have access to these tools:
|
||||||
|
|
||||||
|
### 1. search_babylon_docs
|
||||||
|
Search Babylon.js documentation with semantic search.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `query` (string, required): Search query
|
||||||
|
- `category` (string, optional): Filter by category (e.g., "api", "tutorial")
|
||||||
|
- `limit` (number, optional): Maximum results (default: 5)
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
Search for "how to create a mesh" in Babylon.js documentation
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. get_babylon_doc
|
||||||
|
Retrieve full content of a specific documentation page.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `path` (string, required): Documentation file path or identifier
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
Get the full documentation for "features/featuresDeepDive/mesh/creation"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. search_babylon_api
|
||||||
|
Search Babylon.js API documentation (classes, methods, properties).
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `query` (string, required): API search query (e.g., "getMeshByName", "Scene")
|
||||||
|
- `limit` (number, optional): Maximum results (default: 5)
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
Search the API for "getMeshByName"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. search_babylon_source
|
||||||
|
Search Babylon.js source code using semantic search.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `query` (string, required): Search query for source code
|
||||||
|
- `package` (string, optional): Filter by package (e.g., "core", "gui")
|
||||||
|
- `limit` (number, optional): Maximum results (default: 5)
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
Search the source code for "mesh rendering implementation"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. get_babylon_source
|
||||||
|
Retrieve full source code file or specific line range.
|
||||||
|
|
||||||
|
**Parameters:**
|
||||||
|
- `filePath` (string, required): Relative path from repository root
|
||||||
|
- `startLine` (number, optional): Start line number (1-indexed)
|
||||||
|
- `endLine` (number, optional): End line number (1-indexed)
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```
|
||||||
|
Get the source code from "packages/dev/core/src/scene.ts" lines 4100-4110
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
babylon-mcp/
|
||||||
|
├── src/
|
||||||
|
│ ├── mcp/ # MCP server implementation
|
||||||
|
│ │ ├── index.ts # Server entry point
|
||||||
|
│ │ ├── server.ts # BabylonMCPServer class
|
||||||
|
│ │ ├── handlers.ts # MCP tool handlers
|
||||||
|
│ │ └── ...
|
||||||
|
│ └── search/ # Search and indexing
|
||||||
|
│ ├── lancedb-search.ts # Search implementation
|
||||||
|
│ ├── lancedb-indexer.ts # Documentation indexer
|
||||||
|
│ ├── api-indexer.ts # API indexer
|
||||||
|
│ ├── source-code-indexer.ts # Source code indexer
|
||||||
|
│ └── ...
|
||||||
|
├── scripts/ # Indexing scripts
|
||||||
|
│ ├── index-docs.ts # Index documentation
|
||||||
|
│ ├── index-api.ts # Index API docs
|
||||||
|
│ └── index-source.ts # Index source code
|
||||||
|
├── data/ # Data directory (created during indexing)
|
||||||
|
│ ├── repositories/ # Cloned repositories
|
||||||
|
│ └── lancedb/ # Vector database
|
||||||
|
└── dist/ # Compiled output
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
### Running Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run tests in watch mode
|
||||||
|
npm test
|
||||||
|
|
||||||
|
# Run tests once
|
||||||
|
npm run test:run
|
||||||
|
|
||||||
|
# Run tests with UI
|
||||||
|
npm run test:ui
|
||||||
|
|
||||||
|
# Run tests with coverage
|
||||||
|
npm run test:coverage
|
||||||
|
```
|
||||||
|
|
||||||
|
### Type Checking
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run typecheck
|
||||||
|
```
|
||||||
|
|
||||||
|
### Building
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run build
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data Storage
|
||||||
|
|
||||||
|
The server stores data in the `./data` directory:
|
||||||
|
|
||||||
|
- **`./data/repositories/`**: Cloned Git repositories (Documentation, Babylon.js, havok)
|
||||||
|
- **`./data/lancedb/`**: Vector database containing indexed content
|
||||||
|
|
||||||
|
This directory will be approximately **1.5-2GB** after full indexing.
|
||||||
|
|
||||||
|
## Updating Data
|
||||||
|
|
||||||
|
To update the indexed data with the latest Babylon.js content:
|
||||||
|
|
||||||
|
1. The repositories are automatically updated during indexing
|
||||||
|
2. Re-run the indexing scripts:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run index:all
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Server won't start
|
||||||
|
- Ensure port 4000 is available
|
||||||
|
- Check that the project has been built: `npm run build`
|
||||||
|
- Verify Node.js version is 18 or higher
|
||||||
|
|
||||||
|
### Indexing fails
|
||||||
|
- Ensure you have internet connectivity (for cloning repositories)
|
||||||
|
- Check disk space (~2GB required)
|
||||||
|
- Try indexing components individually to isolate the issue
|
||||||
|
|
||||||
|
### Claude Desktop doesn't see the tools
|
||||||
|
- Verify the path in `claude_desktop_config.json` is absolute
|
||||||
|
- Restart Claude Desktop after configuration changes
|
||||||
|
- Check that the server builds without errors: `npm run build`
|
||||||
|
|
||||||
|
### Search returns no results
|
||||||
|
- Ensure indexing has completed successfully
|
||||||
|
- Check that the `./data/lancedb` directory exists and contains data
|
||||||
|
- Try re-indexing: `npm run index:all`
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
The server uses:
|
||||||
|
- **LanceDB**: Vector database for semantic search
|
||||||
|
- **Xenova/all-MiniLM-L6-v2**: Transformer model for embeddings
|
||||||
|
- **TypeDoc**: For extracting TypeScript API documentation
|
||||||
|
- **Express.js**: Web server framework
|
||||||
|
- **MCP SDK**: Model Context Protocol implementation
|
||||||
|
|
||||||
|
## Contributing
|
||||||
|
|
||||||
|
Contributions are welcome! Please ensure:
|
||||||
|
- All tests pass: `npm test`
|
||||||
|
- Type checking passes: `npm run typecheck`
|
||||||
|
- Code follows the project style
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
ISC
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [Babylon.js Documentation](https://doc.babylonjs.com/)
|
||||||
|
- [Babylon.js Repository](https://github.com/BabylonJS/Babylon.js)
|
||||||
|
- [Model Context Protocol](https://modelcontextprotocol.io/)
|
||||||
|
- [Claude Desktop](https://claude.ai/download)
|
||||||
@ -13,8 +13,13 @@
|
|||||||
"test:ui": "vitest --ui",
|
"test:ui": "vitest --ui",
|
||||||
"test:run": "vitest run",
|
"test:run": "vitest run",
|
||||||
"test:coverage": "vitest run --coverage",
|
"test:coverage": "vitest run --coverage",
|
||||||
"index-docs": "tsx scripts/index-docs.ts",
|
"index:docs": "tsx scripts/index-docs.ts",
|
||||||
"index-api": "tsx scripts/index-api.ts"
|
"index:api": "tsx scripts/index-api.ts",
|
||||||
|
"index:source": "tsx scripts/index-source.ts",
|
||||||
|
"index:all": "npm run index:docs && npm run index:api && npm run index:source",
|
||||||
|
"index-docs": "npm run index:docs",
|
||||||
|
"index-api": "npm run index:api",
|
||||||
|
"index-source": "npm run index:source"
|
||||||
},
|
},
|
||||||
"keywords": [],
|
"keywords": [],
|
||||||
"author": "",
|
"author": "",
|
||||||
|
|||||||
39
scripts/index-source.ts
Normal file
39
scripts/index-source.ts
Normal file
@ -0,0 +1,39 @@
|
|||||||
|
import { SourceCodeIndexer } from '../src/search/source-code-indexer.js';
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
// Define packages to index
|
||||||
|
const packages = [
|
||||||
|
'core',
|
||||||
|
'gui',
|
||||||
|
'materials',
|
||||||
|
'loaders',
|
||||||
|
'serializers',
|
||||||
|
];
|
||||||
|
|
||||||
|
console.log('Starting source code indexing for Babylon.js packages...');
|
||||||
|
console.log(`Indexing ${packages.length} packages:`, packages.join(', '));
|
||||||
|
console.log();
|
||||||
|
|
||||||
|
const indexer = new SourceCodeIndexer(
|
||||||
|
'./data/lancedb',
|
||||||
|
'babylon_source_code',
|
||||||
|
'./data/repositories/Babylon.js',
|
||||||
|
200, // chunk size (lines)
|
||||||
|
20 // chunk overlap (lines)
|
||||||
|
);
|
||||||
|
|
||||||
|
try {
|
||||||
|
await indexer.initialize();
|
||||||
|
await indexer.indexSourceCode(packages);
|
||||||
|
await indexer.close();
|
||||||
|
console.log('\n✓ Source code indexing completed successfully!');
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error during source code indexing:', error);
|
||||||
|
if (error instanceof Error) {
|
||||||
|
console.error('Stack trace:', error.stack);
|
||||||
|
}
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch(console.error);
|
||||||
32
scripts/test-source-indexing.ts
Normal file
32
scripts/test-source-indexing.ts
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
import { SourceCodeIndexer } from '../src/search/source-code-indexer.js';
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
// Start with just core package for testing
|
||||||
|
const packages = ['core'];
|
||||||
|
|
||||||
|
console.log('Testing source code indexing with core package...');
|
||||||
|
console.log();
|
||||||
|
|
||||||
|
const indexer = new SourceCodeIndexer(
|
||||||
|
'./data/lancedb',
|
||||||
|
'babylon_source_test',
|
||||||
|
'./data/repositories/Babylon.js',
|
||||||
|
100, // smaller chunk size for testing
|
||||||
|
10 // smaller overlap for testing
|
||||||
|
);
|
||||||
|
|
||||||
|
try {
|
||||||
|
await indexer.initialize();
|
||||||
|
await indexer.indexSourceCode(packages);
|
||||||
|
await indexer.close();
|
||||||
|
console.log('\n✓ Test source code indexing completed successfully!');
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error during test indexing:', error);
|
||||||
|
if (error instanceof Error) {
|
||||||
|
console.error('Stack trace:', error.stack);
|
||||||
|
}
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch(console.error);
|
||||||
71
scripts/test-source-search.ts
Normal file
71
scripts/test-source-search.ts
Normal file
@ -0,0 +1,71 @@
|
|||||||
|
import { LanceDBSearch } from '../src/search/lancedb-search.js';
|
||||||
|
|
||||||
|
async function main() {
|
||||||
|
console.log('Testing source code search...\n');
|
||||||
|
|
||||||
|
// Note: We use babylon_docs as the main table, but specify babylon_source_test for source code search
|
||||||
|
const search = new LanceDBSearch('./data/lancedb', 'babylon_docs');
|
||||||
|
await search.initialize();
|
||||||
|
|
||||||
|
try {
|
||||||
|
// Test 1: Search for getMeshByName implementation
|
||||||
|
console.log('='.repeat(80));
|
||||||
|
console.log('Test 1: Searching for "getMeshByName implementation"');
|
||||||
|
console.log('='.repeat(80));
|
||||||
|
const results1 = await search.searchSourceCode('getMeshByName implementation', {
|
||||||
|
limit: 3,
|
||||||
|
tableName: 'babylon_source_test'
|
||||||
|
});
|
||||||
|
console.log(`Found ${results1.length} results:\n`);
|
||||||
|
|
||||||
|
for (const result of results1) {
|
||||||
|
console.log(`File: ${result.filePath}`);
|
||||||
|
console.log(`Lines: ${result.startLine}-${result.endLine}`);
|
||||||
|
console.log(`Score: ${(result.score * 100).toFixed(1)}%`);
|
||||||
|
console.log(`Preview: ${result.content.substring(0, 200)}...`);
|
||||||
|
console.log(`URL: ${result.url}`);
|
||||||
|
console.log('-'.repeat(80));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test 2: Get specific source file
|
||||||
|
console.log('\n');
|
||||||
|
console.log('='.repeat(80));
|
||||||
|
console.log('Test 2: Getting source file scene.ts lines 4100-4110');
|
||||||
|
console.log('='.repeat(80));
|
||||||
|
const sourceCode = await search.getSourceFile('packages/dev/core/src/scene.ts', 4100, 4110);
|
||||||
|
if (sourceCode) {
|
||||||
|
console.log(sourceCode);
|
||||||
|
} else {
|
||||||
|
console.log('File not found');
|
||||||
|
}
|
||||||
|
|
||||||
|
// Test 3: Search for mesh management
|
||||||
|
console.log('\n');
|
||||||
|
console.log('='.repeat(80));
|
||||||
|
console.log('Test 3: Searching for "mesh management scene"');
|
||||||
|
console.log('='.repeat(80));
|
||||||
|
const results3 = await search.searchSourceCode('mesh management scene', {
|
||||||
|
limit: 2,
|
||||||
|
tableName: 'babylon_source_test'
|
||||||
|
});
|
||||||
|
console.log(`Found ${results3.length} results:\n`);
|
||||||
|
|
||||||
|
for (const result of results3) {
|
||||||
|
console.log(`File: ${result.filePath}`);
|
||||||
|
console.log(`Lines: ${result.startLine}-${result.endLine}`);
|
||||||
|
console.log(`Exports: ${result.exports}`);
|
||||||
|
console.log(`Score: ${(result.score * 100).toFixed(1)}%`);
|
||||||
|
console.log('-'.repeat(80));
|
||||||
|
}
|
||||||
|
|
||||||
|
} catch (error) {
|
||||||
|
console.error('Error during search:', error);
|
||||||
|
if (error instanceof Error) {
|
||||||
|
console.error('Stack:', error.stack);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
await search.close();
|
||||||
|
}
|
||||||
|
|
||||||
|
main().catch(console.error);
|
||||||
@ -17,7 +17,7 @@ describe('MCP Handlers', () => {
|
|||||||
it('should register all required tools', () => {
|
it('should register all required tools', () => {
|
||||||
setupHandlers(mockServer);
|
setupHandlers(mockServer);
|
||||||
|
|
||||||
expect(registerToolSpy).toHaveBeenCalledTimes(3);
|
expect(registerToolSpy).toHaveBeenCalledTimes(5);
|
||||||
});
|
});
|
||||||
|
|
||||||
it('should register search_babylon_docs tool', () => {
|
it('should register search_babylon_docs tool', () => {
|
||||||
|
|||||||
@ -16,6 +16,8 @@ export function setupHandlers(server: McpServer): void {
|
|||||||
registerSearchDocsTool(server);
|
registerSearchDocsTool(server);
|
||||||
registerGetDocTool(server);
|
registerGetDocTool(server);
|
||||||
registerSearchApiTool(server);
|
registerSearchApiTool(server);
|
||||||
|
registerSearchSourceTool(server);
|
||||||
|
registerGetSourceTool(server);
|
||||||
}
|
}
|
||||||
|
|
||||||
function registerSearchDocsTool(server: McpServer): void {
|
function registerSearchDocsTool(server: McpServer): void {
|
||||||
@ -247,3 +249,149 @@ function registerSearchApiTool(server: McpServer): void {
|
|||||||
}
|
}
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function registerSearchSourceTool(server: McpServer): void {
|
||||||
|
server.registerTool(
|
||||||
|
'search_babylon_source',
|
||||||
|
{
|
||||||
|
description: 'Search Babylon.js source code files',
|
||||||
|
inputSchema: {
|
||||||
|
query: z.string().describe('Search query for source code (e.g., "getMeshByName implementation", "scene rendering")'),
|
||||||
|
package: z
|
||||||
|
.string()
|
||||||
|
.optional()
|
||||||
|
.describe('Optional package filter (e.g., "core", "gui", "materials")'),
|
||||||
|
limit: z
|
||||||
|
.number()
|
||||||
|
.optional()
|
||||||
|
.default(5)
|
||||||
|
.describe('Maximum number of results to return (default: 5)'),
|
||||||
|
},
|
||||||
|
},
|
||||||
|
async ({ query, package: packageFilter, limit = 5 }) => {
|
||||||
|
try {
|
||||||
|
const search = await getSearchInstance();
|
||||||
|
const options = packageFilter ? { package: packageFilter, limit } : { limit };
|
||||||
|
const results = await search.searchSourceCode(query, options);
|
||||||
|
|
||||||
|
if (results.length === 0) {
|
||||||
|
return {
|
||||||
|
content: [
|
||||||
|
{
|
||||||
|
type: 'text',
|
||||||
|
text: `No source code found for "${query}". Try different search terms or check if the source code has been indexed.`,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Format results for better readability
|
||||||
|
const formattedResults = results.map((result, index) => ({
|
||||||
|
rank: index + 1,
|
||||||
|
filePath: result.filePath,
|
||||||
|
package: result.package,
|
||||||
|
startLine: result.startLine,
|
||||||
|
endLine: result.endLine,
|
||||||
|
language: result.language,
|
||||||
|
codeSnippet: result.content.substring(0, 500) + (result.content.length > 500 ? '...' : ''),
|
||||||
|
imports: result.imports,
|
||||||
|
exports: result.exports,
|
||||||
|
url: result.url,
|
||||||
|
relevance: (result.score * 100).toFixed(1) + '%',
|
||||||
|
}));
|
||||||
|
|
||||||
|
return {
|
||||||
|
content: [
|
||||||
|
{
|
||||||
|
type: 'text',
|
||||||
|
text: JSON.stringify(
|
||||||
|
{
|
||||||
|
query,
|
||||||
|
totalResults: results.length,
|
||||||
|
results: formattedResults,
|
||||||
|
},
|
||||||
|
null,
|
||||||
|
2
|
||||||
|
),
|
||||||
|
},
|
||||||
|
],
|
||||||
|
};
|
||||||
|
} catch (error) {
|
||||||
|
return {
|
||||||
|
content: [
|
||||||
|
{
|
||||||
|
type: 'text',
|
||||||
|
text: `Error searching source code: ${error instanceof Error ? error.message : String(error)}`,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function registerGetSourceTool(server: McpServer): void {
|
||||||
|
server.registerTool(
|
||||||
|
'get_babylon_source',
|
||||||
|
{
|
||||||
|
description: 'Retrieve full Babylon.js source code file or specific line range',
|
||||||
|
inputSchema: {
|
||||||
|
filePath: z.string().describe('Relative file path from repository root (e.g., "packages/dev/core/src/scene.ts")'),
|
||||||
|
startLine: z
|
||||||
|
.number()
|
||||||
|
.optional()
|
||||||
|
.describe('Optional start line number (1-indexed)'),
|
||||||
|
endLine: z
|
||||||
|
.number()
|
||||||
|
.optional()
|
||||||
|
.describe('Optional end line number (1-indexed)'),
|
||||||
|
},
|
||||||
|
},
|
||||||
|
async ({ filePath, startLine, endLine }) => {
|
||||||
|
try {
|
||||||
|
const search = await getSearchInstance();
|
||||||
|
const sourceCode = await search.getSourceFile(filePath, startLine, endLine);
|
||||||
|
|
||||||
|
if (!sourceCode) {
|
||||||
|
return {
|
||||||
|
content: [
|
||||||
|
{
|
||||||
|
type: 'text',
|
||||||
|
text: `Source file not found: ${filePath}. The path may be incorrect or the file does not exist in the repository.`,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
content: [
|
||||||
|
{
|
||||||
|
type: 'text',
|
||||||
|
text: JSON.stringify(
|
||||||
|
{
|
||||||
|
filePath,
|
||||||
|
startLine: startLine || 1,
|
||||||
|
endLine: endLine || sourceCode.split('\n').length,
|
||||||
|
totalLines: sourceCode.split('\n').length,
|
||||||
|
language: filePath.endsWith('.ts') || filePath.endsWith('.tsx') ? 'typescript' : 'javascript',
|
||||||
|
content: sourceCode,
|
||||||
|
},
|
||||||
|
null,
|
||||||
|
2
|
||||||
|
),
|
||||||
|
},
|
||||||
|
],
|
||||||
|
};
|
||||||
|
} catch (error) {
|
||||||
|
return {
|
||||||
|
content: [
|
||||||
|
{
|
||||||
|
type: 'text',
|
||||||
|
text: `Error retrieving source file: ${error instanceof Error ? error.message : String(error)}`,
|
||||||
|
},
|
||||||
|
],
|
||||||
|
};
|
||||||
|
}
|
||||||
|
}
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|||||||
@ -53,7 +53,7 @@ vi.mock('@lancedb/lancedb', () => ({
|
|||||||
}));
|
}));
|
||||||
|
|
||||||
vi.mock('@xenova/transformers', () => ({
|
vi.mock('@xenova/transformers', () => ({
|
||||||
pipeline: vi.fn(() => Promise.resolve((text: string) => ({
|
pipeline: vi.fn(() => Promise.resolve((_text: string) => ({
|
||||||
data: new Float32Array([0.1, 0.2, 0.3]),
|
data: new Float32Array([0.1, 0.2, 0.3]),
|
||||||
}))),
|
}))),
|
||||||
}));
|
}));
|
||||||
|
|||||||
@ -203,6 +203,52 @@ export class LanceDBSearch {
|
|||||||
.replace(/\//g, '_');
|
.replace(/\//g, '_');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async searchSourceCode(
|
||||||
|
query: string,
|
||||||
|
options: { package?: string; limit?: number; tableName?: string } = {}
|
||||||
|
): Promise<Array<any & { score: number }>> {
|
||||||
|
if (!this.db || !this.embedder) {
|
||||||
|
throw new Error('Search not initialized');
|
||||||
|
}
|
||||||
|
|
||||||
|
const limit = options.limit || 5;
|
||||||
|
const tableName = options.tableName || 'babylon_source_code';
|
||||||
|
const queryVector = await this.generateEmbedding(query);
|
||||||
|
|
||||||
|
const sourceTable = await this.db.openTable(tableName);
|
||||||
|
let searchQuery = sourceTable.vectorSearch(queryVector).limit(limit);
|
||||||
|
|
||||||
|
if (options.package) {
|
||||||
|
searchQuery = searchQuery.where(`package = '${options.package}'`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const results = await searchQuery.toArray();
|
||||||
|
return results.map((doc: any) => ({
|
||||||
|
...doc,
|
||||||
|
score: doc._distance ? Math.max(0, 1 - doc._distance) : 0,
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
|
||||||
|
async getSourceFile(
|
||||||
|
filePath: string,
|
||||||
|
startLine?: number,
|
||||||
|
endLine?: number
|
||||||
|
): Promise<string | null> {
|
||||||
|
try {
|
||||||
|
const fullPath = path.join('./data/repositories/Babylon.js', filePath);
|
||||||
|
const content = await fs.readFile(fullPath, 'utf-8');
|
||||||
|
|
||||||
|
if (startLine !== undefined && endLine !== undefined) {
|
||||||
|
const lines = content.split('\n');
|
||||||
|
return lines.slice(startLine - 1, endLine).join('\n');
|
||||||
|
}
|
||||||
|
return content;
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Error reading source file ${filePath}:`, error);
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
async close(): Promise<void> {
|
async close(): Promise<void> {
|
||||||
// LanceDB doesn't require explicit closing
|
// LanceDB doesn't require explicit closing
|
||||||
}
|
}
|
||||||
|
|||||||
264
src/search/source-code-indexer.ts
Normal file
264
src/search/source-code-indexer.ts
Normal file
@ -0,0 +1,264 @@
|
|||||||
|
import { connect } from '@lancedb/lancedb';
|
||||||
|
import { pipeline } from '@xenova/transformers';
|
||||||
|
import fs from 'fs/promises';
|
||||||
|
import path from 'path';
|
||||||
|
|
||||||
|
export interface SourceCodeChunk {
|
||||||
|
id: string;
|
||||||
|
filePath: string;
|
||||||
|
package: string;
|
||||||
|
content: string;
|
||||||
|
startLine: number;
|
||||||
|
endLine: number;
|
||||||
|
language: string;
|
||||||
|
imports: string;
|
||||||
|
exports: string;
|
||||||
|
url: string;
|
||||||
|
vector: number[];
|
||||||
|
}
|
||||||
|
|
||||||
|
export class SourceCodeIndexer {
|
||||||
|
private db: any;
|
||||||
|
private embedder: any;
|
||||||
|
private readonly dbPath: string;
|
||||||
|
private readonly tableName: string;
|
||||||
|
private readonly repositoryPath: string;
|
||||||
|
private readonly chunkSize: number;
|
||||||
|
private readonly chunkOverlap: number;
|
||||||
|
|
||||||
|
constructor(
|
||||||
|
dbPath: string = './data/lancedb',
|
||||||
|
tableName: string = 'babylon_source_code',
|
||||||
|
repositoryPath: string = './data/repositories/Babylon.js',
|
||||||
|
chunkSize: number = 200,
|
||||||
|
chunkOverlap: number = 20
|
||||||
|
) {
|
||||||
|
this.dbPath = dbPath;
|
||||||
|
this.tableName = tableName;
|
||||||
|
this.repositoryPath = repositoryPath;
|
||||||
|
this.chunkSize = chunkSize;
|
||||||
|
this.chunkOverlap = chunkOverlap;
|
||||||
|
}
|
||||||
|
|
||||||
|
async initialize(): Promise<void> {
|
||||||
|
console.log('Initializing LanceDB connection...');
|
||||||
|
this.db = await connect(this.dbPath);
|
||||||
|
|
||||||
|
console.log('Loading embedding model...');
|
||||||
|
this.embedder = await pipeline(
|
||||||
|
'feature-extraction',
|
||||||
|
'Xenova/all-MiniLM-L6-v2'
|
||||||
|
);
|
||||||
|
console.log('Embedding model loaded');
|
||||||
|
}
|
||||||
|
|
||||||
|
async indexSourceCode(packages: string[] = ['core']): Promise<void> {
|
||||||
|
if (!this.embedder) {
|
||||||
|
throw new Error('Indexer not initialized. Call initialize() first.');
|
||||||
|
}
|
||||||
|
|
||||||
|
const chunks: SourceCodeChunk[] = [];
|
||||||
|
let fileCount = 0;
|
||||||
|
|
||||||
|
for (const pkg of packages) {
|
||||||
|
console.log(`\nIndexing package: ${pkg}...`);
|
||||||
|
const packagePath = path.join(this.repositoryPath, 'packages/dev', pkg, 'src');
|
||||||
|
|
||||||
|
try {
|
||||||
|
const files = await this.getAllSourceFiles(packagePath);
|
||||||
|
console.log(`Found ${files.length} source files in ${pkg}`);
|
||||||
|
|
||||||
|
for (const file of files) {
|
||||||
|
try {
|
||||||
|
const fileChunks = await this.processFile(file, pkg);
|
||||||
|
chunks.push(...fileChunks);
|
||||||
|
fileCount++;
|
||||||
|
|
||||||
|
if (fileCount % 50 === 0) {
|
||||||
|
console.log(`Processed ${fileCount}/${files.length} files...`);
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Error processing ${file}:`, error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
console.error(`Error indexing package ${pkg}:`, error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
console.log(`\nTotal source code chunks: ${chunks.length}`);
|
||||||
|
console.log('Creating LanceDB table...');
|
||||||
|
|
||||||
|
// Drop existing table if it exists
|
||||||
|
const tableNames = await this.db.tableNames();
|
||||||
|
if (tableNames.includes(this.tableName)) {
|
||||||
|
await this.db.dropTable(this.tableName);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create new table
|
||||||
|
await this.db.createTable(this.tableName, chunks);
|
||||||
|
console.log('Source code indexing complete!');
|
||||||
|
}
|
||||||
|
|
||||||
|
private async getAllSourceFiles(dir: string): Promise<string[]> {
|
||||||
|
const files: string[] = [];
|
||||||
|
|
||||||
|
try {
|
||||||
|
const entries = await fs.readdir(dir, { withFileTypes: true });
|
||||||
|
|
||||||
|
for (const entry of entries) {
|
||||||
|
const fullPath = path.join(dir, entry.name);
|
||||||
|
|
||||||
|
if (entry.isDirectory()) {
|
||||||
|
// Skip node_modules, dist, build, etc.
|
||||||
|
if (!['node_modules', 'dist', 'build', 'lib', '.git'].includes(entry.name)) {
|
||||||
|
const subFiles = await this.getAllSourceFiles(fullPath);
|
||||||
|
files.push(...subFiles);
|
||||||
|
}
|
||||||
|
} else if (entry.isFile()) {
|
||||||
|
// Include .ts, .tsx, .js, .jsx files
|
||||||
|
if (/\.(ts|tsx|js|jsx)$/.test(entry.name)) {
|
||||||
|
files.push(fullPath);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch (error) {
|
||||||
|
// Directory doesn't exist or can't be read
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
|
||||||
|
return files;
|
||||||
|
}
|
||||||
|
|
||||||
|
private async processFile(filePath: string, pkg: string): Promise<SourceCodeChunk[]> {
|
||||||
|
const content = await fs.readFile(filePath, 'utf-8');
|
||||||
|
const lines = content.split('\n');
|
||||||
|
const chunks: SourceCodeChunk[] = [];
|
||||||
|
|
||||||
|
// Extract imports and exports for metadata
|
||||||
|
const imports = this.extractImports(content);
|
||||||
|
const exports = this.extractExports(content);
|
||||||
|
|
||||||
|
// Determine language
|
||||||
|
const language = filePath.endsWith('.ts') || filePath.endsWith('.tsx') ? 'typescript' : 'javascript';
|
||||||
|
|
||||||
|
// Get relative path from repository root
|
||||||
|
const relativePath = path.relative(this.repositoryPath, filePath);
|
||||||
|
|
||||||
|
// Chunk the file
|
||||||
|
for (let i = 0; i < lines.length; i += this.chunkSize - this.chunkOverlap) {
|
||||||
|
const startLine = i + 1;
|
||||||
|
const endLine = Math.min(i + this.chunkSize, lines.length);
|
||||||
|
const chunkLines = lines.slice(i, endLine);
|
||||||
|
const chunkContent = chunkLines.join('\n');
|
||||||
|
|
||||||
|
// Skip empty chunks
|
||||||
|
if (chunkContent.trim().length === 0) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create embedding
|
||||||
|
const embeddingText = this.createEmbeddingText(chunkContent, relativePath);
|
||||||
|
const vector = await this.generateEmbedding(embeddingText);
|
||||||
|
|
||||||
|
chunks.push({
|
||||||
|
id: `${relativePath}:${startLine}-${endLine}`,
|
||||||
|
filePath: relativePath,
|
||||||
|
package: pkg,
|
||||||
|
content: chunkContent,
|
||||||
|
startLine,
|
||||||
|
endLine,
|
||||||
|
language,
|
||||||
|
imports,
|
||||||
|
exports,
|
||||||
|
url: this.generateGitHubUrl(relativePath, startLine, endLine),
|
||||||
|
vector,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
return chunks;
|
||||||
|
}
|
||||||
|
|
||||||
|
private extractImports(content: string): string {
|
||||||
|
const imports: string[] = [];
|
||||||
|
const importRegex = /import\s+(?:{[^}]+}|[^;]+)\s+from\s+['"]([^'"]+)['"]/g;
|
||||||
|
let match;
|
||||||
|
|
||||||
|
while ((match = importRegex.exec(content)) !== null) {
|
||||||
|
if (match[1]) {
|
||||||
|
imports.push(match[1]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return imports.slice(0, 20).join(', '); // Limit to first 20 imports
|
||||||
|
}
|
||||||
|
|
||||||
|
private extractExports(content: string): string {
|
||||||
|
const exports: string[] = [];
|
||||||
|
const exportRegex = /export\s+(?:class|function|interface|type|const|let|var|enum|default)\s+([A-Za-z_$][A-Za-z0-9_$]*)/g;
|
||||||
|
let match;
|
||||||
|
|
||||||
|
while ((match = exportRegex.exec(content)) !== null) {
|
||||||
|
if (match[1]) {
|
||||||
|
exports.push(match[1]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return exports.slice(0, 20).join(', '); // Limit to first 20 exports
|
||||||
|
}
|
||||||
|
|
||||||
|
private createEmbeddingText(code: string, filePath: string): string {
|
||||||
|
// Combine file path, code, and extract key terms for better search
|
||||||
|
const fileName = path.basename(filePath);
|
||||||
|
const dirName = path.dirname(filePath).split('/').pop() || '';
|
||||||
|
|
||||||
|
// Extract comments for context
|
||||||
|
const comments = this.extractComments(code);
|
||||||
|
|
||||||
|
return `${fileName} ${dirName} ${comments} ${code.substring(0, 1000)}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
private extractComments(code: string): string {
|
||||||
|
const comments: string[] = [];
|
||||||
|
|
||||||
|
// Single-line comments
|
||||||
|
const singleLineRegex = /\/\/\s*(.+)$/gm;
|
||||||
|
let match;
|
||||||
|
while ((match = singleLineRegex.exec(code)) !== null) {
|
||||||
|
if (match[1]) {
|
||||||
|
comments.push(match[1].trim());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Multi-line comments
|
||||||
|
const multiLineRegex = /\/\*\*?([\s\S]*?)\*\//g;
|
||||||
|
while ((match = multiLineRegex.exec(code)) !== null) {
|
||||||
|
if (match[1]) {
|
||||||
|
comments.push(match[1].trim());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return comments.slice(0, 5).join(' ');
|
||||||
|
}
|
||||||
|
|
||||||
|
private async generateEmbedding(text: string): Promise<number[]> {
|
||||||
|
if (!this.embedder) {
|
||||||
|
throw new Error('Embedder not initialized');
|
||||||
|
}
|
||||||
|
|
||||||
|
const result = await this.embedder(text, {
|
||||||
|
pooling: 'mean',
|
||||||
|
normalize: true,
|
||||||
|
});
|
||||||
|
|
||||||
|
return Array.from(result.data);
|
||||||
|
}
|
||||||
|
|
||||||
|
private generateGitHubUrl(relativePath: string, startLine: number, endLine: number): string {
|
||||||
|
return `https://github.com/BabylonJS/Babylon.js/blob/master/${relativePath}#L${startLine}-L${endLine}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
async close(): Promise<void> {
|
||||||
|
console.log('Source code indexer closed');
|
||||||
|
}
|
||||||
|
}
|
||||||
Loading…
Reference in New Issue
Block a user