Tool Reference

Complete guide to Code Scalpel's 22 MCP tools for surgical code operations

Overview

Code Scalpel provides 22 specialized MCP tools for AI-assisted code operations. All tools are available at all tiers (Community, Pro, Enterprise) with tier-based limits and progressive feature enhancement.

πŸ”‘ Key Principles

  • Token Efficiency: 99% reduction vs traditional file reading (50 tokens vs 10,000+)
  • AST-Based Accuracy: Zero hallucination via real parsers (Python ast, tree-sitter)
  • Multi-Language: Python (full AST), JavaScript/TypeScript/Java (tree-sitter/heuristic)
  • Tier Scaling: All tools available; limits expand by tier (Community β†’ Pro β†’ Enterprise)

Tool Categories

πŸ“Š Code Analysis (4 tools)

Understand code structure, complexity, and architecture without reading full files.

analyze_code

All Tiers

Purpose: Static code structure analysis via AST parsing. Returns functions, classes, imports, and complexity metrics without hallucination risk.

Token Savings: ~200 tokens vs ~5,000+ for full file read
Languages: Python (ast), JavaScript/TypeScript (tree-sitter), Java
Performance: <100ms for files <1K LOC
Use Cases: Understanding code before extraction/modification, identifying functions/classes, complexity assessment

get_file_context

All Tiers

Purpose: Token-efficient "strategic glance" at a file. Returns structured summary including functions, classes, imports, complexity, and security warnings.

Token Savings: 20-50x reduction (~50-150 tokens vs ~5,000+)
Security: Full security warning support (CWE mapping) in all tiers
Performance: ~50ms average response time
Use Cases: Assessing file relevance before extraction, checking for specific imports, identifying security hotspots quickly
Pro Tier: Semantic summarization, intent extraction, code smells detection, maintainability index
Enterprise Tier: Compliance flags (HIPAA, GDPR, PCI-DSS), PII/secret redaction, CODEOWNERS integration

crawl_project

Community: 100 files Pro: 1000 files Enterprise: Unlimited

Purpose: Project-wide analysis providing comprehensive structure, complexity metrics, and code intelligence across all files. Bird's-eye view of entire codebase.

Performance: >100 files/second, <500MB for 10K files
Community: File inventory, basic metrics, hotspot identification (100 file limit)
Pro: Complexity hotspots, architectural layer detection, dependency mapping (1000 files)
Enterprise: Monorepo support, historical trends, compliance scanning (unlimited)
Use Cases: Understanding overall project structure, identifying complexity hotspots, risk assessment, framework detection, dependency analysis

get_project_map

Community: 100 files Pro: Enhanced Enterprise: Full

Purpose: Architectural reconnaissance engine generating comprehensive project structure visualizations. Shows packages, modules, complexity hotspots, and architectural patterns.

Performance: <10s for 1K files
Visualizations: Mermaid (all tiers), City Map (Enterprise), Force Graph (Enterprise)
Community: Package/module hierarchy, entry points, circular imports (100 files, 50 modules)
Pro: Complexity hotspots, architectural layer detection, smart context
Enterprise: Compliance flags, Git ownership/churn analysis, technical debt scoring

βœ‚οΈ Code Extraction (3 tools)

Surgically extract specific code elements by name with maximum token efficiency.

extract_code

Community: Single-file Pro: Cross-file depth=1 Enterprise: Unlimited

Purpose: Primary code retrieval tool. Surgically extracts specific functions/classes/methods by name. Agent sends ~50 tokens, receives ~200 (vs 10,000+ for full file).

Token Efficiency: 99% savings (50 vs 10,000+ tokens)
Languages: Python (full AST), JavaScript/TypeScript/Java (tree-sitter)
Performance: <50ms extraction time
Features: Extract by symbol name (no line numbers), context-aware (includes decorators/docstrings), React metadata support (JSX/TSX)
Community: Single-file extraction with intra-file dependencies
Pro/Enterprise: Cross-file dependency resolution (Python), depth=1 imports resolved

get_symbol_references

Community: 10 files, 50 refs Pro: Unlimited Enterprise: + Risk Scoring

Purpose: Find all references to a symbol (function, class, variable) across the project for safe refactoring and impact analysis. AST-based accuracy eliminates false positives.

Accuracy: Zero hallucination via AST parsing (excludes comments/strings)
Performance: <100ms for typical projects (<100 files)
Community: Accurate reference finding (10 files, 50 references max)
Pro: Reference categorization (call, import, read, write), unlimited files/refs
Enterprise: Risk scoring, CODEOWNERS integration, team coordination
Use Cases: Pre-refactoring impact analysis, understanding call sites, safe symbol changes

get_cross_file_dependencies

Community: Depth=1, 50 files Pro: Depth=5, 500 files Enterprise: Unlimited

Purpose: Analyze import/require statements and trace dependency chains across file boundaries with confidence scoring. Gathers complete code context for AI-assisted editing.

Confidence Scoring: Exponential decay (0.9^depth) shows reliability of deep chains
Languages: Python, JavaScript, TypeScript
Features: Circular import detection, combined code output, Mermaid diagrams
Community: Depth=1 (immediate imports), 50 files max
Pro: Transitive resolution (A→B→C chains), depth=5, 500 files
Enterprise: Architectural enforcement, unlimited depth/files, coupling metrics

✏️ Code Modification (2 tools)

Surgical, safe code modification with automatic validation and backup.

update_symbol

All Tiers (No Restrictions)

Purpose: Primary write tool for surgical code modification. Replace specific functions/classes/methods with new code while preserving surrounding context. Atomic write with backup.

Safety: Automatic syntax validation before writing, atomic operations prevent corruption
Backup: Creates .bak files by default (rollback safety)
Languages: Python (v1.0), JS/TS/Java (planned v2.0)
Performance: <100ms for patch application
Features: Decorator awareness, automatic indentation handling, zero hallucination risk
Use Cases: Applying generated fixes, updating single functions, adding methods to classes, refactoring implementation

rename_symbol

Community: Single-file Pro: Cross-file (bounded) Enterprise: Org-wide

Purpose: Safely rename functions, classes, or methods while automatically updating all references. AST-based transformation ensures syntactic correctness and preserves formatting.

Accuracy: AST-based (won't rename in strings/comments), identifier validation
Languages: Python, JavaScript, TypeScript, JSX
Community: Single-file renames with local reference updates
Pro: Cross-file propagation with import statement updates (bounded by limits)
Enterprise: Repository-wide and multi-repo renames, audit trails, approval workflows

πŸ”€ Code Flow Analysis (3 tools)

Understand function relationships, call graphs, and execution paths.

get_call_graph

Community: Depth=3, 50 nodes Pro: Depth=50, 500 nodes Enterprise: Unlimited

Purpose: Generate static call graphs showing function-to-function relationships, entry points, and circular dependencies. Primary architecture visualization tool.

Languages: Python, JavaScript, TypeScript
Performance: <500ms for 1K functions
Precision: >90% (Community), >95% (Pro), >98% (Enterprise)
Features: Entry point discovery, circular dependency detection, Mermaid diagrams
Use Cases: Impact analysis (what breaks if I change this?), dead code detection, security auditing (trace to sinks), architecture validation

symbolic_execute

Community: 3 paths, 10 loop iterations Pro: 10 paths, 100 iterations Enterprise: Unlimited

Purpose: Formal verification and path exploration powered by Z3 Theorem Prover. Treats variables as mathematical symbols to solve for inputs that trigger specific code paths.

Mathematical Certainty: Uses SMT (Satisfiability Modulo Theories) to prove code reachability
Solver Engine: Microsoft Z3
Primary Language: Python (v1.0), JS/Java planned
Features: Corner case discovery, zero hallucination test generation, unreachable code detection
Use Cases: Determining how to reach specific code, generating test inputs, verifying critical business logic, proving code is unreachable

get_graph_neighborhood

Community: k=1 Pro: k=5 Enterprise: Unlimited

Purpose: Extract localized subgraph around a specific function to avoid the "Exploding Graph Problem". Surgically explore dependency chains without loading entire codebase.

Memory Safety: Prevents OOM errors via node limits and truncation protection
Performance: <100ms for neighborhood extraction
Community: k=1 (immediate neighbors only)
Pro: k=5 hops, semantic intelligence (functionally similar nodes)
Enterprise: Unlimited hops, Cypher-like query language for complex traversals

πŸ”’ Security Analysis (5 tools)

Comprehensive security vulnerability detection with taint analysis and dependency scanning.

security_scan

Community: 50 findings Pro: Unlimited + Remediation Enterprise: + Custom Rules

Purpose: Primary Single-File SAST engine. Identifies SQL injection, XSS, command injection via sophisticated Python taint analysis. Sink detection for JS/TS/Java.

Precision: Python taint analysis tracks data flow from sources to sinks, recognizes sanitizers
Performance: <200ms per file average
Community: Big Four vulnerabilities (SQL, XSS, Command Injection, Path Traversal), 50 findings max
Pro: Extended detection (NoSQL, LDAP, Secrets, JWT), unlimited findings, remediation hints
Enterprise: Custom rules, compliance reporting, organization-wide patterns

cross_file_security_scan

Community: 10 modules, depth=3 Pro: 100 modules, depth=10 Enterprise: Unlimited

Purpose: Detect vulnerabilities that span multiple files by tracking tainted data flow across module boundaries. Answers "How does untrusted data flow through my entire system?"

Multi-File Taint Tracking: Follows data from entry (routes.py) β†’ storage (models.py) β†’ execution (db.py)
Languages: Python (full), others (heuristic context)
Community: 10 modules, depth=3, basic cross-file flows
Pro: 100 modules, depth=10, architectural visibility, compliance ready
Enterprise: Unlimited modules/depth, microservice boundary tracking

unified_sink_detect

Community: 50 sinks Pro: Unlimited + Context Enterprise: + Risk Scoring

Purpose: Polyglot detection of dangerous "sinks" (functions where untrusted data execution leads to vulnerabilities). Confidence scoring and CWE mapping reduce false positives.

Languages: Python, JavaScript, TypeScript, Java
Sink Types: eval, exec, subprocess, innerHTML, document.write, Runtime.exec, etc.
Confidence Scoring: High (0.9-1.0), Medium (0.6-0.8), Low (0.0-0.5) based on context
Community: 50 sinks max, basic signature matching
Pro: Unlimited sinks, context-aware analysis (rules out safe usages), framework support
Enterprise: Risk scoring, compliance mapping (GDPR, PCI-DSS), remediation advice

scan_dependencies

Community: 50 dependencies Pro: Unlimited + Reachability Enterprise: + Compliance

Purpose: Software Composition Analysis (SCA) to identify security risks in project dependencies. Scans manifests against OSV database with typosquatting detection and license compliance.

Multi-Language: Python (pip), JavaScript (npm), Java (Maven/Gradle)
Vulnerability DB: Google OSV (aggregates GHSA, PySEC, NVD)
Community: CVE detection, transitive dependencies, 50 deps max
Pro: Reachability analysis (is vulnerable code imported?), typosquatting detection, license compliance, supply chain risk scoring
Enterprise: Policy enforcement, SOC2/ISO 27001 reports, SBOM generation (CycloneDX), custom registry support

type_evaporation_scan

All Tiers

Purpose: Detect type safety violations where type hints are present but can be bypassed at runtime. Focuses on Python's dynamic type system vulnerabilities.

Detection: Any casting, getattr, setattr, __dict__ manipulation that bypasses type hints
Languages: Python (primary), TypeScript (planned)
Use Cases: Type safety enforcement, runtime integrity validation, security boundary verification

πŸ“‹ Policy & Compliance (2 tools)

Enforce organizational standards, best practices, and regulatory compliance.

code_policy_check

Community: Style Pro: + Best Practices Enterprise: + Compliance

Purpose: Unified enforcement of coding standards, best practices, security patterns, and compliance requirements. Single interface for style, security, and regulatory auditing.

Performance: <500ms per file
Patterns: 50+ (Community/Pro/Enterprise combined)
Community: Style guides (PEP8, ESLint), basic patterns
Pro: Best practices (type hints, docstrings), security patterns (hardcoded secrets, SQL injection)
Enterprise: Compliance auditing (HIPAA, SOC2, GDPR, PCI-DSS), custom rules, audit trails, PDF certification generation

verify_policy_integrity

Enterprise Only

Purpose: Cryptographic guardian of governance model. Ensures policy definitions (allowlists, denylists, compliance rules) haven't been tampered with. Fail-closed security.

Algorithm: HMAC-SHA256 & SHA-256 cryptographic verification
Security Model: Fail-closed (deny by default on any anomaly)
Bypass Resistance: Defeats chmod attacksβ€”verifies cryptographic content, not file permissions
Features: Tamper detection (single-bit changes), manifest signature validation, audit-ready error reporting
Use Cases: Pre-flight check before AI agents, policy drift auditing, compliance artifact verification in CI/CD

πŸ§ͺ Testing & Validation (2 tools)

Automated test generation and pre-flight safety validation for code changes.

generate_unit_tests

Community: 5 tests Pro: 20 tests + Parametrized Enterprise: Unlimited + Bug Reproduction

Purpose: Automatically create comprehensive unit tests using symbolic execution. Explores all execution paths to generate concrete test cases with specific input values.

Method: Z3-powered symbolic execution (zero hallucination)
Frameworks: pytest (all tiers), unittest (Pro+)
Performance: <5s per function
Community: 5 test cases max, pytest framework, path coverage
Pro: 20 test cases, data-driven parametrized tests, unittest support
Enterprise: Unlimited tests, bug reproduction from crash logs

simulate_refactor

All Tiers

Purpose: Pre-flight safety validator for code modifications. Performs "dry run" analysis to detect security regressions, breaking changes, and behavioral inconsistencies before applying changes.

Performance: <1s for typical refactors (<500 LOC)
Languages: Python, JavaScript, TypeScript, Java
Analysis: Syntax validation, security differential (compares security_scan results), structural comparison (signature changes, removed symbols), semantic analysis (Pro+)
Safety Gate: Prevents SQL injection, XSS introduction during refactoring
Enterprise: Compliance impact analysis, automated rollback strategies

πŸ”§ Utility (2 tools)

Path validation and file system operations with Docker awareness.

validate_paths

Community: 100 paths Pro/Enterprise: Unlimited

Purpose: Pre-flight checklist for filesystem operations. Validates file paths are accessible, exist, and safe before expensive operations. Docker-aware with volume mount suggestions.

Performance: <10ms for 100 paths
Docker Intelligence: Detects container execution, suggests missing volume mounts
Community: Batch validation (100 paths), basic existence checks
Pro/Enterprise: Unlimited paths, Enterprise adds workspace boundary enforcement, audit logging for violations
Use Cases: Fail fast before operations, turn "File Not Found" into actionable DevOps guidance, security enforcement

Tier Comparison Summary

Feature Community Pro Enterprise
All 22 Tools βœ… Available βœ… Available βœ… Available
security_scan findings 50 max Unlimited Unlimited + Custom
symbolic_execute paths 3 paths 10 paths Unlimited
crawl_project files 100 files 1,000 files Unlimited
extract_code dependencies Single-file Cross-file depth=1 Unlimited depth
generate_unit_tests 5 tests 20 tests Unlimited
Remediation Suggestions ❌ βœ… βœ…
Compliance Reporting ❌ ❌ βœ… (HIPAA, SOC2, GDPR, PCI-DSS)
Custom Policies ❌ ❌ βœ…

Common Integration Patterns

Typical Workflow: Analyze β†’ Extract β†’ Modify β†’ Verify

  1. Understand Structure: analyze_code or get_file_context
  2. Extract Code: extract_code with symbol name
  3. Verify Safety: security_scan on extracted code
  4. Check Impact: get_symbol_references to find all call sites
  5. Simulate Changes: simulate_refactor with proposed modifications
  6. Apply Changes: update_symbol if simulation passes
  7. Validate: Re-run security_scan and tests

Getting Started

Ready to use these tools? Check out our documentation for installation instructions, configuration guides, and detailed usage examples.