22 Tools for Surgical Code Operations
Give your AI agent deep understanding of your codebase. Parse structure, track security flows, refactor safely, and verify everything with zero hallucination.
See it in action
Real examples of what your AI agent does differently with Code Scalpel.
"Add a new field to the users table"
Your agent uses get_project_map to find the User model, get_symbol_references to find all usages, security_scan to check implications, and update_symbol to apply changes safely.
Result: Safe change, fully verified, completely auditable. Zero broken builds.
"Fix this security vulnerability"
Your agent uses cross_file_security_scan to trace data flow, symbolic_execute to generate test cases, applies the fix with update_symbol, and verifies with simulate_refactor.
Result: Vulnerability closed. All paths verified safe. Compliance proof generated.
"Refactor this monolithic file"
Your agent uses analyze_code to understand structure, get_call_graph to map dependencies, extract_code to pull out functions, and simulate_refactor to verify behavior preservation.
Result: Clean, testable functions. Every reference updated. Behavior preserved.
What you get at each tier
Every tool works at every tier. Higher tiers increase limits and unlock advanced features.
| Tool | Community (Free) | Pro | Enterprise |
|---|---|---|---|
| Analysis & Context | |||
analyze_code |
✅ Full | ✅ Full | ✅ Full |
crawl_project |
✅ Full | ✅ Full | ✅ Full |
get_file_context |
✅ Full | ✅ Full | ✅ Full |
get_symbol_references |
✅ Full | ✅ Full | ✅ Full |
| Extraction & Modification | |||
extract_code |
✅ Single-file | ✅ + Cross-file deps | ✅ + Cross-file deps |
update_symbol |
✅ Full | ✅ Full | ✅ Full |
rename_symbol |
✅ Single-file | ✅ + Cross-file refs | ✅ + Cross-file refs |
| Security | |||
security_scan |
✅ 10 paths | ✅ 100 paths | ✅ Unlimited |
unified_sink_detect |
✅ Full | ✅ Full | ✅ Full |
cross_file_security_scan |
✅ Basic | ✅ 100 files | ✅ Unlimited |
scan_dependencies |
✅ Full | ✅ Full | ✅ Full |
type_evaporation_scan |
✅ Full | ✅ Full | ✅ Full |
| Graph Analysis | |||
get_call_graph |
✅ 3 depth / 50 nodes | ✅ 50 depth / 500 nodes | ✅ Unlimited |
get_project_map |
✅ 100 files | ✅ 1,000 files | ✅ Unlimited |
get_graph_neighborhood |
✅ 2 hops / 50 nodes | ✅ 5 hops / 200 nodes | ✅ Unlimited |
get_cross_file_dependencies |
✅ 2 depth | ✅ 5 depth | ✅ Unlimited |
| Symbolic Execution | |||
symbolic_execute |
✅ 10 paths / depth 3 | ✅ 100 paths / depth 10 | ✅ Unlimited |
generate_unit_tests |
✅ Basic | ✅ Full coverage | ✅ Full coverage |
simulate_refactor |
✅ Single-file | ✅ + Cross-file | ✅ + Cross-file |
| Policy & Governance | |||
validate_paths |
✅ Full | ✅ Full | ✅ Full |
verify_policy_integrity |
— | — | ✅ Cryptographic |
code_policy_check |
✅ OWASP basics | ✅ Multi-standard | ✅ + Custom OPA rules |
Complete Tool Reference
All 22 tools organized by what they do. Every tool works in every tier. Pro and Enterprise increase limits and add advanced features.
✨ All 22 tools available at all tiers
Community (free), Pro, and Enterprise tiers all have access to the complete toolkit. Higher tiers increase limits (max paths, depth, files) and add advanced features (cross-file analysis, custom rules, compliance reporting) — but every tool works in every tier.
Analysis & Context (4 tools)
Understand code structure without guessing. Parse files, inventory projects, and get quick context.
analyze_code – Parse code structure (functions, classes, imports)
▶
What it does: Performs real Abstract Syntax Tree (AST) parsing to extract complete code structure with precision. Returns every function, class, method, import statement, decorator, and cyclomatic complexity metric. Guaranteed accuracy—no fragile regex patterns or guesswork.
When agents use it: Before any code modification, to map dependencies, identify complexity hotspots (functions over 10 lines with 5+ branches), find all callable functions/classes, or plan safe refactoring strategies.
Languages: Python, JavaScript, TypeScript, Java, JSX, TSX
crawl_project – Inventory all files in a project
▶
What it does: Recursively scans your project directory and catalogs every file. Returns normalized paths, file types (Python, JavaScript, config, docs), file sizes, and respects .gitignore/.git to avoid noise. Produces a complete project inventory in seconds.
When agents use it: On first pass to understand project shape, locate configuration files (setup.py, package.json, .env), find test directories, or identify which directories matter for analysis (skipping node_modules, dist, .git).
get_file_context – Quick file overview without full read
▶
What it does: Provides a lightweight file preview: size, line count, first 10 lines (imports/headers), last 10 lines (EOF markers), mime type, and encoding. Perfect for triage without the cost of parsing 10MB files.
When agents use it: Before expensive operations, to determine if a file is relevant (headers show if it's a config file, test file, or utility). Saves tokens by filtering out non-code files first.
get_symbol_references – Find all usages of a symbol
▶
What it does: Uses AST-based symbol tracking to find every reference to a function, class, method, or variable across your entire codebase. Returns exact file locations, line numbers, and context (is it a definition, call, or reference?). No false positives from string matches.
When agents use it: Before renaming (to update all callers), before deleting (to confirm nothing depends on it), or to understand impact radius of a change ("If I modify this function, what else breaks?").
Code Extraction & Modification
extract_code – Surgical extraction of functions/classes by name
▶
What it does: Surgically extracts a specific function, class, or method from a file with exact source code and line numbers. Optionally resolves and includes dependencies (imports, parent classes) so the extracted code is self-contained and visible.
When agents use it: To get current implementation before making changes. Agents extract → modify → use update_symbol to apply. Works like copying code but with automatic context inclusion.
Languages: Python, JavaScript, TypeScript, Java, JSX, TSX
update_symbol – Safely replace functions/classes in files
▶
What it does: Precisely replaces a function, class, or method in a source file with new code. Preserves all surrounding code exactly, validates Python/JavaScript syntax before writing, and creates automatic .bak backups. Zero risk of mangling nearby code.
When agents use it: After modifying extracted code. Extract → edit locally → update_symbol applies changes. Syntax validation prevents broken deployments.
Use case: "Update the validate_email function to use RFC 5322 pattern" → extract_code → modify → update_symbol
rename_symbol – Rename functions/classes throughout codebase
▶
What it does: Renames a function, class, or method with surgical precision. Automatically updates all references across the codebase (imports, calls, type hints) using AST-based analysis to avoid string-match errors.
When agents use it: Refactoring tasks: "Rename calc_total to calculate_total with all callers updated," enforce naming conventions, or make code more semantic.
Tier differences: Community renames in single file; Pro/Enterprise add cross-file reference updates and automatic import fixing.
Security
security_scan – Detect vulnerabilities using taint analysis
▶
What it does: Advanced taint tracking analysis that traces untrusted data (user input, environment variables, network calls) through your code to identify how it reaches dangerous operations (SQL queries, system commands, file writes). Maps complete attack paths with evidence. Detects SQL injection, XSS, command injection, path traversal, SSRF, hardcoded secrets, with confidence scoring.
When agents use it: Before deployment for security audit, or when user asks "Is this code secure?". Returns taint flow visualization: User input → sanitize? → SQL execute = vulnerability or safe.
Example: User input → f-string → SQL execute = CRITICAL vulnerability detected with remediation advice.
unified_sink_detect – Polyglot sink detection across languages
▶
What it does: Polyglot sink detection across multiple languages. Finds dangerous operations (SQL execute, eval, system calls, file operations, DOM manipulation) in Python (os.system, sqlite3.execute), JavaScript (innerHTML, fetch), TypeScript, Java (Runtime.exec) with confidence scoring for each detection.
When agents use it: Quick surface-level security audit without full taint analysis. Answer: "Show me all potentially dangerous operations in this file."
Languages: Python (os.system, eval), JavaScript (innerHTML, eval), TypeScript, Java (Runtime.exec)
cross_file_security_scan – Track taint flow across file boundaries
▶
What it does: Extends single-file security analysis to multi-file attack chains. Traces tainted data flowing across module boundaries: routes.py (accepts user input) → services.py (processes data) → db.py (executes SQL). Detects vulnerabilities that local analysis misses. Generates Mermaid visualizations showing complete taint flow.
When agents use it: Full-project security audits. Catches cross-module vulnerabilities where input validation happens in one file but is bypassed in another.
Pro/Enterprise limits: Tracks across up to 100 files (Community: basic cross-file). Includes Mermaid flow diagrams showing taint path.
scan_dependencies – Scan dependencies for CVEs
▶
What it does: Scans requirements.txt, package.json, pyproject.toml, go.mod against the OSV (Open Source Vulnerabilities) database. Returns CVE IDs, CVSS severity scores, affected version ranges, and direct links to security advisories. Works for Python (PyPI), JavaScript (npm), Go, Rust, Java ecosystems.
When agents use it: Supply chain security: "Are my dependencies safe?" Returns actionable patch versions and remediation steps.
Ecosystems: Python (PyPI), JavaScript (npm), Go, Rust
type_evaporation_scan – Detect Type System Evaporation
▶
What it does: Detects type system vulnerabilities at API boundaries. A TypeScript frontend declares {age: number, isAdmin: boolean} but backend receives raw JSON without validation. Attacker sends {age: 30, isAdmin: "yes"} (truthy string) and becomes admin. Identifies these full-stack type confusion vulnerabilities.
When agents use it: Full-stack projects with TypeScript frontend + Python/Node backend. Verifies backend properly validates and type-checks all API inputs.
Example vuln: Frontend: isAdmin: boolean → Backend: if data.get('isAdmin') → Attacker sends isAdmin="yes" (truthy string) = admin access granted
Graph Analysis
get_call_graph – Build call graphs showing function relationships
▶
What it does: Builds a complete call graph: every function and which other functions it calls. Detects entry points (main (), CLI handlers, API routes), identifies circular dependencies, shows execution depth and breadth. Outputs Mermaid diagrams showing the entire call chain from entry to leaf functions. Includes confidence scoring (100% certain vs. dynamic dispatch).
When agents use it: Understand execution paths (user request → which functions run?), find dead code, trace taint from input source to output sink, or plan refactoring impact zones.
Pro/Enterprise limits: 50 depth, 500 nodes (Community: 3 depth, 50 nodes). Path queries, k-hop traversal, confidence scoring available at all tiers with different limits.
get_project_map – Generate comprehensive project structure map
▶
What it does: Generates a high-level architecture map of your entire codebase: package structure, exposed functions/classes, entry points, complexity hotspots (cyclomatic complexity > 10), import graph visualization, circular dependencies, size metrics. Returns Mermaid-compatible diagrams and JSON for analysis.
When agents use it: Onboarding to new codebase ("What does this project do?"), identifying code to refactor (high complexity), finding test entry points, or planning modularization.
Use case: Onboarding to new codebase, finding complexity hotspots (functions with cyclomatic complexity > 10).
get_graph_neighborhood – Extract k-hop neighborhood subgraph
▶
What it does: Extracts a focused subgraph from the full call graph. Choose a center node (e.g., process_payment function) and specify depth (2 hops), then get all functions → calling that function, → called by that function, or both. Prevents graph explosion on large codebases by limiting scope.
When agents use it: Impact analysis ("What does delete_user call?"), focused refactoring ("Which functions touch this module?"), or security analysis ("What calls execute_sql?").
Directions: outgoing (callees), incoming (callers), both (full neighborhood)
get_cross_file_dependencies – Analyze cross-file dependency chains
▶
What it does: Recursively resolves all dependencies a function/class needs from other files (imports, parent classes, used utilities). Returns combined source code with all dependencies inlined, a dependency graph, circular dependency detection, and Mermaid visualization of the dependency tree. Includes confidence decay scoring (direct deps = high confidence, transitive deps = lower).
When agents use it: Get full context for a function (all imports resolved and visible), detect circular imports, plan module extraction ("Can we move this function to a new file?"), or understand hidden dependencies.
Output: Dependency graph, combined code (all deps in one string), confidence decay scoring, Mermaid diagram.
Symbolic Execution
symbolic_execute – Explore execution paths with Z3 solver
▶
What it does: Uses the Z3 theorem prover to explore all possible execution paths through a function. Sends symbolic values (not concrete numbers) through if/else branches and computes path constraints. Discovers edge cases and crash conditions: "If quantity < 0 then ValueError," "If total == 0 then ZeroDivisionError," etc. Returns all paths with concrete example inputs.
When agents use it: Answer "What edge cases am I missing?" or find hard-to-reach code paths. Generates vulnerability-triggering inputs and crash proofs.
Example: Function with 5 if/else branches → Tool explores all 5 paths and generates example inputs for each.
generate_unit_tests – Generate tests from symbolic paths
▶
What it does: Leverages symbolic execution to automatically generate comprehensive unit tests. Explores all function branches and generates test cases for each path: happy path, error conditions (ValueError, ZeroDivisionError, TypeError), edge cases (empty inputs, negative numbers, None values). Outputs ready-to-run pytest, unittest, or hypothesis tests with 100% code path coverage.
When agents use it: User says "Write tests for this function" or for TDD workflows. Ensures every code branch is tested, including error handling.
Output: Complete test file with test cases for all branches, error cases (ValueError, ZeroDivisionError), and edge cases.
simulate_refactor – Verify behavior preservation before applying changes
▶
What it does: Simulates applying a code change and verifies behavior preservation through comparative symbolic execution. Passes original and refactored code to Z3, explores all paths in both, and confirms they produce identical results for all inputs. Returns is_safe (boolean), differences (if any), and path comparison analysis.
When agents use it: Before update_symbol applies changes. Critical safety check: "This refactor preserves 100% of behavior" or "Warning: This changes behavior when quantity == 0"
Output: is_safe (boolean), differences (if any), path_comparison (original vs new paths), recommendation.
Policy & Governance
validate_paths – Validate file paths before operations
▶
What it does: Validates file paths before all operations: confirms files exist, are readable/writable, fall within project boundaries (no path traversal escape), and detects suspicious symlinks pointing outside the project. Essential security layer for sandboxed deployments.
When agents use it: Every agent operation on files is guarded by path validation first. Prevents attacks, enforces sandbox boundaries, and confirms file accessibility before expensive parsing.
Checks: File exists, is readable/writable, within project boundaries, not a symlink outside project.
verify_policy_integrity – Cryptographic policy verification
▶
What it does: Cryptographic policy verification: generates SHA-256 hashes of governance configuration files (limits.toml, security-policy.yaml, .code-scalpel config) and verifies against a trusted manifest. Detects if policies have been tampered with or downgraded.
When agents use it: Compliance audits in Enterprise environments. Answer: "Have our security/governance policies been modified?" with cryptographic proof.
Pro/Enterprise limits: Advanced cryptographic verification and remote manifest support available in higher tiers.
code_policy_check – Check code against compliance standards
▶
What it does: Audits code against multi-standard compliance frameworks: OWASP Top 10 (hardcoded secrets, eval usage), SOC 2 (audit logging, access controls), ISO 27001 (data handling), HIPAA (PII handling), PCI-DSS. Also enforces org-specific style guides. Returns violations with severity (CRITICAL/HIGH/MEDIUM/LOW) and remediation guidance.
When agents use it: Pre-commit checks, CI/CD gates, compliance reporting. Answer: "Does this code violate our security/compliance standards?"
Enterprise limits: Adds custom OPA (Open Policy Agent) rules and compliance reporting for organization-specific policies.
Start Using Code Scalpel
All 22 tools included free. Install in 30 seconds.