Known Limitations & Edge Cases
We believe in radical transparency. Every tool has trade-offs, and we document ours upfront so you can make informed decisions.
🎯 Why This Page Exists
Code Scalpel is built for precision, not magic. We document limitations because we believe users deserve to know exactly what our tools can and cannot do. This page represents our commitment to engineering honesty over marketing hype.
Tool-Specific Limitations
"Middleware Blindness"
Global middleware sanitizers (e.g., Express body parsers, Flask request validators) are invisible to the taint tracker unless explicitly registered in your configuration.
Why: The tool analyzes call graphs from entry points. If sanitization happens in framework middleware that isn't invoked in the analyzed code path, the data remains tainted when it reaches your controller.
Workaround: Register middleware sanitizers in SANITIZER_PATTERNS config, or ensure sanitizer calls appear in the analyzed call graph.
"Duck Typing Blindness"
When analyzing x.save() calls, the tool does not link to every save() method in your codebase to prevent graph explosion.
Why: In Python's duck-typed world, linking x.save() to all classes with a save() method would create thousands of false edges. We prefer "missing a link" over "creating 1,000 false links."
Behavior: With advanced_resolution=True, we track simple type assignments. Without type information, the call remains unlinked.
"Stale Ghosts"
Incremental crawl mode does not detect file deletions. Deleted files vanish from results but may linger in the cache until a full re-crawl.
Why: Incremental mode optimizes for speed by only re-scanning changed files. Deletion detection would require full directory traversal, negating the performance benefit.
Workaround: Periodically run full crawls (without incremental_crawl) to clean up stale cache entries.
"Unmockable Logic"
Generated tests for code relying on C-extensions, hardware, or complex external state (databases) will likely fail at runtime because the tool does not auto-mock these dependencies.
Why: Symbolic execution operates on pure Python AST. It cannot model numpy arrays, torch tensors, or hardware I/O.
Behavior: Tests will contain concrete input values but call the actual function—no mocking is inserted.
Medium ImpactSecurity & Safety Edge Cases
Dead Code Vulnerabilities
Vulnerabilities in never-called functions are flagged with the same severity as reachable code.
Why: Static analysis cannot guarantee a function is truly "dead" (reflection, dynamic imports, future changes). We prioritize completeness over noise reduction.
Use Case: Useful for legacy code audits, but may create noise in actively maintained codebases.
Low ImpactFail-Closed on API Timeouts
If the OSV API is unreachable (timeout/500), the tool returns success=False with an error message.
Why: In security tools, a network error is not a clean bill of health. We fail-closed to prevent false confidence.
Note: Requires outbound connectivity to OSV API. For air-gapped environments, consider local vulnerability databases.
High ImpactGenerics with any
Cannot detect any types hidden inside generics (e.g., List<any>, Promise<any>).
Why: The detector inspects as assertions and call patterns but does not traverse generic type arguments.
Workaround: Manually review generic type declarations, or use TypeScript's noImplicitAny in strict mode.
Simple Obfuscation Evasion
String splitting like "ex" + "ec" instead of "exec" evades detection.
Why: Matching is literal (AST call names for Python, substring search for others). Deobfuscation requires symbolic execution, which is not applied at this layer.
Note: Python comments are ignored (AST-based), but JavaScript/TypeScript comments may trigger false positives.
Low ImpactPerformance Boundaries
Hanging on Unresponsive Mounts
Can hang indefinitely on unresponsive NFS/SMB mounts (no timeout mechanism).
Why: The tool calls Path.exists() synchronously. If the OS-level stat call hangs, the tool waits indefinitely.
Workaround: Ensure mount points are responsive before validation, or use OS-level timeouts (mount -o soft,timeo=5 for NFS).
10 Iteration Loop Limit
Hard limit of 10 loop iterations. Paths requiring >10 iterations are not fully explored.
Why: Symbolic execution can generate infinite paths for unbounded loops. The "fuel" limit prevents hanging on while True: constructs.
Configurable: max_loop_iterations can be adjusted, but higher values increase execution time exponentially.
File Size Limits
Enforces tier-based byte limits with a hard cap of 110MB per file.
Why: Parsing 10MB single-line minified files can cause parser slowdowns or memory issues.
Behavior: Files exceeding the limit return an error before parsing begins. No truncation or chunking.
Low ImpactDesign Trade-Offs
Lexicographic File Selection (Pro Tier Limits)
When Pro tier file limits are exceeded (e.g., 1,000 files in get_project_map), files are selected lexicographically (alphabetically by path), not by "importance."
Why: Deterministic behavior is more valuable than heuristic "importance" scoring. Alphabetical selection is predictable and reproducible.
Result: In a monorepo, services/auth/ will be included before services/payments/.
No Behavioral Equivalence Checking
simulate_refactor checks syntax, structure, and security—but not behavioral equivalence.
Example: Refactoring uuid.uuid4() to uuid.uuid1() returns SAFE because no syntax/security issues are detected, even though output behavior differs.
Why: Behavioral equivalence requires test execution or full symbolic execution, which is out of scope for a fast simulation tool.
Medium ImpactWhat We're Working On
🚧 Active Improvements
rename_symbol Keyword Validation: Moving identifier validation to the top of the function to prevent partial application on invalid names (e.g., renaming to reserved keywords like yield).
scan_dependencies Fail-Closed: Ensuring API timeouts return success=False instead of success=True with a warning note.
Questions about a specific limitation? Ask in our GitHub Discussions
Last Updated: January 2026