cross_file_security_scan¶

Track how tainted data flows across file boundaries to detect vulnerabilities where the source and sink are in different files.

Quick Reference¶

cross_file_security_scan(
    project_root: str = None,           # Project directory
    entry_points: list = None,          # Starting functions
    max_depth: int = 5,                 # Max traversal depth
    include_diagram: bool = True,       # Include Mermaid diagram
    confidence_threshold: float = 0.7,  # Minimum confidence
    timeout_seconds: int = 120,         # Analysis timeout
    max_modules: int = 500              # Max modules to analyze
) -> CrossFileSecurityResult

User Stories¶

Persona	Story	Tool Value
🛡️ Marcus (Security Engineer)	"Track taint flow across multiple files (user input → SQL query)"	Multi-file vulnerability detection
🏢 Jennifer (Enterprise Architect)	"Scan organization-wide for cross-module vulnerabilities"	Comprehensive security
👥 David (Team Lead)	"Verify no security issues span module boundaries before release"	Risk mitigation

→ See all user stories

Parameters¶

Parameter	Type	Required	Default	Description
`project_root`	string	No	cwd	Project root directory
`entry_points`	list	No	None	Entry point functions (e.g., ["routes.py:index"])
`max_depth`	int	No	5	Maximum call depth to trace
`include_diagram`	bool	No	true	Include Mermaid visualization
`confidence_threshold`	float	No	0.7	Minimum confidence (0.0-1.0)
`timeout_seconds`	int	No	120	Maximum analysis time
`max_modules`	int	No	500	Maximum modules to analyze

Response Schema¶

{
  "data": {
    "vulnerabilities": [
      {
        "type": "string",
        "severity": "string",
        "cwe": "string",
        "source_file": "string",
        "source_line": "integer",
        "source_function": "string",
        "sink_file": "string",
        "sink_line": "integer",
        "sink_function": "string",
        "cross_file_flow": [
          {
            "file": "string",
            "line": "integer",
            "code": "string",
            "taint_state": "string"
          }
        ],
        "confidence": "float",
        "remediation": "string"
      }
    ],
    "taint_entry_points": [
      {
        "file": "string",
        "function": "string",
        "source": "string",
        "line": "integer"
      }
    ],
    "dangerous_sinks": [
      {
        "file": "string",
        "function": "string",
        "sink_type": "string",
        "line": "integer"
      }
    ],
    "summary": {
      "critical": "integer",
      "high": "integer",
      "medium": "integer",
      "low": "integer",
      "files_analyzed": "integer",
      "functions_traced": "integer"
    },
    "mermaid_diagram": "string"
  },
  "tier_applied": "string",
  "duration_ms": "integer"
}

Examples¶

Full Project Scan¶

AI PromptMCP Tool CallCLI CommandResponse

Run a cross-file security scan on the entire src directory

{
  "project_root": "/project/src"
}

codescalpel cross-file-security-scan src/

{
  "data": {
    "vulnerabilities": [
      {
        "type": "SQL_INJECTION",
        "severity": "CRITICAL",
        "cwe": "CWE-89",
        "source_file": "routes.py",
        "source_line": 25,
        "source_function": "search",
        "sink_file": "db.py",
        "sink_line": 42,
        "sink_function": "execute_query",
        "cross_file_flow": [
          {
            "file": "routes.py",
            "line": 25,
            "code": "query = request.args.get('q')",
            "taint_state": "TAINTED (user input)"
          },
          {
            "file": "routes.py",
            "line": 26,
            "code": "results = db.search(query)",
            "taint_state": "PASSED to db.search"
          },
          {
            "file": "db.py",
            "line": 35,
            "code": "def search(term):",
            "taint_state": "RECEIVED as parameter"
          },
          {
            "file": "db.py",
            "line": 42,
            "code": "cursor.execute(f\"SELECT * FROM items WHERE name LIKE '%{term}%'\")",
            "taint_state": "SINK (SQL execution)"
          }
        ],
        "confidence": 0.95,
        "remediation": "Use parameterized queries in db.search()"
      }
    ],
    "taint_entry_points": [
      {
        "file": "routes.py",
        "function": "search",
        "source": "request.args.get('q')",
        "line": 25
      },
      {
        "file": "api.py",
        "function": "handle_upload",
        "source": "request.files",
        "line": 45
      }
    ],
    "dangerous_sinks": [
      {
        "file": "db.py",
        "function": "execute_query",
        "sink_type": "SQL_EXECUTION",
        "line": 42
      },
      {
        "file": "utils.py",
        "function": "run_command",
        "sink_type": "COMMAND_EXECUTION",
        "line": 78
      }
    ],
    "summary": {
      "critical": 1,
      "high": 0,
      "medium": 0,
      "low": 0,
      "files_analyzed": 12,
      "functions_traced": 45
    },
    "mermaid_diagram": "graph TD\n    subgraph routes.py\n        A[search: request.args.get]\n    end\n    subgraph db.py\n        B[execute_query: cursor.execute]\n    end\n    A -->|tainted| B"
  },
  "tier_applied": "pro",
  "duration_ms": 2450
}

Scan from Specific Entry Points¶

AI PromptMCP Tool CallCLI CommandResponse

Analyze security starting from the API endpoints in routes.py

{
  "project_root": "/project/src",
  "entry_points": [
    "routes.py:handle_login",
    "routes.py:handle_search",
    "routes.py:handle_upload"
  ],
  "max_depth": 8
}

codescalpel cross-file-security-scan src/ \
  --entry-points "routes.py:handle_login,routes.py:handle_search,routes.py:handle_upload" \
  --max-depth 8

{
  "data": {
    "vulnerabilities": [
      {
        "type": "COMMAND_INJECTION",
        "severity": "CRITICAL",
        "source_file": "routes.py",
        "source_function": "handle_upload",
        "sink_file": "processor.py",
        "sink_function": "process_file",
        "cross_file_flow": [
          {"file": "routes.py", "line": 45, "code": "filename = request.files['file'].filename"},
          {"file": "routes.py", "line": 48, "code": "processor.process_file(filename)"},
          {"file": "processor.py", "line": 22, "code": "os.system(f'convert {filename} output.pdf')"}
        ],
        "confidence": 0.92
      }
    ],
    "summary": {
      "critical": 1,
      "files_analyzed": 5,
      "functions_traced": 18
    }
  },
  "tier_applied": "pro",
  "duration_ms": 1250
}

With Lower Confidence for Audit¶

AI PromptMCP Tool CallCLI Command

Do a comprehensive security audit with low confidence threshold

{
  "project_root": "/project/src",
  "confidence_threshold": 0.5,
  "max_depth": 10
}

codescalpel cross-file-security-scan src/ \
  --confidence-threshold 0.5 \
  --max-depth 10

Cross-File Vulnerability Patterns¶

Pattern 1: SQL Injection Across Files¶

routes.py:search() → services.py:find_items() → db.py:execute()
         ↑ user input                                    ↑ SQL sink

Pattern 2: Command Injection via Processing¶

api.py:upload() → processor.py:convert() → utils.py:run_cmd()
       ↑ filename                              ↑ os.system

Pattern 3: Path Traversal via Storage¶

handlers.py:download() → storage.py:get_file() → open()
            ↑ path parameter                      ↑ file open

Mermaid Diagram¶

The tool generates visualizations showing taint flow:

graph TD
    subgraph "routes.py"
        A[handle_request<br/>request.args.get]
    end

    subgraph "services.py"
        B[process_data<br/>passes through]
    end

    subgraph "db.py"
        C[execute_query<br/>cursor.execute]
    end

    A -->|tainted| B
    B -->|tainted| C

    style A fill:#ff6b6b
    style C fill:#ffd93d

Tier Differences¶

This tool is available at all tiers. What differs are the limits and capabilities:

Feature	Community	Pro	Enterprise
Availability	✅ Available	✅ Available	✅ Available
Cross-file analysis	✅ Basic	✅ Advanced	✅ Advanced
Max depth	3	10	Unlimited
Max modules	10	200	Unlimited
Timeout	60 seconds	180 seconds	600 seconds
Confidence threshold	0.7	0.7	0.5 (adjustable)
Custom entry points	Not available	✅	✅
Mermaid diagrams	Not available	✅	✅
SARIF export	Not available	✅	✅
Compliance reporting	Not available	Not available	✅
Custom sinks	Not available	Not available	✅

Community Tier

Cross-file security scanning is available in Community tier with basic taint tracking (3 depth, 10 modules max). For deeper analysis and visualization, consider upgrading to Pro or Enterprise.

Error Handling¶

Timeout¶

{
  "data": {
    "partial_results": true,
    "vulnerabilities": [...],
    "warning": "Analysis timed out after 120s. Results may be incomplete."
  },
  "error": null
}

Too Many Modules¶

{
  "data": null,
  "error": {
    "code": "MODULE_LIMIT_EXCEEDED",
    "message": "Project has 750 modules, exceeds limit of 500",
    "suggestion": "Use entry_points to focus analysis or upgrade tier"
  }
}

Tier Limits¶

cross_file_security_scan capabilities vary by tier:

Feature	Community	Pro	Enterprise
Max modules analyzed	10	100	Unlimited
Max depth	3	10	Unlimited
Taint tracking	✅ Basic	✅ Advanced	✅ Full
Mermaid diagrams	✅	✅	✅ Enhanced
Confidence scoring	✅	✅	✅
Entry point detection	✅	✅	✅
Timeout protection	120s	120s	Unlimited
Progress reporting	❌	✅	✅
Custom sources/sinks	❌	❌	✅

Community Tier¶

✅ Track taint flow across file boundaries
✅ Detect cross-file SQL injection, XSS, command injection
✅ Entry point detection (routes, handlers)
✅ Basic taint tracking
✅ Confidence scoring for vulnerabilities
⚠️ Limited to 10 modules - Small projects only
⚠️ Max depth of 3 - Shallow call chains
❌ No progress reporting for long scans
❌ No custom taint sources/sinks

Pro Tier¶

✅ All Community features
✅ 100 modules analyzed - Handle larger codebases
✅ Max depth of 10 - Deeper call chain analysis
✅ Advanced taint tracking - Better precision
✅ Progress reporting - Know analysis status
✅ Enhanced Mermaid diagrams - Better visualization
✅ Framework-aware - Understand Django/Flask/Express patterns

Enterprise Tier¶

✅ All Pro features
✅ Unlimited modules - No project size restrictions
✅ Unlimited depth - Complete call chain analysis
✅ Full taint tracking - Highest precision
✅ Custom sources/sinks - Organization-specific rules
✅ Multi-repository analysis - Scan across repos
✅ No timeout limits - Complete analysis guaranteed

Key Difference: Module Coverage and Depth - Community: 10 modules, depth 3 - Small apps, quick check - Pro: 100 modules, depth 10 - Production apps, thorough scan - Enterprise: Unlimited - Complete security analysis

→ See tier comparison

Best Practices¶

Start with entry points - Focus on routes/API handlers
Use for security audits - Find vulnerabilities single-file misses
Review taint flows - Understand full data path
Check all entry points - Web inputs, file uploads, queue messages
Run before deployment - Catch cross-file issues early

Remediation Strategies¶

For Cross-File SQL Injection¶

# db.py - Add parameterized query
def search(term: str) -> list:
    # Before (vulnerable)
    cursor.execute(f"SELECT * FROM items WHERE name LIKE '%{term}%'")

    # After (safe)
    cursor.execute(
        "SELECT * FROM items WHERE name LIKE ?",
        (f"%{term}%",)
    )

For Cross-File Command Injection¶

# processor.py - Use safe subprocess
def process_file(filename: str) -> None:
    # Before (vulnerable)
    os.system(f"convert {filename} output.pdf")

    # After (safe)
    import shlex
    subprocess.run(["convert", filename, "output.pdf"], check=True)

security_scan - Single-file analysis
get_call_graph - Visualize call flow
scan_dependencies - Dependency CVEs