Skip to content

cross_file_security_scanΒΆ

Track how tainted data flows across file boundaries to detect vulnerabilities where the source and sink are in different files.

Quick ReferenceΒΆ

cross_file_security_scan(
    project_root: str = None,           # Project directory
    entry_points: list = None,          # Starting functions
    max_depth: int = 5,                 # Max traversal depth
    include_diagram: bool = True,       # Include Mermaid diagram
    confidence_threshold: float = 0.7,  # Minimum confidence
    timeout_seconds: int = 120,         # Analysis timeout
    max_modules: int = 500              # Max modules to analyze
) -> CrossFileSecurityResult

User StoriesΒΆ

Persona Story Tool Value
πŸ›‘οΈ Marcus (Security Engineer) "Track taint flow across multiple files (user input β†’ SQL query)" Multi-file vulnerability detection
🏒 Jennifer (Enterprise Architect) "Scan organization-wide for cross-module vulnerabilities" Comprehensive security
πŸ‘₯ David (Team Lead) "Verify no security issues span module boundaries before release" Risk mitigation

β†’ See all user stories

ParametersΒΆ

Parameter Type Required Default Description
project_root string No cwd Project root directory
entry_points list No None Entry point functions (e.g., ["routes.py:index"])
max_depth int No 5 Maximum call depth to trace
include_diagram bool No true Include Mermaid visualization
confidence_threshold float No 0.7 Minimum confidence (0.0-1.0)
timeout_seconds int No 120 Maximum analysis time
max_modules int No 500 Maximum modules to analyze

Response SchemaΒΆ

{
  "data": {
    "vulnerabilities": [
      {
        "type": "string",
        "severity": "string",
        "cwe": "string",
        "source_file": "string",
        "source_line": "integer",
        "source_function": "string",
        "sink_file": "string",
        "sink_line": "integer",
        "sink_function": "string",
        "cross_file_flow": [
          {
            "file": "string",
            "line": "integer",
            "code": "string",
            "taint_state": "string"
          }
        ],
        "confidence": "float",
        "remediation": "string"
      }
    ],
    "taint_entry_points": [
      {
        "file": "string",
        "function": "string",
        "source": "string",
        "line": "integer"
      }
    ],
    "dangerous_sinks": [
      {
        "file": "string",
        "function": "string",
        "sink_type": "string",
        "line": "integer"
      }
    ],
    "summary": {
      "critical": "integer",
      "high": "integer",
      "medium": "integer",
      "low": "integer",
      "files_analyzed": "integer",
      "functions_traced": "integer"
    },
    "mermaid_diagram": "string"
  },
  "tier_applied": "string",
  "duration_ms": "integer"
}

ExamplesΒΆ

Full Project ScanΒΆ

Run a cross-file security scan on the entire src directory
{
  "project_root": "/project/src"
}
codescalpel cross-file-security-scan src/
{
  "data": {
    "vulnerabilities": [
      {
        "type": "SQL_INJECTION",
        "severity": "CRITICAL",
        "cwe": "CWE-89",
        "source_file": "routes.py",
        "source_line": 25,
        "source_function": "search",
        "sink_file": "db.py",
        "sink_line": 42,
        "sink_function": "execute_query",
        "cross_file_flow": [
          {
            "file": "routes.py",
            "line": 25,
            "code": "query = request.args.get('q')",
            "taint_state": "TAINTED (user input)"
          },
          {
            "file": "routes.py",
            "line": 26,
            "code": "results = db.search(query)",
            "taint_state": "PASSED to db.search"
          },
          {
            "file": "db.py",
            "line": 35,
            "code": "def search(term):",
            "taint_state": "RECEIVED as parameter"
          },
          {
            "file": "db.py",
            "line": 42,
            "code": "cursor.execute(f\"SELECT * FROM items WHERE name LIKE '%{term}%'\")",
            "taint_state": "SINK (SQL execution)"
          }
        ],
        "confidence": 0.95,
        "remediation": "Use parameterized queries in db.search()"
      }
    ],
    "taint_entry_points": [
      {
        "file": "routes.py",
        "function": "search",
        "source": "request.args.get('q')",
        "line": 25
      },
      {
        "file": "api.py",
        "function": "handle_upload",
        "source": "request.files",
        "line": 45
      }
    ],
    "dangerous_sinks": [
      {
        "file": "db.py",
        "function": "execute_query",
        "sink_type": "SQL_EXECUTION",
        "line": 42
      },
      {
        "file": "utils.py",
        "function": "run_command",
        "sink_type": "COMMAND_EXECUTION",
        "line": 78
      }
    ],
    "summary": {
      "critical": 1,
      "high": 0,
      "medium": 0,
      "low": 0,
      "files_analyzed": 12,
      "functions_traced": 45
    },
    "mermaid_diagram": "graph TD\n    subgraph routes.py\n        A[search: request.args.get]\n    end\n    subgraph db.py\n        B[execute_query: cursor.execute]\n    end\n    A -->|tainted| B"
  },
  "tier_applied": "pro",
  "duration_ms": 2450
}

Scan from Specific Entry PointsΒΆ

Analyze security starting from the API endpoints in routes.py
{
  "project_root": "/project/src",
  "entry_points": [
    "routes.py:handle_login",
    "routes.py:handle_search",
    "routes.py:handle_upload"
  ],
  "max_depth": 8
}
codescalpel cross-file-security-scan src/ \
  --entry-points "routes.py:handle_login,routes.py:handle_search,routes.py:handle_upload" \
  --max-depth 8
{
  "data": {
    "vulnerabilities": [
      {
        "type": "COMMAND_INJECTION",
        "severity": "CRITICAL",
        "source_file": "routes.py",
        "source_function": "handle_upload",
        "sink_file": "processor.py",
        "sink_function": "process_file",
        "cross_file_flow": [
          {"file": "routes.py", "line": 45, "code": "filename = request.files['file'].filename"},
          {"file": "routes.py", "line": 48, "code": "processor.process_file(filename)"},
          {"file": "processor.py", "line": 22, "code": "os.system(f'convert {filename} output.pdf')"}
        ],
        "confidence": 0.92
      }
    ],
    "summary": {
      "critical": 1,
      "files_analyzed": 5,
      "functions_traced": 18
    }
  },
  "tier_applied": "pro",
  "duration_ms": 1250
}

With Lower Confidence for AuditΒΆ

Do a comprehensive security audit with low confidence threshold
{
  "project_root": "/project/src",
  "confidence_threshold": 0.5,
  "max_depth": 10
}
codescalpel cross-file-security-scan src/ \
  --confidence-threshold 0.5 \
  --max-depth 10

Cross-File Vulnerability PatternsΒΆ

Pattern 1: SQL Injection Across FilesΒΆ

routes.py:search() β†’ services.py:find_items() β†’ db.py:execute()
         ↑ user input                                    ↑ SQL sink

Pattern 2: Command Injection via ProcessingΒΆ

api.py:upload() β†’ processor.py:convert() β†’ utils.py:run_cmd()
       ↑ filename                              ↑ os.system

Pattern 3: Path Traversal via StorageΒΆ

handlers.py:download() β†’ storage.py:get_file() β†’ open()
            ↑ path parameter                      ↑ file open

Mermaid DiagramΒΆ

The tool generates visualizations showing taint flow:

graph TD
    subgraph "routes.py"
        A[handle_request<br/>request.args.get]
    end

    subgraph "services.py"
        B[process_data<br/>passes through]
    end

    subgraph "db.py"
        C[execute_query<br/>cursor.execute]
    end

    A -->|tainted| B
    B -->|tainted| C

    style A fill:#ff6b6b
    style C fill:#ffd93d

Tier DifferencesΒΆ

This tool is available at all tiers. What differs are the limits and capabilities:

Feature Community Pro Enterprise
Availability βœ… Available βœ… Available βœ… Available
Cross-file analysis βœ… Basic βœ… Advanced βœ… Advanced
Max depth 3 10 Unlimited
Max modules 10 200 Unlimited
Timeout 60 seconds 180 seconds 600 seconds
Confidence threshold 0.7 0.7 0.5 (adjustable)
Custom entry points Not available βœ… βœ…
Mermaid diagrams Not available βœ… βœ…
SARIF export Not available βœ… βœ…
Compliance reporting Not available Not available βœ…
Custom sinks Not available Not available βœ…

Community Tier

Cross-file security scanning is available in Community tier with basic taint tracking (3 depth, 10 modules max). For deeper analysis and visualization, consider upgrading to Pro or Enterprise.

Error HandlingΒΆ

TimeoutΒΆ

{
  "data": {
    "partial_results": true,
    "vulnerabilities": [...],
    "warning": "Analysis timed out after 120s. Results may be incomplete."
  },
  "error": null
}

Too Many ModulesΒΆ

{
  "data": null,
  "error": {
    "code": "MODULE_LIMIT_EXCEEDED",
    "message": "Project has 750 modules, exceeds limit of 500",
    "suggestion": "Use entry_points to focus analysis or upgrade tier"
  }
}

Tier LimitsΒΆ

cross_file_security_scan capabilities vary by tier:

Feature Community Pro Enterprise
Max modules analyzed 10 100 Unlimited
Max depth 3 10 Unlimited
Taint tracking βœ… Basic βœ… Advanced βœ… Full
Mermaid diagrams βœ… βœ… βœ… Enhanced
Confidence scoring βœ… βœ… βœ…
Entry point detection βœ… βœ… βœ…
Timeout protection 120s 120s Unlimited
Progress reporting ❌ βœ… βœ…
Custom sources/sinks ❌ ❌ βœ…

Community TierΒΆ

  • βœ… Track taint flow across file boundaries
  • βœ… Detect cross-file SQL injection, XSS, command injection
  • βœ… Entry point detection (routes, handlers)
  • βœ… Basic taint tracking
  • βœ… Confidence scoring for vulnerabilities
  • ⚠️ Limited to 10 modules - Small projects only
  • ⚠️ Max depth of 3 - Shallow call chains
  • ❌ No progress reporting for long scans
  • ❌ No custom taint sources/sinks

Pro TierΒΆ

  • βœ… All Community features
  • βœ… 100 modules analyzed - Handle larger codebases
  • βœ… Max depth of 10 - Deeper call chain analysis
  • βœ… Advanced taint tracking - Better precision
  • βœ… Progress reporting - Know analysis status
  • βœ… Enhanced Mermaid diagrams - Better visualization
  • βœ… Framework-aware - Understand Django/Flask/Express patterns

Enterprise TierΒΆ

  • βœ… All Pro features
  • βœ… Unlimited modules - No project size restrictions
  • βœ… Unlimited depth - Complete call chain analysis
  • βœ… Full taint tracking - Highest precision
  • βœ… Custom sources/sinks - Organization-specific rules
  • βœ… Multi-repository analysis - Scan across repos
  • βœ… No timeout limits - Complete analysis guaranteed

Key Difference: Module Coverage and Depth - Community: 10 modules, depth 3 - Small apps, quick check - Pro: 100 modules, depth 10 - Production apps, thorough scan - Enterprise: Unlimited - Complete security analysis

β†’ See tier comparison

Best PracticesΒΆ

  1. Start with entry points - Focus on routes/API handlers
  2. Use for security audits - Find vulnerabilities single-file misses
  3. Review taint flows - Understand full data path
  4. Check all entry points - Web inputs, file uploads, queue messages
  5. Run before deployment - Catch cross-file issues early

Remediation StrategiesΒΆ

For Cross-File SQL InjectionΒΆ

# db.py - Add parameterized query
def search(term: str) -> list:
    # Before (vulnerable)
    cursor.execute(f"SELECT * FROM items WHERE name LIKE '%{term}%'")

    # After (safe)
    cursor.execute(
        "SELECT * FROM items WHERE name LIKE ?",
        (f"%{term}%",)
    )

For Cross-File Command InjectionΒΆ

# processor.py - Use safe subprocess
def process_file(filename: str) -> None:
    # Before (vulnerable)
    os.system(f"convert {filename} output.pdf")

    # After (safe)
    import shlex
    subprocess.run(["convert", filename, "output.pdf"], check=True)