Skip to content

crawl_projectΒΆ

Crawl an entire project directory, analyzing all Python files for structure, complexity, and security issues. Provides a comprehensive project-wide view.

Quick ReferenceΒΆ

crawl_project(
    root_path: str = None,              # Project root
    pattern: str = None,                # File pattern to match
    pattern_type: str = "regex",        # regex or glob
    exclude_dirs: list = None,          # Directories to skip
    complexity_threshold: int = 10,      # Flag complex functions
    include_related: list = None,       # Include related file types
    include_report: bool = True         # Generate summary report
) -> ProjectCrawlResult

User StoriesΒΆ

Persona Story Tool Value
πŸ”° Alex (First-Timer) "Inventory all files in this repository" See project scope
πŸ‘₯ David (Team Lead) "Identify complexity hotspots needing attention" Technical debt visibility
πŸ”§ Chris (OSS Contributor) "Analyze entire codebase structure for contribution planning" Project understanding

β†’ See all user stories

ParametersΒΆ

Parameter Type Required Default Description
root_path string No cwd Project root directory
pattern string No None File pattern to match
pattern_type string No "regex" Pattern type: "regex" or "glob"
exclude_dirs list No None Directories to exclude
complexity_threshold int No 10 Cyclomatic complexity threshold
include_related list No None Related file types (e.g., [".yaml", ".json"])
include_report bool No true Include summary report

Response SchemaΒΆ

{
  "data": {
    "files_analyzed": "integer",
    "total_lines": "integer",
    "files": [
      {
        "path": "string",
        "lines": "integer",
        "functions": "integer",
        "classes": "integer",
        "complexity": {
          "average": "float",
          "max": "integer",
          "hotspots": [{"name": "string", "complexity": "integer"}]
        },
        "imports": ["string"],
        "security_warnings": ["string"]
      }
    ],
    "summary": {
      "total_functions": "integer",
      "total_classes": "integer",
      "average_complexity": "float",
      "high_complexity_count": "integer",
      "security_warnings_count": "integer"
    },
    "complexity_hotspots": [
      {
        "file": "string",
        "function": "string",
        "complexity": "integer",
        "line": "integer"
      }
    ],
    "security_overview": {
      "warnings_by_type": {},
      "high_risk_files": ["string"]
    },
    "report": "string"
  },
  "tier_applied": "string",
  "duration_ms": "integer"
}

ExamplesΒΆ

Basic Project CrawlΒΆ

Analyze all Python files in the project
{
  "root_path": "/project"
}
codescalpel crawl-project .
{
  "data": {
    "files_analyzed": 45,
    "total_lines": 12580,
    "files": [
      {
        "path": "src/services/order_service.py",
        "lines": 450,
        "functions": 12,
        "classes": 2,
        "complexity": {
          "average": 6.5,
          "max": 15,
          "hotspots": [
            {"name": "process_order", "complexity": 15}
          ]
        },
        "imports": ["sqlalchemy", "pydantic", "datetime"],
        "security_warnings": []
      },
      {
        "path": "src/handlers/auth.py",
        "lines": 280,
        "functions": 8,
        "classes": 1,
        "complexity": {
          "average": 4.2,
          "max": 8
        },
        "security_warnings": [
          "Potential hardcoded secret at line 45"
        ]
      }
    ],
    "summary": {
      "total_functions": 156,
      "total_classes": 32,
      "average_complexity": 4.8,
      "high_complexity_count": 5,
      "security_warnings_count": 3
    },
    "complexity_hotspots": [
      {"file": "src/services/order_service.py", "function": "process_order", "complexity": 15},
      {"file": "src/utils/validators.py", "function": "validate_complex_form", "complexity": 14},
      {"file": "src/api/routes.py", "function": "handle_upload", "complexity": 12}
    ],
    "security_overview": {
      "warnings_by_type": {
        "hardcoded_secret": 1,
        "sql_string_concat": 2
      },
      "high_risk_files": ["src/handlers/auth.py"]
    }
  },
  "tier_applied": "community",
  "duration_ms": 1250
}

Filter by PatternΒΆ

Analyze only test files
{
  "root_path": "/project",
  "pattern": "test_.*\\.py$",
  "pattern_type": "regex"
}
codescalpel crawl-project . --pattern "test_.*\.py$" --pattern-type regex
{
  "data": {
    "files_analyzed": 28,
    "pattern_matched": "test_.*\\.py$",
    "files": [
      {"path": "tests/test_auth.py", "lines": 180, "functions": 12},
      {"path": "tests/test_orders.py", "lines": 250, "functions": 18}
    ],
    "summary": {
      "total_functions": 156,
      "average_complexity": 2.1,
      "test_to_code_ratio": 0.62
    }
  }
}

Exclude DirectoriesΒΆ

Analyze the project but skip vendor and migrations
{
  "root_path": "/project",
  "exclude_dirs": ["vendor", "migrations", "__pycache__", ".git"]
}
codescalpel crawl-project . \
  --exclude-dirs vendor,migrations,__pycache__,.git
Analyze Python files and also include configuration files
{
  "root_path": "/project",
  "include_related": ["*.yaml", "*.json", "*.toml"]
}
codescalpel crawl-project . --include-related "*.yaml,*.json,*.toml"
{
  "data": {
    "files_analyzed": 45,
    "related_files": [
      {"path": "config/settings.yaml", "type": "yaml", "size": 1250},
      {"path": "pyproject.toml", "type": "toml", "size": 850},
      {"path": "package.json", "type": "json", "size": 420}
    ],
    "configuration_overview": {
      "yaml_files": 3,
      "json_files": 2,
      "toml_files": 1
    }
  }
}

Complexity AnalysisΒΆ

Find all functions with complexity over 8
{
  "root_path": "/project/src",
  "complexity_threshold": 8
}
{
  "data": {
    "complexity_threshold": 8,
    "complexity_hotspots": [
      {
        "file": "src/services/order_service.py",
        "function": "process_order",
        "complexity": 15,
        "line": 45,
        "recommendation": "Consider splitting into smaller functions"
      },
      {
        "file": "src/utils/validators.py",
        "function": "validate_complex_form",
        "complexity": 14,
        "line": 28,
        "recommendation": "Extract validation rules to separate methods"
      },
      {
        "file": "src/api/routes.py",
        "function": "handle_upload",
        "complexity": 12,
        "line": 112
      }
    ],
    "summary": {
      "functions_above_threshold": 5,
      "average_complexity": 4.8,
      "max_complexity": 15
    },
    "report": "## Complexity Report\n\n3 functions exceed complexity threshold of 8:\n\n1. `process_order` in order_service.py (complexity: 15)\n2. `validate_complex_form` in validators.py (complexity: 14)\n..."
  }
}

Report FormatΒΆ

When include_report=true, a markdown report is generated:

# Project Analysis Report

## Overview
- **Files Analyzed:** 45
- **Total Lines:** 12,580
- **Functions:** 156
- **Classes:** 32

## Complexity Summary
- **Average Complexity:** 4.8
- **High Complexity Functions:** 5

### Hotspots
| File | Function | Complexity | Line |
|------|----------|------------|------|
| order_service.py | process_order | 15 | 45 |
| validators.py | validate_complex_form | 14 | 28 |

## Security Overview
- **Total Warnings:** 3
- **High Risk Files:** 1

### Warnings by Type
- Hardcoded secrets: 1
- SQL string concatenation: 2

## Recommendations
1. Refactor `process_order` - complexity too high
2. Review hardcoded secret in auth.py:45
3. Use parameterized queries in db_utils.py

Tier LimitsΒΆ

crawl_project capabilities vary by tier:

Feature Community Pro Enterprise
Max files 100 Unlimited Unlimited
Max depth 10 10 10
Parsing enabled ❌ βœ… βœ…
Complexity analysis ❌ βœ… βœ… Advanced
Respect .gitignore βœ… βœ… βœ…
Security warnings βœ… βœ… βœ…
Pattern filtering Basic βœ… Full βœ… Full
Report generation ❌ βœ… Markdown/HTML βœ… Markdown/HTML/PDF
Related file types ❌ βœ… YAML/JSON/TOML βœ… All types
Historical comparison ❌ βœ… Version tracking βœ… Full history
Custom analyzers ❌ ❌ βœ… Plugin support
Timeout 60 seconds 120 seconds 600 seconds

Community TierΒΆ

  • βœ… Crawl up to 100 Python files
  • βœ… Respect .gitignore patterns
  • βœ… Basic file inventory
  • βœ… Basic security warnings
  • ⚠️ Max 100 files - Small projects only
  • ⚠️ No parsing - File listing only, no AST analysis
  • ⚠️ No complexity analysis - Can't identify hotspots
  • ❌ No report generation
  • ❌ No related file types (YAML, JSON, etc.)

Pro TierΒΆ

  • βœ… All Community features
  • βœ… Unlimited files - Handle large codebases
  • βœ… Full parsing enabled - Complete AST analysis
  • βœ… Complexity analysis - Identify hotspots
  • βœ… Full pattern filtering - Advanced file selection
  • βœ… Report generation - Markdown and HTML reports
  • βœ… Related file types - Include YAML, JSON, TOML
  • βœ… Version tracking - Compare across versions
  • βœ… 120 second timeout - Longer analysis time

Enterprise TierΒΆ

  • βœ… All Pro features
  • βœ… Advanced complexity analysis - More metrics
  • βœ… PDF report generation - Professional reports
  • βœ… All file types supported - Not just Python/YAML/JSON
  • βœ… Full historical comparison - Trend analysis
  • βœ… Custom analyzer plugins - Extend functionality
  • βœ… 600 second timeout - Handle massive codebases
  • βœ… Multi-repository crawling - Organization-wide analysis

Key Difference: Scale and Analysis Depth - Community: 100 files, no parsing - Quick file listing - Pro: Unlimited files, parsing, complexity - Full project analysis - Enterprise: Unlimited, advanced analysis, plugins - Enterprise reporting

β†’ See tier comparison

Use CasesΒΆ

1. New Project OnboardingΒΆ

# Get complete project overview
result = crawl_project(root_path="/project")
print(result.report)
# Understand structure, complexity, issues

2. Pre-Refactoring AnalysisΒΆ

# Find complex code before refactoring
result = crawl_project(
    root_path="/project",
    complexity_threshold=10
)
for hotspot in result.complexity_hotspots:
    print(f"Refactor: {hotspot.function} in {hotspot.file}")

3. Security AuditΒΆ

# Quick security overview
result = crawl_project(root_path="/project")
for file in result.security_overview.high_risk_files:
    print(f"Review: {file}")

Best PracticesΒΆ

  1. Exclude generated code - Skip migrations, vendor, cache
  2. Set realistic thresholds - Start at 10, adjust as needed
  3. Use pattern filtering - Focus on relevant files
  4. Review hotspots - High complexity = high risk
  5. Track over time - Compare reports across versions