crawl_projectΒΆ
Crawl an entire project directory, analyzing all Python files for structure, complexity, and security issues. Provides a comprehensive project-wide view.
Quick ReferenceΒΆ
crawl_project(
root_path: str = None, # Project root
pattern: str = None, # File pattern to match
pattern_type: str = "regex", # regex or glob
exclude_dirs: list = None, # Directories to skip
complexity_threshold: int = 10, # Flag complex functions
include_related: list = None, # Include related file types
include_report: bool = True # Generate summary report
) -> ProjectCrawlResult
User StoriesΒΆ
| Persona | Story | Tool Value |
|---|---|---|
| π° Alex (First-Timer) | "Inventory all files in this repository" | See project scope |
| π₯ David (Team Lead) | "Identify complexity hotspots needing attention" | Technical debt visibility |
| π§ Chris (OSS Contributor) | "Analyze entire codebase structure for contribution planning" | Project understanding |
ParametersΒΆ
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
root_path | string | No | cwd | Project root directory |
pattern | string | No | None | File pattern to match |
pattern_type | string | No | "regex" | Pattern type: "regex" or "glob" |
exclude_dirs | list | No | None | Directories to exclude |
complexity_threshold | int | No | 10 | Cyclomatic complexity threshold |
include_related | list | No | None | Related file types (e.g., [".yaml", ".json"]) |
include_report | bool | No | true | Include summary report |
Response SchemaΒΆ
{
"data": {
"files_analyzed": "integer",
"total_lines": "integer",
"files": [
{
"path": "string",
"lines": "integer",
"functions": "integer",
"classes": "integer",
"complexity": {
"average": "float",
"max": "integer",
"hotspots": [{"name": "string", "complexity": "integer"}]
},
"imports": ["string"],
"security_warnings": ["string"]
}
],
"summary": {
"total_functions": "integer",
"total_classes": "integer",
"average_complexity": "float",
"high_complexity_count": "integer",
"security_warnings_count": "integer"
},
"complexity_hotspots": [
{
"file": "string",
"function": "string",
"complexity": "integer",
"line": "integer"
}
],
"security_overview": {
"warnings_by_type": {},
"high_risk_files": ["string"]
},
"report": "string"
},
"tier_applied": "string",
"duration_ms": "integer"
}
ExamplesΒΆ
Basic Project CrawlΒΆ
{
"data": {
"files_analyzed": 45,
"total_lines": 12580,
"files": [
{
"path": "src/services/order_service.py",
"lines": 450,
"functions": 12,
"classes": 2,
"complexity": {
"average": 6.5,
"max": 15,
"hotspots": [
{"name": "process_order", "complexity": 15}
]
},
"imports": ["sqlalchemy", "pydantic", "datetime"],
"security_warnings": []
},
{
"path": "src/handlers/auth.py",
"lines": 280,
"functions": 8,
"classes": 1,
"complexity": {
"average": 4.2,
"max": 8
},
"security_warnings": [
"Potential hardcoded secret at line 45"
]
}
],
"summary": {
"total_functions": 156,
"total_classes": 32,
"average_complexity": 4.8,
"high_complexity_count": 5,
"security_warnings_count": 3
},
"complexity_hotspots": [
{"file": "src/services/order_service.py", "function": "process_order", "complexity": 15},
{"file": "src/utils/validators.py", "function": "validate_complex_form", "complexity": 14},
{"file": "src/api/routes.py", "function": "handle_upload", "complexity": 12}
],
"security_overview": {
"warnings_by_type": {
"hardcoded_secret": 1,
"sql_string_concat": 2
},
"high_risk_files": ["src/handlers/auth.py"]
}
},
"tier_applied": "community",
"duration_ms": 1250
}
Filter by PatternΒΆ
{
"data": {
"files_analyzed": 28,
"pattern_matched": "test_.*\\.py$",
"files": [
{"path": "tests/test_auth.py", "lines": 180, "functions": 12},
{"path": "tests/test_orders.py", "lines": 250, "functions": 18}
],
"summary": {
"total_functions": 156,
"average_complexity": 2.1,
"test_to_code_ratio": 0.62
}
}
}
Exclude DirectoriesΒΆ
Include Related FilesΒΆ
{
"data": {
"files_analyzed": 45,
"related_files": [
{"path": "config/settings.yaml", "type": "yaml", "size": 1250},
{"path": "pyproject.toml", "type": "toml", "size": 850},
{"path": "package.json", "type": "json", "size": 420}
],
"configuration_overview": {
"yaml_files": 3,
"json_files": 2,
"toml_files": 1
}
}
}
Complexity AnalysisΒΆ
{
"data": {
"complexity_threshold": 8,
"complexity_hotspots": [
{
"file": "src/services/order_service.py",
"function": "process_order",
"complexity": 15,
"line": 45,
"recommendation": "Consider splitting into smaller functions"
},
{
"file": "src/utils/validators.py",
"function": "validate_complex_form",
"complexity": 14,
"line": 28,
"recommendation": "Extract validation rules to separate methods"
},
{
"file": "src/api/routes.py",
"function": "handle_upload",
"complexity": 12,
"line": 112
}
],
"summary": {
"functions_above_threshold": 5,
"average_complexity": 4.8,
"max_complexity": 15
},
"report": "## Complexity Report\n\n3 functions exceed complexity threshold of 8:\n\n1. `process_order` in order_service.py (complexity: 15)\n2. `validate_complex_form` in validators.py (complexity: 14)\n..."
}
}
Report FormatΒΆ
When include_report=true, a markdown report is generated:
# Project Analysis Report
## Overview
- **Files Analyzed:** 45
- **Total Lines:** 12,580
- **Functions:** 156
- **Classes:** 32
## Complexity Summary
- **Average Complexity:** 4.8
- **High Complexity Functions:** 5
### Hotspots
| File | Function | Complexity | Line |
|------|----------|------------|------|
| order_service.py | process_order | 15 | 45 |
| validators.py | validate_complex_form | 14 | 28 |
## Security Overview
- **Total Warnings:** 3
- **High Risk Files:** 1
### Warnings by Type
- Hardcoded secrets: 1
- SQL string concatenation: 2
## Recommendations
1. Refactor `process_order` - complexity too high
2. Review hardcoded secret in auth.py:45
3. Use parameterized queries in db_utils.py
Tier LimitsΒΆ
crawl_project capabilities vary by tier:
| Feature | Community | Pro | Enterprise |
|---|---|---|---|
| Max files | 100 | Unlimited | Unlimited |
| Max depth | 10 | 10 | 10 |
| Parsing enabled | β | β | β |
| Complexity analysis | β | β | β Advanced |
| Respect .gitignore | β | β | β |
| Security warnings | β | β | β |
| Pattern filtering | Basic | β Full | β Full |
| Report generation | β | β Markdown/HTML | β Markdown/HTML/PDF |
| Related file types | β | β YAML/JSON/TOML | β All types |
| Historical comparison | β | β Version tracking | β Full history |
| Custom analyzers | β | β | β Plugin support |
| Timeout | 60 seconds | 120 seconds | 600 seconds |
Community TierΒΆ
- β Crawl up to 100 Python files
- β Respect .gitignore patterns
- β Basic file inventory
- β Basic security warnings
- β οΈ Max 100 files - Small projects only
- β οΈ No parsing - File listing only, no AST analysis
- β οΈ No complexity analysis - Can't identify hotspots
- β No report generation
- β No related file types (YAML, JSON, etc.)
Pro TierΒΆ
- β All Community features
- β Unlimited files - Handle large codebases
- β Full parsing enabled - Complete AST analysis
- β Complexity analysis - Identify hotspots
- β Full pattern filtering - Advanced file selection
- β Report generation - Markdown and HTML reports
- β Related file types - Include YAML, JSON, TOML
- β Version tracking - Compare across versions
- β 120 second timeout - Longer analysis time
Enterprise TierΒΆ
- β All Pro features
- β Advanced complexity analysis - More metrics
- β PDF report generation - Professional reports
- β All file types supported - Not just Python/YAML/JSON
- β Full historical comparison - Trend analysis
- β Custom analyzer plugins - Extend functionality
- β 600 second timeout - Handle massive codebases
- β Multi-repository crawling - Organization-wide analysis
Key Difference: Scale and Analysis Depth - Community: 100 files, no parsing - Quick file listing - Pro: Unlimited files, parsing, complexity - Full project analysis - Enterprise: Unlimited, advanced analysis, plugins - Enterprise reporting
Use CasesΒΆ
1. New Project OnboardingΒΆ
# Get complete project overview
result = crawl_project(root_path="/project")
print(result.report)
# Understand structure, complexity, issues
2. Pre-Refactoring AnalysisΒΆ
# Find complex code before refactoring
result = crawl_project(
root_path="/project",
complexity_threshold=10
)
for hotspot in result.complexity_hotspots:
print(f"Refactor: {hotspot.function} in {hotspot.file}")
3. Security AuditΒΆ
# Quick security overview
result = crawl_project(root_path="/project")
for file in result.security_overview.high_risk_files:
print(f"Review: {file}")
Best PracticesΒΆ
- Exclude generated code - Skip migrations, vendor, cache
- Set realistic thresholds - Start at 10, adjust as needed
- Use pattern filtering - Focus on relevant files
- Review hotspots - High complexity = high risk
- Track over time - Compare reports across versions
Related ToolsΒΆ
- get_project_map - Project structure map
- security_scan - Detailed security analysis
- analyze_code - Single file analysis