[20260312_DOCS] Added stable anchors and compliance guidance used by the current website.
Security Analysis Guide¶
Code Scalpel provides advanced security analysis capabilities using taint tracking and static analysis. This guide covers how to detect vulnerabilities, analyze security flows, and integrate security scanning into your workflow.
Security Analysis Overview¶
Code Scalpel detects vulnerabilities by tracking how untrusted data (taint sources) flows to dangerous operations (sinks).
graph LR
A[User Input<br/>Taint Source] --> B[Data Processing]
B --> C[Database Query<br/>Sink]
style A fill:#ff6b6b
style C fill:#ffd93d Supported Vulnerability Types¶
| Vulnerability | CWE | Severity | Detection |
|---|---|---|---|
| SQL Injection | CWE-89 | Critical | Taint tracking |
| Command Injection | CWE-78 | Critical | Taint tracking |
| XSS | CWE-79 | High | Taint tracking |
| Path Traversal | CWE-22 | High | Taint tracking |
| LDAP Injection | CWE-90 | High | Taint tracking |
| NoSQL Injection | CWE-943 | High | Taint tracking |
| SSRF | CWE-918 | High | Taint tracking |
| Hardcoded Secrets | CWE-798 | Medium | Pattern matching |
Single-File Security Scan¶
Basic Usage¶
The AI will use security_scan:
Understanding Results¶
{
"vulnerabilities": [
{
"type": "SQL_INJECTION",
"severity": "CRITICAL",
"cwe": "CWE-89",
"line": 45,
"function": "get_user",
"source": "user_id (request.args.get)",
"sink": "cursor.execute(query)",
"taint_flow": [
{"line": 42, "code": "user_id = request.args.get('id')"},
{"line": 44, "code": "query = f\"SELECT * FROM users WHERE id = {user_id}\""},
{"line": 45, "code": "cursor.execute(query)"}
],
"confidence": 0.95,
"remediation": "Use parameterized queries: cursor.execute('SELECT * FROM users WHERE id = ?', (user_id,))"
}
],
"summary": {
"critical": 1,
"high": 0,
"medium": 0,
"low": 0,
"total": 1
}
}
Adjusting Confidence Threshold¶
# Higher threshold = fewer false positives
"Scan api/views.py with high confidence threshold (0.9)"
# Lower threshold = more comprehensive
"Scan api/views.py with low confidence threshold (0.5)"
Cross-File Security Analysis¶
Why Cross-File Analysis?¶
Many vulnerabilities span multiple files:
# routes.py
@app.route('/search')
def search():
query = request.args.get('q') # Source: user input
results = db.search(query) # Passes to another file
return render(results)
# db.py
def search(term):
# Sink: SQL execution
cursor.execute(f"SELECT * FROM items WHERE name LIKE '%{term}%'")
Single-file analysis misses this. Cross-file analysis catches it.
Running Cross-File Scan¶
The AI uses cross_file_security_scan:
{
"project_root": "./src",
"entry_points": ["routes.py:search", "api.py:handle_request"],
"max_depth": 5
}
Cross-File Results¶
{
"vulnerabilities": [
{
"type": "SQL_INJECTION",
"severity": "CRITICAL",
"source_file": "routes.py",
"source_line": 15,
"sink_file": "db.py",
"sink_line": 23,
"cross_file_flow": [
{"file": "routes.py", "line": 15, "code": "query = request.args.get('q')"},
{"file": "routes.py", "line": 16, "code": "results = db.search(query)"},
{"file": "db.py", "line": 22, "code": "def search(term):"},
{"file": "db.py", "line": 23, "code": "cursor.execute(f\"SELECT...{term}...\")"}
]
}
],
"taint_entry_points": [
{"file": "routes.py", "function": "search", "source": "request.args.get"}
],
"mermaid_diagram": "graph TD\n..."
}
Tier Limits for Cross-File Scan¶
| Tier | Max Depth | Max Modules | Timeout |
|---|---|---|---|
| Community | 3 | 50 | 60s |
| Pro | 10 | 200 | 180s |
| Enterprise | Unlimited | Unlimited | 600s |
Dependency Vulnerability Scanning¶
Scanning Dependencies¶
Uses scan_dependencies:
Results¶
{
"dependencies": [
{
"name": "requests",
"version": "2.25.0",
"vulnerabilities": [
{
"id": "CVE-2023-32681",
"severity": "HIGH",
"description": "Proxy-Authorization header leaked",
"fixed_version": "2.31.0"
}
]
}
],
"summary": {
"total_packages": 45,
"vulnerable_packages": 1,
"critical_vulns": 0,
"high_vulns": 1
}
}
Security Workflows¶
Pre-Commit Security Check¶
# .pre-commit-config.yaml
repos:
- repo: local
hooks:
- id: code-scalpel-security
name: Security Scan
entry: code-scalpel scan --security
language: system
types: [python]
pass_filenames: true
CI/CD Security Gate¶
# GitHub Actions
jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install Code Scalpel
run: pip install codescalpel-pro # or enterprise
- name: Security Scan
run: |
code-scalpel scan --security ./src --format json > security.json
- name: Check Results
run: |
python -c "
import json
with open('security.json') as f:
data = json.load(f)
criticals = data['summary']['critical']
if criticals > 0:
print(f'Found {criticals} critical vulnerabilities!')
exit(1)
"
Weekly Security Audit¶
# .github/workflows/security-audit.yml
name: Weekly Security Audit
on:
schedule:
- cron: '0 9 * * 1' # Monday 9am
workflow_dispatch:
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Full Security Scan
run: |
pip install codescalpel-enterprise
code-scalpel scan --security --cross-file ./src > audit.json
code-scalpel scan --dependencies ./requirements.txt >> audit.json
- name: Create Issue
uses: peter-evans/create-issue-from-file@v5
with:
title: Weekly Security Audit
content-filepath: audit.json
Compliance & Audit¶
Enterprise teams can pair security scans with governance enforcement to support SOC 2, HIPAA, GDPR, and PCI-DSS review workflows.
- Use
code_policy_checkto evaluate code against compliance-oriented rulesets. - Use
verify_policy_integritybefore audits to confirm governance files have not been tampered with. - Combine
cross_file_security_scanresults with CI artifact retention for audit evidence.
Custom Security Rules¶
Adding Custom Sinks¶
For Pro and Enterprise tiers, you can define custom sinks in governance.yaml:
# .code-scalpel/governance.yaml
security:
custom_sinks:
- name: "audit_log"
patterns:
- "logger.audit(*)"
- "AuditLog.write(*)"
sensitivity: "HIGH"
reason: "Audit logs may contain sensitive data"
- name: "payment_processor"
patterns:
- "payment.process(*)"
- "stripe.charge(*)"
sensitivity: "CRITICAL"
reason: "Payment data requires extra scrutiny"
Adding Custom Sources¶
security:
custom_sources:
- name: "websocket_input"
patterns:
- "ws.receive(*)"
- "socket.recv(*)"
taint_level: "HIGH"
- name: "queue_message"
patterns:
- "queue.get(*)"
- "redis.lpop(*)"
taint_level: "MEDIUM"
Custom Sanitizers¶
security:
sanitizers:
- name: "html_escape"
patterns:
- "bleach.clean(*)"
- "escape_html(*)"
neutralizes:
- "XSS"
- name: "sql_param"
patterns:
- "sqlalchemy.text(*).bindparams(**)"
neutralizes:
- "SQL_INJECTION"
Interpreting Confidence Scores¶
Score Ranges¶
| Score | Meaning | Action |
|---|---|---|
| 0.9-1.0 | Very likely vulnerability | Fix immediately |
| 0.7-0.9 | Probable vulnerability | Review and fix |
| 0.5-0.7 | Possible vulnerability | Investigate |
| <0.5 | Unlikely vulnerability | Low priority |
Factors Affecting Confidence¶
- Direct data flow: Higher confidence
- Complex transformations: Lower confidence
- Unknown function calls: Lower confidence
- Sanitizer presence: Lower confidence
Best Practices¶
1. Scan Early, Scan Often¶
# During development
"Scan this function I just wrote for security issues"
# Before commit
"Quick security check on my changes"
# Before PR
"Full security scan of the modified files"
2. Fix Critical First¶
Prioritize by severity:
- Critical: SQL Injection, Command Injection
- High: XSS, Path Traversal, SSRF
- Medium: Hardcoded secrets, Information disclosure
- Low: Minor issues, best practice violations
3. Use Parameterized Queries¶
# Bad
query = f"SELECT * FROM users WHERE id = {user_id}"
cursor.execute(query)
# Good
query = "SELECT * FROM users WHERE id = ?"
cursor.execute(query, (user_id,))
4. Validate All Input¶
# Validate before use
def get_user(user_id: str):
if not user_id.isdigit():
raise ValueError("Invalid user ID")
# Now safe to use
return db.query(User).filter(User.id == int(user_id)).first()
5. Use Framework Protections¶
# Flask - Use escape for XSS
from markupsafe import escape
@app.route('/hello/<name>')
def hello(name):
return f"Hello, {escape(name)}!"
# Django - Use ORM instead of raw SQL
User.objects.filter(id=user_id) # Safe
Next Steps¶
- Cross-File Security Tutorial - Hands-on tutorial
- Security Scan Tool Reference - Tool details
- Governance Configuration - Custom rules
- CI/CD Integration - Automation setup