Architecture¶
Grippy is built on the Agno agent framework. It has two deployment modes: an MCP server (grippy serve / uvx grippy-mcp serve) for local diff auditing, and a GitHub Actions CI pipeline (grippy / python -m grippy) for PR review.
CI Pipeline Flow¶
PR event (GITHUB_EVENT_PATH)
|
v
Load PR metadata ......................... review.py
|
v
Index codebase into LanceDB .............. codebase.py, embedder.py [non-fatal]
|
v
Build dependency graph ................... imports.py, graph_store.py [non-fatal]
| (walk Python files, extract imports, upsert FILE nodes + IMPORTS edges)
v
Fetch full PR diff ....................... github_review.py
|
v
Run deterministic rule engine ............ rules/ [when profile != general]
| (10 rules, gate check, mode override to security_audit)
v
Record review + author in graph .......... graph_store.py [non-fatal]
| (AUTHOR, REVIEW nodes; AUTHORED, TOUCHED edges)
v
Query graph for pre-review context ....... graph_context.py [non-fatal]
| (blast radius, recurring findings, author risk summary)
v
Truncate diff ............................ review.py (500K char cap, after rules)
|
v
Create agent with prompt chain + tools ... agent.py, prompts.py
|
v
Format PR context with rule findings ..... agent.py
| (includes <graph-context> from knowledge graph)
v
Run review with structured output ........ retry.py, schema.py
| (up to 3 retries on validation failure + rule coverage check)
v
Post inline findings + summary to PR ..... github_review.py
|
v
Resolve stale threads .................... github_review.py (GraphQL)
|
v
Persist findings to graph ................ graph_store.py [non-fatal]
| (FINDING nodes, FOUND_IN/PRODUCED/VIOLATES edges, observations)
v
Set GitHub Actions outputs ............... review.py
(score, verdict, findings-count, merge-blocking,
rule-findings-count, rule-gate-failed, profile)
MCP Server Flow¶
MCP client (Claude Code / Cursor / Claude Desktop)
|
v
scan_diff or audit_diff tool ............. mcp_server.py
|
v
Parse scope + get local diff ............. local_diff.py
| (git subprocess, ref validation, timeout)
v
scan_diff path: audit_diff path:
Run deterministic rules ............ Run rules + create agent
Serialize scan results ............. Run review with structured output
Return JSON ........................ Serialize audit results
Return dense JSON (no personality)
Module Map¶
All modules live in src/grippy/.
| Module | Purpose |
|---|---|
__main__.py |
CLI dispatch: serve, install-mcp, legacy CI. main() entry point for console scripts. |
review.py |
Orchestration entry point. Loads PR event, coordinates the full CI pipeline, handles errors and timeouts, sets Actions outputs. |
agent.py |
Agent factory. _PROVIDERS registry with deferred imports for 6 providers (openai, anthropic, google, groq, mistral, local). Composes prompt chain, attaches tools, configures structured output. |
codebase.py |
Codebase indexing (LanceDB hybrid search: vector + keyword + RRF reranking) and CodebaseToolkit providing search_code, read_file, grep_code, list_files tools for the agent. |
github_review.py |
GitHub API integration. Parses unified diffs to map findings to addressable lines, posts inline review comments (batched), manages thread lifecycle (new/persists/resolved), builds summary dashboard. |
schema.py |
Pydantic models for the complete structured output: GrippyReview, Finding, Score, Verdict, Escalation, Personality. Enums for severity, category, verdict status, tone register. |
prompts.py |
Prompt chain loader. Reads 22 markdown prompt files from prompts_data/ and composes them into identity + instruction layers. |
mcp_server.py |
FastMCP server. scan_diff + audit_diff tools with readOnlyHint annotations. |
mcp_config.py |
MCP client detection (Claude Code/Desktop/Cursor), server entry generation (uvx or dev mode). |
mcp_response.py |
AI-facing serializers. Strips personality, outputs dense structured JSON for MCP tool responses. |
ignore.py |
Suppression directives. .grippyignore file loading (gitignore syntax via pathspec), diff filtering, # nogrip line-level pragma parsing and index building. |
local_diff.py |
Git diff acquisition. Scope parsing (staged, commit:<ref>, range:<base>..<head>), ref validation, subprocess with timeout. |
graph_types.py |
Node/edge type enums (NodeType, EdgeType), dataclasses, deterministic ID helpers. Defines the navi-graph shape. |
graph_store.py |
SQLiteGraphStore --- schema init, node/edge writes, neighbor queries, BFS traversal, append-only observations, migrations. |
graph_context.py |
Pre-review context builder. Queries the graph for blast radius, recurring findings, and author risk summary. |
imports.py |
Python import extraction via ast.parse. Produces IMPORTS edges for the dependency graph. |
graph.py |
Re-exports NodeType and EdgeType for backward compatibility. |
retry.py |
Structured output validation with retry. Handles JSON strings, dicts, markdown-fenced JSON, and GrippyReview instances. Retries with validation error feedback. Post-parse rule coverage validation ensures all deterministic findings appear in LLM output. |
embedder.py |
Embedding model factory. Creates Agno OpenAIEmbedder instances configured for OpenAI or local endpoints. |
rules/ |
Deterministic security rule engine (subpackage). See Rule Engine below. |
Prompt Composition System¶
Grippy's prompts are 20 markdown prompt files in src/grippy/prompts_data/, composed into two Agno layers by prompts.py.
Layer 1: Identity (Agno description)¶
Loaded as a single concatenated string. Defines who Grippy is.
| File | Purpose |
|---|---|
CONSTITUTION.md |
Core rules, constraints, and behavioral boundaries |
PERSONA.md |
The grumpy security auditor personality |
Layer 2: Instructions (Agno instructions)¶
Loaded as a list of strings (one per file). Tells Grippy what to do for a given review mode.
The instruction chain is composed in three parts:
Mode prefix --- system-core.md + one mode-specific file:
| Mode | Files |
|---|---|
pr_review |
system-core.md, pr-review.md |
security_audit |
system-core.md, security-audit.md |
governance_check |
system-core.md, governance-check.md |
surprise_audit |
system-core.md, surprise-audit.md |
cli |
system-core.md, cli-mode.md |
github_app |
system-core.md, github-app.md |
Shared prompts --- personality and quality gate prompts included in every mode:
tone-calibration.mdconfidence-filter.mdescalation.mdcontext-builder.mdcatchphrases.mddisguises.mdascii-art.mdall-clear.md
Conditional: When rule engine findings exist, rule-findings-context.md is inserted into the instruction chain before the suffix.
Suffix --- always anchored at the end:
scoring-rubric.md--- The scoring formula and deduction rulesoutput-schema.md--- The exact JSON schema the model must produce
The full instruction chain for any mode is: MODE_PREFIX + SHARED_PROMPTS + [rule-findings-context.md if rules fired] + SUFFIX.
Codebase Indexing Pipeline¶
When GITHUB_WORKSPACE is set (i.e., running in CI), Grippy indexes the checked-out repository before running the review. This lets the agent search and read the full codebase, not just the diff.
1. Walk source files¶
walk_source_files() uses git ls-files to enumerate tracked files, respecting .gitignore. Falls back to manual directory walk if git is unavailable. Only files with indexable extensions are included (.py, .md, .yaml, .yml, .toml by default). Capped at 5,000 files.
2. Chunk files¶
chunk_file() splits each file into 4KB character windows with 200-character overlap. Small files become a single chunk. Each chunk records its file path, chunk index, and line range for attribution.
3. Embed and store¶
CodebaseIndex.build() computes embeddings for all chunks (batch when supported, sequential otherwise) and stores them in a LanceDB table (codebase_chunks). The table is overwritten on each build.
4. Agent tools¶
During review, the agent has access to four codebase tools via CodebaseToolkit:
| Tool | Purpose |
|---|---|
search_code |
Hybrid search (vector + keyword + RRF reranking) over indexed chunks. "Find code that handles authentication." |
grep_code |
Regex search across the codebase with context lines. Exact pattern matching. |
read_file |
Read a file or line range with line numbers. Path traversal protection via is_relative_to(). |
list_files |
List directory contents with glob filtering. Capped at 500 results. |
Tool output is capped at 12,000 characters per response. All tool responses pass through sanitize_tool_hook middleware (navi_sanitize.clean() + XML-escape + 12K truncation) before reaching the LLM, preventing indirect prompt injection through crafted file contents.
Knowledge Graph¶
Grippy maintains a SQLite-backed knowledge graph that tracks relationships between files, reviews, findings, rules, and authors across PRs. The graph is structural only --- vectors stay in the LanceDB codebase index.
What it tracks:
| Node Type | Represents |
|---|---|
FILE |
Source code file |
REVIEW |
A single PR review run |
FINDING |
A code issue or recommendation |
RULE |
A deterministic security rule |
AUTHOR |
A PR submitter |
These are connected by 6 edge types: IMPORTS (file dependencies), FOUND_IN (finding location), VIOLATES (rule match), PRODUCED (review → finding), TOUCHED (review → file), and AUTHORED (author → review).
Before each review, the context builder queries the graph for blast radius (which modules depend on changed files), recurring findings (prior issues in the same files), and author risk (historical finding patterns). This context is injected into the LLM prompt as <graph-context>.
All graph operations are non-fatal --- if the graph store is unavailable, the review proceeds without it.
See the Knowledge Graph page for the full data model and technical details.
Rule Engine¶
The src/grippy/rules/ subpackage implements a deterministic security rule engine that runs before the LLM. Core contract: rules detect, LLM explains.
Submodule Map¶
| Module | Purpose |
|---|---|
__init__.py |
Public API: run_rules(), check_gate() convenience functions |
base.py |
Rule protocol, RuleResult dataclass, RuleSeverity enum (CRITICAL, ERROR, WARN, INFO) |
context.py |
Diff parser: parse_diff() produces ChangedFile / DiffHunk / DiffLine. RuleContext holds parsed diff + profile config |
config.py |
ProfileConfig, load_profile() (CLI > GRIPPY_PROFILE env > default), PROFILES dict |
engine.py |
RuleEngine: instantiates rules from registry, runs all, aggregates results. check_gate() compares max severity against profile threshold |
registry.py |
RULE_REGISTRY: explicit list of all Rule classes |
Rules (v1)¶
| Rule ID | Module | Default Severity | Detects |
|---|---|---|---|
workflow-permissions-expanded |
workflow_permissions.py |
ERROR | write/admin permissions, pull_request_target, unpinned actions in GitHub workflows |
secrets-in-diff |
secrets_in_diff.py |
CRITICAL | API keys (AWS, GitHub, OpenAI), private key headers, .env additions |
dangerous-execution-sinks |
dangerous_sinks.py |
ERROR | Unsafe code execution patterns in Python and JS/TS |
path-traversal-risk |
path_traversal.py |
WARN | Tainted path variables, ../ traversal patterns |
llm-output-unsanitized |
llm_output_sinks.py |
ERROR | Model output piped to output sinks without sanitizer |
ci-script-execution-risk |
ci_script_risk.py |
WARN | Risky CI script patterns, sudo in CI, chmod +x (individual patterns like curl\|bash are elevated to CRITICAL dynamically) |
sql-injection-risk |
sql_injection.py |
ERROR | SQL queries built with f-strings, %-formatting, or concatenation |
weak-crypto |
weak_crypto.py |
WARN | MD5, SHA1, DES, ECB mode, and random module for security contexts |
hardcoded-credentials |
hardcoded_credentials.py |
ERROR | Hardcoded passwords, DB connection strings, and auth headers |
insecure-deserialization |
insecure_deserialization.py |
ERROR | Unsafe deserialization via shelve, jsonpickle, dill, cloudpickle, and torch.load |
Pipeline Integration¶
- Rule engine runs on the full, un-truncated diff (before
truncate_diff()) - Rules only run when
profile != general - If any rule finding has severity >= profile's
fail_onthreshold, the gate fails (CI exits non-zero after posting) - When rules fire, the review mode is automatically overridden to
security_audit - Rule findings are XML-escaped and injected into the LLM user message as
<rule_findings>context - After the LLM produces output,
_validate_rule_coverage()inretry.pychecks that every rule finding appears in the structured output with itsrule_idset --- missing findings trigger retry