paiml-mcp-agent-toolkit
💡 Summary
PMAT is a comprehensive toolkit for analyzing code quality and generating AI-ready context across multiple programming languages.
🎯 Target Audience
🤖 AI Roast: “Powerful, but the setup might scare off the impatient.”
Risk: Medium. Review: shell/CLI command execution; outbound network access (SSRF, data egress); filesystem read/write scope and path traversal. Run with least privilege and audit before enabling in production.
PMAT
Getting Started | Features | Examples | Documentation
What is PMAT?
PMAT (Pragmatic Multi-language Agent Toolkit) provides everything needed to analyze code quality and generate AI-ready context:
- Context Generation - Deep analysis for Claude, GPT, and other LLMs
- Technical Debt Grading - A+ through F scoring with 6 orthogonal metrics
- Mutation Testing - Test suite quality validation (85%+ kill rate)
- Repository Scoring - Quantitative health assessment (0-211 scale)
- Semantic Search - Natural language code discovery
- MCP Integration - 19 tools for Claude Code, Cline, and AI agents
- Quality Gates - Pre-commit hooks, CI/CD integration
- 17+ Languages - Rust, TypeScript, Python, Go, Java, C/C++, and more
Part of the PAIML Stack, following Toyota Way quality principles (Jidoka, Genchi Genbutsu, Kaizen).
Getting Started
Add to your system:
# Install from crates.io cargo install pmat # Or from source (latest) git clone https://github.com/paiml/paiml-mcp-agent-toolkit cd paiml-mcp-agent-toolkit && cargo install --path server
Basic Usage
# Generate AI-ready context pmat context --output context.md --format llm-optimized # Analyze code complexity pmat analyze complexity # Grade technical debt (A+ through F) pmat analyze tdg # Score repository health pmat repo-score . # Run mutation testing pmat mutate --target src/
MCP Server Mode
# Start MCP server for Claude Code, Cline, etc. pmat mcp
Features
Context Generation
Generate comprehensive context for AI assistants:
pmat context # Basic analysis pmat context --format llm-optimized # AI-optimized output pmat context --include-tests # Include test files
Technical Debt Grading (TDG)
Six orthogonal metrics for accurate quality assessment:
pmat analyze tdg # Project-wide grade pmat analyze tdg --include-components # Per-component breakdown pmat tdg baseline create # Create quality baseline pmat tdg check-regression # Detect quality degradation
Grading Scale:
- A+/A: Excellent quality, minimal debt
- B+/B: Good quality, manageable debt
- C+/C: Needs improvement
- D/F: Significant technical debt
Mutation Testing
Validate test suite effectiveness:
pmat mutate --target src/lib.rs # Single file pmat mutate --target src/ --threshold 85 # Quality gate pmat mutate --failures-only # CI optimization
Supported Languages: Rust, Python, TypeScript, JavaScript, Go, C++
Repository Health Scoring
Evidence-based quality metrics (0-211 scale):
pmat rust-project-score # Fast mode (~3 min) pmat rust-project-score --full # Comprehensive (~10-15 min) pmat repo-score . --deep # Full git history
Workflow Prompts
Pre-configured AI prompts enforcing EXTREME TDD:
pmat prompt --list # Available prompts pmat prompt code-coverage # 85%+ coverage enforcement pmat prompt debug # Five Whys analysis pmat prompt quality-enforcement # All quality gates
Git Hooks
Automatic quality enforcement:
pmat hooks install # Install pre-commit hooks pmat hooks install --tdg-enforcement # With TDG quality gates pmat hooks status # Check hook status
Examples
Generate Context for AI
# For Claude Code pmat context --output context.md --format llm-optimized # With semantic search pmat embed sync ./src pmat semantic search "error handling patterns"
CI/CD Integration
# .github/workflows/quality.yml name: Quality Gates on: [push, pull_request] jobs: quality: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - run: cargo install pmat - run: pmat analyze tdg --fail-on-violation --min-grade B - run: pmat mutate --target src/ --threshold 80
Quality Baseline Workflow
# 1. Create baseline pmat tdg baseline create --output .pmat/baseline.json # 2. Check for regressions pmat tdg check-regression \ --baseline .pmat/baseline.json \ --max-score-drop 5.0 \ --fail-on-regression
Architecture
pmat/
├── server/ CLI and MCP server
│ ├── src/
│ │ ├── cli/ Command handlers
│ │ ├── services/ Analysis engines
│ │ ├── mcp/ MCP protocol
│ │ └── tdg/ Technical Debt Grading
├── crates/
│ └── pmat-dashboard/ Pure WASM dashboard
└── docs/
└── specifications/ Technical specs
Quality
| Metric | Value | |--------|-------| | Tests | 4600+ passing | | Coverage | >85% | | Mutation Score | >80% | | Languages | 17+ supported | | MCP Tools | 19 available |
Falsifiable Quality Commitments
Per Popper's demarcation criterion, all claims are measurable and testable:
| Commitment | Threshold | Verification Method |
|------------|-----------|---------------------|
| Context Generation | < 5 seconds for 10K LOC project | time pmat context on test corpus |
| Memory Usage | < 500 MB for 100K LOC analysis | Measured via heaptrack in CI |
| Test Coverage | ≥ 85% line coverage | cargo llvm-cov (CI enforced) |
| Mutation Score | ≥ 80% killed mutants | pmat mutate --threshold 80 |
| Build Time | < 3 minutes incremental | cargo build --timings |
| CI Pipeline | < 15 minutes total | GitHub Actions workflow timing |
| Binary Size | < 50 MB release binary | ls -lh target/release/pmat |
| Language Parsers | All 17 languages parse without panic | Fuzz testing in CI |
How to Verify:
# Run self-assessment with Popper Falsifiability Score pmat popper-score --verbose # Individual commitment verification cargo llvm-cov --html # Coverage ≥85% pmat mutate --threshold 80 # Mutation ≥80% cargo build --timings # Build time <3min
Failure = Regression: Any commitment violation blocks CI merge.
Benchmark Results (Statistical Rigor)
All benchmarks use Criterion.rs with proper statistical methodology:
| Operation | Mean | 95% CI | Std Dev | Sample Size | |-----------|------|--------|---------|-------------| | Context (1K LOC) | 127ms | [124, 130] | ±12.3ms | n=1000 runs | | Context (10K LOC) | 1.84s | [1.79, 1.90] | ±156ms | n=500 runs | | TDG Scoring | 156ms | [148, 164] | ±18.2ms | n=500 runs | | Complexity Analysis | 23ms | [22, 24] | ±3.1ms | n=1000 runs |
Comparison Baselines (vs. Alternatives):
| Metric | PMAT | ctags | tree-sitter | Effect Size | |--------|------|-------|-------------|-------------| | 10K LOC parsing | 1.84s | 0.3s | 0.8s | d=0.72 (medium) | | Memory (10K LOC) | 287MB | 45MB | 120MB | - | | Semantic depth | Full | Syntax only | AST only | - |
See docs/BENCHMARKS.md for complete statistical analysis.
ML/AI Reproducibility
PMAT uses ML for semantic search and embeddings. All ML operations are reproducible:
Random Seed Management:
- Embedding generation uses fixed seed (SEED=42) for deterministic outputs
- Clustering operations use fixed seed (SEED=12345)
- Seeds documented in docs/ml/REPRODUCIBILITY.md
Model Artifacts:
- Pre-trained models from HuggingFace (all-MiniLM-L6-v2)
- Model versions pinned in Cargo.toml
- Hash verification on download
Dataset Sources
PMAT does not train models but uses these data sources for evaluation:
| Dataset | Source | Purpose | Size | |---------|--------|---------|------| | CodeSearchNet | GitHub/Microsoft | Semantic search benchmarks | 2M functions | | PMAT-bench | Internal | Regression testing | 500 queries |
Data provenance and licensing documented in docs/ml/REPRODUCIBILITY.md.
Sovereign Stack
PMAT is built on the PAIML Sovereign Stack - pure-Rust, SIMD-accelerated libraries:
| Library | Purpose | Version | |---------|---------|---------| | aprender | ML library (text similarity, clustering, topic modeling) | 0.24.0 | | trueno | SIMD compute library for matrix operations | 0.11.0 | | trueno-graph | GPU-first graph database (PageRank, Louvain, CSR) | 0.1.7 | | trueno-rag | RAG pipeline with VectorStore | 0.1.8 | | trueno-db | Embedded analytics database | 0.3.10 | | trueno-viz | Terminal graph visualization | 0.1.17 | | trueno-zram-core | SIMD LZ4/ZSTD compression (optional) | 0.3.0 | | pmat | Code analysis toolkit | 2.213.4 |
Key Benefits:
- Pure Rust (no C dependencies, no FFI)
- SIMD-first (AVX2, AVX-512, NEON auto-detection)
- 2-4x speedup on graph algorithms via ap
Pros
- Supports multiple programming languages
- Comprehensive analysis features
- Integrates well with CI/CD workflows
- High test coverage and reliability
Cons
- May have a steep learning curve for beginners
- Requires Rust toolchain for installation
- Limited documentation on advanced features
- Performance may vary with large codebases
Related Skills
pytorch
S“It's the Swiss Army knife of deep learning, but good luck figuring out which of the 47 installation methods is the one that won't break your system.”
agno
S“It promises to be the Kubernetes for agents, but let's see if developers have the patience to learn yet another orchestration layer.”
nuxt-skills
S“It's essentially a well-organized cheat sheet that turns your AI assistant into a Nuxt framework parrot.”
Disclaimer: This content is sourced from GitHub open source projects for display and rating purposes only.
Copyright belongs to the original author paiml.
