Co-Pilot

Updated 4 months ago

paiml-mcp-agent-toolkit

Name: paiml-mcp-agent-toolkit
Rating: 4.3 (121 reviews)
Author: paiml

Ppaiml

0.1k

paiml/paiml-mcp-agent-toolkit

Agent Score

💡 Summary

PMAT is a comprehensive toolkit for analyzing code quality and generating AI-ready context across multiple programming languages.

🎯 Target Audience

Software developers seeking to improve code qualityDevOps engineers implementing CI/CD pipelinesTechnical leads assessing project healthData scientists utilizing AI for code analysisQuality assurance teams focusing on technical debt

🤖 AI Roast: “Powerful, but the setup might scare off the impatient.”

Security AnalysisMedium Risk

Risk: Medium. Review: shell/CLI command execution; outbound network access (SSRF, data egress); filesystem read/write scope and path traversal. Run with least privilege and audit before enabling in production.

PMAT

Getting Started | Features | Examples | Documentation

What is PMAT?

PMAT (Pragmatic Multi-language Agent Toolkit) provides everything needed to analyze code quality and generate AI-ready context:

Context Generation - Deep analysis for Claude, GPT, and other LLMs
Technical Debt Grading - A+ through F scoring with 6 orthogonal metrics
Mutation Testing - Test suite quality validation (85%+ kill rate)
Repository Scoring - Quantitative health assessment (0-211 scale)
Semantic Search - Natural language code discovery
MCP Integration - 19 tools for Claude Code, Cline, and AI agents
Quality Gates - Pre-commit hooks, CI/CD integration
17+ Languages - Rust, TypeScript, Python, Go, Java, C/C++, and more

Part of the PAIML Stack, following Toyota Way quality principles (Jidoka, Genchi Genbutsu, Kaizen).

Getting Started

Add to your system:

# Install from crates.io
cargo install pmat

# Or from source (latest)
git clone https://github.com/paiml/paiml-mcp-agent-toolkit
cd paiml-mcp-agent-toolkit && cargo install --path server

Basic Usage

# Generate AI-ready context
pmat context --output context.md --format llm-optimized

# Analyze code complexity
pmat analyze complexity

# Grade technical debt (A+ through F)
pmat analyze tdg

# Score repository health
pmat repo-score .

# Run mutation testing
pmat mutate --target src/

MCP Server Mode

# Start MCP server for Claude Code, Cline, etc.
pmat mcp

Features

Context Generation

Generate comprehensive context for AI assistants:

pmat context                           # Basic analysis
pmat context --format llm-optimized    # AI-optimized output
pmat context --include-tests           # Include test files

Technical Debt Grading (TDG)

Six orthogonal metrics for accurate quality assessment:

pmat analyze tdg                       # Project-wide grade
pmat analyze tdg --include-components  # Per-component breakdown
pmat tdg baseline create               # Create quality baseline
pmat tdg check-regression              # Detect quality degradation

Grading Scale:

A+/A: Excellent quality, minimal debt
B+/B: Good quality, manageable debt
C+/C: Needs improvement
D/F: Significant technical debt

Mutation Testing

Validate test suite effectiveness:

pmat mutate --target src/lib.rs        # Single file
pmat mutate --target src/ --threshold 85  # Quality gate
pmat mutate --failures-only            # CI optimization

Supported Languages: Rust, Python, TypeScript, JavaScript, Go, C++

Repository Health Scoring

Evidence-based quality metrics (0-211 scale):

pmat rust-project-score                # Fast mode (~3 min)
pmat rust-project-score --full         # Comprehensive (~10-15 min)
pmat repo-score . --deep               # Full git history

Workflow Prompts

Pre-configured AI prompts enforcing EXTREME TDD:

pmat prompt --list                     # Available prompts
pmat prompt code-coverage              # 85%+ coverage enforcement
pmat prompt debug                      # Five Whys analysis
pmat prompt quality-enforcement        # All quality gates

Git Hooks

Automatic quality enforcement:

pmat hooks install                     # Install pre-commit hooks
pmat hooks install --tdg-enforcement   # With TDG quality gates
pmat hooks status                      # Check hook status

Examples

Generate Context for AI

# For Claude Code
pmat context --output context.md --format llm-optimized

# With semantic search
pmat embed sync ./src
pmat semantic search "error handling patterns"

CI/CD Integration

# .github/workflows/quality.yml
name: Quality Gates
on: [push, pull_request]

jobs:
  quality:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: cargo install pmat
      - run: pmat analyze tdg --fail-on-violation --min-grade B
      - run: pmat mutate --target src/ --threshold 80

Quality Baseline Workflow

# 1. Create baseline
pmat tdg baseline create --output .pmat/baseline.json

# 2. Check for regressions
pmat tdg check-regression \
  --baseline .pmat/baseline.json \
  --max-score-drop 5.0 \
  --fail-on-regression

Architecture

pmat/
├── server/           CLI and MCP server
│   ├── src/
│   │   ├── cli/      Command handlers
│   │   ├── services/ Analysis engines
│   │   ├── mcp/      MCP protocol
│   │   └── tdg/      Technical Debt Grading
├── crates/
│   └── pmat-dashboard/  Pure WASM dashboard
└── docs/
    └── specifications/  Technical specs

Quality

| Metric | Value | |--------|-------| | Tests | 4600+ passing | | Coverage | >85% | | Mutation Score | >80% | | Languages | 17+ supported | | MCP Tools | 19 available |

Falsifiable Quality Commitments

Per Popper's demarcation criterion, all claims are measurable and testable:

| Commitment | Threshold | Verification Method | |------------|-----------|---------------------| | Context Generation | < 5 seconds for 10K LOC project | time pmat context on test corpus | | Memory Usage | < 500 MB for 100K LOC analysis | Measured via heaptrack in CI | | Test Coverage | ≥ 85% line coverage | cargo llvm-cov (CI enforced) | | Mutation Score | ≥ 80% killed mutants | pmat mutate --threshold 80 | | Build Time | < 3 minutes incremental | cargo build --timings | | CI Pipeline | < 15 minutes total | GitHub Actions workflow timing | | Binary Size | < 50 MB release binary | ls -lh target/release/pmat | | Language Parsers | All 17 languages parse without panic | Fuzz testing in CI |

How to Verify:

# Run self-assessment with Popper Falsifiability Score
pmat popper-score --verbose

# Individual commitment verification
cargo llvm-cov --html        # Coverage ≥85%
pmat mutate --threshold 80   # Mutation ≥80%
cargo build --timings        # Build time <3min

Failure = Regression: Any commitment violation blocks CI merge.

Benchmark Results (Statistical Rigor)

All benchmarks use Criterion.rs with proper statistical methodology:

| Operation | Mean | 95% CI | Std Dev | Sample Size | |-----------|------|--------|---------|-------------| | Context (1K LOC) | 127ms | [124, 130] | ±12.3ms | n=1000 runs | | Context (10K LOC) | 1.84s | [1.79, 1.90] | ±156ms | n=500 runs | | TDG Scoring | 156ms | [148, 164] | ±18.2ms | n=500 runs | | Complexity Analysis | 23ms | [22, 24] | ±3.1ms | n=1000 runs |

Comparison Baselines (vs. Alternatives):

| Metric | PMAT | ctags | tree-sitter | Effect Size | |--------|------|-------|-------------|-------------| | 10K LOC parsing | 1.84s | 0.3s | 0.8s | d=0.72 (medium) | | Memory (10K LOC) | 287MB | 45MB | 120MB | - | | Semantic depth | Full | Syntax only | AST only | - |

See docs/BENCHMARKS.md for complete statistical analysis.

ML/AI Reproducibility

PMAT uses ML for semantic search and embeddings. All ML operations are reproducible:

Random Seed Management:

Embedding generation uses fixed seed (SEED=42) for deterministic outputs
Clustering operations use fixed seed (SEED=12345)
Seeds documented in docs/ml/REPRODUCIBILITY.md

Model Artifacts:

Pre-trained models from HuggingFace (all-MiniLM-L6-v2)
Model versions pinned in Cargo.toml
Hash verification on download

Dataset Sources

PMAT does not train models but uses these data sources for evaluation:

| Dataset | Source | Purpose | Size | |---------|--------|---------|------| | CodeSearchNet | GitHub/Microsoft | Semantic search benchmarks | 2M functions | | PMAT-bench | Internal | Regression testing | 500 queries |

Data provenance and licensing documented in docs/ml/REPRODUCIBILITY.md.

Sovereign Stack

PMAT is built on the PAIML Sovereign Stack - pure-Rust, SIMD-accelerated libraries:

| Library | Purpose | Version | |---------|---------|---------| | aprender | ML library (text similarity, clustering, topic modeling) | 0.24.0 | | trueno | SIMD compute library for matrix operations | 0.11.0 | | trueno-graph | GPU-first graph database (PageRank, Louvain, CSR) | 0.1.7 | | trueno-rag | RAG pipeline with VectorStore | 0.1.8 | | trueno-db | Embedded analytics database | 0.3.10 | | trueno-viz | Terminal graph visualization | 0.1.17 | | trueno-zram-core | SIMD LZ4/ZSTD compression (optional) | 0.3.0 | | pmat | Code analysis toolkit | 2.213.4 |

Key Benefits:

Pure Rust (no C dependencies, no FFI)
SIMD-first (AVX2, AVX-512, NEON auto-detection)
2-4x speedup on graph algorithms via ap

5-Dim Analysis

Clarity9/10

Novelty8/10

Utility9/10

Completeness8/10

Maintainability9/10

Pros & Cons

Pros

Supports multiple programming languages
Comprehensive analysis features
Integrates well with CI/CD workflows
High test coverage and reliability

Cons

May have a steep learning curve for beginners
Requires Rust toolchain for installation
Limited documentation on advanced features
Performance may vary with large codebases

Related Skills

pytorch

toolCode Lib

92/ 100

“It's the Swiss Army knife of deep learning, but good luck figuring out which of the 47 installation methods is the one that won't break your system.”

View Analysis

agno

toolCode Lib

90/ 100

“It promises to be the Kubernetes for agents, but let's see if developers have the patience to learn yet another orchestration layer.”

View Analysis

nuxt-skills

toolCo-Pilot

90/ 100

“It's essentially a well-organized cheat sheet that turns your AI assistant into a Nuxt framework parrot.”

View Analysis

Disclaimer: This content is sourced from GitHub open source projects for display and rating purposes only.