Co-Pilot / 辅助式
更新于 a month ago

mgrep

Mmixedbread-ai
3.1k
mixedbread-ai/mgrep
86
Agent 评分

💡 摘要

mgrep 是一个语义搜索工具,通过自然语言处理增强了传统的 grep,支持多种文件类型。

🎯 适合人群

软件开发人员数据科学家DevOps 工程师技术写作人员人工智能研究人员

🤖 AI 吐槽:看起来很能打,但别让配置把人劝退。

安全分析中风险

风险:Medium。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险;依赖锁定与供应链风险。以最小权限运行,并在生产环境启用前审计代码与依赖。

Why mgrep?

  • Natural-language search that feels as immediate as grep.
  • Semantic, multilingual & multimodal (audio, video support coming soon!)
  • Web search built-in — query the web alongside your local files with --web.
  • Smooth background indexing via mgrep watch, designed to detect and keep up-to-date everything that matters inside any git repository.
  • Friendly device-login flow and first-class coding agent integrations.
  • Built for agents and humans alike, and designed to be a helpful tool, not a restrictive harness: quiet output, thoughtful defaults, and escape hatches everywhere.
  • Reduces the token usage of your agent by 2x while maintaining superior performance
# index once mgrep watch # then ask your repo things in natural language mgrep "where do we set up auth?"

Quick Start

  1. Install

    npm install -g @mixedbread/mgrep # or pnpm / bun
  2. Sign in once

    mgrep login

    A browser window (or verification URL) guides you through Mixedbread authentication.

    Alternative: API Key Authentication For CI/CD or headless environments, set the MXBAI_API_KEY environment variable:

    export MXBAI_API_KEY=your_api_key_here

    This bypasses the browser login flow entirely.

  3. Index a project

    cd path/to/repo mgrep watch

    watch performs an initial sync, respects .gitignore, then keeps the Mixedbread store updated as files change.

  4. Search anything

    mgrep "where do we set up auth?" src/lib mgrep -m 25 "store schema"

    Searches default to the current working directory unless you pass a path.

Today, mgrep works great on: code, text, PDFs, images.
Coming soon: audio & video.

Using it with Coding Agents

[!CAUTION] Background Sync Enabled: When installed with a coding agent, mgrep runs a background process that syncs your files to enable semantic search. This process starts automatically when you begin a session and stops when your session ends. You can see your current usage in the Mixedbread platform.

[!NOTE] Default Limits: mgrep enforces default limits to ensure optimal performance:

  • Maximum file size: 1MB per file
  • Maximum file count: 1,000 files per directory

These limits can be customized via CLI flags (--max-file-size, --max-file-count), environment variables, or config files. See the Configuration section for details.

If you prefer to manually start the file watcher instead of relying on the agent's automatic background sync, you can run:

mgrep watch /path/to/your/project

This gives you explicit control over when indexing occurs and which directories are watched.

mgrep supports assisted installation commands for many agents:

  • mgrep install-claude-code for Claude Code
  • mgrep install-opencode for OpenCode
  • mgrep install-codex for Codex
  • mgrep install-droid for Factory Droid

These commands sign you in (if needed) and add Mixedbread mgrep support to the agent. After that you only have to start the agent in your project folder, thats it.

More Agents Coming Soon

More agents (Cursor, Windsurf, etc.) are on the way—this section will grow as soon as each integration lands.

Making your agent smarter

We plugged mgrep into Claude Code and ran a benchmark of 50 QA tasks to evaluate the economics of mgrep against grep.

mgrep benchmark

In our 50-task benchmark, mgrep+Claude Code used ~2x fewer tokens than grep-based workflows at similar or better judged quality.

mgrep finds the relevant snippets in a few semantic queries first, and the model spends its capacity on reasoning instead of scanning through irrelevant code from endless grep attempts. You can Try it yourself.

Note: Win Rate (%) was calculated by using an LLM as a judge.

Why we built mgrep

grep is an amazing tool. It's lightweight, compatible with just about every machine on the planet, and will reliably surface any potential match within any target folder.

But grep is from 1973, and it carries the limitations of its era: you need exact patterns and it slows down considerably in the cases where you need it most, on large codebases.

Worst of all, if you're looking for deeply-buried critical business logic, you cannot describe it: you have to be able to accurately guess what kind of naming patterns would have been used by the previous generations of engineers at your workplace for grep to find it. This will often result in watching a coding agent desperately try hundreds of patterns, filling its token window, and your upcoming invoice, with thousands of tokens.

But it doesn't have to be this way. Everything else in our toolkit is increasingly tailored to understand us, and so should our search tools. mgrep is our way to bring grep to 2025, integrating all of the advances in semantic understanding and code-search, without sacrificing anything that has made grep such a useful tool.

Under the hood, mgrep is powered by Mixedbread Search, our full-featured search solution. It combines state-of-the-art semantic retrieval models with context-aware parsing and optimized inference methods to provide you with a natural language companion to grep. We believe both tools belong in your toolkit: use grep for exact matches, mgrep for semantic understanding and intent.

When to use what

We designed mgrep to complement grep, not replace it. The best code search combines mgrep with grep.

| Use grep (or ripgrep) for... | Use mgrep for... | | --- | --- | | Exact Matches | Intent Search | | Symbol tracing, Refactoring, Regex | Code exploration, Feature discovery, Onboarding |

Web Search

mgrep can also search the web alongside your local files. This is useful when you need to find documentation, tutorials, or answers to programming questions without leaving your terminal.

# Search the web and get a summarized answer mgrep --web --answer "How do I integrate a JavaScript runtime into Deno?" # Get the urls of the search mgrep --web "best practices for error handling in TypeScript"

Web search queries the mixedbread/web store in addition to your local store, merging results based on relevance. Use --answer (or -a) to get a concise summary instead of raw results.

mgrep as Subagent

For complex questions that require information from multiple sources, mgrep can act as a subagent that automatically refines queries and performs multiple searches to find the best answer.

# Enable agentic search for complex multi-part questions mgrep --agentic "What are the yearly numbers for 2020, 2021, 2022, 2023, 2024?" # Combine with --answer for a synthesized response from multiple sources mgrep --agentic -a "How does authentication work and where is it configured?"

When --agentic is enabled, mgrep will:

  • Automatically break down complex queries into sub-queries
  • Perform multiple searches as needed to gather comprehensive results
  • Combine findings from different parts of your codebase

This is particularly useful for questions that span multiple files or concepts, where a single search might miss important context.

Commands at a Glance

| Command | Purpose | | --- | --- | | mgrep / mgrep search <pattern> [path] | Natural-language search with many grep-style flags (-i, -r, -m...). | | mgrep watch | Index current repo and keep the Mixedbread store in sync via file watchers. | | mgrep login & mgrep logout | Manage device-based authentication with Mixedbread. | | mgrep install-claude-code | Authenticate, add the Mixedbread mgrep plugin to Claude Code. | | mgrep install-opencode | Authenticate and add the Mixedbread mgrep to OpenCode. | | mgrep install-codex | Authenticate and add the Mixedbread mgrep to Codex. | | mgrep install-droid | Authenticate and add the Mixedbread mgrep hooks/skills to Factory Droid. |

mgrep search

mgrep search is the default command. It can be used to search the current directory for a pattern.

| Option | Description | | --- | --- | | -m <max_count> | The maximum number of results to return | | -c, --content | Show content of the results | | -a, --answer | Generate an answer to the question based on the results | | -w, --web | Include web search results alongside local files | | --agentic | Enable agentic search to automatically refine queries and perform multiple searches | | -s, --sync | Sync the local files to the store before searching | | -d, --dry-run | Dry run

五维分析
清晰度8/10
创新性9/10
实用性9/10
完整性9/10
可维护性8/10
优缺点分析

优点

  • 支持自然语言查询。
  • 多模态搜索能力。
  • 与编码代理集成。
  • 后台索引以实现实时更新。

缺点

  • 需要初始设置和登录。
  • 在大文件上可能有性能限制。
  • 依赖于 Mixedbread 平台。
  • 音频和视频仍在开发中。

相关技能

ccmp

A
toolCo-Pilot / 辅助式
86/ 100

“看起来很能打,但别让配置把人劝退。”

claude-mods

A
toolCo-Pilot / 辅助式
86/ 100

“看起来很能打,但别让配置把人劝退。”

agentic-qe

A
toolCo-Pilot / 辅助式
86/ 100

“看起来很能打,但别让配置把人劝退。”

免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。

版权归原作者所有 mixedbread-ai.