💡 摘要
一个Python库,通过安全的代码执行模式增强Zotero搜索,以高效管理大型数据集。
🎯 适合人群
🤖 AI 吐槽: “看起来很能打,但别让配置把人劝退。”
风险:Medium。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险;依赖锁定与供应链风险。以最小权限运行,并在生产环境启用前审计代码与依赖。
Zotero Code Execution
Efficient multi-strategy Zotero search using code execution pattern
A Python library for Zotero MCP that implements Anthropic's code execution pattern to enable safe, comprehensive searches without context overflow or crashes.
Skill Installation
For Claude Code
- Clone or download this repository
- Copy the
skill/folder to your Claude Code skills directory:cp -r skill ~/.claude/skills/zotero-mcp-code - Restart Claude Code to load the skill
Quick Start
import sys sys.path.append('/path/to/zotero-code-execution') import setup_paths from zotero_lib import SearchOrchestrator, format_results # Single comprehensive search - fetches 100+ items, returns top 20 orchestrator = SearchOrchestrator() results = orchestrator.comprehensive_search("embodied cognition", max_results=20) print(format_results(results))
That's it! This automatically:
- ✅ Performs semantic + keyword + tag searches
- ✅ Deduplicates results
- ✅ Ranks by relevance
- ✅ Keeps large datasets in code (no crashes)
Multi-Term Searches
For OR-style searches (e.g., multiple spellings or languages), search each term separately and merge:
# Search for "Atayal" OR "泰雅族" all_results = {} for term in ['Atayal', '泰雅族']: results = orchestrator.comprehensive_search(term, max_results=50) for item in results: all_results[item.key] = item # Deduplicate by key # Re-rank combined results ranked = orchestrator._rank_items(list(all_results.values()), 'Atayal 泰雅族') print(format_results(ranked[:25]))
Why? Zotero treats multi-word queries as AND conditions. Searching "Atayal 泰雅族" finds items matching BOTH terms, not either term.
Why This Exists
The Problem
Direct MCP tool calls have limitations:
- 🚫 Crash risk with large result sets (>15-20 items)
- 🚫 Token bloat - all results load into LLM context
- 🚫 Manual orchestration - multiple searches, manual deduplication
- 🚫 No ranking - results not sorted by relevance
The Solution
Code execution keeps large datasets in the execution environment:
- ✅ No crashes - only filtered results return to context
- ✅ Token efficient - process 100+ items, return top 20
- ✅ Auto-orchestration - multi-strategy search in one call
- ✅ Auto-ranking - results sorted by relevance
Features
Multi-Strategy Search
One function call performs:
- Semantic search (multiple variations)
- Keyword search (multiple modes)
- Tag-based search
- Automatic deduplication
- Relevance ranking
Safe Large Searches
# ❌ Old way: Crash risk results1 = zotero_semantic_search("query", limit=10) # Limited to 10 results2 = zotero_search_items("query", limit=10) # Another 10 # Manual deduplication, manual ranking... # ✅ New way: Safe and comprehensive orchestrator = SearchOrchestrator() results = orchestrator.comprehensive_search("query", max_results=20) # Fetches 100+, processes in code, returns top 20
Advanced Filtering
# Fetch broadly, filter in code library = ZoteroLibrary() items = library.search_items("machine learning", limit=100) # Safe! # Filter to recent journal articles filtered = orchestrator.filter_by_criteria( items, item_types=["journalArticle"], date_range=(2020, 2025) )
Installation
Requirements
- Python 3.8+
- Zotero MCP installed via pipx
- Claude Code or similar code execution environment
Setup
- Clone this repository:
git clone https://github.com/yourusername/zotero-code-execution.git cd zotero-code-execution
- Install dependencies (optional - usually already installed with Zotero MCP):
pip install -r requirements.txt
- Use in your code:
import sys sys.path.append('/path/to/zotero-code-execution') import setup_paths # Adds zotero_mcp to path from zotero_lib import SearchOrchestrator, format_results
Usage Examples
Basic Search
orchestrator = SearchOrchestrator() results = orchestrator.comprehensive_search("neural networks", max_results=20) print(format_results(results))
Filter by Author
library = ZoteroLibrary() results = library.search_items("Kahneman", qmode="titleCreatorYear", limit=50) sorted_results = sorted(results, key=lambda x: x.date, reverse=True) print(format_results(sorted_results))
Tag-Based Search
library = ZoteroLibrary() results = library.search_by_tag(["learning", "cognition"], limit=50) print(format_results(results[:20]))
Recent Papers
library = ZoteroLibrary() results = library.get_recent(limit=20) print(format_results(results))
Custom Filtering
library = ZoteroLibrary() orchestrator = SearchOrchestrator(library) items = library.search_items("AI", limit=100) # Only recent papers with DOI recent_with_doi = [ item for item in items if item.doi and item.date and int(item.date[:4]) >= 2020 ] print(format_results(recent_with_doi))
See examples.py for 8 complete working examples.
Claude Code Skill
This repository includes a Claude Code skill for easy integration.
Installation
Copy the skill to your Claude skills directory:
cp -r claude-skill ~/.claude/skills/zotero-mcp-code
Usage
In Claude Code, searches will automatically use the code execution pattern:
"Find papers about embodied cognition"
Claude will write code using this library instead of direct MCP calls.
See claude-skill/SKILL.md for complete skill documentation.
API Reference
SearchOrchestrator
Main class for automated multi-strategy searching.
comprehensive_search(query, max_results=20, use_semantic=True, use_keyword=True, use_tags=True, search_limit_per_strategy=50)
Performs comprehensive search with automatic deduplication and ranking.
Returns: List of ZoteroItem objects
filter_by_criteria(items, item_types=None, date_range=None, required_tags=None, excluded_tags=None)
Filter items by various criteria.
Returns: Filtered list of ZoteroItem objects
ZoteroLibrary
Low-level interface to Zotero.
search_items(query, ...)- Keyword searchsemantic_search(query, ...)- Semantic/vector searchsearch_by_tag(tags, ...)- Tag-based searchget_recent(limit)- Recently added itemsget_tags()- All library tags
Helper Functions
format_results(items, include_abstracts=True, max_abstract_length=300)- Format as markdown
See README_LIBRARY.md for complete API documentation.
Architecture
Based on Anthropic's code execution with MCP:
- Claude writes Python code (not direct MCP calls)
- Code fetches large datasets (100+ items) from Zotero
- Code processes in execution environment (dedup, rank, filter)
- Only filtered results return to LLM context (20 items)
Result: Large datasets stay out of context, preventing crashes and saving tokens.
Performance
Expected Benefits
Based on Anthropic's pattern and implementation design:
- Token reduction: 50-90% (exact amount depends on search size)
- Function calls: 5-10x → 1x (confirmed by design)
- Search limits: 10-15 → 100+ items (safe in code)
- Crash prevention: Likely effective (untested)
Status
⚠️ Proof of concept - Performance claims are theoretical projections, not measured results.
See HONEST_STATUS.md for detailed status and validation needs.
Documentation
- README_LIBRARY.md - Complete library documentation
- QUICK_START.md - Quick reference guide
- CLAUDE_INSTRUCTIONS.md - Instructions for Claude Code
- examples.py - 8 working examples
- IMPLEMENTATION_SUMMARY.md - Technical details
- HONEST_STATUS.md - Implementation status
- claude-skill/SKILL.md - Claude Code skill docs
Contributing
Contributions welcome! Areas for improvement:
- Performance validation - Measure actual token savings
- Better ranking - Incorporate semantic similarity scores
- Caching - Cache search results with invalidation
- Parallel processing - Execute search strategies concurrently
- Export functions - Batch BibTeX generation, CSV export
License
MIT License - see LICENSE file for details.
Credits
- Based on Zotero MCP
- Inspired by Anthropic's code execution with MCP
Related Projects
- Zotero MCP - The underlying MCP server
- Claude Code - Code execution environment
- FastMCP - MCP server framework
Citation
If you use this in research, please cite:
@software{zotero_code_execution, title = {Zotero Code Execution: Efficient Multi-Strategy Search}, year = {2025}, url = {https://github.com/kerim/zotero-code-execution} }
优点
- 高效处理大型数据集
- 自动去重和排名
- 支持多种搜索策略
- 减少搜索过程中的崩溃风险
缺点
- 性能声明是理论性的,未经测试
- 需要特定的环境设置
- 新用户可能有学习曲线
- 高级功能的文档有限
相关技能
免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。
版权归原作者所有 kerim.
