Co-Pilot / 辅助式
更新于 24 days ago

zotero-code-execution

Kkerim
0.0k
kerim/zotero-code-execution
80
Agent 评分

💡 摘要

一个Python库,通过安全的代码执行模式增强Zotero搜索,以高效管理大型数据集。

🎯 适合人群

需要高效文献搜索的研究人员将Zotero与自定义应用程序集成的开发人员寻找高级过滤选项的学者管理大型文献数据集的数据科学家进行全面文献综述的学生

🤖 AI 吐槽:看起来很能打,但别让配置把人劝退。

安全分析中风险

风险:Medium。建议检查:是否执行 shell/命令行指令;是否发起外网请求(SSRF/数据外发);API Key/Token 的获取、存储与泄露风险;文件读写范围与路径穿越风险;依赖锁定与供应链风险。以最小权限运行,并在生产环境启用前审计代码与依赖。

Zotero Code Execution

Efficient multi-strategy Zotero search using code execution pattern

License: MIT

A Python library for Zotero MCP that implements Anthropic's code execution pattern to enable safe, comprehensive searches without context overflow or crashes.

Skill Installation

For Claude Code

  1. Clone or download this repository
  2. Copy the skill/ folder to your Claude Code skills directory:
    cp -r skill ~/.claude/skills/zotero-mcp-code
  3. Restart Claude Code to load the skill

Quick Start

import sys sys.path.append('/path/to/zotero-code-execution') import setup_paths from zotero_lib import SearchOrchestrator, format_results # Single comprehensive search - fetches 100+ items, returns top 20 orchestrator = SearchOrchestrator() results = orchestrator.comprehensive_search("embodied cognition", max_results=20) print(format_results(results))

That's it! This automatically:

  • ✅ Performs semantic + keyword + tag searches
  • ✅ Deduplicates results
  • ✅ Ranks by relevance
  • ✅ Keeps large datasets in code (no crashes)

Multi-Term Searches

For OR-style searches (e.g., multiple spellings or languages), search each term separately and merge:

# Search for "Atayal" OR "泰雅族" all_results = {} for term in ['Atayal', '泰雅族']: results = orchestrator.comprehensive_search(term, max_results=50) for item in results: all_results[item.key] = item # Deduplicate by key # Re-rank combined results ranked = orchestrator._rank_items(list(all_results.values()), 'Atayal 泰雅族') print(format_results(ranked[:25]))

Why? Zotero treats multi-word queries as AND conditions. Searching "Atayal 泰雅族" finds items matching BOTH terms, not either term.

Why This Exists

The Problem

Direct MCP tool calls have limitations:

  • 🚫 Crash risk with large result sets (>15-20 items)
  • 🚫 Token bloat - all results load into LLM context
  • 🚫 Manual orchestration - multiple searches, manual deduplication
  • 🚫 No ranking - results not sorted by relevance

The Solution

Code execution keeps large datasets in the execution environment:

  • No crashes - only filtered results return to context
  • Token efficient - process 100+ items, return top 20
  • Auto-orchestration - multi-strategy search in one call
  • Auto-ranking - results sorted by relevance

Features

Multi-Strategy Search

One function call performs:

  • Semantic search (multiple variations)
  • Keyword search (multiple modes)
  • Tag-based search
  • Automatic deduplication
  • Relevance ranking

Safe Large Searches

# ❌ Old way: Crash risk results1 = zotero_semantic_search("query", limit=10) # Limited to 10 results2 = zotero_search_items("query", limit=10) # Another 10 # Manual deduplication, manual ranking... # ✅ New way: Safe and comprehensive orchestrator = SearchOrchestrator() results = orchestrator.comprehensive_search("query", max_results=20) # Fetches 100+, processes in code, returns top 20

Advanced Filtering

# Fetch broadly, filter in code library = ZoteroLibrary() items = library.search_items("machine learning", limit=100) # Safe! # Filter to recent journal articles filtered = orchestrator.filter_by_criteria( items, item_types=["journalArticle"], date_range=(2020, 2025) )

Installation

Requirements

  • Python 3.8+
  • Zotero MCP installed via pipx
  • Claude Code or similar code execution environment

Setup

  1. Clone this repository:
git clone https://github.com/yourusername/zotero-code-execution.git cd zotero-code-execution
  1. Install dependencies (optional - usually already installed with Zotero MCP):
pip install -r requirements.txt
  1. Use in your code:
import sys sys.path.append('/path/to/zotero-code-execution') import setup_paths # Adds zotero_mcp to path from zotero_lib import SearchOrchestrator, format_results

Usage Examples

Basic Search

orchestrator = SearchOrchestrator() results = orchestrator.comprehensive_search("neural networks", max_results=20) print(format_results(results))

Filter by Author

library = ZoteroLibrary() results = library.search_items("Kahneman", qmode="titleCreatorYear", limit=50) sorted_results = sorted(results, key=lambda x: x.date, reverse=True) print(format_results(sorted_results))

Tag-Based Search

library = ZoteroLibrary() results = library.search_by_tag(["learning", "cognition"], limit=50) print(format_results(results[:20]))

Recent Papers

library = ZoteroLibrary() results = library.get_recent(limit=20) print(format_results(results))

Custom Filtering

library = ZoteroLibrary() orchestrator = SearchOrchestrator(library) items = library.search_items("AI", limit=100) # Only recent papers with DOI recent_with_doi = [ item for item in items if item.doi and item.date and int(item.date[:4]) >= 2020 ] print(format_results(recent_with_doi))

See examples.py for 8 complete working examples.

Claude Code Skill

This repository includes a Claude Code skill for easy integration.

Installation

Copy the skill to your Claude skills directory:

cp -r claude-skill ~/.claude/skills/zotero-mcp-code

Usage

In Claude Code, searches will automatically use the code execution pattern:

"Find papers about embodied cognition"

Claude will write code using this library instead of direct MCP calls.

See claude-skill/SKILL.md for complete skill documentation.

API Reference

SearchOrchestrator

Main class for automated multi-strategy searching.

comprehensive_search(query, max_results=20, use_semantic=True, use_keyword=True, use_tags=True, search_limit_per_strategy=50)

Performs comprehensive search with automatic deduplication and ranking.

Returns: List of ZoteroItem objects

filter_by_criteria(items, item_types=None, date_range=None, required_tags=None, excluded_tags=None)

Filter items by various criteria.

Returns: Filtered list of ZoteroItem objects

ZoteroLibrary

Low-level interface to Zotero.

  • search_items(query, ...) - Keyword search
  • semantic_search(query, ...) - Semantic/vector search
  • search_by_tag(tags, ...) - Tag-based search
  • get_recent(limit) - Recently added items
  • get_tags() - All library tags

Helper Functions

  • format_results(items, include_abstracts=True, max_abstract_length=300) - Format as markdown

See README_LIBRARY.md for complete API documentation.

Architecture

Based on Anthropic's code execution with MCP:

  1. Claude writes Python code (not direct MCP calls)
  2. Code fetches large datasets (100+ items) from Zotero
  3. Code processes in execution environment (dedup, rank, filter)
  4. Only filtered results return to LLM context (20 items)

Result: Large datasets stay out of context, preventing crashes and saving tokens.

Performance

Expected Benefits

Based on Anthropic's pattern and implementation design:

  • Token reduction: 50-90% (exact amount depends on search size)
  • Function calls: 5-10x → 1x (confirmed by design)
  • Search limits: 10-15 → 100+ items (safe in code)
  • Crash prevention: Likely effective (untested)

Status

⚠️ Proof of concept - Performance claims are theoretical projections, not measured results.

See HONEST_STATUS.md for detailed status and validation needs.

Documentation

Contributing

Contributions welcome! Areas for improvement:

  1. Performance validation - Measure actual token savings
  2. Better ranking - Incorporate semantic similarity scores
  3. Caching - Cache search results with invalidation
  4. Parallel processing - Execute search strategies concurrently
  5. Export functions - Batch BibTeX generation, CSV export

License

MIT License - see LICENSE file for details.

Credits

Related Projects

Citation

If you use this in research, please cite:

@software{zotero_code_execution, title = {Zotero Code Execution: Efficient Multi-Strategy Search}, year = {2025}, url = {https://github.com/kerim/zotero-code-execution} }
五维分析
清晰度8/10
创新性8/10
实用性9/10
完整性8/10
可维护性7/10
优缺点分析

优点

  • 高效处理大型数据集
  • 自动去重和排名
  • 支持多种搜索策略
  • 减少搜索过程中的崩溃风险

缺点

  • 性能声明是理论性的,未经测试
  • 需要特定的环境设置
  • 新用户可能有学习曲线
  • 高级功能的文档有限

相关技能

pytorch

S
toolCode Lib / 代码库
92/ 100

“它是深度学习的瑞士军刀,但祝你好运能从47种安装方法里找到那个不会搞崩你系统的那一个。”

agno

S
toolCode Lib / 代码库
90/ 100

“它承诺成为智能体领域的Kubernetes,但得看开发者有没有耐心学习又一个编排层。”

nuxt-skills

S
toolCo-Pilot / 辅助式
90/ 100

“这本质上是一份组织良好的小抄,能把你的 AI 助手变成一只 Nuxt 框架的复读机。”

免责声明:本内容来源于 GitHub 开源项目,仅供展示和评分分析使用。

版权归原作者所有 kerim.