#search + #llm

Parallel Quality Benchmarks | Parallel Web Systems | Infrastructure for intelligence on the web

Dataset

We evaluated search providers against five open benchmarks covering complementary aspects of agentic search: BrowseComp (hard multi-hop questions that require navigating the live web), Frames (multi-document factoid reasoning), FreshQA (time-sensitive questions where the correct answer depends on recent web information), HLE (Humanity's Last Exam — expert-level academic questions spanning math, science, and humanities), SealQA (ambiguity-robust factoid QA with intentionally misleading snippets), WebWalker (tasks designed around following links across pages to find an answer).

Evaluation methodology

Every task is run through a shared deep-research harness: a single GPT-5.4 agent is given two tools (web search and web fetch) with an iterative budget of up to MAX_TOOL_CALLS=25 tool calls per question. The agent plans sub-queries, fans out searches, fetches specific pages when snippets are insufficient, and returns an answer when it exhausts the number of allowed tool calls or has sufficient information to answer the question. Each answer is then LLM-graded by GPT-5.4. We report accuracy of the final answer.

We measure accuracy and overall cost, which includes LLM token costs and tool call costs.

Testing dates

April 19-21, 2026

#6:40 PM

benchmarks mcp search llm

Parallel Web Systems | Infrastructure for intelligence on the web

parallel.ai/

The highest accuracy web search for your AI

Why use Parallel Search vs. the default search in Claude?

Parallel runs its own web-scale index (billions of pages, millions added daily) and returns dense, query-relevant excerpts instead of raw HTML or SEO-ranked snippets. On public benchmarks, Parallel outperforms the default search in leading frontier models. Your agent reaches the right answer in fewer round trips and with less wasted context. – https://parallel.ai/blog/free-web-search-mcp

#6:06 PM

mcp code search llm

Friday, May 22, 2026

Parallel Quality Benchmarks | Parallel Web Systems | Infrastructure for intelligence on the web

Parallel Web Systems | Infrastructure for intelligence on the web