Preview: Deep Link API

We're a research lab working toward a browsing experience that understands what you're trying to do and skips the tedious mechanics of getting there. Not full automation. Not a background agent that takes over. Just less friction between intent and action.

This is our first experiment along that path. The Deep Link API is a narrow, testable idea: what if a browsing agent never had to click a filter again? It’s a critical step on the path to deeply understanding each site.

You can try it in the playground with your own tasks. Below is the problem it solves, how it works, and what we found when we benchmarked it.

We gave state-of-the-art browser-use models a standard industry benchmark task:

Find an Airbnb in Cleveland for three nights. The check-in date is the day after tomorrow. We have 2 adults, 2 kids, and 1 pet. The budget is $100 to $300 per night. Essential amenities include free parking, a washer, and a gym.

It took Gemini 3 minutes and 6 seconds, 25 round-trips, and 61,819 tokens to populate the filters. All that interaction to build up state the site already accepts as a URL:

https://www.airbnb.com/s/Cleveland--OH/homes?adults=2&checkin=2026-02-11
  &checkout=2026-02-14&query=Cleveland%2C+OH&children=2&pets=1
  &price_min=100&price_max=300
  &amenities%5B%5D=9&amenities%5B%5D=33&amenities%5B%5D=15

This pattern repeats across nearly every site we tested. Browsing agents spend the majority of their time, tokens, and actions on what is essentially state-building: clicking into dropdowns, selecting filters, typing into search boxes, waiting for pages to reload. The actual task, evaluating results and making decisions, barely gets started before the budget is spent.

Intent to Path

Our API takes a different approach. Given a task description and a target domain, it returns a stateful URL that skips the state-building entirely, landing the agent (or user) directly at the decision-making point.

The Airbnb task above resolves in under 5 seconds. No round-trips. No filter clicking. No token burn.

This works because most major websites already encode their application state in the URL. Filter selections, sort orders, search queries, pagination, date ranges: it's all there in the query string. Browsing agents just aren't using it. They're interacting with the UI as if they were a person with a mouse, when the faster path was always a URL.

Benchmark results

We ran the Intent to Path API against the Online Mind2Web benchmark, 77 "hard" level tasks across 59 real websites, comparing OpenAI's CUA agent with and without Tilt.

Speed

Median task runtime dropped from 124.7 seconds to 54.7 seconds, a 56% reduction. Total runtime across all 77 tasks fell from 187 minutes to 90 minutes. 79% of tasks got faster, and over half saw speed improvements of 50% or more.

Cost

Median tokens per task dropped from 54,275 to 20,539, a 62% reduction. Median model calls fell from 21 to 7. Across all tasks, total token usage dropped by 49% (5.88M to 3.02M) and browser actions fell by 54%.

On some sites the reductions were dramatic:

Domain	Tokens: Before → After	Reduction	Requests: Before → After
CarMax	215,886 → 5,826	-97.3%	45 → 3
KBB	148,998 → 6,212	-95.8%	39 → 3
StubHub	63,846 → 6,524	-89.8%	20 → 3
Target	46,867 → 6,087	-87.0%	23 → 3
Airbnb	61,819 → 13,981	-77.4%	25 → 5
IKEA	87,861 → 25,420	-71.1%	28 → 9

Reliability

Task completion rose from 76.6% to 87.0%, a net gain of 8 additional tasks completed. This was unexpected. We built the API to improve speed and cost, but it turns out that skipping the state-building phase also removes a significant source of failure. Fewer interactions means fewer opportunities for the agent to get lost, click the wrong element, or trigger an unexpected page state.

Long tail compression

One of the more interesting findings: Tilt doesn't just improve the average case, it compresses the long tail. The slowest tasks (p95) dropped from 302 seconds to 183 seconds. The most expensive tasks (p95) dropped from 217,164 tokens to 125,770. For anyone running agents at scale, predictable worst-case performance matters as much as the median.

What this doesn't solve

Intent to Path handles the state-building phase of a task. It gets the agent to the right page, with the right filters applied, ready to act. It does not help with what comes after: evaluating results, comparing options, navigating multi-step workflows, or handling sites that require authentication before any meaningful state can be built.

We also found that not every site encodes its state cleanly in URLs. Some rely heavily on session state, client-side rendering, or non-standard routing. Coverage will grow, but it's worth being honest about the boundaries.

What's next

Intent to Path is our first public experiment. It addresses one specific piece of the problem: the web wastes enormous amounts of time on repetitive mechanics, and full delegation isn't the answer.

We're working on approaches to the other patterns we identified, anticipating user preferences, handling tasks that can't be expressed through search, and finding the right balance between automation and human control. More write-ups coming soon.