Introducing Gamut, a vision-based preference model for UI and graphic design.

Aryan Malhotra · April 29th, 2026

Design appears to be the lovehandle of LLM progress. Today, a markdown file can become a competent typescript API, or a feature-complete C library, or a performant load balancer. It cannot yet, however, become a well-crafted user interface.

As our first step towards closing this gap, we introduce Gamut. Given images of two design variants, Gamut is trained to pick the "better" one, where "better" encompasses both aesthetic value and functional considerations.

Under the hood, Gamut is a ConvNeXt backbone + head classifier tuned on a highly informative image corpus, created and labeled entirely in-house. Gamut excels at discerning between plausible near-misses, where LLMs struggle due to 1 -- the coarseness of ViT patches, and 2 -- the OOD nature of the task itself.

Fig. 1: The loser (left) is preferred by GPT 5.5, Claude Opus 4.7, and Gemini 3.1 Pro, while the winner (right) is preferred by Gamut. Frontier LLMs fail to classify subtle perturbations in typographical choice, color restraint, spacing, and iconography. Incidentally, these are the exact features that tend to distinguish notions of "slop" from "taste".

On a held-out, diverse evaluation set of 160 pairs, Gamut achieves near-ceiling performance (95% CI [154, 160]) where the top-performing LLM, Gemini 3.1 Pro, is only modestly above chance (95% CI [93, 117]).

Fig. 2: Gamut outperforms frontier LLMs on a held-out eval. Winner-loser order was randomized across pairs with an identical seed for all models. All LLMs were given identical prompts and invoked at default temperature and reasoning-effort settings. Dotted red line represents baseline random-guess accuracy. 95% CI intervals shown for all models.

Putting performance aside, Gamut is also extremely efficient, totaling less than 90M parameters. This makes it an ideal choice for a wide variety of applications, including:

As a reward signal in your RL stack
As a judge to select the best of multiple LLM samples
As an objective function to directly optimize designs within a parametrized configuration space

If you have a use case for Gamut, we'd love to have you try it out on your own image pairs. Just request access, and we'll be in touch.