DesignSync

loveholidays' design system had no machine-readable source of truth. AI tools defaulted to generic patterns. 12+ internal prototypes looked like they came from different companies. Built the infrastructure to make brand-aligned AI output the default, not the exception.

Role
Principal Designer
Company
loveholidays
Year
2026
Duration
January–March 2026
Problem

The design system hadn't received meaningful technical investment in four years. As AI-assisted development accelerated, this became a structural liability — AI tools defaulted to generic patterns, and a 10-minute idea required two days of manual engineering correction.

Approach

Build machine-readable design system infrastructure — token pipeline, component registry, MCP server layer — that makes brand-aligned AI output the default. Prove it with working code, not a proposal.

Outcome

One build command generates 9,200+ lines of brand-specific CSS. Icon pipeline saves ~450 hours/year. Prototype benchmarked at 7.6/10 against aspirational industry practices. Q3 engineering investment secured.

9,200+

Lines of brand CSS from one build command

~450hrs

Manual icon work automated per year

7.6/10

Benchmark score vs. aspirational industry practices

Q3

Engineering investment secured for production

Snapshot

loveholidays’ design system hadn’t received meaningful technical investment in four years. Documentation was scattered across Figma, Storybook, and Google Docs. No single source of truth. No ownership. As AI-assisted development accelerated across the company, this became a structural liability. AI tools don’t interpret undocumented knowledge. They default to generic patterns and ignore brand conventions. The result: 12+ internal AI prototypes that looked like they came from different companies. A 10-minute idea required two days of manual engineering correction.

The goal: Build machine-readable design system infrastructure that makes brand-aligned AI output the default, not the exception. Prove it with working code, not a proposal.

My scope: Two designers. No dedicated engineering headcount. No formal coding background on either side. Built in approximately three months using Claude Code.

How I scoped it: The director of design made OKR space to improve the design system’s AI foundation. No defined scope. No solution specified. I partnered with the lead designer to define and build the solution ourselves, with light-touch support from a web infrastructure engineer to ensure we stayed within existing tech stacks. We used Claude Code to contribute directly to a running codebase as designers without formal engineering training.

Context and Stakes

loveholidays was scaling fast. AI prototyping had accelerated across product and engineering teams, but the design system couldn’t keep up. Every AI tool hitting the codebase defaulted to its own interpretation of the brand because no machine-readable source of truth existed.

We structured the project around seven testable hypotheses, from brand-aligned AI output through to market scalability and governance. The most important was H7: the cost of inaction compounds. Every month of delay raises the remediation frontier as AI continues to accelerate on a weak foundation.

The goal was to prove the infrastructure worked with code, not a proposal. Translating that business risk into an investable programme — and building credibility with engineering by shipping real infrastructure rather than a deck — is what the brief needed.

Approach and Key Decisions

Structured the brief before touching any tooling

Before writing a line of code, I mapped what “better” meant:

  • Brand-aligned AI output
  • Scalable multi-brand tokens
  • Machine-readable component intent
  • Human oversight baked into AI workflows
  • A clear path to engineering adoption

This gave us a testable framework rather than an open-ended build.

Built infrastructure, not documentation

The core insight was that the problem wasn’t missing docs. It was missing structure. Documentation in Figma or a Google Doc is invisible to AI tools. The design system needed to be queryable.

Token pipeline

Four-tier architecture built on W3C DTCG standards: Primitives → auto-generated Palette → curated Semantic → Component tokens. Components only reference the semantic layer. One build command generates 9,200+ lines of brand-specific CSS across three brands and two market variants, using OKLCH colour format for perceptual consistency across every theme switch.

Adding a new market like Austria is a 50-line JSON file. Onboarding a B2B partner requires no component code changes at all. Stack: Figma Token Studio, Style Dictionary v5, Tailwind CSS v4.

A core part of this work was defining a new token naming structure from scratch. Each tier has semantic meaning baked into the name itself. A token at the primitive tier like core.primary tells you it is a raw brand value. At the semantic tier, brand.primary.base tells you how that value is used across the brand. At the component tier, ds.button.primary.base.hover.background tells you exactly which element, which variant, which state, and which property it controls. Any AI tool, engineer, or designer reading the token name understands its intent without needing to trace it back through the system. This was a deliberate design decision: the name is the documentation.

Early whiteboard session mapping the four-tier naming structure
Early whiteboard session mapping the four-tier naming structure — showing how token names carry semantic intent at each layer from primitive to component
4-Tier Token Architecture
Tier 1

Primitives Core

Raw brand values. Human-defined, one per brand.

loveholidays core.primary=#0374DA
holidaypirates core.primary=#B10038
core.discount=#DE2A5C
auto-generates
Tier 2

Palette Auto-generated

Mathematical shade ramp from each primitive. Light (L10–L95) and Dark (D10–D95). Not used directly by components.

palette.primary.light.L60
palette.primary.dark.D80
curates
Tier 3

Semantic Brand selections

Curated selections that define how colours are used. Reduces complexity — only the meaningful shades.

brand.primary.base→ {core.primary}
state.success.lighter→ {palette.success.light.L80}
state.success.base→ {core.success}
state.success.dark→ {palette.success.dark.D60}
consumed by
Tier 4

Component Specific

Component-level tokens. What Figma and code consume. Name = documentation.

ds.badge.product.accent.soft.default.background→ {brand.accent.lighter}
ds.badge.state.success.base.default.foreground→ {surface.white}
ds.button.primary.base.hover.background→ {brand.primary.dark}

The key constraint

Components never reference Tier 1 or Tier 2 directly — always through the semantic layer. Switching brands (loveholidays → Holiday Pirates) only requires changing Tier 1. Everything else cascades automatically.

Pipeline
Figma Token Studio Git Style Dictionary Panda CSS / Tailwind / CSS vars
Tier 1–3: fully defined for loveholidays + Holiday Pirates. Tier 4: Button (945 variants) and Badge (123 tokens).
DesignSync six-phase pipeline from raw colour values to AI-ready React components
DesignSync six-phase pipeline — raw colour values through to AI-ready React components

Trade-off: governance baked in vs bolt-on

I chose to embed governance into the tools rather than write policy documents. Proto Studio, our AI prototyping sandbox, runs 11 discovery steps before generating a single line of code. It queries the registry, validates output, self-heals errors, and uses only components the design system actually supports. Teams can explore freely with DS primitives. Anything outside the system surfaces a transparent approval request before entering the codebase.

This was a slower build. Direct code generation would have been faster. But a tool that produces inconsistent output with no oversight doesn’t solve the original problem.

Used AI to build AI infrastructure

Both the AI illustration tool and the competitive analysis agent were specified, designed, and built by designers using Claude Code, with no engineering hire. Proof that designers can contribute directly to production codebases when the tooling is right.

The illustration tool encodes six specific skin tones, enforces a four-colour brand palette, and includes a scored approval workflow ($0.009 for concept drafts, $0.13 for hero banners). Generation costs are tracked per asset by default, not as an afterthought.

Outcomes

Infrastructure delivered

  • Token pipeline: one build command generates CSS, a Tailwind theme, and a JavaScript runtime object across three brands
  • Component library: six fully token-driven components following a strict nine-rule contract that AI tools cannot break accidentally
  • Machine-readable registry: intent, usage rules, and failure modes encoded for every component
  • Proto Studio: brand-aligned AI prototyping from text, image, or Figma URL input
  • AI illustration generation with human approval gates and per-generation cost tracking
  • Competitive analysis agent: full funnel capture (desktop and mobile) with CSS token extraction and structured UX gap analysis
  • Claude Code skills: reusable commands that load the full design system into any AI tool’s working memory

Commercial signal

Industry benchmarks frame the investment case. Miro’s equivalent infrastructure let a team of six serve 40+ product teams and cut design support queries by 70–80%. Enara Health dropped design system monitoring costs from $169/month to $0.20 using a comparable AI-ready foundation. DesignSync is built on the same principles. Those efficiency gains are the baseline expectation, not the stretch goal.

Internal validation

After a live prototype walkthrough, stakeholder feedback confirmed the tool was usable in its current state. The follow-up question from the room: “When can I use it?” Engineering investment backed for Q3 to move from prototype to production programme.

Retrospective

Leading with a working prototype rather than a proposal was the right call. It shifted the conversation from “is this worth investing in?” to “when can we productise this?” That’s a better place to be.

What I’d do differently: instrument usage earlier. I have strong proxy metrics (token coverage, automation hours, benchmark score) but no direct evidence yet on time-to-prototype reduction or design-to-dev error rates. Those baselines need to be captured from day one of the Q3 build, before adoption scales.

The next 12 months: get the token pipeline and registry into daily use across all product teams, and measure the reduction in AI prototype correction time against the pre-system baseline. The commercial case becomes significantly stronger once those numbers exist.

The broader lesson: design systems become strategic assets when they’re infrastructure, not documentation. The question isn’t whether AI will write more of the product. It’s whether the design system is ready to govern it.