Historical technical snapshot · v2.0

A Transparent, Multi-Signal Ranking Model for Real-Estate Search

The algorithm behind ChatHome vibe search — query understanding into a query knowledge graph, a bounded clarification loop, semantic retrieval, twelve scoring functions, the relevance/quality partition, and accountability by design.

Version 2.0 Published 3 July 2026 (v1.0: 12 June 2026) Authors ChatHome Engineering, Luxembourg Code Open to review on request (hello@chathome.lu) Primary scoring source at publication apps/web/src/lib/search/ranking.ts

Historical snapshot — not the current implementation reference. This paper records the v2.0 scoring model published on 3 July 2026. Current production listing and vibe retrieval uses gemini-embedding-001 at 1,536 dimensions in embeddings.description_vector_v2, with COSINE vector retrieval inside a hybrid pipeline and an active semantic reranking stage. Later production changes are listed in the Ranking Policy changelog.

Abstract

ChatHome lets people describe what they want in natural language — “a calm two-bedroom near Kirchberg under €2,200 with good light” — and ranks Luxembourg listings by how well they satisfy that intent. This paper documents the v2.0 scoring model as published on 3 July 2026; it is not an end-to-end description of the current production retrieval and ordering pipeline. At publication, a query was parsed into a structured search plan, candidates were retrieved by semantic similarity over 768-dimensional multilingual embeddings and gated by deterministic hard filters, then twelve interpretable component functions mapped each candidate to the unit interval [0,1]. Components were partitioned into relevance and quality signals, weights were segment-specific and boosted by explicit intent, and missing signals were dropped through signal-aware renormalisation. The current production embedding and retrieval facts are stated in the historical-status notice above; later changes are maintained in the Ranking Policy changelog. Critically, no term in the model described here encodes payment, advertising spend, or commercial relationship: ranking cannot be bought.

1. Design principles

2. Notation

3. Pipeline overview

4. Query understanding & the query knowledge graph

5. Candidate generation & hard filters

6. The twelve scoring functions

7. Weighting & intent boosting

8. The relevance/quality partition & renormalisation

9. Diversity guardrail

10. Explainability

11. Complexity

12. What never influences ranking

13. Limitations & future work

A. Weight reference table

B. Symbol glossary

C. Revision history

1. Design principles

The model is built around six commitments that constrain every design decision:

Interpretability over opacity. Each signal is a named, bounded function of observable listing and query attributes. There is no learned end-to-end black box whose output cannot be decomposed.
Determinism. Given the same query, listing corpus, and measurement inputs, the ranking is reproducible. Identical inputs yield identical orderings. Even the clarification dialogue is deterministic: which questions are asked, and how answers modify the plan, is fixed by the query knowledge graph, not by a model's mood.
Graceful degradation. Missing data (no energy certificate, no measured commute, no benchmark) removes a signal rather than scoring it as zero. Absence is not penalised.
Ask, don't guess. When the query omits a must-have dimension (location, budget), the system asks the user — a bounded number of times, always skippable — instead of silently inventing a constraint on their behalf.
Intent alignment. Weights adapt to who is searching (renter, family, student, buyer, investor) and to the constraints they made explicit. The displayed match score reflects only what the user asked for; listing “niceness” can never impersonate query fit.
Incorruptibility. No feature represents money. Position is earned by relevance, never by payment.

2. Notation

Let q denote a user query and L = {ℓ₁, …, ℓ_n} the candidate listings. For a listing ℓ_i we write a set of component scores v_k(ℓ_i) ∈ [0,1], indexed by the twelve components k ∈ K = {sem, loc, com, bud, spc, vibe, egy, tr, fr, mkt, life, pers}. Each component additionally emits a boolean signal flag h_k(ℓ_i) ∈ {0,1} indicating whether it had enough data to produce a meaningful value. The clamp operator is

clamp₀₁(x) = max(0, min(1, x)) (1)

K is partitioned (§8) into relevance components K_R = {sem, loc, com, bud, spc, vibe} and quality components K_Q = {tr, fr, mkt, life, pers}; egy joins K_R only when the user voiced an energy preference, otherwise K_Q. A weight vector w = (w_k)_k∈K assigns a non-negative importance to each component, with the base vectors normalised so that ∑_k w_k = 1.

3. Pipeline overview

Natural-language query → LLM query understanding → Search plan → query knowledge graph
↓
Completeness gate → Clarification questions (≤2 rounds, skippable) → Completed plan
↓
Semantic retrieval at publication (768-d cosine) → Hard filters → Candidate set L
↓
12 component scores → Segment weights + intent boost → Relevance/quality split + renormalisation
↓
Relevance score σ + capped quality tie-breaker → Diversity guardrail → Ranked results + two-tier explanations

Figure 1. The v2.0 pipeline at publication. Later retrieval, clarification, calibration, reranking, and guardrail changes are documented on the Ranking Policy page.

4. Query understanding & the query knowledge graph

4.1 The search plan

A language model parses q into a structured search plan P with three parts:

Hard filters — discrete constraints the user stated or strongly implied: transaction intent (rent/buy), price floor/ceiling, bedrooms, surface bounds, communes / cantons / region, canonical locality references and points-of-interest references, cross-border scope, and exclusion tags.
Soft preferences — graded desiderata: vibe tags, commute target and tolerance, energy preference, budget tier, noise preference.
Segment labels — the inferred intent (rent vs buy) and household type (e.g. family, student, couple, cross-border, investor), which select the weight profile in §7.

The plan also carries a plan-strength estimate and a list of ambiguities — fields the extractor resolved but is not certain about (a bare “Luxembourg” may mean the city or the central region). The query understanding layer is itself grounded: figures, localities, and POIs are resolved against the corpus rather than free text, so the ranker measures real geography instead of matching strings.

4.2 The query knowledge graph

Every plan field maps to a node in a fixed query knowledge graph, and every node carries a requirement tier. The graph is the single source of truth for “is this query complete enough to search well?” — the clarification gate (§4.3), the UI's ranking hints, and the explanation layer all read the same structure, so they can never contradict each other.

Table 0. The query knowledge graph: field tiers and what satisfies them.
Field	Tier	Satisfied by (any of)
Transaction intent	must-have	rent/buy — always present (locked by the UI toggle).
Location	must-have	communes, region, cantons, locality references, a proximity target, or POI references.
Budget	must-have	an explicit price ceiling or budget ceiling.
Property type	must-have	dwelling type, property-type list, or a commercial/mixed use-class.
Cross-border scope	must-have*	always satisfied — the default (Luxembourg-only) is a valid state, registered as must-have so the platform's choice is always disclosed, never hidden.
Bedrooms	good-to-have	a minimum bedroom count.
Size	good-to-have	a surface floor or ceiling (m²).
Pets / furnished	extra	an explicit boolean either way.
Vibe tags	extra	soft tags or vibe keywords.
Energy	extra	a voiced energy preference.

Evaluating the graph over a plan yields, per field, {filled, waived} and the set of missing must-haves. A field the user explicitly declined to answer (“no limit”) counts as waived and never re-asks. Good-to-have and extra fields never block a search; they surface as optional depth questions or UI hints.

4.3 The clarification loop

When must-have fields are missing, the system asks rather than guesses — but under strict, verifiable bounds:

Deterministic questions. The question set is a pure function of the missing fields: a location question with commune/region choices, a budget question with intent-scaled price chips (rent-scale for rent, purchase-scale for buy), a property-type question with dwelling choices. Every question accepts free text and every question can be skipped.
Optional decoration only. A small language model may rewrite the question text into the user's locale and add query-grounded suggestion chips — but its output is strictly additive, re-validated against the same canonical vocabularies (commune gazetteer, dwelling taxonomy, intent-scaled price bounds), and time-boxed: on any failure the deterministic question set ships unchanged.
Confirmation questions. For fields that are present but ambiguous, a non-blocking “did you mean…?” question presents the extractor's candidates with its best guess marked as recommended. The best guess is already in the plan, so ignoring the question costs nothing.
Bounded rounds. At most two rounds, enforced server-side. Three independent termination guards — the round cap, waived fields counting as filled, and an explicit “search anyway” — make an endless interrogation structurally impossible.
Deterministic answer application. Chip answers patch the plan without any model in the loop: a commune resolves through the gazetteer, a budget through a strict parser (“2k”, “1.5m”, “800 000”), a dwelling type against the taxonomy. Budget answers on the wrong scale for the locked intent (a “€1.2M” budget while renting) are rejected and disclosed, never silently applied or “corrected”. Only free text that no deterministic parser could resolve is given to a single, bounded refinement call — and anything still unresolved is reported back to the user as not applied.
One question of depth, at most. When the must-haves are already complete on the first pass and the query carried no soft signal, one optional, multi-select “anything else that matters?” question may be offered — never re-asked, never blocking, and suppressed whenever a confirmation question is already on screen.

The clarification gate is client-declared: it fires only for interfaces that state they can render questions, so programmatic callers (APIs, agents, map view) receive direct results, and interfaces where the user just edited a specific field re-run with clarification suppressed — the system never re-asks for something the user deliberately removed.

4.4 From graph to hard filters

Once the graph is complete (or explicitly waived), the plan's hard-filter fields are applied as retrieval-time predicates (§5) and its expressed dimensions drive the weight boost (§7.1) and the relevance partition (§8). The same graph also powers the “add for more precise results” hint chips on weak plans, guaranteed consistent with what the gate would ask.

5. Candidate generation & hard filters (v2.0 snapshot)

At publication, the candidate set was produced by approximate nearest-neighbour retrieval over multilingual sentence embeddings. The query and every listing were embedded into the same D = 768-dimensional space with text-multilingual-embedding-002 (covering EN/FR/DE/LU/PT). For unit-normalised vectors the index returned the cosine distance

d_i = 1 − cos(e_q, e_i) = 1 − e_q · e_i‖e_q‖ ‖e_i‖ (2)

Retrieval scope defaults to Luxembourg; listings from the French Grande Région enter the candidate pool only when the user opts in to cross-border scope — a first-class, always-disclosed toggle (§4.2), never a silent expansion.

Before scoring, several classes of listing are removed outright (a hard mask, not a soft penalty):

Intent mismatch — a buy listing under a rent query, or vice versa. When the query text contradicts the rent/buy toggle, results still render and a non-blocking banner offers a one-tap intent switch.
Disabled / dismissed — administratively disabled listings, and listings the signed-in user has explicitly dismissed.
Excluded attributes — any listing bearing a tag in the query's exclusion set (e.g. “no ground floor”).
Unverifiable price under a budget constraint — when the query carries a budget, “price on request” listings are excluded fail-closed: a price that cannot be verified against the ceiling is treated as not satisfying it. The number of listings excluded this way is counted and disclosed to the user, with a one-tap “search without budget” escape.

One deliberate asymmetry: a stated minimum surface is enforced fail-closed in the classic (filter-based) search path, but fail-open in vibe search — vibe is recall-oriented, and listings without a recorded area are down-weighted by the space-fit signal (§6.5) rather than hidden. Survivors form L. Everything downstream is a continuous re-ranking of L; the hard mask only ever removes, never re-orders.

6. The twelve scoring functions

Each function maps a listing to [0,1] and reports whether it had a signal. We give the exact form of each. A recurring theme in v2.0 is de-saturation: v1.0's flat plateaus (any in-budget price → 1.0, anywhere inside a locality → 1.0, any under-budget commute → 1.0) tied entire result pages at identical scores; each plateau is now a gentle, continuous, monotone gradient that restores discrimination without punishing good matches.

6.1 Semantic affinity (sem)

Directly converts the retrieval distance (2) into a similarity:

v_sem = clamp₀₁(1 − d_i) (3)

Signal semantics matter here. When the vector channel ran and a listing was not among its results, that is a measured weak match: the listing scores v_sem = 0 with the signal flag set, so it can never gain rank by having its semantic component silently dropped from the renormalisation. Only when the vector channel itself failed (no distances for anyone) is the component dropped, and the route degrades disclosed-and-fail-closed rather than pretending semantic relevance was assessed.

6.2 Location fit (loc)

For each requested locality reference r with corpus centroid c_r and radius ρ_r (default 1 km), let κ_r = haversine(g_i, c_r) be the great-circle distance from the listing's geocode g_i. Inside the radius the score is a gentle gradient from 1 at the centre to λ = 0.85 at the edge; outside, the pre-existing linear decay (zero at three radii) is rescaled to join the gradient continuously, so a listing just outside can never outscore one at the edge inside. The component takes the best-matching locality:

s_r = {

1 − (1 − `λ`)·`κ`_r`ρ`_r,	`κ`_r ≤ `ρ`_r
max(0, `λ`·(1 − `κ`_r − `ρ`_r2`ρ`_r)),	`κ`_r > `ρ`_r

, v_loc = max_r s_r (4)

When no locality was geocoded, a legacy string fallback assigns 1.0 on a district match and 0.8 on a whole-token city-name match (token matching, so a short commune name cannot spuriously match inside an unrelated city name); with no location intent at all the component is dropped. Cross-border listings score no-signal on this component — they do not resolve against the Luxembourg locality corpus and must not be zeroed unfairly on it.

6.3 Commute fit (com)

For each POI reference p with a measured door-to-door ETA t_i,p (minutes, from Maps grounding or isochrone retrieval), the score is a banded, monotone function of the absolute ETA — so a genuinely close commute beats a far-but-under-budget one (v1.0's flat 1.0-under-budget tied Pétange with Luxembourg-Ville for “commute to Kirchberg”):

s_p : ≤10 min→1.0, ≤20→0.85, ≤30→0.7, ≤45→0.5, ≤60→0.3, else 0.1 (5)

When the user stated an explicit time budget B_p and the ETA exceeds it, the band score is additionally capped by a decaying penalty:

s_p ← min( s_p, max(0, 0.5·(1 − t_i,p − B_pB_p)) ), v_com = max_p s_p (6)

The ranker scores only commutes it has actually measured; it never infers proximity from text. A legacy aggregate (the same bands over city-level ETA, or a distance-banded fallback over haversine distance to canonical targets such as a train station) is retained for queries predating per-listing ETA measurement.

6.4 Budget fit (bud)

Let the listing price be x_i and the requested range [φ, γ] (floor, ceiling; either may be absent). In-range prices score a headroom gradient: full marks up to η = 0.8 of the ceiling, easing linearly to β = 0.9 at exactly the ceiling. Above the ceiling the original decay (zero at 2γ) is rescaled to start at β, keeping the curve continuous and monotone — €1 over the ceiling can never outscore exactly-at-ceiling. Below the floor the price scales toward it:

v_bud = {

1,	`x`_i ≤ `ηγ` (in range)
1 − (1 − `β`)·`x`_i/`γ` − `η`1 − `η`,	`ηγ` < `x`_i ≤ `γ`
max(0, `β`·(1 − `x`_i − `γγ`)),	`x`_i > `γ`
max(0, `x`_i`φ`),	`x`_i < `φ`

(7)

Floor-only queries keep the flat 1.0 in range — with no ceiling there is no headroom concept to grade against. With no explicit bounds, a soft budget tier (anchors: budget €1,500, mid €3,000, luxury €6,000) maps the price-to-anchor ratio x_i/a to 0.75 (≤1.0), 0.55 (≤1.3) or 0.35 (otherwise). Absent both, the component is dropped. (Recall from §5 that under a stated budget, price-on-request listings never reach this function — they are excluded fail-closed upstream, with the count disclosed.)

6.5 Space fit (spc)

Two sub-scores are averaged over whichever are available. The bedroom sub-score rewards meeting the requested count β and tolerates a little over-provision:

b = { 1 if beds=β; 0.85 if β+1; 0.7 if ≥β+2; 0.45 if β−1; 0.1 otherwise } (8)

One anti-evasion rule: a listing with no recorded bedroom count but a surface of at most 32 m² is inferred to be a studio (0 bedrooms) and scored as such — otherwise a 16 m² studio would dodge a “2 bedroom” requirement by dropping the space dimension entirely and keep a misleadingly high match. Larger null-bedroom listings remain unknown (sub-score dropped without penalty). The surface sub-score α = 1 when the area lies within [A_min, A_max], 0.2 when below the minimum, 0.3 when above the maximum. With parts S ⊆ {b, α} present,

v_spc = 1|S| ∑_z∈S z (9)

6.6 Vibe-tag overlap (vibe)

Given the query's tag set Q and the listing's tag set T, each query tag is first expanded through a curated multilingual alias table — “bright” matches the canonical bright_natural_light; “cosy” matches gemütlich, chaleureux, acolhedor, gemittlech; accents are folded — and counts as satisfied if any alias hits a listing tag. With m_tag tags so satisfied, the primary score is coverage:

v_vibe = m_tag|Q| (if m_tag > 0) (10)

When structured tags are missing, the same alias expansion runs as a free-text fallback over the listing's title/description/locale fields, yielding v_vibe = min(1, 0.2 + 0.8·m/|Q|), where m is the number of query tags with any alias found in the text. The component is dropped if the query carries no tags.

6.7 Energy efficiency (egy)

A direct ordinal mapping of the energy-performance class:

A→1.0, B→0.85, C→0.70, D→0.55, E→0.40, F→0.25, G→0.10; off-scale class→0.30 (11)

Missing-certificate handling depends on whether the user asked about energy. When they did not, no certificate means no signal — the component drops, and energy acts only as a quality tie-breaker (§8). When the user did voice an energy preference, a missing class is a measured uncertainty, not “no data”: the listing is scored at the uncertainty value 0.3 so that an unknown-energy listing participates as a mild miss instead of being lifted by renormalisation above a known-but-mediocre class-G.

6.8 Listing trust & completeness (tr)

A weighted measure of how complete and verifiable a listing is — more photos, richer description, a certificate, and a stated price all raise it. With n_img images and a description of ξ characters:

v_tr = 0.40·clamp₀₁(n_img6) + 0.30·clamp₀₁(ξ280) + 0.15·𝟙[CPE] + 0.15·𝟙[price] (12)

6.9 Freshness (fr)

A monotone-decreasing step function of listing age in days, with a faster decay for rentals than for purchases (rentals turn over more quickly). For rentals, ages of <3, <7, <14, <30, <60 days score 1.0, 0.9, 0.7, 0.5, 0.3 and 0.1 thereafter; for purchases the thresholds are <7, <30, <90, <180, <365 days for the same descending values. Dropped when no creation timestamp exists.

6.10 Market value (mkt)

Compares the listing's price-per-m² against the commune benchmark μ_c. With ppm = x_i/A_i and relative delta δ = 100·(ppm − μ_c)/μ_c:

v_mkt = {

1.0,	`δ` ≤ 0
0.7,	0 < `δ` ≤ 10
0.5,	10 < `δ` ≤ 20
0.3,	`δ` > 20

(13)

Below-benchmark pricing scores highest. Dropped when price, area, or benchmark is unavailable.

6.11 Lifestyle fit (life)

Uses precomputed per-listing lifestyle indices on a 0–100 scale. Families read the family-friendly index; students, couples and cross-border workers read the urban-professional index; otherwise the two are averaged. All are divided to the unit interval. Dropped when no lifestyle indices exist — including the all-zero case, which the ingestion layer uses as “no data” and which must not be mistaken for “worst in class”.

6.12 Personalization (pers)

For signed-in users, a per-listing affinity in [0,1] derived upstream from a privacy-preserving user-taste profile (embedding-space similarity between the profile vector u and the listing embedding e_i), passed into the ranker per listing:

v_pers = clamp₀₁(cos(u, e_i)) (14)

Dropped entirely for anonymous sessions, so signed-out ranking is unaffected by any profile. As a quality-side component (§8), personalization can nudge ordering among near-equal results but can never enter the displayed match score.

7. Weighting & intent boosting

The twelve components are combined as a convex combination. The base weight vector is selected by the user's segment (intent × household type). Table 1 in Appendix A gives the full set; each column sums to 1. The profiles encode intuitive priorities — families weight location and space; students weight commute and budget; investors weight market value; renters spread weight toward semantic and personalization.

7.1 Explicit-intent weight boosting

When the user makes a constraint explicit — as recorded by the query knowledge graph's expressed dimensions — the corresponding weight is increased before normalisation, sharpening the ranking toward what they actually asked for. Writing 𝟙[·] for the indicator of an explicit constraint:

w′_loc = w_loc + 0.10·𝟙[explicit location] · w′_com = w_com + 0.12·𝟙[commute target]
w′_bud = w_bud + 0.05·𝟙[budget stated] · w′_vibe = w_vibe + 0.05·𝟙[tags present]
w′_spc = w_spc + 0.05·𝟙[size/beds stated] (15)

The resulting vector w′ need not sum to 1; normalisation is deferred to §8, which also handles missing signals in the same pass.

8. The relevance/quality partition & renormalisation

This section is the heart of the model's honesty properties, and the largest change since v1.0. Two distinct failure modes are prevented here: missing data must not penalise a listing (signal-aware renormalisation, §8.2), and a nice listing must not impersonate a matching one (the relevance/quality partition, §8.1).

8.1 The partition

The components split along the query knowledge graph:

Relevance components K_R = {sem, loc, com, bud, spc, vibe} measure fit against dimensions the user can — and did — express in the query. They alone form the displayed match score.
Quality components K_Q = {tr, fr, mkt, life, pers} are absolute listing properties the user never asked for. They act only as a small, capped, invisible tie-breaker.
Energy is partitioned at runtime: relevance when the user voiced an energy preference, quality otherwise.

The rationale: in v1.0 all twelve components entered one weighted sum, so a fresh, photo-rich, below-benchmark listing could out-score a listing that better matched the actual query — and the displayed “% Match” conflated “matches what you asked” with “is generally a good listing”. The partition makes the displayed score answer exactly one question: how well does this satisfy what you asked for?

8.2 Live sets and effective weights

Within each side of the partition, signal-less components are dropped and the remaining weights renormalised — absence removes a signal, it never scores as zero:

𝓛_R(ℓ_i) = { k ∈ K_R : h_k(ℓ_i) = 1 }, 𝓛_Q(ℓ_i) = { k ∈ K_Q : h_k(ℓ_i) = 1 } (16)

ŵ_k = w′_k · 𝟙[k ∈ 𝓛_R] ∑_{j∈𝓛_R} w′_j (and identically for 𝓛_Q) (17)

By construction the effective weights sum to 1 whenever at least one component is live, so two listings are always compared on a common, fully-weighted basis even when they expose different subsets of attributes. Note the interaction with §6.1: a listing the vector channel measured as irrelevant carries sem = 0 with a signal — renormalisation can never rescue it by dropping the semantic axis.

8.3 Displayed score and rank score

The displayed 0–100 match score is the renormalised relevance combination alone:

σ(ℓ_i) = 100 · clamp₀₁( ∑_{k∈𝓛_R} ŵ_k · v_k(ℓ_i) ) (18)

Quality enters only the internal ordering key, capped at c = 2 points:

rank(ℓ_i) = σ(ℓ_i) + min( c, c · ∑_{k∈𝓛_Q} ŵ_k · v_k(ℓ_i) ) (19)

Listings are sorted by rank(·) in descending order. Among near-equal-relevance results the better-quality listing ranks first, but quality can never cross relevance bands (the cap) and never appears in the displayed score. When a listing has no live relevance component at all, its score is suppressed rather than fabricated — the UI shows no match percentage instead of an empty claim.

Removed in v2.0: v1.0's “low-signal variance spreading” (a semantic-distance nudge applied when fewer than three components were live) is gone. The de-saturation gradients of §6.2–6.4 restore score separation at the source, and the relevance-only display makes sparse-query scores honest without a corrective term.

9. Diversity guardrail

Pure relevance sorting can fill the entire first page with one commune. A post-ranking guardrail enforces a minimum of m = 3 distinct communes within the top N = 10 results. If the top slice already spans m communes, it is untouched. Otherwise the algorithm scans the tail for the highest-scoring listings that introduce a new commune and promotes them, displacing the lowest-scoring entries of the top slice. Formally, the guardrail returns the smallest set of promotions π such that

| { commune(ℓ) : ℓ ∈ top_N ∪ π } | ≥ m (20)

It only reorders the top N; deeper results are preserved. The guardrail's activation is recorded so it can be surfaced to the user.

10. Explainability

Because the score is a transparent convex combination, every result carries a decomposition — now delivered in two tiers:

10.1 Tier 1 — deterministic, numbers-first reasons

Zero LLM. Each measured dimension emits a signed, locale-templated reason carrying the actual numbers: “14 min to Kirchberg — 6 under your 20-min target” (positive), “€180 over your ceiling (+8%)” (negative), “12% below the commune's €/m² benchmark”, “one bedroom fewer than requested”. The polarity is baked into the reason key, so the panel renders honest two-sided rows — a listing is never described only by its good sides. Energy classes are bucketed by rank: A–C is presented as a strength, F–I as a weakness, D–E (and off-scale values such as “NC”) as neutral context — unknown is not a claim. Two consistency rules hold by construction: the score breakdown shows only live relevance components (matching the relevance-only headline score), while the reason list may additionally carry quality-side context (market delta, energy class) clearly as context — informative, but never score-inflating. Components with no signal are omitted rather than shown as 0%, so an explanation never contradicts the score.

10.2 Tier 2 — optional narrative

A single batched language-model call per search may rewrite the Tier-1 reason set for the top results into one fluent, locale-aware sentence each. It is strictly presentational: the narrative is generated from the deterministic reasons (never from the listing directly), cached, budget-capped, and fails soft — on any failure the numbers-first Tier-1 rows render unchanged. The panel never blocks on, and never depends on, the LLM.

Any user or agent may ask why a listing ranked where it did and receive the exact contributing terms.

11. Complexity

In the v2.0 snapshot, candidate generation was sub-linear in the corpus via an approximate-nearest-neighbour index over the 768-d space. Scoring was O(|L| · |K|) = O(12|L|) = O(|L|) with a small constant: each component was O(1) except location and commute, which were O(|R|) and O(|P|) in the number of requested localities and POIs. This complexity statement is historical; the current hybrid retrieval and reranking pipeline has additional stages.

12. What never influences ranking

The twelve components are inputs to the interpretable fit score described in this historical paper. Candidate retrieval, hard filters, calibration, semantic reranking, near-duplicate handling, and diversity guardrails can also affect which listings appear and their final order. None of those paths uses:

Payment, advertising spend, or subscription tier. There is no “promoted” multiplier. An agency cannot pay to rank higher.
Commercial relationship with ChatHome. First-party and third-party listings are scored identically.
Manual editorial overrides on individual listings.

Material production ranking changes are published in the Ranking Policy changelog. This paper is a dated technical snapshot; the repository is the source of truth for the current implementation and is open to review on request (email hello@chathome.lu).

13. Limitations & future work recorded at publication

Piecewise constants. Several components (freshness, market value, the commute bands) use step functions for interpretability; smooth monotone curves would remove threshold artefacts at the boundaries. (The budget and location plateaus were smoothed in v2.0; the remaining steps are candidates.)
Static weight profiles. Segment weights are hand-tuned, not learned. A learning-to-rank layer constrained to remain monotone and decomposable could calibrate weights from anonymised engagement while preserving auditability.
Quarter-level geography. Locality centroids discriminate communes well; ranking a true city-quarter match above listings in adjacent communes at similar centroid distance remains weaker than we would like.
Benchmark coverage. The market-value signal depends on commune-level €/m² benchmarks; sparsely-transacted communes yield no signal and the component is dropped.
Measurement dependence. Commute fit is only as good as the measured ETA matrix; localities outside the measured set fall back to coarser geography.
Clarification coverage. The clarification loop is client-declared (§4.3); programmatic surfaces receive best-effort direct results by design. Answers given in the dialogue refine the current search but are not yet persisted into shareable URLs.

Appendix A — Weight reference table

Table 1. Base component weights by segment. Each column sums to 1.00. “Rent” is the default renter profile; family/student specialise it; buy and investor are the purchase profiles. Rows above the rule are relevance components (they form the displayed score); rows below it are quality components (capped tie-breaker only); energy switches sides per query (§8.1).
Component	Rent	Family (rent)	Student (rent)	Buy	Investor (buy)
Semantic	0.18	0.11	0.14	0.14	0.09
Location	0.14	0.20	0.22	0.18	0.18
Commute	0.14	0.09	0.18	0.09	0.05
Budget	0.14	0.13	0.18	0.13	0.13
Space	0.09	0.16	0.02	0.11	0.05
Vibe	0.06	0.01	0.02	0.06	0.02
Energy (per-query side)	0.04	0.07	0.02	0.07	0.09
Trust	0.04	0.04	0.05	0.04	0.06
Freshness	0.02	0.01	0.01	0.02	0.04
Market value	0.02	0.02	0.02	0.02	0.18
Lifestyle	0.03	0.05	0.04	0.04	0.02
Personalization	0.10	0.11	0.10	0.10	0.09
Total	1.00	1.00	1.00	1.00	1.00

Explicit-intent boosts of equation (15) are added to these base values, then the relevance and quality subsets are renormalised independently per equation (17), so the effective weights used at scoring time vary per query and per listing.

Appendix B — Symbol glossary

Table 2. Notation used throughout.
Symbol	Meaning
`ℓ`_i, `L`	A candidate listing; the candidate set after hard filtering.
`K`, `K`_R, `K`_Q	The twelve scoring components; their relevance and quality partitions.
`P`	The structured search plan; its fields populate the query knowledge graph.
`v`_k, `h`_k	Component score in [0,1]; its signal flag in {0,1}.
`d`_i	Cosine distance between query and listing embeddings (eq. 2).
`w`, `w′`, `ŵ`	Base, intent-boosted, and renormalised effective weights.
`𝓛`_R, `𝓛`_Q	Live sets — relevance / quality components with a signal for a listing.
`σ`, rank(·)	Displayed 0–100 relevance score (eq. 18); internal ordering key (eq. 19).
`c`	Quality tie-breaker cap (2 points).
`κ`_r, `ρ`_r, `λ`	Haversine distance to locality centroid; locality radius; in-radius edge score (0.85).
`t`_i,p, `B`_p	Measured ETA to POI `p`; commute time budget.
`φ`, `γ`, `η`, `β`	Budget floor and ceiling; headroom ratio (0.8); at-ceiling score (0.9).
`μ`_c, `δ`	Commune €/m² benchmark; relative price delta (%).
`N`, `m`	Diversity window (10) and minimum distinct communes (3).

Appendix C — Revision history

Table 3. Revision history of this historical document. See the Ranking Policy page for the current production changelog.
Version	Date	Change (plain language)
1.0	12 Jun 2026	Initial publication: twelve components, one weighted sum, segment weights, signal-aware renormalisation, diversity guardrail.
2.0	3 Jul 2026	Query knowledge graph & clarification (§4): plan fields now populate a tiered knowledge graph; missing must-haves trigger a bounded (≤2-round), skippable clarification dialogue with deterministic questions and deterministic answer application, instead of the system guessing. Relevance/quality partition (§8): the displayed match score is now computed from query-relative components only; listing-quality signals (trust, freshness, market value, lifestyle, personalization, and energy when not asked about) act solely as an invisible tie-breaker capped at 2 points — a nice listing can no longer impersonate a matching one. De-saturation (§6): the flat in-budget, in-locality, and under-budget-commute plateaus were replaced by gentle continuous gradients (budget headroom 1.0→0.9 at the ceiling; locality 1.0→0.85 at the radius edge; commute banded by absolute ETA), restoring discrimination on pages that previously tied at identical scores. Honest semantic zero (§6.1): listings the vector channel measured as non-matching now score semantic = 0 with a signal instead of dropping the axis. Anti-evasion (§6.5): small null-bedroom listings are inferred as studios so they cannot dodge bedroom requirements. Multilingual vibe matching (§6.6): query tags expand through en/fr/de/lu/pt alias tables. Hard-filter additions (§5): price-on-request listings are excluded fail-closed under a stated budget, with the excluded count disclosed; cross-border scope is an explicit, disclosed opt-in. Two-tier explanations (§10): deterministic, signed, numbers-first reasons (zero LLM) with an optional fail-soft narrative layer; energy classes bucketed honestly (A–C strength, D–E neutral, F–I weakness). Removed: v1.0's low-signal variance-spreading term, superseded by the above.