A Transparent, Multi-Signal Ranking Model for Real-Estate Search
The algorithm behind ChatHome vibe search — query understanding, semantic retrieval, twelve scoring functions, segment-aware weighting, and accountability by design.
Abstract
ChatHome lets people describe what they want in natural language — “a calm two-bedroom near Kirchberg under €2,200 with good light” — and ranks Luxembourg listings by how well they satisfy that intent. This paper documents the ranking model end to end. A query is parsed into a structured search plan; candidates are retrieved by semantic similarity over 768-dimensional multilingual embeddings; each candidate is then scored by twelve interpretable component functions, each mapping to the unit interval [0,1]. Components are combined with a segment-specific weight vector that is boosted by the user's explicit intent and then renormalised over only the components that carry a real signal, so missing data never silently penalises a listing. A final diversity guardrail guarantees geographic spread in the top results. Every term is deterministic, auditable, and inspectable per result. Critically, no term in the model encodes payment, advertising spend, or commercial relationship: ranking cannot be bought.
1. Design principles
The model is built around five commitments that constrain every design decision:
- Interpretability over opacity. Each signal is a named, bounded function of observable listing and query attributes. There is no learned end-to-end black box whose output cannot be decomposed.
- Determinism. Given the same query, listing corpus, and measurement inputs, the ranking is reproducible. Identical inputs yield identical orderings.
- Graceful degradation. Missing data (no energy certificate, no measured commute, no benchmark) removes a signal rather than scoring it as zero. Absence is not penalised.
- Intent alignment. Weights adapt to who is searching (renter, family, student, buyer, investor) and to the constraints they made explicit.
- Incorruptibility. No feature represents money. Position is earned by relevance, never by payment.
2. Notation
Let q denote a user query and L = {ℓ1, …, ℓn} the candidate listings. For a listing ℓi we write a set of component scores vk(ℓi) ∈ [0,1], indexed by the twelve components k ∈ K = {sem, loc, com, bud, spc, vibe, egy, tr, fr, mkt, life, pers}. Each component additionally emits a boolean signal flag hk(ℓi) ∈ {0,1} indicating whether it had enough data to produce a meaningful value. The clamp operator is
A weight vector w = (wk)k∈K assigns a non-negative importance to each component, with the base vectors normalised so that ∑k wk = 1.
3. Pipeline overview
↓
Semantic retrieval (768-d cosine) → Hard filters → Candidate set L
↓
12 component scores → Segment weights + intent boost → Signal-aware renormalisation
↓
Weighted score σ → Diversity guardrail → Ranked results + explanations
4. Query understanding
A language model parses q into a structured search plan P with three parts:
- Hard filters — discrete constraints the user stated or strongly implied: transaction intent (rent/buy), price floor/ceiling, bedrooms, surface bounds, communes / cantons / region, canonical locality references and points-of-interest references, and exclusion tags.
- Soft preferences — graded desiderata: vibe tags, commute target and tolerance, energy preference, budget tier, noise preference.
- Segment labels — the inferred intent (rent vs buy) and household type (e.g. family, student, couple, cross-border, investor), which select the weight profile in §7.
The plan also carries a plan-strength estimate. Weak plans (few constraints) trigger UI hints and the low-signal handling of §8.4. The query understanding layer is itself grounded: figures, localities, and POIs are resolved against the corpus rather than free text, so the ranker measures real geography instead of matching strings.
5. Candidate generation & hard filters
The candidate set is produced by approximate nearest-neighbour retrieval over multilingual sentence embeddings. The query and every listing are embedded into the same D = 768-dimensional space with text-multilingual-embedding-002 (covering EN/FR/DE/LU). For unit-normalised vectors the index returns the cosine distance
Before scoring, three classes of listing are removed outright (a hard mask, not a soft penalty):
- Intent mismatch — a buy listing under a rent query, or vice versa.
- Disabled / dismissed — administratively disabled listings, and listings the signed-in user has explicitly dismissed.
- Excluded attributes — any listing bearing a tag in the query's exclusion set (e.g. “no ground floor”).
Survivors form L. Everything downstream is a continuous re-ranking of L; the hard mask only ever removes, never re-orders.
6. The twelve scoring functions
Each function maps a listing to [0,1] and reports whether it had a signal. We give the exact form of each.
6.1 Semantic affinity (sem)
Directly converts the retrieval distance (2) into a similarity. Signal exists iff a vector distance was returned for the listing.
6.2 Location fit (loc)
For each requested locality reference r with corpus centroid cr and radius ρr (default 1 km), let κr = haversine(gi, cr) be the great-circle distance from the listing's geocode gi. The per-locality score is 1 inside the radius and decays linearly to 0 at three radii; the component takes the best-matching locality:
| 1, | κr ≤ ρr |
| max(0, 1 − κr − ρr2ρr), | κr > ρr |
When no locality was geocoded, a legacy string fallback assigns 1.0 on a district match and 0.8 on a city-name match; with no location intent at all the component is dropped.
6.3 Commute fit (com)
For each POI reference p with a measured door-to-door ETA ti,p (minutes, from Maps grounding or isochrone retrieval) and time budget Bp (default 30 min), the score is 1 within budget and decays to 0 at twice the budget:
| 1, | ti,p ≤ Bp |
| max(0, 1 − ti,p − BpBp), | ti,p > Bp |
The ranker scores only commutes it has actually measured; it never infers proximity from text. A legacy aggregate (a monotone step function of city-level ETA, or of haversine distance to canonical targets such as a train station) is retained for queries predating per-listing ETA measurement.
6.4 Budget fit (bud)
Let the listing price be xi and the requested range [φ, γ] (floor, ceiling; either may be absent). Inside the range scores 1; above the ceiling decays by relative overshoot; below the floor scales toward it:
| 1, | φ ≤ xi ≤ γ |
| max(0, 1 − xi − γγ), | xi > γ |
| max(0, xiφ), | xi < φ |
With no explicit bounds, a soft budget tier (anchors: budget €1,500, mid €3,000, luxury €6,000) maps the price-to-anchor ratio xi/a to 0.75 (≤1.0), 0.55 (≤1.3) or 0.35 (otherwise). Absent both, the component is dropped.
6.5 Space fit (spc)
Two sub-scores are averaged over whichever are available. The bedroom sub-score rewards meeting the requested count β and tolerates a little over-provision:
The surface sub-score α = 1 when the area lies within [Amin, Amax], 0.2 when below the minimum, 0.3 when above the maximum. With parts S ⊆ {b, α} present,
6.6 Vibe-tag overlap (vibe)
Given the query's tag set Q and the listing's tag set T, the primary score is the fraction of requested tags covered:
When structured tags are missing, a synonym-expanded free-text fallback over the listing's title/description/locale fields yields vvibe = min(1, 0.2 + 0.8·m/|Q|), where m is the number of matched tag aliases. The component is dropped if the query carries no tags.
6.7 Energy efficiency (egy)
A direct ordinal mapping of the energy-performance class, dropped when no certificate is present:
6.8 Listing trust & completeness (tr)
A weighted measure of how complete and verifiable a listing is — more photos, richer description, a certificate, and a stated price all raise it. With nimg images and a description of ξ characters:
6.9 Freshness (fr)
A monotone-decreasing step function of listing age in days, with a faster decay for rentals than for purchases (rentals turn over more quickly). For rentals, ages of <3, <7, <14, <30, <60 days score 1.0, 0.9, 0.7, 0.5, 0.3 and 0.1 thereafter; for purchases the thresholds are <7, <30, <90, <180, <365 days for the same descending values. Dropped when no creation timestamp exists.
6.10 Market value (mkt)
Compares the listing's price-per-m² against the commune benchmark μc. With ppm = xi/Ai and relative delta δ = 100·(ppm − μc)/μc:
| 1.0, | δ ≤ 0 |
| 0.7, | 0 < δ ≤ 10 |
| 0.5, | 10 < δ ≤ 20 |
| 0.3, | δ > 20 |
Below-benchmark pricing scores highest. Dropped when price, area, or benchmark is unavailable.
6.11 Lifestyle fit (life)
Uses precomputed per-listing lifestyle indices on a 0–100 scale. Families read the family-friendly index; students, couples and cross-border workers read the urban-professional index; otherwise the two are averaged. All are divided to the unit interval. Dropped when no lifestyle indices exist.
6.12 Personalization (pers)
For signed-in users, the cosine similarity between the listing embedding and a privacy-preserving user-taste profile vector u, computed upstream and passed in per listing:
Dropped entirely for anonymous sessions, so signed-out ranking is unaffected by any profile.
7. Weighting & aggregation
The twelve components are combined as a convex combination. The base weight vector is selected by the user's segment (intent × household type). Table 1 in Appendix A gives the full set; each column sums to 1. The profiles encode intuitive priorities — families weight location and space; students weight commute and budget; investors weight market value; renters spread weight toward semantic and personalization.
7.1 Explicit-intent weight boosting
When the user makes a constraint explicit, the corresponding weight is increased before normalisation, sharpening the ranking toward what they actually asked for. Writing 𝟙[·] for the indicator of an explicit constraint:
w′bud = wbud + 0.05·𝟙[budget stated] · w′vibe = wvibe + 0.05·𝟙[tags present]
w′spc = wspc + 0.05·𝟙[size/beds stated] (14)
The resulting vector w′ need not sum to 1; normalisation is deferred to §8, which also handles missing signals in the same pass.
8. Signal-aware renormalisation
This is the heart of the model's fairness property. Rather than scoring an absent attribute as zero — which would collapse scores into a narrow band and punish listings for missing data — we drop signal-less components and renormalise the remaining weights so they again sum to 1.
8.1 Live set
8.2 Effective weights
Restrict w′ to the live set and renormalise:
By construction ∑k ŵk = 1 whenever at least one component is live, so two listings are always compared on a common, fully-weighted basis even when they expose different subsets of attributes.
8.3 Final score
Listings are then sorted by σ in descending order. The reported 0–100 score is exactly this quantity.
8.4 Low-signal variance spreading
When fewer than three components are live, the weighted score has too little variance and integer rounding can flatten distinct results to the same value. In that regime only, the raw semantic distance re-introduces separation, with a small additive nudge from any vibe signal:
This is a tie-breaker on sparse queries, not a re-weighting; it leaves the relative order of well-specified queries unchanged.
9. Diversity guardrail
Pure relevance sorting can fill the entire first page with one commune. A post-ranking guardrail enforces a minimum of m = 3 distinct communes within the top N = 10 results. If the top slice already spans m communes, it is untouched. Otherwise the algorithm scans the tail for the highest-scoring listings that introduce a new commune and promotes them, displacing the lowest-scoring entries of the top slice. Formally, the guardrail returns the smallest set of promotions π such that
It only reorders the top N; deeper results are preserved. The guardrail's activation is recorded so it can be surfaced to the user.
10. Explainability
Because the score is a transparent convex combination, every result carries a decomposition. For each listing the system exposes the live components and their values { (k, vk) : k ∈ 𝓛 } and the top-three contributing signals as human-readable reasons (“close to your commute target”, “within budget”, “matches ‘quiet, near tram’”). Components with no signal are omitted rather than shown as 0%, so an explanation never contradicts the score. Any user or agent may ask why a listing ranked where it did and receive the exact contributing terms.
11. Complexity
Candidate generation is sub-linear in the corpus via an approximate-nearest-neighbour index over the 768-d space. Scoring is O(|L| · |K|) = O(12|L|) = O(|L|) with a small constant: each component is O(1) except location and commute, which are O(|R|) and O(|P|) in the (tiny) number of requested localities and POIs. The diversity guardrail is O(N) over the top slice. End to end the re-ranking is linear in the candidate count, so latency is dominated by retrieval and the upstream ETA/measurement lookups, not by scoring.
12. What never influences ranking
The component set K is exhaustive — these twelve signals are the only inputs to σ. None of the following is, or will silently become, a term in the model:
- Payment, advertising spend, or subscription tier. There is no “promoted” multiplier. An agency cannot pay to rank higher.
- Commercial relationship with ChatHome. First-party and third-party listings are scored identically.
- Manual editorial overrides on individual listings.
Any future change to K or to the weight profiles is published in the public ranking changelog with a plain-language explanation, consistent with the ChatHome manifesto. The reference implementation is open source under AGPL-3.0, so these claims are verifiable line by line.
13. Limitations & future work
- Piecewise constants. Several components (freshness, market value, the legacy commute branch) use step functions for interpretability; smooth monotone curves would remove threshold artefacts at the boundaries.
- Static weight profiles. Segment weights are hand-tuned, not learned. A learning-to-rank layer constrained to remain monotone and decomposable could calibrate weights from anonymised engagement while preserving auditability.
- Benchmark coverage. The market-value signal depends on commune-level €/m² benchmarks; sparsely-transacted communes yield no signal and the component is dropped.
- Measurement dependence. Commute fit is only as good as the measured ETA matrix; localities outside the measured set fall back to coarser geography.
Appendix A — Weight reference table
| Component | Rent | Family (rent) | Student (rent) | Buy | Investor (buy) |
|---|---|---|---|---|---|
| Semantic | 0.18 | 0.11 | 0.14 | 0.14 | 0.09 |
| Location | 0.14 | 0.20 | 0.22 | 0.18 | 0.18 |
| Commute | 0.14 | 0.09 | 0.18 | 0.09 | 0.05 |
| Budget | 0.14 | 0.13 | 0.18 | 0.13 | 0.13 |
| Space | 0.09 | 0.16 | 0.02 | 0.11 | 0.05 |
| Vibe | 0.06 | 0.01 | 0.02 | 0.06 | 0.02 |
| Energy | 0.04 | 0.07 | 0.02 | 0.07 | 0.09 |
| Trust | 0.04 | 0.04 | 0.05 | 0.04 | 0.06 |
| Freshness | 0.02 | 0.01 | 0.01 | 0.02 | 0.04 |
| Market value | 0.02 | 0.02 | 0.02 | 0.02 | 0.18 |
| Lifestyle | 0.03 | 0.05 | 0.04 | 0.04 | 0.02 |
| Personalization | 0.10 | 0.11 | 0.10 | 0.10 | 0.09 |
| Total | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Explicit-intent boosts of equation (14) are added to these base values before the signal-aware renormalisation of equation (16), so the effective weights used at scoring time vary per query and per listing.
Appendix B — Symbol glossary
| Symbol | Meaning |
|---|---|
| ℓi, L | A candidate listing; the candidate set after hard filtering. |
| K | The twelve scoring components. |
| vk, hk | Component score in [0,1]; its signal flag in {0,1}. |
| di | Cosine distance between query and listing embeddings (eq. 2). |
| w, w′, ŵ | Base, intent-boosted, and renormalised effective weights. |
| 𝓛 | Live set — components with a signal for a given listing. |
| σ | Final 0–100 relevance score (eq. 17). |
| κr, ρr | Haversine distance to locality centroid; locality radius. |
| ti,p, Bp | Measured ETA to POI p; commute time budget. |
| φ, γ | Budget floor and ceiling. |
| μc, δ | Commune €/m² benchmark; relative price delta (%). |
| N, m | Diversity window (10) and minimum distinct communes (3). |