refactor: rewrite used car article, remove LLM confidence chat draft
This commit is contained in:
@@ -1,159 +0,0 @@
|
|||||||
---
|
|
||||||
title: "2 Million Used Cars and What They Tell Us"
|
|
||||||
draft: true
|
|
||||||
date: 2026-02-12
|
|
||||||
tags: ["Scraping", "Data Engineering", "Grafana", "PostgreSQL"]
|
|
||||||
summary: "Scraping ~2M used car listings, throwing them into a database, and seeing what shakes out."
|
|
||||||
code: ""
|
|
||||||
demo: ""
|
|
||||||
---
|
|
||||||
|
|
||||||
## The Question
|
|
||||||
|
|
||||||
Everyone's heard the legend: a VW Passat that just keeps going at 400,000 km. But is it actually the only car that pulls that off? What other models quietly rack up absurd mileage — and what do they cost? In short: what makes a car *last*, and can you get one without overpaying?
|
|
||||||
|
|
||||||
Time to find out with data instead of hearsay.
|
|
||||||
|
|
||||||
## The Approach
|
|
||||||
|
|
||||||
A major used car platform turned out to be surprisingly cooperative when it came to structured data. Their recommendation engine helpfully links to similar listings — so starting from a search, you can just keep crawling through related results.
|
|
||||||
|
|
||||||
The haul: roughly **2 million listings** from early 2026, downloaded as JSON and
|
|
||||||
loaded into a PostgreSQL database. At that point, the recommendation graph
|
|
||||||
stopped surfacing new entries — a second pass would likely uncover more, since
|
|
||||||
new listings appear daily. But 2M felt like a solid starting point.
|
|
||||||
|
|
||||||
## The Data
|
|
||||||
|
|
||||||
**2,046,879 listings**, most of them containing the following fields (among others):
|
|
||||||
|
|
||||||
`make` · `model` · `model_variant` · `fuel` · `price` · `mileage` · `power_kw` ·
|
|
||||||
`transmission_type` · `number_of_cylinders` · `body_type` · `body_color` ·
|
|
||||||
`first_registration_date` · `number_of_previous_owners` · `is_roadworthy` ·
|
|
||||||
`is_currently_damaged` · `usage_state` · `type` · `zip_code` · `country_code` ·
|
|
||||||
`city`
|
|
||||||
|
|
||||||
That's enough to get interesting.
|
|
||||||
|
|
||||||
About 119,500 listings were missing either price or mileage — not entirely clear
|
|
||||||
why, but with nearly 2 million records left, it's barely a dent.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Findings
|
|
||||||
|
|
||||||
### Price vs. Mileage
|
|
||||||
|
|
||||||
The obvious place to start: how does price relate to mileage?
|
|
||||||
|
|
||||||
<!-- dashboard on price vs mileage; scatter plot; clipped for 1,000,000 € and 1,000,000 km -->
|
|
||||||
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/0852019305114cd189aedb67dea27721" height="450" >}}
|
|
||||||
|
|
||||||
Two hot spots jump out immediately. One in the low-mileage/high-price corner,
|
|
||||||
one in the high-mileage/low-price corner — exactly what you'd expect. Expensive
|
|
||||||
special cars that barely leave the garage, and daily drivers with six-figure
|
|
||||||
odometers priced to move.
|
|
||||||
|
|
||||||
The vast majority of listings, though, cluster in relatively low-price and
|
|
||||||
low-mileage territory compared to the extremes.
|
|
||||||
|
|
||||||
**A note on clipping.** The plot caps at €1,000,000 and 1,000,000 km because the
|
|
||||||
*tails get absurd. The highest listed price was €999,999,999 — obviously not
|
|
||||||
*real. Six listings exceeded €10 million, none of them serious. A handful
|
|
||||||
*between €1M and €10M could be genuine exotics. On the mileage side, the maximum
|
|
||||||
*was 100,000,000 km. The highest plausible reading I found was an Iveco truck at
|
|
||||||
*897,000 km on the odometer. The roughly 570 listings beyond that appeared to be
|
|
||||||
*typos or placeholder values for "mileage unknown."
|
|
||||||
|
|
||||||
Across the cleaned dataset, averages land at roughly **€28,400** for price and
|
|
||||||
**75,600 km** for mileage. The standard deviations are enormous — €731k and 109k
|
|
||||||
km respectively — which tells you just how wide the spread really is.
|
|
||||||
|
|
||||||
That gives an overall average ratio of about **€375 per 1,000 km**. In other
|
|
||||||
words: for each 1,000 km on the odometer, the average listing costs about €375.
|
|
||||||
This isn't a depreciation rate in the strict sense — we're looking at a
|
|
||||||
cross-sectional snapshot of listed prices, not tracking individual cars losing
|
|
||||||
value over time. But it turns out to be a useful back-of-the-napkin metric for
|
|
||||||
comparing brands.
|
|
||||||
|
|
||||||
### By Brand
|
|
||||||
|
|
||||||
The dataset contains **346 distinct makes**. Of those, 72 (~21%) have more than
|
|
||||||
500 listings — enough for halfway meaningful statistics. The rest are too sparse
|
|
||||||
to generalize from, so brand-level analysis focuses on these 72.
|
|
||||||
|
|
||||||
{{/* dashboard on price per mileage; table with make and euro/km; ordered by price per mileage */}}
|
|
||||||
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/1777bb018e9b47639b93ef31d97f9c89" height="450" >}}
|
|
||||||
|
|
||||||
The ranking roughly mirrors the common perception of luxury brands — makes with
|
|
||||||
a reputation for being expensive also tend to show up with high price-per-km
|
|
||||||
values. No surprises there.
|
|
||||||
|
|
||||||
But this metric has a blind spot: **age**. Brands with almost no older cars on
|
|
||||||
the market look disproportionately expensive per km, simply because their
|
|
||||||
listings haven't had time to depreciate. BYD, for example, ranks just below
|
|
||||||
Ferrari and Rolls-Royce — not because a BYD is a luxury vehicle, but because the
|
|
||||||
average BYD listing is only **0.7 years** old, compared to the overall average
|
|
||||||
of **6.7 years**. Leapmotor is even more extreme at **0.5 years**. Give these
|
|
||||||
brands a few years to accumulate used inventory at higher mileages, and their
|
|
||||||
ratios will settle down considerably.
|
|
||||||
|
|
||||||
### Depreciation Curves
|
|
||||||
|
|
||||||
You'd expect the price-per-km ratio to fall as mileage increases — older,
|
|
||||||
high-mileage cars are cheaper, and the new-car premium fades fast. You'd also
|
|
||||||
expect the decline to be roughly exponential: a car loses a percentage of its
|
|
||||||
current value per additional kilometer, not a fixed euro amount.
|
|
||||||
|
|
||||||
Both hold up in the data:
|
|
||||||
|
|
||||||
{{/* dashboard on price vs mileage and price/km vs mileage */}}
|
|
||||||
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/e999ce3c237b4cae95b3331331a26261" height="400" >}}
|
|
||||||
|
|
||||||
The curves above show average prices over mileage for BMW, Volkswagen, and Fiat.
|
|
||||||
The brand-level differences are immediately visible — different starting points,
|
|
||||||
different values throughout the decay — but the overall shape is the same: steep
|
|
||||||
early depreciation that gradually flattens out.
|
|
||||||
|
|
||||||
|
|
||||||
* nur zum ende hin verschwimmen die Grenzen, 200k km kann hier als grobe grenze gesehen werden, ab der die daten etwas chaotischer werden
|
|
||||||
* das könnte auch daran liegen, dass es hier einfach weniger datenpunkte gibt und damit die durchschnittbrechnung schlechter wird
|
|
||||||
|
|
||||||
* also lass uns genauer auf die verteilungen an diesem Ende der skala schauen
|
|
||||||
|
|
||||||
|
|
||||||
### The Survivors: Cars Beyond 250k km
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
{{/*
|
|
||||||
==============================================
|
|
||||||
PLAN: Analysis chapters to write
|
|
||||||
==============================================
|
|
||||||
|
|
||||||
1. Price vs. Mileage Relationship
|
|
||||||
- Scatter/heatmap of price vs. mileage across all listings
|
|
||||||
- Depreciation curves: how fast do different makes lose value?
|
|
||||||
- The "sweet spot": best mileage-to-price ratio by model
|
|
||||||
|
|
||||||
2. The Survivors: Cars Beyond 300k km
|
|
||||||
- Which makes/models appear most often at extreme mileage?
|
|
||||||
- Fuel type breakdown (diesel vs. petrol at high mileage)
|
|
||||||
- Average price of high-mileage cars — are they dirt cheap or still holding value?
|
|
||||||
|
|
||||||
+ ausfallrate abschätzen
|
|
||||||
|
|
||||||
4. Fuel & Drivetrain Trends
|
|
||||||
- Fuel type distribution (diesel/petrol/electric/hybrid/LPG)
|
|
||||||
- Price and mileage by fuel type
|
|
||||||
- Are EVs showing up in used markets yet? At what price?
|
|
||||||
|
|
||||||
5. Geography
|
|
||||||
- Listings by country and region (zip code clusters)
|
|
||||||
- Regional price differences for the same model
|
|
||||||
- Where are the cheap cars?
|
|
||||||
|
|
||||||
==============================================
|
|
||||||
*/}}
|
|
||||||
|
|
||||||
@@ -0,0 +1,191 @@
|
|||||||
|
---
|
||||||
|
title: "2 Million Used Cars and What They Tell Us"
|
||||||
|
date: 2026-02-18
|
||||||
|
tags: ["Scraping", "Data Engineering", "Grafana", "PostgreSQL"]
|
||||||
|
summary: "Scraping ~2M used car listings, throwing them into a database, and seeing what shakes out."
|
||||||
|
code: ""
|
||||||
|
demo: ""
|
||||||
|
---
|
||||||
|
|
||||||
|
## The Question
|
||||||
|
|
||||||
|
Everyone's heard the legend: a VW Passat that just keeps going at 400,000 km.
|
||||||
|
But is it actually the only car that pulls that off? What other models quietly
|
||||||
|
rack up absurd mileage — and what do they cost? In short: what makes a car
|
||||||
|
*last*, and can you get one without overpaying?
|
||||||
|
|
||||||
|
Time to find out with data instead of hearsay.
|
||||||
|
|
||||||
|
## The Approach
|
||||||
|
|
||||||
|
A major used car platform turned out to be surprisingly cooperative when it came
|
||||||
|
to structured data. Their recommendation engine helpfully links to similar
|
||||||
|
listings — so starting from a search, you can just keep crawling through related
|
||||||
|
results.
|
||||||
|
|
||||||
|
The haul: roughly **2 million listings** from early 2026, downloaded as JSON and
|
||||||
|
loaded into a PostgreSQL database. Each listing carries make, model, fuel type,
|
||||||
|
price, mileage, registration date, location, and a handful of other fields —
|
||||||
|
enough to get interesting.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Findings
|
||||||
|
|
||||||
|
### Price vs. Mileage
|
||||||
|
|
||||||
|
The obvious place to start: how does price relate to mileage?
|
||||||
|
|
||||||
|
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/0852019305114cd189aedb67dea27721" height="450" >}}
|
||||||
|
|
||||||
|
Two clusters jump out: low-mileage/high-price (garage queens) and
|
||||||
|
high-mileage/low-price (daily drivers priced to move). The vast majority sits in
|
||||||
|
modest territory on both axes. The plot caps at €1M and 1M km — beyond that, the
|
||||||
|
data is mostly typos and placeholder values.
|
||||||
|
|
||||||
|
Across the cleaned dataset, averages land at roughly **€28,400** and **75,600
|
||||||
|
km**. That gives an overall ratio of about **€375 per 1,000 km** — not a
|
||||||
|
depreciation rate in the strict sense, but a useful back-of-the-napkin metric
|
||||||
|
for comparing brands.
|
||||||
|
|
||||||
|
### Price per Kilometer by Brand
|
||||||
|
|
||||||
|
Of the **346 distinct makes**, 72 have more than 500 listings — enough for
|
||||||
|
meaningful statistics.
|
||||||
|
|
||||||
|
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/1777bb018e9b47639b93ef31d97f9c89" height="450" >}}
|
||||||
|
|
||||||
|
The ranking mirrors common perception — luxury brands dominate the top. But it
|
||||||
|
has a blind spot: **age**. BYD ranks just below Ferrari and Rolls-Royce, not
|
||||||
|
because it's a luxury vehicle, but because the average BYD listing is only **0.7
|
||||||
|
years** old (overall average: **6.7 years**). Give these newcomers time to
|
||||||
|
accumulate used inventory and their ratios will settle.
|
||||||
|
|
||||||
|
### Depreciation
|
||||||
|
|
||||||
|
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/e999ce3c237b4cae95b3331331a26261" height="400" >}}
|
||||||
|
|
||||||
|
Average prices over mileage for Mercedes-Benz, Toyota, Volkswagen, and Volvo all
|
||||||
|
follow the same shape: steep early depreciation that gradually flattens out.
|
||||||
|
Brand-level differences show in the starting points and decay rates, but the
|
||||||
|
exponential form is universal.
|
||||||
|
|
||||||
|
The first 50k km wipe out **32–39%** of value across all brands — the new-car
|
||||||
|
premium evaporating. From 50k to 200k km, depreciation settles into a steadier
|
||||||
|
20–30% per bracket.
|
||||||
|
|
||||||
|
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/a801062ca49f49d0a1ef4f97ccb550c8" height="400" >}}
|
||||||
|
|
||||||
|
Grouping listings into 50k-km brackets also gives a rough "survival curve" — not
|
||||||
|
a true survival rate (it conflates production volume, usage patterns, and actual
|
||||||
|
longevity), but the *shape* tells a story. Volvo shows the gentlest decline: 39%
|
||||||
|
of listings remain at 100k km, still 6.6% at 250k. Toyota drops steeply early
|
||||||
|
but then shows a stubbornly flat tail — 51 listings at 400k, 33 at 450k. Once a
|
||||||
|
Toyota survives the initial culling, it apparently just keeps going.
|
||||||
|
|
||||||
|
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/38c5b2b61b8542b4aedc91b947253991" height="400" >}}
|
||||||
|
|
||||||
|
Past 250k km, the numbers get chaotic — small sample sizes let a handful of
|
||||||
|
collector cars or overpriced outliers distort the averages. Toyota is the most
|
||||||
|
striking case: past 250k, depreciation goes consistently *negative* — prices
|
||||||
|
*increase* from €7,775 to €13,481 at 450k. The Land Cruiser effect: the Toyotas
|
||||||
|
that survive to extreme mileage are the models with cult followings, commanding
|
||||||
|
premiums *because* they've proven their durability. Past 250k, survivor bias
|
||||||
|
takes over — the remaining cars aren't representative anymore.
|
||||||
|
|
||||||
|
### The Survivors: Cars Beyond 250k km
|
||||||
|
|
||||||
|
Out of the ~2 million listings, roughly **55,000** have a mileage of 250,000 km
|
||||||
|
or more. That's about 3% of the dataset — a small slice, but more than enough to
|
||||||
|
work with.
|
||||||
|
|
||||||
|
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/f9fcbcca31e64065a8e9970b88a8a999" height="400" >}}
|
||||||
|
|
||||||
|
#### Who Makes It This Far?
|
||||||
|
|
||||||
|
In absolute numbers, the usual suspects lead: Mercedes (8,900 listings),
|
||||||
|
Volkswagen (8,200), BMW (6,700), Audi (5,300). Together they account for over
|
||||||
|
half of all high-mileage entries. Renault's survivors average 362k km — the
|
||||||
|
highest on the list — and BMW and Porsche carry remarkably high average prices
|
||||||
|
past 250k (€24,600 and €27,300), hinting at maintained enthusiast cars rather
|
||||||
|
than beaters.
|
||||||
|
|
||||||
|
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/6c4d0d116ba746f18c908ff5d4037da8" height="400" >}}
|
||||||
|
|
||||||
|
But raw counts are misleading. More telling is what *share* of a brand's
|
||||||
|
listings are high-mileage. That flips the picture: **Saab** leads at **32.6%** —
|
||||||
|
almost one in three. No new cars since 2012 means the surviving inventory is old
|
||||||
|
by definition, but it also speaks to owners who keep these cars running well
|
||||||
|
past their expected shelf life. Iveco follows at 21% (commercial vehicles, no
|
||||||
|
surprise). Then a cluster at 5–6%: **Volvo, Honda, Subaru, and Mercedes-Benz** —
|
||||||
|
the brands enthusiasts *claim* are built to last, and the data is at least
|
||||||
|
consistent.
|
||||||
|
|
||||||
|
Notably absent: **Toyota**. Only about 2% of Toyota listings cross 250k — below
|
||||||
|
the dataset average. That doesn't mean Toyotas die young; they may get exported
|
||||||
|
or simply leave this platform. The data can't distinguish "the car died" from
|
||||||
|
"the car left the market."
|
||||||
|
|
||||||
|
#### Diesel vs. Petrol
|
||||||
|
|
||||||
|
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/b721ad75bf764693ba6243714899e0f0" height="400" >}}
|
||||||
|
|
||||||
|
Diesel dominates the high-mileage segment: **6.75%** of diesel listings cross
|
||||||
|
250k km, compared to just **1.58%** for petrol — a 4× overrepresentation. That's
|
||||||
|
consistent with the "diesel lasts longer" narrative, though not proof: diesel
|
||||||
|
cars also tend to be driven more (company cars, long-distance commuters).
|
||||||
|
|
||||||
|
The real outlier is **CNG at 10.52%** — higher than diesel, though from a small
|
||||||
|
sample of 5,500 listings (mostly fleet cars and taxis). Electric cars and
|
||||||
|
hybrids are virtually absent at 0.07% and 0.16% — simply too young to have
|
||||||
|
accumulated that kind of mileage.
|
||||||
|
|
||||||
|
#### What Do They Cost?
|
||||||
|
|
||||||
|
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/543b7c49e5f742468f1d90b112cec799" height="400" >}}
|
||||||
|
|
||||||
|
**Porsche** stands alone: a median of **€21,000** even past 250k km — these are
|
||||||
|
maintained sports cars that appreciate regardless of mileage. **Land Rover**
|
||||||
|
(€11,000 median) and **Iveco** (€12,000) also hold surprising value — cult
|
||||||
|
status and commercial utility, respectively.
|
||||||
|
|
||||||
|
Most brands cluster in the €3,000–€8,000 range. At the bottom: **Opel** at
|
||||||
|
€2,650, **Peugeot** and **SEAT** at €3,000 — the genuine bargain-bin survivors.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## So What Should You Buy?
|
||||||
|
|
||||||
|
Two million listings, a few dozen queries, and a lot of scatter plots later —
|
||||||
|
what's the actual answer?
|
||||||
|
|
||||||
|
It depends on what you're optimizing for.
|
||||||
|
|
||||||
|
**If you want the car that refuses to die:** look at diesel Volvo, Mercedes, or
|
||||||
|
Subaru in the 150–200k km range. These brands show up disproportionately at
|
||||||
|
extreme mileage, and their survival curves decline more gently than most. A
|
||||||
|
diesel Mercedes with 180k km on the clock isn't halfway through its life — it
|
||||||
|
might be a third of the way.
|
||||||
|
|
||||||
|
**If you want the best deal per remaining kilometer:** the sweet spot sits around
|
||||||
|
100–150k km for most brands. The steepest depreciation is already behind you
|
||||||
|
(that 32–39% first-bracket hit), but the car still has plenty of mechanical life
|
||||||
|
left. Volkswagen and Opel are the value picks here — median prices in the low
|
||||||
|
single-digit thousands, well-understood mechanicals, and parts that are cheap
|
||||||
|
and everywhere.
|
||||||
|
|
||||||
|
**If you want something that holds its value no matter what:** Toyota and
|
||||||
|
Porsche. Toyota's prices actually *increase* past 250k km — the Land Cruiser
|
||||||
|
effect — and Porsche commands a €21k median even at quarter-million mileage.
|
||||||
|
You're not buying depreciation; you're buying into a secondary market where
|
||||||
|
demand outstrips supply.
|
||||||
|
|
||||||
|
**If you just want cheap and functional:** Opel, SEAT, or Peugeot past 200k km.
|
||||||
|
Median prices between €2,650 and €3,000. Nobody will envy your car. Nobody will
|
||||||
|
steal it either. And it'll probably keep running.
|
||||||
|
|
||||||
|
One thing the data can't tell you: the condition of any *specific* car. Averages
|
||||||
|
are averages. A well-maintained Fiat at 300k km will outlast a neglected BMW at
|
||||||
|
100k every time. The numbers here describe the market, not the machine in front
|
||||||
|
of you. For that, you still need a mechanic — preferably one who doesn't own a
|
||||||
|
dealership.
|
||||||
@@ -1,29 +0,0 @@
|
|||||||
---
|
|
||||||
title: "LLM Confidence Chat Interface"
|
|
||||||
draft: true
|
|
||||||
date: 2024-06-01
|
|
||||||
tags: ["Python", "ML", "Full-Stack"]
|
|
||||||
summary: "A chat interface making LLM confidence scores accessible to non-technical users."
|
|
||||||
code: "https://github.com/yourusername/project"
|
|
||||||
demo: "https://demo.example.com"
|
|
||||||
---
|
|
||||||
|
|
||||||
## Problem
|
|
||||||
|
|
||||||
Non-technical users have no way to assess how confident an LLM is in its responses.
|
|
||||||
|
|
||||||
## Approach
|
|
||||||
|
|
||||||
Built an interactive chat interface integrating lm-polygraph for
|
|
||||||
uncertainty estimation with an intuitive visualization layer.
|
|
||||||
|
|
||||||
## Tech Stack
|
|
||||||
|
|
||||||
- Python, FastAPI
|
|
||||||
- lm-polygraph
|
|
||||||
- React
|
|
||||||
|
|
||||||
## Challenges & Learnings
|
|
||||||
|
|
||||||
Translating probabilistic outputs into user-friendly visual indicators
|
|
||||||
while keeping the interface responsive.
|
|
||||||
Reference in New Issue
Block a user