feat: add 'coming soon' empty state for projects page & gitignore public dir
All checks were successful
Build and Push Docker Container / build-and-push (push) Successful in 48s
All checks were successful
Build and Push Docker Container / build-and-push (push) Successful in 48s
- Add animated coming-soon card when no projects exist - Add Grafana shortcode and gebrauchtwagen-datenbank project - Add hugo/eliaskohout.de/public/ to .gitignore and remove from tracking
This commit is contained in:
159
hugo/eliaskohout.de/content/projects/gebrauchtwagen-datenbank.md
Normal file
159
hugo/eliaskohout.de/content/projects/gebrauchtwagen-datenbank.md
Normal file
@@ -0,0 +1,159 @@
|
||||
---
|
||||
title: "2 Million Used Cars and What They Tell Us"
|
||||
draft: true
|
||||
date: 2026-02-12
|
||||
tags: ["Scraping", "Data Engineering", "Grafana", "PostgreSQL"]
|
||||
summary: "Scraping ~2M used car listings, throwing them into a database, and seeing what shakes out."
|
||||
code: ""
|
||||
demo: ""
|
||||
---
|
||||
|
||||
## The Question
|
||||
|
||||
Everyone's heard the legend: a VW Passat that just keeps going at 400,000 km. But is it actually the only car that pulls that off? What other models quietly rack up absurd mileage — and what do they cost? In short: what makes a car *last*, and can you get one without overpaying?
|
||||
|
||||
Time to find out with data instead of hearsay.
|
||||
|
||||
## The Approach
|
||||
|
||||
A major used car platform turned out to be surprisingly cooperative when it came to structured data. Their recommendation engine helpfully links to similar listings — so starting from a search, you can just keep crawling through related results.
|
||||
|
||||
The haul: roughly **2 million listings** from early 2026, downloaded as JSON and
|
||||
loaded into a PostgreSQL database. At that point, the recommendation graph
|
||||
stopped surfacing new entries — a second pass would likely uncover more, since
|
||||
new listings appear daily. But 2M felt like a solid starting point.
|
||||
|
||||
## The Data
|
||||
|
||||
**2,046,879 listings**, most of them containing the following fields (among others):
|
||||
|
||||
`make` · `model` · `model_variant` · `fuel` · `price` · `mileage` · `power_kw` ·
|
||||
`transmission_type` · `number_of_cylinders` · `body_type` · `body_color` ·
|
||||
`first_registration_date` · `number_of_previous_owners` · `is_roadworthy` ·
|
||||
`is_currently_damaged` · `usage_state` · `type` · `zip_code` · `country_code` ·
|
||||
`city`
|
||||
|
||||
That's enough to get interesting.
|
||||
|
||||
About 119,500 listings were missing either price or mileage — not entirely clear
|
||||
why, but with nearly 2 million records left, it's barely a dent.
|
||||
|
||||
---
|
||||
|
||||
## Findings
|
||||
|
||||
### Price vs. Mileage
|
||||
|
||||
The obvious place to start: how does price relate to mileage?
|
||||
|
||||
<!-- dashboard on price vs mileage; scatter plot; clipped for 1,000,000 € and 1,000,000 km -->
|
||||
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/0852019305114cd189aedb67dea27721" height="450" >}}
|
||||
|
||||
Two hot spots jump out immediately. One in the low-mileage/high-price corner,
|
||||
one in the high-mileage/low-price corner — exactly what you'd expect. Expensive
|
||||
special cars that barely leave the garage, and daily drivers with six-figure
|
||||
odometers priced to move.
|
||||
|
||||
The vast majority of listings, though, cluster in relatively low-price and
|
||||
low-mileage territory compared to the extremes.
|
||||
|
||||
**A note on clipping.** The plot caps at €1,000,000 and 1,000,000 km because the
|
||||
*tails get absurd. The highest listed price was €999,999,999 — obviously not
|
||||
*real. Six listings exceeded €10 million, none of them serious. A handful
|
||||
*between €1M and €10M could be genuine exotics. On the mileage side, the maximum
|
||||
*was 100,000,000 km. The highest plausible reading I found was an Iveco truck at
|
||||
*897,000 km on the odometer. The roughly 570 listings beyond that appeared to be
|
||||
*typos or placeholder values for "mileage unknown."
|
||||
|
||||
Across the cleaned dataset, averages land at roughly **€28,400** for price and
|
||||
**75,600 km** for mileage. The standard deviations are enormous — €731k and 109k
|
||||
km respectively — which tells you just how wide the spread really is.
|
||||
|
||||
That gives an overall average ratio of about **€375 per 1,000 km**. In other
|
||||
words: for each 1,000 km on the odometer, the average listing costs about €375.
|
||||
This isn't a depreciation rate in the strict sense — we're looking at a
|
||||
cross-sectional snapshot of listed prices, not tracking individual cars losing
|
||||
value over time. But it turns out to be a useful back-of-the-napkin metric for
|
||||
comparing brands.
|
||||
|
||||
### By Brand
|
||||
|
||||
The dataset contains **346 distinct makes**. Of those, 72 (~21%) have more than
|
||||
500 listings — enough for halfway meaningful statistics. The rest are too sparse
|
||||
to generalize from, so brand-level analysis focuses on these 72.
|
||||
|
||||
{{/* dashboard on price per mileage; table with make and euro/km; ordered by price per mileage */}}
|
||||
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/1777bb018e9b47639b93ef31d97f9c89" height="450" >}}
|
||||
|
||||
The ranking roughly mirrors the common perception of luxury brands — makes with
|
||||
a reputation for being expensive also tend to show up with high price-per-km
|
||||
values. No surprises there.
|
||||
|
||||
But this metric has a blind spot: **age**. Brands with almost no older cars on
|
||||
the market look disproportionately expensive per km, simply because their
|
||||
listings haven't had time to depreciate. BYD, for example, ranks just below
|
||||
Ferrari and Rolls-Royce — not because a BYD is a luxury vehicle, but because the
|
||||
average BYD listing is only **0.7 years** old, compared to the overall average
|
||||
of **6.7 years**. Leapmotor is even more extreme at **0.5 years**. Give these
|
||||
brands a few years to accumulate used inventory at higher mileages, and their
|
||||
ratios will settle down considerably.
|
||||
|
||||
### Depreciation Curves
|
||||
|
||||
You'd expect the price-per-km ratio to fall as mileage increases — older,
|
||||
high-mileage cars are cheaper, and the new-car premium fades fast. You'd also
|
||||
expect the decline to be roughly exponential: a car loses a percentage of its
|
||||
current value per additional kilometer, not a fixed euro amount.
|
||||
|
||||
Both hold up in the data:
|
||||
|
||||
{{/* dashboard on price vs mileage and price/km vs mileage */}}
|
||||
{{< grafana url="https://gr.eliaskohout.de/public-dashboards/e999ce3c237b4cae95b3331331a26261" height="400" >}}
|
||||
|
||||
The curves above show average prices over mileage for BMW, Volkswagen, and Fiat.
|
||||
The brand-level differences are immediately visible — different starting points,
|
||||
different values throughout the decay — but the overall shape is the same: steep
|
||||
early depreciation that gradually flattens out.
|
||||
|
||||
|
||||
* nur zum ende hin verschwimmen die Grenzen, 200k km kann hier als grobe grenze gesehen werden, ab der die daten etwas chaotischer werden
|
||||
* das könnte auch daran liegen, dass es hier einfach weniger datenpunkte gibt und damit die durchschnittbrechnung schlechter wird
|
||||
|
||||
* also lass uns genauer auf die verteilungen an diesem Ende der skala schauen
|
||||
|
||||
|
||||
### The Survivors: Cars Beyond 250k km
|
||||
|
||||
|
||||
|
||||
|
||||
{{/*
|
||||
==============================================
|
||||
PLAN: Analysis chapters to write
|
||||
==============================================
|
||||
|
||||
1. Price vs. Mileage Relationship
|
||||
- Scatter/heatmap of price vs. mileage across all listings
|
||||
- Depreciation curves: how fast do different makes lose value?
|
||||
- The "sweet spot": best mileage-to-price ratio by model
|
||||
|
||||
2. The Survivors: Cars Beyond 300k km
|
||||
- Which makes/models appear most often at extreme mileage?
|
||||
- Fuel type breakdown (diesel vs. petrol at high mileage)
|
||||
- Average price of high-mileage cars — are they dirt cheap or still holding value?
|
||||
|
||||
+ ausfallrate abschätzen
|
||||
|
||||
4. Fuel & Drivetrain Trends
|
||||
- Fuel type distribution (diesel/petrol/electric/hybrid/LPG)
|
||||
- Price and mileage by fuel type
|
||||
- Are EVs showing up in used markets yet? At what price?
|
||||
|
||||
5. Geography
|
||||
- Listings by country and region (zip code clusters)
|
||||
- Regional price differences for the same model
|
||||
- Where are the cheap cars?
|
||||
|
||||
==============================================
|
||||
*/}}
|
||||
|
||||
Reference in New Issue
Block a user