diff --git a/hugo/eliaskohout.de/content/projects/gebrauchtwagen-datenbank.md b/hugo/eliaskohout.de/content/projects/gebrauchtwagen-datenbank.md deleted file mode 100644 index 85e8d20..0000000 --- a/hugo/eliaskohout.de/content/projects/gebrauchtwagen-datenbank.md +++ /dev/null @@ -1,159 +0,0 @@ ---- -title: "2 Million Used Cars and What They Tell Us" -draft: true -date: 2026-02-12 -tags: ["Scraping", "Data Engineering", "Grafana", "PostgreSQL"] -summary: "Scraping ~2M used car listings, throwing them into a database, and seeing what shakes out." -code: "" -demo: "" ---- - -## The Question - -Everyone's heard the legend: a VW Passat that just keeps going at 400,000 km. But is it actually the only car that pulls that off? What other models quietly rack up absurd mileage — and what do they cost? In short: what makes a car *last*, and can you get one without overpaying? - -Time to find out with data instead of hearsay. - -## The Approach - -A major used car platform turned out to be surprisingly cooperative when it came to structured data. Their recommendation engine helpfully links to similar listings — so starting from a search, you can just keep crawling through related results. - -The haul: roughly **2 million listings** from early 2026, downloaded as JSON and -loaded into a PostgreSQL database. At that point, the recommendation graph -stopped surfacing new entries — a second pass would likely uncover more, since -new listings appear daily. But 2M felt like a solid starting point. - -## The Data - -**2,046,879 listings**, most of them containing the following fields (among others): - -`make` · `model` · `model_variant` · `fuel` · `price` · `mileage` · `power_kw` · -`transmission_type` · `number_of_cylinders` · `body_type` · `body_color` · -`first_registration_date` · `number_of_previous_owners` · `is_roadworthy` · -`is_currently_damaged` · `usage_state` · `type` · `zip_code` · `country_code` · -`city` - -That's enough to get interesting. - -About 119,500 listings were missing either price or mileage — not entirely clear -why, but with nearly 2 million records left, it's barely a dent. - ---- - -## Findings - -### Price vs. Mileage - -The obvious place to start: how does price relate to mileage? - - -{{< grafana url="https://gr.eliaskohout.de/public-dashboards/0852019305114cd189aedb67dea27721" height="450" >}} - -Two hot spots jump out immediately. One in the low-mileage/high-price corner, -one in the high-mileage/low-price corner — exactly what you'd expect. Expensive -special cars that barely leave the garage, and daily drivers with six-figure -odometers priced to move. - -The vast majority of listings, though, cluster in relatively low-price and -low-mileage territory compared to the extremes. - -**A note on clipping.** The plot caps at €1,000,000 and 1,000,000 km because the -*tails get absurd. The highest listed price was €999,999,999 — obviously not -*real. Six listings exceeded €10 million, none of them serious. A handful -*between €1M and €10M could be genuine exotics. On the mileage side, the maximum -*was 100,000,000 km. The highest plausible reading I found was an Iveco truck at -*897,000 km on the odometer. The roughly 570 listings beyond that appeared to be -*typos or placeholder values for "mileage unknown." - -Across the cleaned dataset, averages land at roughly **€28,400** for price and -**75,600 km** for mileage. The standard deviations are enormous — €731k and 109k -km respectively — which tells you just how wide the spread really is. - -That gives an overall average ratio of about **€375 per 1,000 km**. In other -words: for each 1,000 km on the odometer, the average listing costs about €375. -This isn't a depreciation rate in the strict sense — we're looking at a -cross-sectional snapshot of listed prices, not tracking individual cars losing -value over time. But it turns out to be a useful back-of-the-napkin metric for -comparing brands. - -### By Brand - -The dataset contains **346 distinct makes**. Of those, 72 (~21%) have more than -500 listings — enough for halfway meaningful statistics. The rest are too sparse -to generalize from, so brand-level analysis focuses on these 72. - -{{/* dashboard on price per mileage; table with make and euro/km; ordered by price per mileage */}} -{{< grafana url="https://gr.eliaskohout.de/public-dashboards/1777bb018e9b47639b93ef31d97f9c89" height="450" >}} - -The ranking roughly mirrors the common perception of luxury brands — makes with -a reputation for being expensive also tend to show up with high price-per-km -values. No surprises there. - -But this metric has a blind spot: **age**. Brands with almost no older cars on -the market look disproportionately expensive per km, simply because their -listings haven't had time to depreciate. BYD, for example, ranks just below -Ferrari and Rolls-Royce — not because a BYD is a luxury vehicle, but because the -average BYD listing is only **0.7 years** old, compared to the overall average -of **6.7 years**. Leapmotor is even more extreme at **0.5 years**. Give these -brands a few years to accumulate used inventory at higher mileages, and their -ratios will settle down considerably. - -### Depreciation Curves - -You'd expect the price-per-km ratio to fall as mileage increases — older, -high-mileage cars are cheaper, and the new-car premium fades fast. You'd also -expect the decline to be roughly exponential: a car loses a percentage of its -current value per additional kilometer, not a fixed euro amount. - -Both hold up in the data: - -{{/* dashboard on price vs mileage and price/km vs mileage */}} -{{< grafana url="https://gr.eliaskohout.de/public-dashboards/e999ce3c237b4cae95b3331331a26261" height="400" >}} - -The curves above show average prices over mileage for BMW, Volkswagen, and Fiat. -The brand-level differences are immediately visible — different starting points, -different values throughout the decay — but the overall shape is the same: steep -early depreciation that gradually flattens out. - - -* nur zum ende hin verschwimmen die Grenzen, 200k km kann hier als grobe grenze gesehen werden, ab der die daten etwas chaotischer werden -* das könnte auch daran liegen, dass es hier einfach weniger datenpunkte gibt und damit die durchschnittbrechnung schlechter wird - -* also lass uns genauer auf die verteilungen an diesem Ende der skala schauen - - -### The Survivors: Cars Beyond 250k km - - - - -{{/* - ============================================== - PLAN: Analysis chapters to write - ============================================== - - 1. Price vs. Mileage Relationship - - Scatter/heatmap of price vs. mileage across all listings - - Depreciation curves: how fast do different makes lose value? - - The "sweet spot": best mileage-to-price ratio by model - - 2. The Survivors: Cars Beyond 300k km - - Which makes/models appear most often at extreme mileage? - - Fuel type breakdown (diesel vs. petrol at high mileage) - - Average price of high-mileage cars — are they dirt cheap or still holding value? - - + ausfallrate abschätzen - - 4. Fuel & Drivetrain Trends - - Fuel type distribution (diesel/petrol/electric/hybrid/LPG) - - Price and mileage by fuel type - - Are EVs showing up in used markets yet? At what price? - - 5. Geography - - Listings by country and region (zip code clusters) - - Regional price differences for the same model - - Where are the cheap cars? - - ============================================== -*/}} - diff --git a/hugo/eliaskohout.de/content/projects/gebrauchtwagen-datengrube.md b/hugo/eliaskohout.de/content/projects/gebrauchtwagen-datengrube.md new file mode 100644 index 0000000..d5036c6 --- /dev/null +++ b/hugo/eliaskohout.de/content/projects/gebrauchtwagen-datengrube.md @@ -0,0 +1,191 @@ +--- +title: "2 Million Used Cars and What They Tell Us" +date: 2026-02-18 +tags: ["Scraping", "Data Engineering", "Grafana", "PostgreSQL"] +summary: "Scraping ~2M used car listings, throwing them into a database, and seeing what shakes out." +code: "" +demo: "" +--- + +## The Question + +Everyone's heard the legend: a VW Passat that just keeps going at 400,000 km. +But is it actually the only car that pulls that off? What other models quietly +rack up absurd mileage — and what do they cost? In short: what makes a car +*last*, and can you get one without overpaying? + +Time to find out with data instead of hearsay. + +## The Approach + +A major used car platform turned out to be surprisingly cooperative when it came +to structured data. Their recommendation engine helpfully links to similar +listings — so starting from a search, you can just keep crawling through related +results. + +The haul: roughly **2 million listings** from early 2026, downloaded as JSON and +loaded into a PostgreSQL database. Each listing carries make, model, fuel type, +price, mileage, registration date, location, and a handful of other fields — +enough to get interesting. + +--- + +## Findings + +### Price vs. Mileage + +The obvious place to start: how does price relate to mileage? + +{{< grafana url="https://gr.eliaskohout.de/public-dashboards/0852019305114cd189aedb67dea27721" height="450" >}} + +Two clusters jump out: low-mileage/high-price (garage queens) and +high-mileage/low-price (daily drivers priced to move). The vast majority sits in +modest territory on both axes. The plot caps at €1M and 1M km — beyond that, the +data is mostly typos and placeholder values. + +Across the cleaned dataset, averages land at roughly **€28,400** and **75,600 +km**. That gives an overall ratio of about **€375 per 1,000 km** — not a +depreciation rate in the strict sense, but a useful back-of-the-napkin metric +for comparing brands. + +### Price per Kilometer by Brand + +Of the **346 distinct makes**, 72 have more than 500 listings — enough for +meaningful statistics. + +{{< grafana url="https://gr.eliaskohout.de/public-dashboards/1777bb018e9b47639b93ef31d97f9c89" height="450" >}} + +The ranking mirrors common perception — luxury brands dominate the top. But it +has a blind spot: **age**. BYD ranks just below Ferrari and Rolls-Royce, not +because it's a luxury vehicle, but because the average BYD listing is only **0.7 +years** old (overall average: **6.7 years**). Give these newcomers time to +accumulate used inventory and their ratios will settle. + +### Depreciation + +{{< grafana url="https://gr.eliaskohout.de/public-dashboards/e999ce3c237b4cae95b3331331a26261" height="400" >}} + +Average prices over mileage for Mercedes-Benz, Toyota, Volkswagen, and Volvo all +follow the same shape: steep early depreciation that gradually flattens out. +Brand-level differences show in the starting points and decay rates, but the +exponential form is universal. + +The first 50k km wipe out **32–39%** of value across all brands — the new-car +premium evaporating. From 50k to 200k km, depreciation settles into a steadier +20–30% per bracket. + +{{< grafana url="https://gr.eliaskohout.de/public-dashboards/a801062ca49f49d0a1ef4f97ccb550c8" height="400" >}} + +Grouping listings into 50k-km brackets also gives a rough "survival curve" — not +a true survival rate (it conflates production volume, usage patterns, and actual +longevity), but the *shape* tells a story. Volvo shows the gentlest decline: 39% +of listings remain at 100k km, still 6.6% at 250k. Toyota drops steeply early +but then shows a stubbornly flat tail — 51 listings at 400k, 33 at 450k. Once a +Toyota survives the initial culling, it apparently just keeps going. + +{{< grafana url="https://gr.eliaskohout.de/public-dashboards/38c5b2b61b8542b4aedc91b947253991" height="400" >}} + +Past 250k km, the numbers get chaotic — small sample sizes let a handful of +collector cars or overpriced outliers distort the averages. Toyota is the most +striking case: past 250k, depreciation goes consistently *negative* — prices +*increase* from €7,775 to €13,481 at 450k. The Land Cruiser effect: the Toyotas +that survive to extreme mileage are the models with cult followings, commanding +premiums *because* they've proven their durability. Past 250k, survivor bias +takes over — the remaining cars aren't representative anymore. + +### The Survivors: Cars Beyond 250k km + +Out of the ~2 million listings, roughly **55,000** have a mileage of 250,000 km +or more. That's about 3% of the dataset — a small slice, but more than enough to +work with. + +{{< grafana url="https://gr.eliaskohout.de/public-dashboards/f9fcbcca31e64065a8e9970b88a8a999" height="400" >}} + +#### Who Makes It This Far? + +In absolute numbers, the usual suspects lead: Mercedes (8,900 listings), +Volkswagen (8,200), BMW (6,700), Audi (5,300). Together they account for over +half of all high-mileage entries. Renault's survivors average 362k km — the +highest on the list — and BMW and Porsche carry remarkably high average prices +past 250k (€24,600 and €27,300), hinting at maintained enthusiast cars rather +than beaters. + +{{< grafana url="https://gr.eliaskohout.de/public-dashboards/6c4d0d116ba746f18c908ff5d4037da8" height="400" >}} + +But raw counts are misleading. More telling is what *share* of a brand's +listings are high-mileage. That flips the picture: **Saab** leads at **32.6%** — +almost one in three. No new cars since 2012 means the surviving inventory is old +by definition, but it also speaks to owners who keep these cars running well +past their expected shelf life. Iveco follows at 21% (commercial vehicles, no +surprise). Then a cluster at 5–6%: **Volvo, Honda, Subaru, and Mercedes-Benz** — +the brands enthusiasts *claim* are built to last, and the data is at least +consistent. + +Notably absent: **Toyota**. Only about 2% of Toyota listings cross 250k — below +the dataset average. That doesn't mean Toyotas die young; they may get exported +or simply leave this platform. The data can't distinguish "the car died" from +"the car left the market." + +#### Diesel vs. Petrol + +{{< grafana url="https://gr.eliaskohout.de/public-dashboards/b721ad75bf764693ba6243714899e0f0" height="400" >}} + +Diesel dominates the high-mileage segment: **6.75%** of diesel listings cross +250k km, compared to just **1.58%** for petrol — a 4× overrepresentation. That's +consistent with the "diesel lasts longer" narrative, though not proof: diesel +cars also tend to be driven more (company cars, long-distance commuters). + +The real outlier is **CNG at 10.52%** — higher than diesel, though from a small +sample of 5,500 listings (mostly fleet cars and taxis). Electric cars and +hybrids are virtually absent at 0.07% and 0.16% — simply too young to have +accumulated that kind of mileage. + +#### What Do They Cost? + +{{< grafana url="https://gr.eliaskohout.de/public-dashboards/543b7c49e5f742468f1d90b112cec799" height="400" >}} + +**Porsche** stands alone: a median of **€21,000** even past 250k km — these are +maintained sports cars that appreciate regardless of mileage. **Land Rover** +(€11,000 median) and **Iveco** (€12,000) also hold surprising value — cult +status and commercial utility, respectively. + +Most brands cluster in the €3,000–€8,000 range. At the bottom: **Opel** at +€2,650, **Peugeot** and **SEAT** at €3,000 — the genuine bargain-bin survivors. + +--- + +## So What Should You Buy? + +Two million listings, a few dozen queries, and a lot of scatter plots later — +what's the actual answer? + +It depends on what you're optimizing for. + +**If you want the car that refuses to die:** look at diesel Volvo, Mercedes, or +Subaru in the 150–200k km range. These brands show up disproportionately at +extreme mileage, and their survival curves decline more gently than most. A +diesel Mercedes with 180k km on the clock isn't halfway through its life — it +might be a third of the way. + +**If you want the best deal per remaining kilometer:** the sweet spot sits around +100–150k km for most brands. The steepest depreciation is already behind you +(that 32–39% first-bracket hit), but the car still has plenty of mechanical life +left. Volkswagen and Opel are the value picks here — median prices in the low +single-digit thousands, well-understood mechanicals, and parts that are cheap +and everywhere. + +**If you want something that holds its value no matter what:** Toyota and +Porsche. Toyota's prices actually *increase* past 250k km — the Land Cruiser +effect — and Porsche commands a €21k median even at quarter-million mileage. +You're not buying depreciation; you're buying into a secondary market where +demand outstrips supply. + +**If you just want cheap and functional:** Opel, SEAT, or Peugeot past 200k km. +Median prices between €2,650 and €3,000. Nobody will envy your car. Nobody will +steal it either. And it'll probably keep running. + +One thing the data can't tell you: the condition of any *specific* car. Averages +are averages. A well-maintained Fiat at 300k km will outlast a neglected BMW at +100k every time. The numbers here describe the market, not the machine in front +of you. For that, you still need a mechanic — preferably one who doesn't own a +dealership. \ No newline at end of file diff --git a/hugo/eliaskohout.de/content/projects/llm-confidence-chat.md b/hugo/eliaskohout.de/content/projects/llm-confidence-chat.md deleted file mode 100644 index db3aaaf..0000000 --- a/hugo/eliaskohout.de/content/projects/llm-confidence-chat.md +++ /dev/null @@ -1,29 +0,0 @@ ---- -title: "LLM Confidence Chat Interface" -draft: true -date: 2024-06-01 -tags: ["Python", "ML", "Full-Stack"] -summary: "A chat interface making LLM confidence scores accessible to non-technical users." -code: "https://github.com/yourusername/project" -demo: "https://demo.example.com" ---- - -## Problem - -Non-technical users have no way to assess how confident an LLM is in its responses. - -## Approach - -Built an interactive chat interface integrating lm-polygraph for -uncertainty estimation with an intuitive visualization layer. - -## Tech Stack - -- Python, FastAPI -- lm-polygraph -- React - -## Challenges & Learnings - -Translating probabilistic outputs into user-friendly visual indicators -while keeping the interface responsive.