Dani Solà
@dani-sola.com
📤 37
📥 58
📝 26
Interested in people, distributed systems, sustainability, and all things data.
reposted by
Dani Solà
rmoff 🏃♂️🫖🥓
2 days ago
How to Sell Data Modeling - good stuff from
@joereis.bsky.social
practicaldatamodeling.substack.com/p/how-to-sel...
loading . . .
How to Sell Data Modeling
Making the Invisible Visible
https://practicaldatamodeling.substack.com/p/how-to-sell-data-modeling
0
5
2
reposted by
Dani Solà
Paul Hünermund
12 days ago
⏰ Last chance to register for
#CDSM2025
! Don't miss your chance to join us Nov 12–13 for two days of talks & debates at the intersection of causality, data science & AI. 💻 Online | 🎟️ Free 👉
causalscience.org
0
14
9
reposted by
Dani Solà
Gail Myerscough
3 months ago
If I’m being honest, I’m feeling pretty crap about my small business. It’s so bloody difficult at the moment with rising costs, US tariffs, Brexit nonsense and the threat of AI. Please have a look at what I do and repost to spread the word
gailmyerscough.co.uk
39
391
470
reposted by
Dani Solà
Gaël Varoquaux
3 months ago
Our didactic review on machine learning for causal inference, now open access: • identifiability (theory of when the data can answer a causal question) • machine-learning estimators • study design (asking well-framed questions + loopholes, eg with timewise data)
www.annualreviews.org/content/jour...
2
43
10
reposted by
Dani Solà
Tim Kellogg
4 months ago
Deep Agents this is a great 10 min video that’s absolutely worth your time Deep Agent = planning tool (TODO lists) + subagents + filesystem + long detailed system prompt seems like a deconstruction of why Claude Code works so well
www.youtube.com/watch?v=433S...
loading . . .
What are Deep Agents?
YouTube video by LangChain
https://www.youtube.com/watch?v=433SmtTc0TA
3
93
10
Recommended watch. Although the war is not as often in the news any more, help is as important as ever. And we can all have a direct impact on defending democracy.
add a skeleton here at some point
4 months ago
0
2
1
reposted by
Dani Solà
Simon Willison
5 months ago
I like this take by
@kentbeck.com
on how AI-assisted programming changes the balance of which skills are most important From this interview with
@gergely.pragmaticengineer.com
newsletter.pragmaticengineer.com/p/tdd-ai-age...
7
155
17
reposted by
Dani Solà
Ethan Mollick
5 months ago
No signs of an end to rapid gains in AI ability at ever-decreasing costs, yet I did my best to update my chart to take into account the price drop in o3 & new models released by Google GPT-4 was released 2.25 ago, so its worth noting the trend when considering the future of AI capabilities & cost
3
77
10
reposted by
Dani Solà
Jack Vanlightly
5 months ago
How to reliably distribute work across microservices, stream processors, durable execution, event-driven, orchestration and now AI agents? Coordinated Progress is a 4 part series that explores the common structure behind reliable distributed systems.
jack-vanlightly.com/blog/2025/6/...
loading . . .
Coordinated Progress – Part 1 – Seeing the System: The Graph — Jack Vanlightly
At some point, we’ve all sat in an architecture meeting where someone asks, “ Should this be an event? An RPC? A queue? ”, or “ How do we tie this process together across our microservices? Should it ...
https://jack-vanlightly.com/blog/2025/6/11/coordinated-progress-part-1
3
33
8
I tested ChatGPT, Claude, and Mistral on a multimodal problem about a washing machine. ChatGPT emerged the winner.
#ai
#chatgpt
#claude
#mistral
6 months ago
1
1
1
reposted by
Dani Solà
Charlie Marsh
6 months ago
Today, we’re announcing the preview release of ty, an extremely fast type checker and language server for Python, written in Rust. In early testing, it's 10x, 50x, even 100x faster than existing type checkers. (We've seen >600x speed-ups over Mypy in some real-world projects.)
14
331
98
reposted by
Dani Solà
Alexander Doria
7 months ago
New blogpost: "Training as we know it might end". It was originally a panorama of the new methods of synthetic generation but the stakes are now much higher and I openly wonder if model training is not soon going to change forever.
vintagedata.org/blog/posts/t...
4
40
11
reposted by
Dani Solà
Sebastian Röhl
7 months ago
Wow, that's an insanely cool website:
animejs.com/
loading . . .
Anime.js | JavaScript Animation Engine
A fast and versatile JavaScript animation library
https://animejs.com/
1
48
13
reposted by
Dani Solà
Rob Hyndman
7 months ago
A new Python edition of "Forecasting: Principles and Practice" is now available online at
otexts.com/fpppy/
. Thanks to
@azulgarza.bsky.social
, Cristian Challu, Max Mergenthaler, Kin Olivares & Nixtla for making this happen.
#forecasting
#python
loading . . .
Forecasting: Principles and Practice, the Pythonic Way
https://otexts.com/fpppy/
3
81
27
reposted by
Dani Solà
Joy Gao
8 months ago
Interesting read on ClickHouse’s query condition cache (not a query result cache) — efficient indices built on the fly to reduce unnecessary full table scans for repeated queries.
clickhouse.com/blog/introdu...
loading . . .
Introducing the query condition cache
Repeated queries are everywhere—in dashboards, alerts, observability, and more. Learn how ClickHouse now skips redundant work by caching filter results per granule.
https://clickhouse.com/blog/introducing-the-clickhouse-query-condition-cache
1
15
2
reposted by
Dani Solà
Ethan Mollick
8 months ago
So it looks like there's a third scaling law: you can make models better by (1) training them with more compute, by (2) having them "think" for longer about an answer, or by (now 3) generating large numbers of answers in parallel & picking good ones Both 2 & 3 seem to have lots of low-hanging fruit
3
115
14
reposted by
Dani Solà
Eric Colson
10 months ago
Operationalizing Machine Learning: An Interview Study by
@joehellerstein.bsky.social
,
@adityagp.bsky.social
, et al. Particularly love the part on "Retrofitting Explanations".
#MachineLearning
#MLOps
#Datascience
.
arxiv.org/pdf/2209.09125
1
13
6
reposted by
Dani Solà
Grace Lindsay
9 months ago
This is a pretty cool resource for applied ML: a list of "case studies" sourced from different companies describing problems they face and the methods they've tried to solve them. Anyone know of something like this specific to geospatial/remote sensing data problems?
#MLsky
#CCAI
#GISchat
loading . . .
Evidently AI - ML and LLM system design: 500 case studies
How do top companies apply AI? A database of 500 case studies from 100+ companies with practical ML use cases, LLM applications, and learnings from designing ML and LLM systems.
https://www.evidentlyai.com/ml-system-design
1
51
9
reposted by
Dani Solà
Bojan Tunguz
9 months ago
We are continuing with our series of posts on some non-trivial use cases for XGBoost. In this latest posts we talk about using Shapley *interaction* values for feature engineering. 1/2
2
8
3
Just published a post about building smart services at CLARK. A pragmatic approach that worked very well for us, going from heuristics to ML. Thoughts and feedback welcome!
#datasky
#data
#databs
medium.com/clark-engine...
loading . . .
A Blueprint for Smart Services
In today’s fast-paced world, creating intelligent services that adapt and improve over time is crucial for business success. This post…
https://medium.com/clark-engineering/a-blueprint-for-smart-services-a358bdee1054
9 months ago
1
1
0
reposted by
Dani Solà
Jess Calarco
10 months ago
Despite patriarchy's persistence, growing numbers of men believe they have it worse off than women. And, new research shows this "male victimhood" ideology is most common among men who aren't facing hardship. Which means what they're really feeling is status loss. 1/
www.psypost.org/male-victimh...
loading . . .
Male victimhood ideology driven by perceived status loss, not economic hardship, among Korean men
Research published in Sex Roles suggests that male victimhood ideology among South Korean men is driven more by perceived socioeconomic status decline rather than objective economic hardship.
https://www.psypost.org/male-victimhood-ideology-driven-by-perceived-status-loss-not-economic-hardship-among-korean-men/
231
7031
2305
reposted by
Dani Solà
Sung Kim
10 months ago
DeepSeek-R1! ⚡ Performance on par with OpenAI-o1 📖 Fully open-weight model & technical report 🏆 MIT licensed: Distill & commercialize freely! 🌐 Website & API are live now! Demo:
chat.deepseek.com
Models:
huggingface.co/deepseek-ai
2
79
27
reposted by
Dani Solà
Chris
10 months ago
First post of the year!
@andypavlo.bsky.social
got me thinking about why Confluent didn't build WarpStream. My conclusion: legacy infrastructure companies are going to have a tough time against cloud native, AI-enabled, post-ZIRP competitors.
loading . . .
Infrastructure Vendors Are in a Tough Spot
Cloud native, AI-enabled, post-ZIRP companies are the new apex predator.
https://materializedview.io/p/infrastructure-vendors-are-in-a-tough
4
31
8
reposted by
Dani Solà
Qian Li
11 months ago
The MemoryDB paper shows the power of separating responsibilities through clever composition. I think this DB frontend/execution plus a distributed transaction log pattern can be promising for creating serverless variants of many popular databases. E.g., Aurora adopts a similar decoupling approach.
1
51
11
reposted by
Dani Solà
Alex Miller
11 months ago
OLTP Through the Looking Glass 16 Years Later: Communication is the New Bottleneck
www.cs.cit.tum.de/fi...
0
8
3
I love dbt, but
sdf.com
looks very promising: faster runtime, improved reports, column-level lineage, etc. Does anyone have experience running it in production?
#databs
#datasky
loading . . .
SDF Labs | Data Runs Better on SDF
SDF is the next generation transformation layer and best developer platform for data. A compiler and execution engine designed to improve the data engineering experience, with compile time guarantees ...
https://www.sdf.com/
12 months ago
0
6
0
reposted by
Dani Solà
Nikhil Benesch
12 months ago
S3 (Iceberg) Tables is everything I dreamt of, and more. I blogged some long-form thoughts:
meltware.com/2024/12/04/s...
I think we're about to see an explosion of data tools (
@materialize.com
,
@clickhouse.com
,
@duckdb.org
, et al.) learn to write Iceberg tables via S3 table buckets.
#databs
loading . . .
A First Look at S3 (Iceberg) Tables
AWS announced S3 Tables today, which brings native support for Apache Iceberg to S3. It’s hard to overstate how exciting this is for the data analytics ecosystem. This post is a quick rundown of my th...
https://meltware.com/2024/12/04/s3-tables
15
107
44
reposted by
Dani Solà
Martin Kleppmann
12 months ago
Seems like a safe bet that object storage as a foundation of data systems architecture is here to stay
blog.colinbreck.com/predicting-t...
loading . . .
Predicting the Future of Distributed Systems
There are significant changes happening in distributed systems.
https://blog.colinbreck.com/predicting-the-future-of-distributed-systems/
12
310
48
reposted by
Dani Solà
Dr. Verónica Espinoza
12 months ago
📕A Portable Introduction to Data Analysis (open access) 2024. By Michael Bulmer 👉(
uq.pressbooks.pub/portable-int...
)
#Statistics
#Datavisualization
#MachineLearning
#DataScience
#Python
#rstudio
#PhD
#bioinformatics
#Rstudio
#neuroscience
#postdoc
#research
#stats
#AI
1
23
5
reposted by
Dani Solà
Jack Vanlightly
about 1 year ago
New blog post! Big data isn’t dead; it’s just going incremental. But bad things happen when uncontrolled changes collide with incremental jobs. Reacting to changes is a losing strategy.
jack-vanlightly.com/...
loading . . .
Incremental Jobs and Data Quality Are On a Collision Course - Part 1 - The Problem — Jack Vanlightly
Big data isn’t dead; it’s just going incremental If you keep an eye on the data space ecosystem like I do, then you’ll be aware of the rise of DuckDB and its message that big data is dead. The idea comes from two industry papers (and associated data sets), one from the Redshift team (paper and dataset) and one from Snowflake (paper and dataset). Each paper analyzed the queries run on their platforms, and some surprising conclusions were drawn – one being that most queries were run over quite small data. The conclusion (of DuckDB) was that big data was dead, and you could use simpler query engines rather than a data warehouse. It’s far more nuanced than that, but data shows that most queries are run over smaller datasets. Why?
https://jack-vanlightly.com/blog/2024/11/13/incremental-jobs-and-data-quality-are-on-a-collision-course-part-1-the-problem
7
25
7
you reached the end!!
feeds!
log in