Johannes Hoffart
@hoffart.ai
๐ค 694
๐ฅ 849
๐ 14
CTO, AI at SAP -
#foundationmodel
#linkedbusinessdata
#knowledgegraph
#nlp
#ai
- www.hoffart.ai
A new player enters the arena of Foundation Models on Tabular Data:
www.limix.ai
- novel methods for pre-training and data generation that look highly relevant. Their evaluation on selected datasets is showing strong performance. Exciting times, looking forward to further in depth comparisons!
loading . . .
LimiX
https://www.limix.ai
22 days ago
0
0
0
At
#VLDB2025
London I joined a panel on Neural Relational Data. My Take: LLMs solve some data management tasks, but the next wave is Foundation Models on Relational Data and Semantically Linked Tables. More on this and further trends in
#AI
and
#DataManagement
-
www.hoffart.ai/vldb-2025-ai...
loading . . .
VLDB 2025: AI Meets Enterprise Data Management โ The Tabular FM Moment โ Johannes Hoffart
https://www.hoffart.ai/vldb-2025-ai-meets-enterprise-data-management-the-tabular-fm-moment/
26 days ago
0
4
0
Our team developing Foundation Models on Tables & Linked Business Data is looking for a new Senior Applied Research Scientist! Excited about pushing the frontier in foundation models on tabular data? Want to have business impact and academic visibility? Look no further:
jobs.sap.com/job/Walldorf...
loading . . .
Senior/Principal Applied Research Scientist (f/m/d): Foundation Models on Linked Business Data
Senior/Principal Applied Research Scientist (f/m/d): Foundation Models on Linked Business Data
https://jobs.sap.com/job/Walldorf-SeniorPrincipal-Applied-Research-Scientist-%28fmd%29-Foundation-Models-on-Linked-Business-Data-69190/1230048301/
2 months ago
0
2
0
reposted by
Johannes Hoffart
Grace Lindsay
4 months ago
For the past 3 years, I've taught a course on Machine Learning for Climate Change to undergrads. At times, people have asked if the course lectures could be made available online. While I can't offer that, I have decided to start making "5 Minute Papers on AI for the Planet" videos. Hope its useful!
loading . . .
5 Minute Papers on AI for the Planet
AI is more than just chatbots! Learn about how AI can be used to protect biodiversity, fight climate change, and just better understand our planet through 5-minute explainers covering academic papers ...
https://www.youtube.com/@AIforthePlanet
5
154
48
reposted by
Johannes Hoffart
4 months ago
Can you train a performant language model using only openly licensed text? We are thrilled to announce the Common Pile v0.1, an 8TB dataset of openly licensed and public domain text. We train 7B models for 1T and 2T tokens and match the performance similar models like LLaMA 1 & 2
2
149
63
reposted by
Johannes Hoffart
Simon Willison
5 months ago
Here's the full workshop handout plus annotated slides from "Building software on top of Large Language Models", a three hour tutorial I presented yesterday at PyCon US
#PyConUS
simonwillison.net/2025/May/15/...
loading . . .
Building software on top of Large Language Models
I presented a three hour workshop at PyCon US yesterday titled Building software on top of Large Language Models. The goal of the workshop was to give participants everything they โฆ
https://simonwillison.net/2025/May/15/building-on-llms/
4
187
44
reposted by
Johannes Hoffart
Franรงois Fleuret
5 months ago
I asked "on the other platform" what were the most important improvements to the original 2017 transformer. That was quite popular and here is a synthesis of the responses:
4
205
46
reposted by
Johannes Hoffart
Ethan Mollick
5 months ago
This was helpful. Also worth noting that Bluesky remains a very fraught place for AI discussions for a variety of reasons, good & bad, but with the impact of keeping a lot of the most relevant AI news, paper discussions & biggest names on X That might change, but it hasnโt yet. Still posting, tho.
add a skeleton here at some point
12
241
18
reposted by
Johannes Hoffart
Simon Willison
6 months ago
It's been a couple of years since GPT-4 powered Bing, but with the various Deep Research products and now o3/o4-mini I'm ready to say that AI assisted search-based research actually works now
simonwillison.net/2025/Apr/21/...
loading . . .
AI assisted search-based research actually works now
For the past two and a half years the feature Iโve most wanted from LLMs is the ability to take on search-based research tasks on my behalf. We saw the โฆ
https://simonwillison.net/2025/Apr/21/ai-assisted-search/
6
90
22
reposted by
Johannes Hoffart
Sebastian Raschka (rasbt)
7 months ago
I just shared a new article, "The State of Reasoning Models", where I am exploring 12 new research articles on improving the reasoning capabilities of LLMs (all published after the release of DeepSeek R1):
magazine.sebastianraschka.com/p/state-of-l...
Happy reading!
loading . . .
The State of LLM Reasoning Models
Part 1: Inference-Time Compute Scaling Methods
https://magazine.sebastianraschka.com/p/state-of-llm-reasoning-and-inference-scaling
1
62
15
reposted by
Johannes Hoffart
Thomas Wolf
7 months ago
I shared a controversial take the other day at an event and I decided to write it down in a longer format: Iโm afraid AI won't give us a "compressed 21st century" Here:
thomwolf.io/blog/scienti...
It's an extension of this interview discussion from the AI summit:
youtu.be/AxBd3G0lFLs?...
11
132
46
reposted by
Johannes Hoffart
Eunsol Choi
7 months ago
When using LLM-as-a-judge, practitioners often use greedy decoding to get the most likely judgment. But we found that deriving a score from the judgment distribution (like taking the mean) works better! โLLM-as-a-judge with greedy decoding ๐Using the distribution of the judgeโs labels
add a skeleton here at some point
1
28
4
reposted by
Johannes Hoffart
ELLIS
9 months ago
Discover European cities โ๏ธ while building your career! Check out the ELLIS PhD/Postdoc Program's 2025 Winter & Summer School Schedule! Dive deep into cutting-edge
#AI
research, learn from top researchers & connect with peers across Europe. Learn more:
bit.ly/42iow66
#PhD
#machinelearning
1
15
6
reposted by
Johannes Hoffart
Thomas Wolf
9 months ago
Our first release of 2025: ๐จ๐ข๐ค๐ก๐๐๐๐ฃ๐ฉ๐จ, ๐๐ต๐ฒ ๐๐ถ๐บ๐ฝ๐น๐ฒ๐๐ ๐น๐ถ๐ฏ๐ฟ๐ฎ๐ฟ๐ ๐๐ผ ๐ฏ๐๐ถ๐น๐ฑ ๐ฎ๐ด๐ฒ๐ป๐๐ถ๐ฐ ๐๐๐๐๐ฒ๐บ๐! ๐ฅ Main logic in ~1000 LoC ๐งโ๐ป Agent writes its actions in code! LLMs are much better at writing code than current standard of writing JSON => higher perf ๐ Any LLM support (h/t LiteLLM) ๐ก๏ธ Secure code exec (h/t E2B)
4
123
21
Have a look at our work on foundation models on tabular data, published today at
#TRL
@
#NeurIPS2024
: ๐ PORTAL, an open weight and code foundation model trained on tabular data, and ๐ SALT, a real business data set containing millions of sales orders across multiple tables. Further details ๐
10 months ago
1
9
1
reposted by
Johannes Hoffart
Simon Willison
10 months ago
Wrote up my initial impressions of the new Google Gemini 2.0 Flash model - it's really good, and the streaming mode (where you can stream video and audio to it and get audio streamed right back) is pure science-fiction
simonwillison.net/2024/Dec/11/...
loading . . .
Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming mode
Huge announcment from Google this morning: Introducing Gemini 2.0: our new AI model for the agentic era. Thereโs a ton of stuff in there (including updates on Project Astra and โฆ
https://simonwillison.net/2024/Dec/11/gemini-2/
9
167
42
reposted by
Johannes Hoffart
Table Representation Learning research
10 months ago
The 3rd Table Representation Learning (TRL) workshop at NeurIPS 2024 is approaching soon โจ Join us Saturday 14 Dec from 8:30AM for an amazing program and discussions about all things neural models + tabular data (
table-representation-learning.github.io
). Not in Vancouver? Join online
neurips.cc
๐
loading . . .
Table Representation Learning Workshop
TRL Workshop ---
https://table-representation-learning.github.io/
1
9
3
We are growing the team building the SAP Knowledge Graph and are
#hiring
AI & Data Scientists, Data Engineers, Knowledge Engineers and Applied Research Scientists in Germany (Berlin, Walldorf) and India (Bangalore):
jobs.sap.com/search/?crea...
Let's take GenAI to the next level with
#KG
!
loading . . .
SAP Knowledge Graph - SAP Jobs
Find SAP Knowledge Graph at SAP
https://jobs.sap.com/search/?createNewAlert=false&q=SAP+Knowledge+Graph
10 months ago
0
5
1
reposted by
Johannes Hoffart
Davide Paglieri
11 months ago
Tired of saturated benchmarks? Want scope for a significant leap in capabilities? ๐ฅ Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games! BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come. 1/๐งต
4
96
27
reposted by
Johannes Hoffart
Raphaรซl Troncy
10 months ago
Great blog post from
@odihq.bsky.social
@esimperl.bsky.social
on the current development state of
#dataspaces
in Europe.
theodi.org/news-and-eve...
loading . . .
What are data spaces and what do they do?
Learn more about data spaces: what they are, what they do and whatโs next.
https://theodi.org/news-and-events/blog/what-are-data-spaces-and-what-do-they-do/
0
3
1
reposted by
Johannes Hoffart
Ivan Rubachev
10 months ago
Tabular DL and AutoML podcast just dropped. For sure watching this
youtu.be/3qpQ-sMRafE
loading . . .
How AutoML Creates New Opportunities for Europe - Frank Hutter // CyberValley Podcast #5
YouTube video by Cyber Valley
https://youtu.be/3qpQ-sMRafE?si=Cv8GyzzuHm1HDlKG
1
11
2
Let me surface this again now that this place is more lively: Come join us at SAP in the US or Germany for a PhD Summer Internship in 2025 in Foundation Models on Structured Data, Table Representation Learning, LLMs and Knowledge Graphs!
#MLInternships
add a skeleton here at some point
10 months ago
0
8
3
reposted by
Johannes Hoffart
Mark Collier
11 months ago
Added some more folks to the Open Source AI Starter Pack:
go.bsky.app/N8yVZdW
add a skeleton here at some point
23
78
23
reposted by
Johannes Hoffart
Gerard de Melo
11 months ago
I am chairing the AI@HPI Conference: Responsible AI December 3-4 in Potsdam (Berlin metropolitan area) Discussing AI with regard to bias, elections/society, trustworthiness, copyright, the EU AI Act, and best practices. Registration:
hpi.de/en/ai-hpi-co...
Please spread the word!
1
7
4
reposted by
Johannes Hoffart
ELLIS
11 months ago
Hi ๐ We're glad to be here on
@bsky.app
and looking forward to engaging in this community. But first, learn a little more about us...
#ELLISforEurope
#AI
#ML
#CrossBorderCollab
#PhD
loading . . .
3
112
18
reposted by
Johannes Hoffart
Marvin Schmitt
11 months ago
You can create your own rule-based feed with
@skyfeed.app
, or run a completely self-hosted feed server if you want to go fully custom. For instance,
@serge.belongie.com
and I just set up an ML Internship feed that collects posts by a keywords-regex and hashtag-MLinternship
bsky.app/profile/did:...
add a skeleton here at some point
3
52
13
reposted by
Johannes Hoffart
AmsterdamNLP
11 months ago
Work in progress -- suggestions for NLP-ers based in the EU/Europe & already on Bluesky very welcome!
go.bsky.app/NZDc31B
add a skeleton here at some point
51
70
20
reposted by
Johannes Hoffart
Laura
11 months ago
How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge๐ฆ? In our new preprint, we look at the pretraining data and find evidence against this: Procedural knowledge in pretraining drives LLM reasoning โ๏ธ๐ข ๐งตโฌ๏ธ
36
859
164
reposted by
Johannes Hoffart
Kimon Fountoulakis
11 months ago
Here is an initial starter pack list on Machine Learning on Graphs:
go.bsky.app/HN2MTzp
loading . . .
Machine Learning on Graphs
Join the conversation
https://go.bsky.app/HN2MTzp
18
34
21
reposted by
Johannes Hoffart
ACL
11 months ago
All the ACL chapters are here now:
@aaclmeeting.bsky.social
@emnlpmeeting.bsky.social
@eaclmeeting.bsky.social
@naaclmeeting.bsky.social
#NLProc
1
107
40
reposted by
Johannes Hoffart
Table Representation Learning research
11 months ago
๐ The 60 Accepted Papers and (tentative) Program for the 3rd Table Representation Learning workshop @NeurIPS '24 are out at:
table-representation-learning.github.io
! Also, reply or DM
@madelonhulsebos.bsky.social
if you/others should be added to the TRL researcher starterpack:
go.bsky.app/4SNSMRj
!
1
15
5
reposted by
Johannes Hoffart
M A Osborne
11 months ago
New here? Interested in AI/ML? Check out these great starter packs! AI:
go.bsky.app/SipA7it
RL:
go.bsky.app/3WPHcHg
Women in AI:
go.bsky.app/LaGDpqg
NLP:
go.bsky.app/SngwGeS
AI and news:
go.bsky.app/5sFqVNS
You can also search all starter packs here:
blueskydirectory.com/starter-pack...
67
557
268
reposted by
Johannes Hoffart
Marvin Schmitt
11 months ago
I created a starter pack of scientists in the European Laboratory for Learning and Intelligent Systems (ELLIS) ๐ช๐บ Please ping me and Iโll add you.
go.bsky.app/Cihupkk
add a skeleton here at some point
46
77
28
reposted by
Johannes Hoffart
Madelon Hulsebos
11 months ago
WIP starterpack w researchers on Table Representation Learning (TRL): all things related to representation learning and generative models for e.g. tables, DBs, spreadsheets! I'll curate but DM/reply w handle+some info welcome! Also follow
@trl-research.bsky.social
for updates ๐ค
go.bsky.app/4SNSMRj
loading . . .
Table Representation Learning researchers
Join the conversation
https://go.bsky.app/4SNSMRj
8
24
9
We are looking for PhD summer interns for 2025 in the area of Foundation Models on Structured Data, Table Rep Learning, LLMs and Knowledge Graphs. If you want to work on groundbreaking research on the richest business data available, please reach out to me or apply here:
jobs.sap.com/job/Berlin-P...
loading . . .
PhD Intern (f/m/d) - Business AI Research
PhD Intern (f/m/d) - Business AI Research
https://jobs.sap.com/job/Berlin-PhD-Intern-%28fmd%29-Business-AI-Research-10557/1140017401/
11 months ago
1
12
4
you reached the end!!
feeds!
log in