Caleb Fahlgren
@calebfahlgren.hf.co
π€ 853
π₯ 142
π 28
SWE
@hf.co
You can just ask things π£οΈ "show me messages in the coding category that are in the top 10% of reward model scores" Download really high quality instructions from the Argilla Llama3.1 405B synthetic dataset π₯
loading . . .
12 months ago
0
4
0
reposted by
Caleb Fahlgren
Thomas Wolf
12 months ago
Most liked and most downloaded open-source AI models from 2022 to 2024 Interactive viz:
aiworld.eu/embed/model/...
Discussion:
huggingface.co/spaces/huggi...
loading . . .
2
86
24
The amazing, new Qwen2.5-Coder 32B model can now write SQL for any
@hf.co
dataset β¨
loading . . .
12 months ago
1
19
4
This is insane! Structured generation in the browser with the new
@hf.co
SmolLM2-1.7B model β’ Tiny 1.7B LLM running at 88 tokens / second β‘ β’ Powered by MLC/WebLLM on WebGPU π₯ β’ JSON Structured Generation entirely in the browser π€
loading . . .
about 1 year ago
1
11
1
reposted by
Caleb Fahlgren
Thomas Wolf
about 1 year ago
Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos. Outperforms all models at similar GPU RAM usage and tokens throughputs Blog post:
huggingface.co/blog/smolvlm
4
231
32
The OpenLLM Leaderboard just passed 2k evals π₯³ Here's a look at the distribution of average scores for all those models! Great work by the
@huggingface.bsky.social
team to do these evals!
about 1 year ago
1
15
1
Automatically tracking all Ollama requests to a dataset with the new observers python library! With just a few lines of code all your requests can be sent to
@huggingface.bsky.social
datasets for annotating, analysis and observability π
about 1 year ago
0
6
0
observers π - automatically log all OpenAI compatible requests to a dataset π½ β’ supports any OpenAI compatible endpoint πͺ β’ supports
@duckdb.org
,
@huggingface.bsky.social
datasets and Argilla as stores > pip install observers
about 1 year ago
4
13
5
SmolTalk is out π£οΈ Over 1M high quality instructions used for training SmolLM2, one of the best small language models in the industry.
huggingface.co/datasets/Hug...
loading . . .
HuggingFaceTB/smoltalk Β· Datasets at Hugging Face
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/datasets/HuggingFaceTB/smoltalk
about 1 year ago
1
10
1
reposted by
Caleb Fahlgren
David Berenstein
about 1 year ago
Observers: A Lightweight SDK for AI Observability TLDR; - Track and record interactions with AI models - Store observations in multiple backends
@huggingface.bsky.social
,
@duckdb.org
or Argilla - Query and analyse your AI interactions with ease GitHub:
github.com/cfahlgren1/o...
4
42
7
reposted by
Caleb Fahlgren
Simon Willison
about 1 year ago
Foursquare just open sourced their 100 million place point of interest dataset! Some notes on poking around with it using DuckDB (it's Parquet files on S3)
simonwillison.net/2024/Nov/20/...
loading . . .
Foursquare Open Source Places: A new foundational dataset for the geospatial community
I did not expect this! > [...] we are announcing today the general availability of a foundational open data set, Foursquare Open Source Places ("FSQ OS Places"). This base layer β¦
https://simonwillison.net/2024/Nov/20/foursquare-open-source-places/
23
459
129
Range requests + Parquet is what makes the Hugging Face SQL Console possible to query datasets entirely in the browser
add a skeleton here at some point
about 1 year ago
0
1
0
reposted by
Caleb Fahlgren
archie.md
about 1 year ago
duckdb-gsheets v0.0.3 is out, courtesy of
@a13x.bsky.social
the power is terrifying!
duckdb-gsheets.com
loading . . .
2
68
12
reposted by
Caleb Fahlgren
jsulz
about 1 year ago
When XetHub joined Hugging Face, we brainstormed how to share our tech with the community. The magic? Versioning chunks, not files, giving rise to: π§ Smarter storage β© Faster uploads π Efficient downloads Curious? Read the blog and let us know how it could help your workflows!
loading . . .
From Files to Chunks: Improving HF Storage Efficiency
Weβre on a journey to advance and democratize artificial intelligence through open source and open science.
https://huggingface.co/blog/from-files-to-chunks
1
33
17
Life would be so easy if
@duckdb.org
had an LLMs.txt π€©
llmstxt.org
loading . . .
The /llms.txt file β llms-txt
A proposal to standardise on using an /llms.txt file to provide information to help LLMs use a website at inference time.
https://llmstxt.org/
about 1 year ago
1
5
0
you reached the end!!
feeds!
log in