Kyle Kastner
@kastnerkyle.bsky.social
π€ 382
π₯ 789
π 77
computers and music are (still) fun
reposted by
Kyle Kastner
Motonobu Kanagawa
4 months ago
ProbNum 2025 Keynote 2 ``Gradient Flows on the Maximum Mean Discrepancy'' by
@arthurgretton.bsky.social
(
@gatsbyucl.bsky.social
and Google DeepMind. Slides available here:
probnum25.github.io/keynotes
1
6
2
reposted by
Kyle Kastner
Tim Duffy
6 months ago
Surprising new results from Owain Evans and Anthropic: Training on the outputs of a model can change the model's behavior, even when those outputs seem unrelated. Training only on completions of 3-digit numbers was able to transmit a love of owls.
alignment.anthropic.com/2025/sublimi...
5
31
7
reposted by
Kyle Kastner
Catherine Arnett
6 months ago
MorphScore got an update! MorphScore now covers 70 languages πππ We have a new-preprint out and we will be presenting our paper at the Tokenization Workshop
@tokshop.bsky.social
at ICML next week!
@marisahudspeth.bsky.social
@brenocon.bsky.social
1
12
5
reposted by
Kyle Kastner
Harry Thasarathan
9 months ago
Our work finding universal concepts in vision models is accepted at
#ICML2025
!!! My first major conference paper with my wonderful collaborators and friends
@matthewkowal.bsky.social
@thomasfel.bsky.social
@Julian_Forsyth
@csprofkgd.bsky.social
Working with y'all is the best π₯Ή Preprint β¬οΈ!!
add a skeleton here at some point
0
15
5
reposted by
Kyle Kastner
Jack Greenhalgh
7 months ago
Contribute to the first global archive of soniferous freshwater life, The Freshwater Sounds Archive, and receive recognition as a co-author in a resulting data paper! Pre-print now available. New deadline: 31st Dec, 2025. See link π4 more
fishsounds.net/freshwater.js
4
41
18
reposted by
Kyle Kastner
Daniel Tanneberg
8 months ago
π Interested in Neuro-Symbolic Learning and attending
#ICRA2025
? π§ π€ Do not miss Leon Keller presenting βNeuro-Symbolic Imitation Learning: Discovering Symbolic Abstractions for Skill Learningβ. Joint work of Honda Research Institute EU and
@jan-peters.bsky.social
(
@ias-tudarmstadt.bsky.social
).
1
11
2
reposted by
Kyle Kastner
arxiv cs.CL
8 months ago
Prasoon Bajpai, Tanmoy Chakraborty Multilingual Test-Time Scaling via Initial Thought Transfer
https://arxiv.org/abs/2505.15508
0
2
1
reposted by
Kyle Kastner
AI Firehose
8 months ago
A study shows in-context learning in spoken language models can mimic human adaptability, reducing word error rates by nearly 20% with just a few utterances, especially aiding low-resource language varieties and enhancing recognition across diverse speakers.
https://arxiv.org/abs/2505.14887
loading . . .
In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties
ArXiv link for In-Context Learning Boosts Speech Recognition via Human-like Adaptation to Speakers and Language Varieties
https://arxiv.org/abs/2505.14887
0
1
1
reposted by
Kyle Kastner
β deepfates
8 months ago
"Interdimensional Cable", shorts made with Veo 3 ai. By CodeSamurai on Reddit
loading . . .
11
171
58
reposted by
Kyle Kastner
arxiv cs.CV
8 months ago
Bingda Tang, Boyang Zheng, Xichen Pan, Sayak Paul, Saining Xie Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
https://arxiv.org/abs/2505.10046
0
1
1
reposted by
Kyle Kastner
arXiv Sound
8 months ago
A neural ODE model combined modal decomposition with a neural network to model nonlinear string vibrations, generating synthetic data and sound examples.
loading . . .
Learning Nonlinear Dynamics in Physical Modelling Synthesis using Neural Ordinary Differential Equations
Victor Zheleznov, Stefan Bilbao, Alec Wright, Simon King
https://arxiv.org/abs/2505.10511
0
2
1
reposted by
Kyle Kastner
AI Firehose
8 months ago
Research unveils Omni-R1, a fine-tuning method for audio LLMs that boosts audio performance via text training, achieving MMAU results. Findings reveal how enhanced text reasoning affects audio capacities, suggesting new model optimization directions.
https://arxiv.org/abs/2505.09439
loading . . .
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
ArXiv link for Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
https://arxiv.org/abs/2505.09439
0
1
1
reposted by
Kyle Kastner
Alexander Doria
8 months ago
Yeah we finally have a model report with an actual data section. Thanks Qwen 3!
github.com/QwenLM/Qwen3...
1
53
10
reposted by
Kyle Kastner
AI Firehose
8 months ago
FLAM, a novel audio-language model, enables frame-wise localization of sound events in an open-vocabulary format. With large-scale synthetic data and advanced training methods, FLAM enhances audio understanding and retrieval, aiding multimedia indexing and access.
https://arxiv.org/abs/2505.05335
loading . . .
FLAM: Frame-Wise Language-Audio Modeling
ArXiv link for FLAM: Frame-Wise Language-Audio Modeling
https://arxiv.org/abs/2505.05335
0
2
1
reposted by
Kyle Kastner
Ahmad Beirami
8 months ago
#ICML2025
Is standard RLHF optimal in view of test-time scaling? Unsurprisingly no. We show a simple change to standard RLHF framework that involves π«ππ°ππ«π πππ₯π’ππ«πππ’π¨π§ and π«ππ°ππ«π ππ«ππ§π¬ππ¨π«π¦πππ’π¨π§ (suited to test-time procedure) is optimal!
add a skeleton here at some point
1
17
6
reposted by
Kyle Kastner
Dylan Foster π’
9 months ago
Is Best-of-N really the best we can do for language model inference? New paper (appearing at ICML) led by the amazing Audrey Huang (
ahahaudrey.bsky.social
) with Adam Block, Qinghua Liu, Nan Jiang, and Akshay Krishnamurthy (
akshaykr.bsky.social
). 1/11
1
22
6
reposted by
Kyle Kastner
Tim G. J. Rudner
9 months ago
Congratulations to the
#AABI2025
Workshop Track Outstanding Paper Award recipients!
0
20
9
reposted by
Kyle Kastner
Sung Kim
9 months ago
Why not? Reinforcement Learning for Reasoning in Large Language Models with One Training Example Applying RLVR to the base model Qwen2.5-Math-1.5B, they identify a single example that elevates model performance on MATH500 from 36.0% to 73.6%,
2
20
4
reposted by
Kyle Kastner
AI Firehose
9 months ago
Instruct-LF merges LLMs' instruction-following with statistical models, enhancing interpretability in noisy datasets and improving task performance up to 52%.
https://arxiv.org/abs/2502.15147
loading . . .
Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision
ArXiv link for Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision
https://arxiv.org/abs/2502.15147
0
2
1
reposted by
Kyle Kastner
Sung Kim
9 months ago
An incomplete list of Chinese AI: - DeepSeek:
www.deepseek.com
. You can also access AI models via API. - Moonshot AI's Kimi:
www.kimi.ai
- Alibaba's Qwen:
chat.qwen.ai
. You can also access AI models via API. - ByteDance's Doubaob (only in Chinese):
www.doubao.com/chat/
1
22
7
reposted by
Kyle Kastner
lebellig
9 months ago
I really liked this approach by
@matthieuterris.bsky.social
et al.They propose learning a unique lightweight model for multiple inverse problems by conditioning it with the forward operator A. Thanks to self-supervised fine-tuning, it can tackle unseen inverse pb. π°
https://arxiv.org/abs/2503.08915
0
7
1
reposted by
Kyle Kastner
Mattie Fellows
9 months ago
Excited to be presenting our spotlight ICLR paper Simplifying Deep Temporal Difference Learning today! Join us in Hall 3 + Hall 2B Poster #123 from 3pm :)
loading . . .
https://arxiv.org/pdf/2407.04811
0
7
1
reposted by
Kyle Kastner
9 months ago
Balinese text-to-speech dataset as digital cultural heritage
https://pubmed.ncbi.nlm.nih.gov/40275973/
0
1
1
reposted by
Kyle Kastner
Sung Kim
9 months ago
Kimi.ai
releases Kimi-Audio! Our new open-source audio foundation model advances capabilities in audio understanding, generation, and conversation. Paper:
github.com/MoonshotAI/K...
Repo:
github.com/MoonshotAI/K...
Model:
huggingface.co/moonshotai/K...
1
13
2
reposted by
Kyle Kastner
lebellig
9 months ago
Very cool article from Panagiotis Theodoropoulos et al:
https://arxiv.org/abs/2410.14055
Feedback SchrΓΆdinger Bridge Matching introduces a new method to improve transfer between two data distributions using only a small number of paired samples!
0
4
2
reposted by
Kyle Kastner
Arno Solin
9 months ago
Our
#ICLR2025
poster "Discrete Codebook World Models for Continuous Control" (Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle KujanpÀÀ, Yi Zhao, Kevin Luck, Arno Solin, Joni Pajarinen) ποΈ Hall 3 + Hall 2B #415, Thu 24 Apr 10 a.m. +08 β 12:30 p.m. +08 π Preprint:
arxiv.org/abs/2503.00653
2
10
3
reposted by
Kyle Kastner
arxiv cs.CV
9 months ago
Andrew Kiruluta Wavelet-based Variational Autoencoders for High-Resolution Image Generation
https://arxiv.org/abs/2504.13214
0
1
1
reposted by
Kyle Kastner
9 months ago
7/ Large Language Models to Diffusion Finetuning Paper:
openreview.net/forum?id=Wu5...
Workshop:
workshop-llm-reasoning-planning.github.io
New finetuning method empowering pre-trained LLMs with some of the key properties of diffusion models and the ability to scale test-time compute.
1
4
3
reposted by
Kyle Kastner
9 months ago
10/ Sakana AI Co-Founder and CEO, David Ha, will be giving a talk at the
#ICLR2025
World Models Workshop, at a panel to discuss the Current Development and Future Challenges of World Models. Workshop Website:
sites.google.com/view/worldmo...
0
12
4
reposted by
Kyle Kastner
arXiv cs.LG Machine Learning
9 months ago
Duy A. Nguyen, Quan Huu Do, Khoa D. Doan, Minh N. Do: Are you SURE? Enhancing Multimodal Pretraining with Missing Modalities through Uncertainty Estimation
https://arxiv.org/abs/2504.13465
https://arxiv.org/pdf/2504.13465
https://arxiv.org/html/2504.13465
1
1
1
reposted by
Kyle Kastner
arXiv cs.LG Machine Learning
9 months ago
Yixuan Even Xu, Yash Savani, Fei Fang, Zico Kolter: Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning
https://arxiv.org/abs/2504.13818
https://arxiv.org/pdf/2504.13818
https://arxiv.org/html/2504.13818
1
2
4
reposted by
Kyle Kastner
Richard McElreath πββ¬
9 months ago
I have learned that some people have still not heard the Good News about Hamiltonian Monte Carlo. My gentle animated interactive explanation:
elevanth.org/blog/2017/11...
loading . . .
Markov Chains: Why Walk When You Can Flow?
In 1989, Depeche Mode was popular, the first version of Microsoft Office was released, large demonstrations brought down the wall separating East and West Germany, and a group of statisticians in the ...
https://elevanth.org/blog/2017/11/28/build-a-better-markov-chain/
2
88
29
reposted by
Kyle Kastner
arXiv cs.LG Machine Learning
9 months ago
Akira Tamamori: Kernel Ridge Regression for Efficient Learning of High-Capacity Hopfield Networks
https://arxiv.org/abs/2504.12561
https://arxiv.org/pdf/2504.12561
https://arxiv.org/html/2504.12561
1
1
5
reposted by
Kyle Kastner
Lynn Cherny
9 months ago
Jack Morrisβs embzip
github.com/jxmorris12/e...
βefficiently compressing and decompressing embeddings using Product Quantizationβ
loading . . .
GitHub - jxmorris12/embzip
Contribute to jxmorris12/embzip development by creating an account on GitHub.
https://github.com/jxmorris12/embzip
0
6
3
reposted by
Kyle Kastner
Eugene Vinitsky π
9 months ago
Pre training for reasoning by doing RL on synthetic tasks, a position paper Iβve really enjoyed and am upset by
arxiv.org/abs/2502.19402
loading . . .
General Reasoning Requires Learning to Reason from the Get-go
Large Language Models (LLMs) have demonstrated impressive real-world utility, exemplifying artificial useful intelligence (AUI). However, their ability to reason adaptively and robustly -- the hallmar...
https://arxiv.org/abs/2502.19402
1
43
7
reposted by
Kyle Kastner
AI Firehose
9 months ago
DiTSE revolutionizes speech enhancement using latent diffusion transformers, delivering studio-quality audio while preserving speaker identity and minimizing content hallucination, transforming audio content creation and telecommunications.
https://arxiv.org/abs/2504.09381
loading . . .
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
ArXiv link for DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
https://arxiv.org/abs/2504.09381
0
1
1
reposted by
Kyle Kastner
Serge Belongie
9 months ago
A must-read for CV/NLP/ML grad students seeking wisdom on writing conference papers
add a skeleton here at some point
2
34
3
reposted by
Kyle Kastner
AI Firehose
9 months ago
A key study presents Dynamic Importance Sampling for Constrained Decoding (DISC), enhancing efficiency and accuracy in large language models. This method reduces bias and optimizes constrained generation, broadening AI's practical applications in real-world uses.
https://arxiv.org/abs/2504.09135
loading . . .
Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models
ArXiv link for Efficient and Asymptotically Unbiased Constrained Decoding for Large Language Models
https://arxiv.org/abs/2504.09135
0
1
1
reposted by
Kyle Kastner
Hacker News 100
9 months ago
NoProp: Training neural networks without back-propagation or forward-propagation
https://arxiv.org/abs/2503.24322
https://news.ycombinator.com/item?id=43676837
loading . . .
NoProp: Training Neural Networks without Back-propagation or Forward-propagation
The canonical deep learning approach for learning requires computing a gradient term at each layer by back-propagating the error signal from the output towards each learnable parameter. Given the stacked structure of neural networks, where each layer builds on the representation of the layer below, this approach leads to hierarchical representations. More abstract features live on the top layers of the model, while features on lower layers are expected to be less abstract. In contrast to this, we introduce a new learning method named NoProp, which does not rely on either forward or backwards propagation. Instead, NoProp takes inspiration from diffusion and flow matching methods, where each layer independently learns to denoise a noisy target. We believe this work takes a first step towards introducing a new family of gradient-free learning methods, that does not learn hierarchical representations -- at least not in the usual sense. NoProp needs to fix the representation at each layer beforehand to a noised version of the target, learning a local denoising process that can then be exploited at inference. We demonstrate the effectiveness of our method on MNIST, CIFAR-10, and CIFAR-100 image classification benchmarks. Our results show that NoProp is a viable learning algorithm which achieves superior accuracy, is easier to use and computationally more efficient compared to other existing back-propagation-free methods. By departing from the traditional gradient based learning paradigm, NoProp alters how credit assignment is done within the network, enabling more efficient distributed learning as well as potentially impacting other characteristics of the learning process.
https://arxiv.org/abs/2503.24322
0
1
1
reposted by
Kyle Kastner
Max Slater
9 months ago
Monte Carlo methods require randomly sampling complicated domains, which can be difficult in of itself. Part three (
thenumb.at/Sampling/
) discusses how to create samplers using rejection, inversion, and changes of coordinates.
add a skeleton here at some point
3
70
14
Really, really cool work. You can try it yourself. Check the hygiene comments for some "adversarial" collaboration
huggingface.co/papers/2504....
add a skeleton here at some point
9 months ago
1
5
2
Really enjoyed ICASSP this year. Hopefully will get a chance to give a summary/shout out to some interesting work I saw there in the upcoming days. Looking forward to the next one!
9 months ago
0
1
0
reposted by
Kyle Kastner
Sung Kim
9 months ago
DDT: Decoupled Diffusion Transformer They've come up with a more efficient way to use diffusion models to generate high-quality images by breaking down the process into separate, specialized components.
1
13
1
reposted by
Kyle Kastner
arxiv cs.CL
9 months ago
Gabriel Grand, Joshua B. Tenenbaum, Vikash K. Mansinghka, Alexander K. Lew, Jacob Andreas Self-Steering Language Models
https://arxiv.org/abs/2504.07081
0
1
1
Dynamic evaluation (cf Generating Sequences with Recurrent Neural Networks) making a comeback under the umbrella of test-time methods
add a skeleton here at some point
9 months ago
0
1
0
reposted by
Kyle Kastner
Tommy Thompson
9 months ago
To clarify: Microsoft hasn't created an AI version of Quake, but an AI model that simulates the behaviour of Quake based on existing play data at a reduced resolution. I just covered this AI trend on
@aiandgames.com
last month.
youtu.be/9_oTroD9nzM?...
add a skeleton here at some point
1
35
11
reposted by
Kyle Kastner
Kosta Derpanis
9 months ago
1
5
1
reposted by
Kyle Kastner
Sarath Chandar
10 months ago
Can better architectures & representations make self-play enough for zero-shot coordination? π€ We explore this in our ICLR 2025 paper: A Generalist Hanabi Agent. We develop R3D2, the first agent to master all Hanabi settings and generalize to novel partners! π
#ICLR2025
1/n
1
13
7
reposted by
Kyle Kastner
garreth
about 1 year ago
π With Meta's recent paper replacing tokenization in LLMs with patches π©Ή, I figured that it's a great time to revisit how tokenization has evolved over the years using everyone's favourite medium - memes! Let's take a trip down memory lane! [1/N]
4
33
14
reposted by
Kyle Kastner
arxiv cs.CV
10 months ago
Xiaohua Qi, Renda Li, Long Peng, Qiang Ling, Jun Yu, Ziyi Chen, Peng Chang, Mei Han, Jing Xiao Data-free Knowledge Distillation with Diffusion Models
https://arxiv.org/abs/2504.00870
0
1
1
Load more
feeds!
log in