Horace He
@chhillee.bsky.social
๐ค 780
๐ฅ 60
๐ 13
@PyTorch "My learning style is Horace twitter threads" - @typedfemale
reposted by
Horace He
Yaron Minsky
10 months ago
@chhillee.bsky.social
's talk at Jane Street is now up!
youtu.be/139UPjoq7Kw?...
loading . . .
Building Machine Learning Systems for a Trillion Trillion Floating Point Operations
YouTube video by Jane Street
https://youtu.be/139UPjoq7Kw?si=Ec_GvRcHM-NT_FTu
0
30
8
reposted by
Horace He
Mike Smith
10 months ago
Getting different attention masks working for AstroPT (a proto-foundation model for astronomy
github.com/Smith42/astr...
), so much nicer to do it with Flex Attention vs custom CUDA kernels -- thank you for releasing it to the world ๐ซก
loading . . .
GitHub - Smith42/astroPT: Transformer for galaxy images (and general astronomy)
Transformer for galaxy images (and general astronomy) - Smith42/astroPT
https://github.com/Smith42/astroPT
0
4
1
I judge social networks by how many FlexAttention users I can find on each one, and by that metric, Bluesky is doing pretty good!
10 months ago
1
50
1
If you'd like to influence what features the PyTorch distributed team work on in torchtitan (e.g. MoE, multimodal, context parallelism, etc.), go made your voices heard here!
loading . . .
Vote on new features! ยท pytorch torchtitan ยท Discussion #693
Hi torchtitanists, Thank you for your interests in torchtitan! Please upvote on what features you would like to see next, and add one if it's not already there. We'll try to prioritize on the most ...
https://github.com/pytorch/torchtitan/discussions/693
10 months ago
0
11
1
First thought: Seems kinda "FlexAttention-y":
https://bsky.app/profile/sungkim.bsky.social/post/3lbjbfmyqts27
Second thought: oh cool, they're already using FlexAttention! it's a nice usage of the `or_masks` and `and_masks` API - I think they do (causal & sliding_window) | (register_mask)
10 months ago
0
9
0
you reached the end!!
feeds!
log in