I spent some time evaluating the best AI models on interactive block-building tasks. I am surprised by 1) the fragility of these systems when trying to generate creative ideas or update hypotheses, and 2) the vast, but often unnecessary, knowledge and compute they are willing to throw.
5 days ago