Working within the constraints of a 64K token window is interesting. It forces your management-foo into better task decomposition for the agent. Also, when you're using local inference (llama.cpp) you are also not getting "instant" results. Work is usually about ~12-15m intervals on 32GB M1 Pro.
about 13 hours ago