Very interesting paper that is making rounds on Bluesky but please notice this issue is for *proprietary* LLMs like the OpenAI ones. Running Llama locally was remarkably stable.
In general, using any kind of proprietary software, especially through an API, is bad for reproducibility.
add a skeleton here at some point
over 1 year ago