I'm noticing the gradual accretion of errors in datasets based on the sloppy application of LMs to extract questions and process answers.
One example: prompt applied to GPT4 to extract questions, supporting material, answers from a text. Error-prone, collating info from all over, no error checking.
about 2 months ago