PhD student in Responsible NLP at the University of Edinburgh, curious about interpretability and alignment