Debugging misaligned completions with sparse-autoencoder latent attribution

alignment.openai.com

1 point

rd

an hour ago


0 comments