N
Hacker Next
new
show
ask
jobs
submit
login
Refusal in Language Models Is Mediated by a Single Direction
arxiv.org
88 points by
fagnerbrack
11 hours ago
|
33 comments
add comment