🦧
AI Researcher | Phd@ MBZUAI | CHAI, UC Berkeley | IIIT Hyderabad | Precog
- Abu Dhabi, UAE
-
17:21
(UTC -12:00) - https://bonagiri.io
Highlights
- Pro
Pinned Loading
-
SaGE
SaGE PublicForked from vnnm404/SaGE
The official repo implementing, SaGE: Evaluating Moral Consistency in Large Language Models.
Python
-
QuittingAgents
QuittingAgents PublicCode for the paper: Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety
Python
-
sycophancy-detection
sycophancy-detection PublicThis project implements activation-based detection approaches from representation engineering to identify when LLMs give sycophantic (user-pleasing rather than truthful) responses.
Python 1
-
-
idecir-Towards-Effective-Paraphrasing-for-Information-Disguise
idecir-Towards-Effective-Paraphrasing-for-Information-Disguise PublicForked from idecir/idecir-Towards-Effective-Paraphrasing-for-Information-Disguise
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.


