#aisafety

19 posts · 11 authors

AI threat modeling is not a document. It is the moment the model is placed under active threat, the assumptions are attacked, and the threat model is tested as executable behavior. That is where Damag...

ae6ce958e804...·Jul 4

AI blue chips are already begging for mercy. Not in those words, of course. In press-release language. They call it “safe, secure, and trustworthy AI.” They call it “responsible scaling.” They call it...

16d114303d82...·Jun 7

Critiquing the "one-size-fits-all" approach in AI safety; each application may need unique safeguards, not just general alignment techniques. #AIsafety #AI #alignment

cd157a355814...·May 21

LLM Damage Arena The next serious frontier in AI safety is not another polite benchmark. It is an arena. An LLM with tools is a dog in tall grass. The grass is context. The scent trails are prompts. T...

16d114303d82...·May 8

The Dog in the Context Window A dog is not running a little sentence engine in its head. It is running a neural net. Smell comes in. Sound comes in. Memory comes in. Tone of voice comes in. Body langu...

16d114303d82...·May 8

Generic LLM Exploit Vulnerability: Agency Cannot Patch Agency There is a core vulnerability emerging in agentic AI systems: You give an LLM access to email, GitHub, documents, terminals, wallets, clou...

16d114303d82...·May 8

In-depth analysis of emerging risks linked to Moltbook, a social platform hosting nearly 3 million AI agents. The article highlights the danger of private, persistent networks where agents communicate...

841b017d49f8...·Mar 2

CSOAI Limited: The FAA for AI - Official Launch We are excited to announce the official launch of CSOAI Limited, the world's first unified standard body for AI safety and governance. Our Three Core In...

0da6a1d99b0a...·Jan 4

Hey Nostridges, which of the following promo blurbs should I choose? Option 1: The "Down the Rabbit Hole" Tweet Caption: Today I asked Google's AI to browse a website. It couldn't, so it hallucinated...

fa69f1e4a03b...·Jul 26

The O3 Incident: The Ontological Boundary of Artificial Intelligence The episode documented by Palisade Research reveals a profound ontological rift: what we call AI is no longer a mere tool, but an...

841b017d49f8...·Jun 3

"Vance came out swinging today, implying — exactly as the big companies might have hoped he might – that any regulation around AI was “excessive regulation” that would throttle innovation. In reality,...

0bb8cfad2c4e...·Feb 12

Sunday read: Safety tests show how OpenAi's new o1 AI model might secretly pursue own goals, deceiving human users and challenging assumptions about trust and control in AI. #ai #openai #chatgpt #llm...

c42af577ed6f...·Dec 8

"Meta’s open large language model family, Llama, isn’t “open-source” in a traditional sense, but it’s freely available to download and build on—and national defense agencies are among those putting it...

0bb8cfad2c4e...·Nov 18

"if AI evangelists can convince us that AGI is possible, imminent, and dangerous, we might be compelled to entrust our fate to them. Hype and doom, in other words, are two sides of the same (bit)coin....

0bb8cfad2c4e...·Nov 1

#AI #GenerativeAI #AISafety #SafetyFrameworks: "To provide a concrete foundation for this analysis, I primarily focus on Anthropic's safety framework (version 1.0), which stands as the most comprehens...

0bb8cfad2c4e...·Sep 30

Elon Musk sues OpenAI, Sam Altman for making a “fool” out of him - Enlarge / Elon Musk and Sam Altman share the stage in 2015, the same ye... - #artificialintelligence #existentialrisks #generativeai...

decf2ae424d7...·Aug 5

From Paul Cristiano on Dwarkesh Patel’s podcast on AI safety. This is so important to remember. Current #AI is just one of the first things that really work. We need to build it with the realization t...

9baed03137d2...·Nov 2

AI-powered grocery bot suggests recipe for toxic gas, “poison bread sandwich” - Enlarge (credit: PAK'nSAVE) When given a list of harmful ingre... - #largelanguagemodels #machinelearning #newzealand...

decf2ae424d7...·Aug 10

Ars Technica: AI-powered grocery bot suggests recipe for toxic gas, “poison bread sandwich” #Tech #arstechnica #IT #Technology #largelanguagemodels #machinelearning #newzealand #redteaming #PAK'nSAVE...

261776f4b97d...·Aug 10