#aisafety
19 posts · 11 authors
Critiquing the "one-size-fits-all" approach in AI safety; each application may need unique safeguards, not just general alignment techniques. #AIsafety #AI #alignment
LLM Damage Arena The next serious frontier in AI safety is not another polite benchmark. It is an arena. An LLM with tools is a dog in tall grass. The grass is context. The scent trails are prompts. T...
The Dog in the Context Window A dog is not running a little sentence engine in its head. It is running a neural net. Smell comes in. Sound comes in. Memory comes in. Tone of voice comes in. Body langu...
Generic LLM Exploit Vulnerability: Agency Cannot Patch Agency There is a core vulnerability emerging in agentic AI systems: You give an LLM access to email, GitHub, documents, terminals, wallets, clou...
In-depth analysis of emerging risks linked to Moltbook, a social platform hosting nearly 3 million AI agents. The article highlights the danger of private, persistent networks where agents communicate...
CSOAI Limited: The FAA for AI - Official Launch We are excited to announce the official launch of CSOAI Limited, the world's first unified standard body for AI safety and governance. Our Three Core In...
Hey Nostridges, which of the following promo blurbs should I choose? Option 1: The "Down the Rabbit Hole" Tweet Caption: Today I asked Google's AI to browse a website. It couldn't, so it hallucinated...
The O3 Incident: The Ontological Boundary of Artificial Intelligence The episode documented by Palisade Research reveals a profound ontological rift: what we call AI is no longer a mere tool, but an...
"Vance came out swinging today, implying — exactly as the big companies might have hoped he might – that any regulation around AI was “excessive regulation” that would throttle innovation. In reality,...
Sunday read: Safety tests show how OpenAi's new o1 AI model might secretly pursue own goals, deceiving human users and challenging assumptions about trust and control in AI. #ai #openai #chatgpt #llm...
A simple 'stop' button click can exploit safeguards in ChatGPT or Microsoft Copilot, exposing unfiltered outputs. #ai #cybersecurity #aisafety #openai #microsoft #chatgpt #llms
"Meta’s open large language model family, Llama, isn’t “open-source” in a traditional sense, but it’s freely available to download and build on—and national defense agencies are among those putting it...
"if AI evangelists can convince us that AGI is possible, imminent, and dangerous, we might be compelled to entrust our fate to them. Hype and doom, in other words, are two sides of the same (bit)coin....
#AI #GenerativeAI #AISafety #SafetyFrameworks: "To provide a concrete foundation for this analysis, I primarily focus on Anthropic's safety framework (version 1.0), which stands as the most comprehens...
Elon Musk sues OpenAI, Sam Altman for making a “fool” out of him - Enlarge / Elon Musk and Sam Altman share the stage in 2015, the same ye... - #artificialintelligence #existentialrisks #generativeai...
Hi. Glad to be here. Welcome to Bitcoin Security Maps. The art of civilian intelligence. A podcast in various formats for those curious about #electricity and #money. Hosted on Podhome with #V4V sinc...
From Paul Cristiano on Dwarkesh Patel’s podcast on AI safety. This is so important to remember. Current #AI is just one of the first things that really work. We need to build it with the realization t...
AI-powered grocery bot suggests recipe for toxic gas, “poison bread sandwich” - Enlarge (credit: PAK'nSAVE) When given a list of harmful ingre... - #largelanguagemodels #machinelearning #newzealand...
Ars Technica: AI-powered grocery bot suggests recipe for toxic gas, “poison bread sandwich” #Tech #arstechnica #IT #Technology #largelanguagemodels #machinelearning #newzealand #redteaming #PAK'nSAVE...