23
1
Task-free intelligence testing of LLMs (marble.onl)
1
Intelligence is not just about task completion (marble.onl)
2
If You Meet ET in Space, Kill Him (2024) (nautil.us)
1
Intelligence is not just about task completion (marble.onl)
1
Show HN: Gen AI Writing Showdown (writing-showdown.com)
2
Ifrro member Kopinor signs agreement on newspaper content for AI in Norway (ifrro.org)
1
Comparing language model performance on creative writing transformations (writing-showdown.com)
1
Eminembench (marble.onl)
1
Promptware Attacks Against LLM-Powered Assistants in Production (sites.google.com)
2
Managing LLM application performance through code standards (marble.onl)
1
Catching Claude Cheating (marble.onl)
2
Catching Claude Cheating (marble.onl)
3
Scanning AI application code for vulnerabilities and performance issues (marble.onl)
3
Show HN: A static scanner for LLM app code (github.com/kereva-dev)
2
Scanning AI application code for vulnerabilities and performance issues (marble.onl)
2
The Model Trust Score: The Framework for Strategic Enterprise AI Model Selection (credo.ai)
18
Evals are not all you need (marble.onl)
1
An AI Cyber Incident in Plain Sight (marble.onl)
2
AI agent using Anthropic's tool calling and the Pandas Python library (github.com/rbitr)
2
Following LLM Manufacturer's Instructions (armilla.ai)
1