AI Research

$GPT-5.5 as Scientific Co-Researcher: Ramsey Proofs, Gene Expression, and Cyber Defense$

GPT-5.5 as Scientific Co-Researcher: Ramsey Proofs, Gene Expression, and Cyber Defense

April 29, 2026 5 minutes read 68

GPT-5.5's Lean-verified Ramsey proof, BixBench score, CyberGym result, and High Preparedness rating show both scientific upside and cyber risk.

Read story

AI Research

ScreenMemory: Measured Improvements in Visual Desktop Control

March 23, 2026 24 minutes read 231

Region-Targeted Capture, Structural UI Automation, Semantic Navigation, and Accessibility-Tree Compression Independent Research -- June 2026 Project: ScreenMemory — an autonomous desktop agent perception stack | Benchmarks: Independent reproduction scripts available on request Abstract Autonomous desktop agents remain far below human performance on real-world operating-system

Read story

AI Research

MedAgentBench and the Clinical AI Frontier: Stanford’s Benchmark for Healthcare Agent Safety

February 21, 2026 9 minutes read 144

Stanford University’s benchmark suite exposes a critical gap between medical knowledge and clinical execution: frontier AI models achieve 69.67% success in realistic EHR workflows, until systematic architectural redesigns — including extractive memory, tool abstraction, and mandatory human-in-the-loop gating — push reliability to 98%. From

Read story

GPT-5.5 as Scientific Co-Researcher: Ramsey Proofs, Gene Expression, and Cyber Defense

ScreenMemory: Measured Improvements in Visual Desktop Control

MedAgentBench and the Clinical AI Frontier: Stanford’s Benchmark for Healthcare Agent Safety

Stay in the loop