Evaluating AI Is Harder Than Building It Post date September 25, 2025 Post author By Andrew Gostishchev Post categories In ai, ai-evaluation, ai-evaluation-frameworks, ai-model-evaluation, ai-rankings, language-model-evaluation, model-evaluation-framework, pre-agentic-era
A Quantitative and Qualitative Analysis of the SymTax Citation Recommendation Model Post date August 26, 2025 Post author By Hyperbole Post categories In ablation-study, ai-evaluation, citation-recommendation, explainable-ai, hyperbolic-embeddings, information-retrieval, nlp, scientometrics
The Fine Print of Misbehavior: VRP’s Blueprint and Safety Stance Post date August 11, 2025 Post author By Large Models (dot tech) Post categories In adversarial-ai-research, ai-evaluation, ai-model-security, ethical-ai-attacks, mllm-jailbreak, role-play-attack, text-moderation, vrp-methodology
Strategic LLM Training: Multi-Token Prediction’s Data Efficiency in Mathematical Reasoning Post date July 23, 2025 Post author By Cosmological thinking: time, space and universal causation Post categories In ai-evaluation, ai-optimization, llm-performance, llm-training, multi-token-llm, multi-token-prediction, natural-language-math, transformer-models
The New Gold Standard in AI Evaluation: How “Agent-as-a-Judge” Changes Everything Post date March 20, 2025 Post author By Sai Jeevan Puchakayala Post categories In agents, ai-evaluation, artificial-intelligence, llm-as-a-judge, prompt-engineering
The HackerNoon Newsletter: Lumoz Unveils TEE+ZK Multi-Proof for On-chain AI Agent (1/19/2025) Post date January 19, 2025 Post author By Noonification Post categories In ai-evaluation, bitcoin, crypto-market-highlights, firewall, hackernoon, hackernoon-newsletter, latest-tect-stories, mx-token-growth, noonification, product-management, recording-software
Lumoz Unveils TEE+ZK Multi-Proof for On-chain AI Agent Post date January 10, 2025 Post author By Lumoz (formerly Opside) Post categories In ai-agents, ai-evaluation, good-company, lumoz, lumoz-tee, on-chain-ai-agent, tee-zk, trusted-execution-environment