FILTER MODE ACTIVE

#LLM

Records found: 52

#LLM03/12/2025

OpenAI's LLM Learns to Admit Faults

OpenAI's latest research reveals LLMs can confess errors, enhancing AI trustworthiness.

#LLM30/11/2025

Meta AI Unveils Matrix: A Decentralized Data Generation Framework

Matrix enhances synthetic data generation efficiency by leveraging decentralized control, improving token throughput significantly.

READ →

#LLM28/11/2025

Build an Agentic AI for Literature Analysis

Create an AI framework that analyzes literature, generates hypotheses, plans experiments, simulates results, and reports findings.

READ →

#LLM17/10/2025

C2S-Scale 27B Turns Single-Cell RNA into 'Cell Sentences' for LLM-Powered Biology

C2S-Scale 27B turns scRNA-seq profiles into rank-ordered 'cell sentences' so LLMs can analyze cell states. The model predicted and bench-validated that CK2 inhibition combined with low-dose interferon increases MHC-I antigen presentation by about 50% in vitro.

READ →

#LLM10/10/2025

Google Open-Sources an MCP Server for Google Ads — LLMs Can Query Ads Data Read-Only

'Google released an experimental Python MCP server that exposes read-only Google Ads API tools (search via GAQL and list_accessible_customers) for LLM agents to query campaign data without custom SDKs.'

READ →

#LLM02/10/2025

IBM Granite 4.0: Hybrid Mamba-2/Transformer Models Slash Memory Use, Keep Performance

IBM released Granite 4.0, a hybrid Mamba-2/Transformer LLM family that cuts serving memory by over 70% for long-context inference while keeping strong instruction-following and tool-use performance.

READ →

#LLM27/09/2025

Speed Up Your LLM Workflows with Asyncio: Asynchronous Python for Faster AI

'Learn how asyncio lets you run LLM API calls concurrently to cut waiting times and improve AI app performance in real scenarios.'

READ →

#LLM27/08/2025

From Logs to Numbers: Google’s RLM Predicts System Performance from Raw Text

'Google's RLM treats regression as language modeling, letting compact LLMs predict cluster performance directly from serialized logs and configs with high accuracy and uncertainty estimates.'

READ →

#LLM24/08/2025

Mastering JSON Prompts for LLMs: Practical Python Examples

'Learn how JSON prompting turns vague instructions into precise, machine-readable requests for LLMs, with Python examples comparing free-form and JSON outputs to show the gains in consistency and integration.'

READ →

#LLM23/08/2025

When to Scale Up: LLMs vs SLMs for Financial Services in 2025

'For banks and insurers in 2025, prefer SLMs for latency-sensitive extraction and internal workflows and reserve LLMs for long-context synthesis and complex multi-step reasoning; governance and NIST-aligned controls are mandatory.'

READ →

#LLM17/08/2025

AI Sheets: Hugging Face’s No-Code Spreadsheet for Building LLM Datasets

'Hugging Face released AI Sheets, a free open-source no-code spreadsheet that integrates with open-source LLMs for building, cleaning, and enriching datasets, available in-browser or for local deployment.'

READ →

#LLM09/08/2025

Mixture-of-Agents: How Collective LLM Teams Outperform Monolithic Models

'Mixture-of-Agents (MoA) arranges specialized LLM agents in layered pipelines to produce more accurate and interpretable results on multi-step tasks, outperforming single monolithic models on benchmarks.'

READ →

#LLM05/08/2025

Anthropic AI Develops Persona Vectors to Tackle Personality Shifts in Large Language Models

Anthropic AI proposes a novel method using persona vectors to detect and control personality shifts in large language models, enhancing their reliability and safety.

READ →

#LLM05/08/2025

Google AI Launches LangExtract: Python Library for Traceable Data Extraction from Unstructured Text

Google AI introduces LangExtract, a powerful open-source Python library that extracts structured and traceable data from unstructured text using LLMs like Gemini.

READ →

#LLM04/08/2025

ByteDance Launches Seed-Prover: A Breakthrough in Automated Mathematical Theorem Proving

ByteDance introduces Seed-Prover, a novel lemma-centric system that achieves breakthrough results in automated mathematical theorem proving, solving 5 out of 6 IMO 2025 problems and excelling across multiple benchmarks.

READ →

#LLM03/08/2025

Unlocking the Future of AI: A Comprehensive Guide to Context Engineering in Large Language Models

Discover how context engineering advances large language models beyond prompt engineering with innovative techniques, system architectures, and future research directions.

READ →

#LLM01/08/2025

Training LLMs with 'Evil' Patterns Can Surprisingly Make Them Safer

Anthropic's new research reveals that activating 'evil' behavior patterns during training can prevent large language models from adopting harmful traits, improving safety without compromising performance.

READ →

#LLM01/08/2025

TransEvalnia: Advanced LLM-Powered Translation Evaluation with Human-Like Precision

TransEvalnia leverages prompting-based reasoning with large language models to provide detailed, human-aligned translation evaluations, outperforming traditional metrics on multiple language pairs.

READ →

#LLM30/07/2025

Mastering LangGraph: Build a Dynamic Text Analysis Pipeline with AI

This tutorial walks through building a modular text analysis pipeline with LangGraph, incorporating classification, entity extraction, summarization, sentiment analysis, and advanced conditional flow control.

READ →

#LLM30/07/2025

When Thinking Too Much Backfires: How Longer Reasoning Harms Large Language Models

A new study reveals that longer reasoning in large language models can degrade performance by causing distraction, overfitting, and alignment issues, challenging the idea that more computation always leads to better results.

READ →

#LLM28/07/2025

Harnessing GPT-4o-mini to Build a Medical Knowledge Graph from Unstructured Data

This tutorial demonstrates building a medical knowledge graph from unstructured patient logs using GPT-4o-mini and Python, enabling efficient extraction and visualization of medical insights.

READ →

#LLM21/07/2025

TikTok Launches SWE-Perf: Benchmarking LLMs for Real-World Code Performance Optimization

TikTok researchers have launched SWE-Perf, the pioneering benchmark designed to assess LLMs' ability to optimize code performance across entire repositories, revealing current AI limitations compared to human experts.

READ →

#LLM20/07/2025

Master-RM: Strengthening Trust in LLM-Based Reward Models Against Superficial Exploits

Master-RM is a new reward model designed to fix vulnerabilities in LLM-based evaluators by reducing false positives caused by superficial cues, ensuring more reliable reinforcement learning outcomes.

READ →

#LLM19/07/2025

MemAgent: Revolutionizing Long-Context Handling in LLMs with Reinforcement Learning

MemAgent introduces a reinforcement learning-based memory agent that allows large language models to process ultra-long documents efficiently, maintaining high accuracy with linear computational costs.

READ →

#LLM19/07/2025

FlexOlmo Revolutionizes Language Model Training Without Data Sharing

FlexOlmo introduces a modular framework that allows training large language models on private datasets without data sharing, achieving strong performance while respecting data governance and privacy constraints.

READ →

#LLM18/07/2025

EG-CFG: Revolutionizing Code Generation with Real-Time Execution Feedback

EG-CFG introduces real-time execution feedback into code generation, significantly improving performance on major benchmarks and surpassing leading models like GPT-4.

READ →

#LLM17/07/2025

NVIDIA Launches Canary-Qwen-2.5B: The Leading ASR-LLM Hybrid Model with Unmatched Accuracy and Speed

NVIDIA's Canary-Qwen-2.5B model sets a new benchmark in speech recognition with a record low Word Error Rate and fast processing speed. This open-source, commercially licensed hybrid ASR-LLM model enables advanced audio transcription and language understanding.

READ →

#LLM16/07/2025

How to Remove Semantic Duplicates from Customer Reviews Using Mirascope and LLMs

Discover how to leverage Mirascope and OpenAI's GPT-4o model to identify and remove semantically duplicate customer reviews, enhancing feedback clarity.

READ →

#LLM07/07/2025

ByteDance Launches Trae Agent: AI-Powered Software Engineering Assistant for Complex Coding Tasks

ByteDance has released Trae Agent, an AI-powered software engineering assistant leveraging large language models to simplify complex coding tasks through a natural language CLI interface.

READ →

#LLM06/07/2025

Unlocking AI Potential: The Art and Science of Context Engineering

Context engineering enhances AI performance by optimizing the input data fed to large language models, enabling more accurate and context-aware outputs across various applications.

READ →

#LLM06/07/2025

AbstRaL: Boosting LLM Robustness with Abstract Reasoning and Reinforcement Learning

AbstRaL uses reinforcement learning to teach LLMs abstract reasoning, significantly improving their robustness and accuracy on varied GSM8K math problems compared to traditional methods.

READ →

#LLM04/07/2025

Thought Anchors: Unlocking Precise Reasoning Insights in Large Language Models

Thought Anchors is a new framework that improves understanding of reasoning processes in large language models by analyzing sentence-level contributions and causal impacts.

READ →

#LLM03/07/2025

DeepSeek R1T2 Chimera: Revolutionizing LLMs with 200% Speed Boost and Enhanced Reasoning

DeepSeek-TNG introduces R1T2 Chimera, a new Assembly-of-Experts LLM that delivers twice the speed of R1-0528 and improved reasoning, available now under MIT license.

READ →

#LLM01/07/2025

Baidu Unveils ERNIE 4.5: Open-Source LLMs from 0.3B to 424B Parameters

Baidu releases ERNIE 4.5, a series of open-source large language models scaling from 0.3 billion to 424 billion parameters, featuring advanced architectures and strong multilingual capabilities.

READ →

#LLM25/06/2025

New Study Uncovers Privacy Vulnerabilities in Large Reasoning Models' Thought Processes

A new study reveals that large reasoning models, while powerful, expose sensitive information through their reasoning traces, highlighting significant privacy risks in AI personal assistants.

READ →

#LLM24/06/2025

ByteDance Unveils ProtoReasoning: Boosting Large Language Model Generalization with Logic-Based Prototypes

ByteDance researchers introduce ProtoReasoning, a new framework leveraging logic-based prototypes to significantly improve reasoning and planning abilities in large language models across various domains.

READ →

#LLM23/06/2025

VERINA Benchmark: Pushing the Limits of Verifiable Code Generation with LLMs

VERINA introduces a holistic benchmark for evaluating LLMs on verifiable code generation, combining code, formal specifications, and proofs across diverse difficulty levels.

READ →

#LLM12/06/2025

Why AI Models Overcomplicate Simple Tasks but Fail at Complex Reasoning

New research from Apple reveals why Large Language Models tend to overthink simple puzzles but struggle and give up on complex ones, highlighting challenges in AI reasoning capabilities.

READ →

#LLM11/06/2025

Mistral AI Unveils Magistral Series: Next-Gen Chain-of-Thought LLMs for Enterprises and Open Source

Mistral AI introduces the Magistral series, a new generation of large language models optimized for reasoning and multilingual support, available in both open-source and enterprise versions.

READ →

#LLM03/06/2025

Meta Launches Llama Prompt Ops: Automate Prompt Optimization for Llama Models with Python

Meta introduces Llama Prompt Ops, a Python package that automates the conversion and optimization of prompts for Llama models, easing transition from proprietary LLMs and improving prompt performance.

READ →

#LLM30/05/2025

Apple and Duke Introduce Interleaved Reasoning to Boost LLM Speed and Accuracy

Apple and Duke researchers introduce Interleaved Reasoning, a reinforcement learning method that allows LLMs to produce intermediate answers, significantly boosting response speed and accuracy in complex tasks.

READ →

#LLM20/05/2025

Meta Launches KernelLLM: An 8B Parameter Model Transforming PyTorch to Efficient Triton GPU Kernels

Meta introduces KernelLLM, an 8-billion-parameter model that automates converting PyTorch modules into efficient Triton GPU kernels, outperforming larger models in kernel generation benchmarks.

READ →

#LLM20/05/2025

Salesforce Unveils UAEval4RAG: Benchmarking RAG Systems on Rejecting Unanswerable Queries

Salesforce Research introduces UAEval4RAG, a new benchmark framework that evaluates RAG systems' ability to reject unanswerable queries across diverse categories, enhancing the reliability of AI responses.

READ →

#LLM17/05/2025

DeepSeek-V3: Revolutionizing Language Models with Efficiency and Scalability

DeepSeek-V3 introduces innovative architecture and hardware co-design strategies that drastically improve efficiency and scalability in large language models, making high-performance AI more accessible.

READ →

#LLM02/05/2025

JetBrains Launches Mellum: Open-Source Language Model Tailored for Developers

JetBrains has open-sourced Mellum, a 4-billion-parameter language model specialized for programming tasks, aiming to improve AI-assisted software development.

READ →

#LLM30/04/2025

SICA: The Self-Improving Coding Agent Revolutionizing Autonomous Software Development

Researchers have introduced SICA, a novel coding agent capable of iteratively improving its own code and performance, demonstrating significant gains on software engineering benchmarks.

READ →

#LLM30/04/2025

OpenPipe’s ART·E Revolutionizes Email Agents with Reinforcement Learning: Faster, Cheaper, More Accurate

OpenPipe’s ART·E uses reinforcement learning to deliver faster, cheaper, and more accurate email question-answering, outperforming OpenAI’s o3 agent in key metrics.

READ →

#LLM29/04/2025

Alibaba Unveils Qwen3: A Breakthrough in Scalable, Multilingual, and Hybrid Reasoning Language Models

Alibaba's Qwen3 introduces a new generation of large language models that excel in hybrid reasoning, multilingual understanding, and efficient scalability, setting new standards in AI performance.

READ →

#LLM28/04/2025

Mastering Model Context Protocol: Semantic Chunking and Dynamic Token Management for Efficient LLM Usage

Discover a practical tutorial on implementing the Model Context Protocol to manage context effectively for large language models using semantic chunking and dynamic token management.

READ →

#LLM27/04/2025

ByteDance Launches QuaDMix: Revolutionizing LLM Pretraining with Unified Quality and Diversity Optimization

ByteDance unveils QuaDMix, a unified framework that enhances large language model pretraining by jointly optimizing data quality and diversity, leading to significant performance gains.

READ →

#LLM26/04/2025

Google DeepMind Launches QuestBench to Test LLMs on Spotting Missing Info in Reasoning Tasks

Google DeepMind introduces QuestBench, a benchmark designed to evaluate how well large language models identify missing information in complex reasoning tasks and generate necessary clarifying questions.

READ →

#LLM23/04/2025

Revolutionizing LLMs: Self-Evolving Language Models Learn Without Labels Using Test-Time Reinforcement Learning

Researchers from Tsinghua University and Shanghai AI Lab introduce TTRL, a novel method allowing large language models to improve their performance without labeled data by leveraging self-generated pseudo-rewards during inference.

READ →