AI agents
media r/LocalLLaMA · 8d ago

Local LLM-powered RPG with persistent generated content

The developer released a local LLM-powered RPG where NPCs, locations, items, and quests are generated as persistent in-game objects. These elements can be revisited and interacted with, and the game integrates LLMs into core RPG mechanics like dialogue, narration, and quest progression, while managing inventory, combat, and saves. The game sold about 1,800 copies in its first week and has a 4.0 store rating, indicating player interest in AI-driven RPG experiences.

arxiv arXiv cs.LG · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45%, offering actionable diagnostics for trustworthy legal AI deployment.

arxiv arXiv cs.LG · 8d ago

Compositional Generalization in Language Model Reasoning

A hierarchical latent selection model shows that supervised fine-tuning and reinforcement learning work together to enable compositional generalization in language models. SFT provides raw module materials, while RL identifies and recombines atomic modules from compound traces to solve new problems. Training on compound traces leads to stronger generalization than isolated module training, and an effective protocol is found where SFT ensures module coverage and RL drives exploration of novel compositions.

arxiv arXiv cs.LG · 8d ago

OmniPlan: Adaptive Framework for Timely and Near-Optimal Network Planning

OmniPlan introduces an adaptive framework that converts natural-language user intents into quantifiable preferences using a large language model. It dynamically selects among mixed integer programming, heuristics, and deep reinforcement learning experts to achieve both timeliness and near-optimality in network planning. Evaluations on distributed machine learning workloads show up to 97.8% latency reduction and 11.5% lower resource consumption.

arxiv arXiv cs.LG · 8d ago

Embedded ML Workflow for Microcontroller Edge Devices

This paper outlines a systems-oriented workflow for embedded machine learning on microcontroller-class devices. It details key engineering decisions such as data sampling, feature extraction, class imbalance validation, model-runtime co-design, and streaming deployment, using inertial motion recognition and keyword spotting as case studies. The work provides practical design rules for robust on-device inference, including data curation, quantization, thresholding, scheduling, and field monitoring.

arxiv arXiv cs.LG · 8d ago

Flash Endurance as Depreciating Capital in Robot Memory

A robot's flash memory degrades with each write, forming a non-renewable asset. A wear-aware pricing model uses a shadow price $η$ to guide memory placement across RAM, NVM, and cloud, with optimal routing depending on whether task value increases with memory persistence. The sign of the value-write association $χ$ varies by deployment: positive in long-horizon manipulation, null in short-horizon tasks, and negative in teleoperation. The endurance budget is binding only on low-end QLC/eMMC memory, and while wear-aware routing aligns with task value, actual performance improvements remain unverified in data.

arxiv arXiv cs.LG · 8d ago

ATT&CK-Labeled Multi-Source Cybersecurity Logs Dataset Released

A new dataset combines system, network, and browser logs from 870 Windows sessions, including 70 attacks and 800 benign cases. It provides per-event labels with MITRE ATT&CK technique IDs for 12 tactics and 53 techniques, using real attack tools like RAT and C2 tunnels. Fine-tuning three Small Language Models (SLMs) via LoRA improved chunk classification accuracy to 90–97% and achieved up to 42% exact-match accuracy in technique identification, showing strong reasoning capture despite challenges.

arxiv arXiv cs.CL · 8d ago

LegalHalluLens: Auditing Hallucinations in Legal AI

LegalHalluLens introduces a framework to audit AI hallucinations in legal contexts by analyzing typed hallucination profiles across four claim categories. It reveals a 38-40 point gap between obligation/numeric and temporal claims, and shows two systems with identical 52% hallucination rates can have opposite risk directions. The framework uses a Risk Direction Index and calibrated debate pipelines to reduce fabricated detections by 45% and improve accountability in legal AI deployment.

arxiv arXiv cs.CL · 8d ago

ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

ProvenanceGuard introduces a source-aware verifier for MCP-based LLM agents that detects cross-source conflation by routing claims to specific evidence sources and comparing stated attribution with actual source ownership. It achieves block F1 of 0.802 and source accuracy of 0.858 on 260 source-eligible claims, outperforming source-blind baselines, and detects all injected attribution swaps in 50 clinical probes.