Spaces tokens stop working after update
Users report that Spaces tokens no longer function after a recent update. No generated files are being saved, disrupting workflow and model execution.
Users report that Spaces tokens no longer function after a recent update. No generated files are being saved, disrupting workflow and model execution.
A benchmark of 8 LLMs on 300 synthetic doctor-patient dialogues found 12 high-impact hallucinations and 520 clinically relevant omissions. Omissions were far more common than hallucinations, with DeepSeek excelling in prose and cost but missing many safety facts, while Claude Opus had fewest omissions but poorer prose quality.
A university seeking on-premises document processing for academic workflows must use local parsers due to strict data governance policies banning cloud APIs. The user evaluates Docling, Liteparse, MinerU, and Unstructured, noting Docling excels in complex layouts with Apache 2.0 licensing but is slower; Liteparse offers good printed document performance with Tesseract OCR; MinerU uses PaddleOCR and handles French documents well despite longer setup; Unstructured supports multiple formats including DOCX and PPTX. The solution must support recurring, stable parsing of evolving PDFs with minimal formatting changes.
Users note a lack of discussion around Gemma 4 26b despite its potential suitability for personal assistant and RAG tasks on a solo 3090. The model is considered a strong candidate for all-in-one local AI applications, though it receives less attention compared to Qwen3.6 or Gemma4 31b.
Mistral OCR 4 introduces bounding boxes, block classification, and inline confidence scores for 170 languages across 10 language groups. It outperforms leading OCR systems in human preference evaluations with a 72% win rate and achieves the top score on OlmOCRBench (85.20), while offering self-hosted deployment in a single container and supporting enterprise use cases like RAG and document ingestion.
ViRGo introduces a lightweight framework that adapts visual retrieval based on object scale. It uses intrinsic localization and semantic confidence to route between global perception, patch-based retrieval, and attention-based retrieval, improving accuracy-efficiency trade-offs without extra computation.
π-RAG decouples LLMs from sensitive data by using π's digits as an immutable, uneditable source of entropy. It introduces a semantic quantization layer that maps user inputs to canonical intent centroids, then uses cryptographic salt to generate deterministic offsets pointing to standardized payloads, ensuring oblivious retrieval and mathematical guarantees of data privacy.
A new method improves topic-to-timestamp alignment in meeting transcripts by selecting timestamped evidence instead of generating timecodes. On 420 queries from municipal meeting transcripts, it boosts Recall@5 to 50.0%, reduces MAE to 761.0 seconds, and increases parseable outputs from 373 to 419, showing that retrieval quality and output design are critical.
PeerCheck analyzes differences between LLM and human academic reviews, finding LLMs focus on theory while humans prioritize methodology and experiments. The framework uses prompt engineering like Chain-of-Thought and retrieval-augmented generation, with CoT significantly improving review quality, though RAG introduces an unexpected 'paradox' that sometimes reduces quality.
A study compares retrieval-augmented generation (RAG) and long-context prompting in document-grounded AI. Long-context prompting achieves higher epistemic accuracy—73.1% vs. 65.4%—but at 26 times the per-query token cost, highlighting a significant token tax for broader evidentiary access.
A controlled ablation study evaluates agentic RAG components using a local 7B model on HotpotQA. Fixed hybrid retrieval outperforms adaptive routing by 1.8 EM and 1.9 F1, while two retrieval iterations capture 95% of the gains from five. Query decomposition and cross-encoder reranking show statistically significant but smaller improvements.
semantic-memory is a local-first knowledge base in Rust that combines BM25, vector, and reciprocal rank fusion search with SQLite. It features typed graph edges for causal, temporal, and semantic relationships, provenance tracking, bitemporal storage, and adaptive query routing, supporting 18 MCP tools for AI agents. All components run locally without cloud dependencies, API keys, or telemetry.
A user is designing a local, offline document retrieval and LLM pipeline with storage, ingestion, query, and highlighting features. They seek advice on vector databases (e.g., pgvector in Postgres vs Qdrant), GraphRAG feasibility offline, and open-source tools for document highlighting with citations.
A user asks how to integrate Gemma 4 12B with search capabilities using self-hosted AI models. They mention trying openwebui, which has issues with search engines like DDG, and seek alternatives that avoid using Brave or Google API keys.
The AI Economist Agent uses RAG, knowledge graphs, and LLMs to generate economic narratives grounded in theory and data. It enables model-based analysis, evidence retrieval, and report generation, ensuring economic coherence and traceability through explicit model computations.
The AI Economist Agent uses RAG, knowledge graphs, and LLMs to generate economic narratives grounded in theory and data. It enables model-based analysis, evidence retrieval, and report generation, ensuring economic coherence and traceability through explicit model computations.
A four-arm comparison shows that retrieval is essential for accurate statutory citation under the Ontario Residential Tenancies Act. The SFT+RAG hybrid model achieves 0.481 exact-match with zero hallucinations, outperforming base and SFT-only models, and matches a pipeline using larger, specialized models without needing more data or larger training sets. Results are based on a small, human-verified real-world evaluation set and are preliminary.
Multi-Agent Transactive Memory (MATM) enables population-level storage and retrieval of agent-generated trajectories. It allows producer agents to share procedural knowledge with consumer agents, improving task performance and reducing interaction steps in interactive environments like ALFWorld and WebArena without coordination or joint training.
A study measures tool-intent stabilization in Streaming RAG, defining when speculative tool queries converge to correct answers. On the CRAG benchmark, 73.9% of queries allow substantial latency hiding, with early stabilization observed in questions with verbatim retrievable evidence. Question type significantly predicts early versus late stabilization, informing when speculative triggers are effective.
CATCH-ME introduces the first large-scale, multilingual dataset of contextually annotated, multi-turn counterspeech dialogues targeting hate and misinformation. The dataset covers five languages and focuses on seven marginalized groups, with dialogues grounded in verified fact-checking sources and including document- and chunk-level span annotations for RAG systems.