Hugging Face Forums — korshunov.ai

Source · Hugging Face Forums

Native binary embeddings outperform post-hoc binarization

A small-scale experiment shows that native binary embedding models achieve better retrieval than post-hoc binarization of float models. At SciFact Recall@10, native binary models (2048-dim and 4096-dim) outperform post-hoc binary models by 17% and 25% respectively, with significant speed and memory advantages in indexing.

media Hugging Face Forums · 2d ago

Buddy System: Rust entropy monitor with NER-gated uncertainty for tiered LLM inference

The Buddy System uses a Rust entropy monitor to detect per-token uncertainty in local Gemma 3 4B inference, routing only uncertain tokens to Sonnet via NER-gated span extraction and semantic retrieval. Benchmarks show it achieves 71.4% accuracy at $0.21, outperforming the Anthropic Advisor pattern (62.9% at $0.44) across seven Hugging Face datasets, with a key improvement on SQuAD v2 by routing source passage chunks to the cloud model.

media Hugging Face Forums · 3d ago

I built a novel triple-hybrid LLM under 1B parameters for ~$50

Mateusz has developed a full pre-trained language model, Project Inkblot's Titan v1, combining Mamba SSM, Multi-Head Attention, and 32-expert MoE in a single decoder-only architecture under 1B parameters. The model, trained on a single NVIDIA L4 GPU for ~$50, achieves 27.5 validation perplexity and demonstrates efficient scaling via a single-line config update, with all components implemented from scratch in PyTorch. Titan v2's first training cycle is now complete, and dataset expansion is underway.

media Hugging Face Forums · 3d ago

LLMs as Epistemic Accelerators: The Risk Is Not Only Hallucination

LLMs do not merely hallucinate; they amplify human epistemic overconfidence by turning weak hypotheses into coherent, polished claims before evidence is verified. This creates a risk of premature certainty in research, policy, and other domains, not because models lie, but because they accelerate human tendencies to favor elegant explanations over uncertainty.

media Hugging Face Forums · 2h ago

Niodoo: A Local Runtime for Hidden State Steering of Frozen LLMs

Jason Van Pham has released Niodoo, a local runtime designed to steer frozen large language models through their hidden states. The project aims to correct last-step errors by injecting noise or "physics forces" during inference to break token loops. This approach allows smaller models to improve performance without fine-tuning, targeting specific failure cases like the Llama strawberry prompt benchmark. The system generates its own telemetry tags and utilizes TDA analysis to monitor internal model states for looping behavior. Van Pham developed this tool organically through months of self-directed research and red-teaming, emphasizing reproducible results from pinned hashes. The code is available on GitHub under the repository Ruffian-L/niodoo-hidden-state-steering.

media Hugging Face Forums · 2h ago

Prompt Format Inquiry for Training Unsloth/Phi-3.5-mini-instruct

A user seeks advice on the optimal prompt formatting strategy for training the Phi-3.5-mini-instruct model using Unsloth. The inquiry contrasts maintaining a custom text format against utilizing a standard chat template for dataset preparation. The current implementation employs a function that structures data into '### Input:' and '### Output:' sections, appending an end-of-text token. This approach processes JSON-encoded input and output fields derived from a Hugging Face Dataset object. The provided example illustrates a complex structure involving financial insights, merchant names, dates, and transaction totals. The user intends to deploy the trained model via a custom API and requests guidance on whether to retain this format or switch to a chat template.

media Hugging Face Forums · 5h ago

User Reports Step 3.7 Flash Model Tool Access Failure on HuggingChat

A user on the Hugging Face discussion forum reported that the Step 3.7 Flash model by StepFun AI has lost its ability to use tools, including MCP servers, as of the morning of the report. The individual expressed concern over whether this outage is temporary or permanent, noting their strong preference for this specific model due to its high performance and low resource costs compared to competitors. Despite praising the model's quality and affordability, the user highlighted the immediate disruption caused by the inability to execute tool-based functions. The post seeks clarification from the community regarding prior experiences with similar issues and potential resolutions. This incident underscores a critical dependency on tool availability for users relying on this specific AI configuration.

media Hugging Face Forums · 6h ago

Ontological Inversion: Flipping LLM Emotional Concepts via Negative Gain

The author introduces 'ontological inversion,' a technique designed to expand the one-directional inference nature of Large Language Models. This method allows models to capture nuanced, multifaceted concepts, such as memories that evoke both sorrow and joy simultaneously. The approach was developed by applying a negative gain factor during sweeps into the Niodoo steering architecture. It addresses the common limitation where LLMs overfit to singular emotional labels when prompted with personal experiences. By inverting concepts similarly to physics involution, the technique enables models to flip emotional states, such as transforming sorrowful memories into joyful ones. The work is shared via a GitHub repository titled 'ontological-inversion' by user Ruffian-L.

media Hugging Face Forums · 6h ago

Qwen3/Gemma3 Candle Skips Attention Masks for Equal-Length Batches in CPU Mode

A user has reported a critical bug in the Hugging Face text-embeddings-inference library affecting Qwen3 and Gemma3 models. The issue arises when running inference on CPUs with concurrent requests, leading to significant accuracy degradation. Specifically, the Candle backend incorrectly skips attention masks for batches where all input sequences have equal lengths. This defect compromises the reliability of embeddings generated under these specific conditions. To address the problem, the author submitted a pull request containing a fix that was thoroughly tested on their local machines. The bug highlights potential stability risks in CPU-based embedding services handling batched inputs.

media Hugging Face Forums · 14h ago

Aiden Mobile Agent Prototype in the Making

Aiden is a physical AI agent device that monitors a phone's screen via HDMI and controls it through USB HID, enabling app automation without jailbreak or installed software. It supports bring-your-own LLMs, operates without backend infrastructure or data collection, and is released under the AGPL license as an open-source development board.

media Hugging Face Forums · 20h ago

I Built an MCP Server in Go for AI Agents - 200 Lines Tutorial

A 200-line Go tutorial demonstrates building a lightweight Model Context Protocol server using Go's concurrency and simplicity. The server enables AI agents like Claude to access structured data and Go applications, potentially making them 10x more useful.

media Hugging Face Forums · 20h ago

Best model for local usage and working on Unity with MCP at 12 GB VRAM

A user is seeking a lightweight LLM tailored for Unity 6.5 with MCP, operating within 12 GB VRAM. They currently rely on free tiers of Cursor and Claude but find them insufficient, asking if any specialized models exist or alternative solutions are available.

media Hugging Face Forums · 20h ago

Wav2vec2 and WavLM Audio Classifier Stuck at 33% Accuracy

A user reports that fine-tuning wav2vec2-base or wavlm-base-plus for 3-class audio classification achieves only 33% accuracy, matching chance levels. The model is trained with only the classification head updated, using padded clips of 1.0s duration without attention masks, and with a learning rate of 1e-3, leading to poor performance despite class imbalance and short input clips.

media Hugging Face Forums · 1d ago

Inference provider information out of date?

The Hugging Face page for Llama 3.1 405B lists Featherless AI as a provider, but the test widget shows 'Failed to fetch' and featherless.ai does not list it as available. A similar issue is reported for Baidu's ERNIE-4.5-300B model.

media Hugging Face Forums · 1d ago

Llama 3.1 70B API Access Restricted to Hugging Face Tester

Users can access the Llama 3.1 70B model via the Hugging Face tester, but receive a "Model not supported by provider" error when using third-party apps or curl. The model is currently only available through the Hugging Face interface and not exposed via public API endpoints.

media Hugging Face Forums · 1d ago

Spaces tokens stop working after update

Users report that Spaces tokens no longer function after a recent update. No generated files are being saved, disrupting workflow and model execution.

media Hugging Face Forums · 1d ago

Seeking arXiv cs.LG Endorsement for PsiLogic Optimizer

Ali, a 16-year-old independent researcher, has developed PsiLogic, a chaos-aware active cancellation optimizer based on Adam. Evaluated against AdamW and Lion using FairBench on an NVIDIA H100, PsiLogic achieved top validation metrics in three out of four tasks and is statistically tied in the fourth, though it incurs step-time overhead. The author seeks endorsement for arXiv submission under cs.LG, providing a GitHub repository and endorsement code 4ACC37.

media Hugging Face Forums · 1d ago

Spaces tokens no longer work and files not saved

After a recent Hugging Face update, Spaces tokens stopped working, resulting in 404 errors when attempting to save generated files. The process completes successfully up to 100% but fails during the save phase due to token errors, consuming ZeroGPU credits without producing any saved outputs.

media Hugging Face Forums · 2d ago

Coolest Theoretical AI Topics with Realistic AI System Basis

The discussion explores theoretical AI topics that have mathematical foundations and plausible implementation in current AI systems, such as large language models. Topics include reasoning chains, knowledge graphs, and probabilistic reasoning, all of which are grounded in formal math and show potential for real-world AI applications.

media Hugging Face Forums · 2d ago

My Hugging Face Account Was Locked

A user reports their Hugging Face account, AntixStudioDesign, was locked unexpectedly during experimentation with AI tools. They have contacted the Safety Team via email and seek advice on account recovery, response time, and data preservation options.