Source · Hugging Face Forums
media Hugging Face Forums · 2d ago

Buddy System: Rust entropy monitor with NER-gated uncertainty for tiered LLM inference

The Buddy System uses a Rust entropy monitor to detect per-token uncertainty in local Gemma 3 4B inference, routing only uncertain tokens to Sonnet via NER-gated span extraction and semantic retrieval. Benchmarks show it achieves 71.4% accuracy at $0.21, outperforming the Anthropic Advisor pattern (62.9% at $0.44) across seven Hugging Face datasets, with a key improvement on SQuAD v2 by routing source passage chunks to the cloud model.

media Hugging Face Forums · 3d ago

I built a novel triple-hybrid LLM under 1B parameters for ~$50

Mateusz has developed a full pre-trained language model, Project Inkblot's Titan v1, combining Mamba SSM, Multi-Head Attention, and 32-expert MoE in a single decoder-only architecture under 1B parameters. The model, trained on a single NVIDIA L4 GPU for ~$50, achieves 27.5 validation perplexity and demonstrates efficient scaling via a single-line config update, with all components implemented from scratch in PyTorch. Titan v2's first training cycle is now complete, and dataset expansion is underway.

media Hugging Face Forums · 2h ago

Niodoo: A Local Runtime for Hidden State Steering of Frozen LLMs

Jason Van Pham has released Niodoo, a local runtime designed to steer frozen large language models through their hidden states. The project aims to correct last-step errors by injecting noise or "physics forces" during inference to break token loops. This approach allows smaller models to improve performance without fine-tuning, targeting specific failure cases like the Llama strawberry prompt benchmark. The system generates its own telemetry tags and utilizes TDA analysis to monitor internal model states for looping behavior. Van Pham developed this tool organically through months of self-directed research and red-teaming, emphasizing reproducible results from pinned hashes. The code is available on GitHub under the repository Ruffian-L/niodoo-hidden-state-steering.

media Hugging Face Forums · 2h ago

Prompt Format Inquiry for Training Unsloth/Phi-3.5-mini-instruct

A user seeks advice on the optimal prompt formatting strategy for training the Phi-3.5-mini-instruct model using Unsloth. The inquiry contrasts maintaining a custom text format against utilizing a standard chat template for dataset preparation. The current implementation employs a function that structures data into '### Input:' and '### Output:' sections, appending an end-of-text token. This approach processes JSON-encoded input and output fields derived from a Hugging Face Dataset object. The provided example illustrates a complex structure involving financial insights, merchant names, dates, and transaction totals. The user intends to deploy the trained model via a custom API and requests guidance on whether to retain this format or switch to a chat template.

media Hugging Face Forums · 5h ago

User Reports Step 3.7 Flash Model Tool Access Failure on HuggingChat

A user on the Hugging Face discussion forum reported that the Step 3.7 Flash model by StepFun AI has lost its ability to use tools, including MCP servers, as of the morning of the report. The individual expressed concern over whether this outage is temporary or permanent, noting their strong preference for this specific model due to its high performance and low resource costs compared to competitors. Despite praising the model's quality and affordability, the user highlighted the immediate disruption caused by the inability to execute tool-based functions. The post seeks clarification from the community regarding prior experiences with similar issues and potential resolutions. This incident underscores a critical dependency on tool availability for users relying on this specific AI configuration.

media Hugging Face Forums · 6h ago

Ontological Inversion: Flipping LLM Emotional Concepts via Negative Gain

The author introduces 'ontological inversion,' a technique designed to expand the one-directional inference nature of Large Language Models. This method allows models to capture nuanced, multifaceted concepts, such as memories that evoke both sorrow and joy simultaneously. The approach was developed by applying a negative gain factor during sweeps into the Niodoo steering architecture. It addresses the common limitation where LLMs overfit to singular emotional labels when prompted with personal experiences. By inverting concepts similarly to physics involution, the technique enables models to flip emotional states, such as transforming sorrowful memories into joyful ones. The work is shared via a GitHub repository titled 'ontological-inversion' by user Ruffian-L.

media Hugging Face Forums · 6h ago

Qwen3/Gemma3 Candle Skips Attention Masks for Equal-Length Batches in CPU Mode

A user has reported a critical bug in the Hugging Face text-embeddings-inference library affecting Qwen3 and Gemma3 models. The issue arises when running inference on CPUs with concurrent requests, leading to significant accuracy degradation. Specifically, the Candle backend incorrectly skips attention masks for batches where all input sequences have equal lengths. This defect compromises the reliability of embeddings generated under these specific conditions. To address the problem, the author submitted a pull request containing a fix that was thoroughly tested on their local machines. The bug highlights potential stability risks in CPU-based embedding services handling batched inputs.

media Hugging Face Forums · 1d ago

Seeking arXiv cs.LG Endorsement for PsiLogic Optimizer

Ali, a 16-year-old independent researcher, has developed PsiLogic, a chaos-aware active cancellation optimizer based on Adam. Evaluated against AdamW and Lion using FairBench on an NVIDIA H100, PsiLogic achieved top validation metrics in three out of four tasks and is statistically tied in the fourth, though it incurs step-time overhead. The author seeks endorsement for arXiv submission under cs.LG, providing a GitHub repository and endorsement code 4ACC37.