Hardware & chips — korshunov.ai

Hardware & chips Page 1 / 2

MINISFORUM DEG1 Oculink eGPU Dock Refurbished Available for $59

A refurbished MINISFORUM DEG1 Oculink eGPU dock is currently available for $59. The product listing highlights its robust build quality, noting that the device has sufficient heft to securely hold a graphics card. Unlike some lower-cost alternatives, this dock includes redrivers to ensure signal integrity. A user who purchased a unit last year reported positive experiences with its performance and stability. The item can be purchased directly from the manufacturer's refurbished product page.

media r/LocalLLaMA · 4h ago

Query on Clustering Nvidia DGX Spark and AMD Ryzen AI Max 395 for Unified Memory Inference

A user inquired about the feasibility of clustering a Nvidia DGX Spark with an AMD Ryzen AI Max 395 to run a single large language model. Both devices possess 128GB of unified memory, offering a potential combined capacity of approximately 256GB minus operating system overhead. The DGX Spark is equipped with a 200Gbit network interface, whereas the AMD Strix system currently has only 5Gbit Ethernet but includes a PCIe Gen 4x4 slot. The user noted that DeepSeek v4 Flash can fit on two DGX Sparks and wondered if the Strix could serve as an alternative node. To improve connectivity, they proposed adding a Mellanox ConnectX-6 QSFP+28 to the AMD system to achieve higher bandwidth over the link.

media r/LocalLLaMA · 1d ago

7 Chinese companies shipping H100/H200-class AI chips, most IPO'd in last 6 months

At least seven Chinese companies are now shipping H100/H200-class AI accelerators, with most having gone public within the last six months. Huawei alone shipped 812,000 AI cards last year, accounting for 49% of China's domestic supply, and its Ascend 950 is reportedly targeted at H200-class performance. Several of these firms were founded by former NVIDIA and AMD GPU leaders, including MetaX, which saw revenue grow 3,800x in three years, and Alibaba, which launched a server with 1.5TB of VRAM for on-premises frontier model deployment.

media MarkTechPost · 3d ago

MoonMath AI Open-Sources HIP Attention Kernel That Beats AITER v3 on MI300X

MoonMath AI has open-sourced a bf16 forward attention kernel for AMD's MI300X GPU, written in HIP rather than assembly. It outperforms AMD's own AITER v3 kernel across all tested shapes and rounding modes, with speedups up to 1.26x, and maintains bit-identical numerical accuracy.

github llama.cpp · 5d ago

llama.cpp release b9738: fixes CORS auth header forwarding and new binary builds

llama.cpp version b9738 fixes the CORS proxy to avoid forwarding authentication headers. The release includes binary builds for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, OpenVINO, and SYCL.

github llama.cpp · 5d ago

GLM-5.2 DSA indexer fix: tensors marked not required

The GLM-5.2 model's DSA indexer was incorrectly loaded on all layers, causing failures due to missing tensors. The update marks indexer tensors as TENSOR_NOT_REQUIRED, allowing layers without an indexer to load as nullptr and enabling full MLA attention. DeepSeek-V3.2, with uniform indexing, is unaffected.

media r/LocalLLaMA · 5d ago

AMD Future GPU Offerings for LLM Builds

AMD has announced upcoming GPU offerings that could support local large language model (LLM) deployments. These GPUs are designed with enhanced memory bandwidth and compute capabilities, making them suitable for efficient LLM inference and training in dedicated local rigs.

github llama.cpp · 5d ago

LLaMA.cpp Release b9728 Adds Comment Line Support and Multiple Platform Binaries

LLaMA.cpp version b9728 introduces support for comment lines in --api-key-file configuration. The release includes pre-built binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, OpenVINO, and SYCL.

media r/LocalLLaMA · 5d ago

EvoTensile: Evolutionary tuning of AMD Tensile GEMM kernels

EvoTensile uses evolutionary algorithms to tune GEMM kernels for AMD GPUs, improving NT layout performance from 20 to 40 TFLOPS on Strix Halo. This speedup represents a significant advance over unoptimized kernels, though it remains below the theoretical roofline of 59.4 TFLOPS.

github llama.cpp · 5d ago

llama.cpp release b9718: consolidated slot selection and new binary builds

llama.cpp version b9718 consolidates slot selection into a single function, get_available_slot, while maintaining LCP similarity checks for prompt cache updates. The release includes binary builds for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options.

github llama.cpp · 6d ago

LLaMA.cpp Release b9715 Adds CUDA Col2Im 1D and Multiple Platform Binaries

LLaMA.cpp version b9715 introduces CUDA support for GGML_OP_COL2IM_1D, building on a CPU implementation. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and acceleration frameworks, including Vulkan, ROCm, OpenVINO, and SYCL.

arxiv arXiv cs.AI · 6d ago

Hybrid ANN-SNN Pipeline with Local Plasticity

A hybrid ANN-SNN pipeline uses pretrained EfficientNet encoders and converts their activations to spike trains via rate-coding. The system trains a CoLaNET spiking classifier with local plasticity rules, achieving 99.09% accuracy on ImageNet's 64-class benchmark, matching conventional deep networks.

arxiv arXiv cs.LG · 6d ago

Quantum Ring All-Reduce: Communication and Privacy Advantages for Distributed Learning

A quantum version of ring all-reduce reduces per-link communication by a factor of two using entanglement and superdense coding, without altering model or gradient computations. It achieves information-theoretically secure aggregation via verified entanglement, with a 2x overhead in GHZ copies, and provides exponential communication advantages in gradient conflict detection for specific auditing tasks.

media r/LocalLLaMA · 6d ago

My suitcase robot gets high from real gas sensor

A real MQ-2 gas sensor detects smoke and feeds live data to an LLM sampler, adjusting temperature, top_p, and top_k in real time. As smoke increases, the robot's speech becomes loopier and more associative, with no scripted 'stoned' mode, demonstrating live model behavior driven by physical input.

github llama.cpp · 6d ago

LLaMA.cpp Release b9698 Adds Self-Update Support and Multiple Platform Binaries

LLaMA.cpp version b9698 enables self-updates only when built with llama-install.sh. The release includes binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and hardware acceleration options, including Vulkan, CUDA, OpenVINO, and SYCL.

arxiv arXiv cs.LG · 7d ago

Zero-Overhead Telemetry Detects Hidden ML Training

A study evaluates GPU workload classification using only zero-overhead NVML telemetry. The classifier achieves 98.2% accuracy in identifying training workloads and 43-87% accuracy against adversarially disguised, unexpected workloads across 9 GPU models.

arxiv arXiv cs.AI · 7d ago

SwitchBraidNet: Lightweight EEG Model for Hybrid BCIs

SwitchBraidNet is a quantisation-aware EEG classification architecture that achieves high accuracy in motor imagery and SSVEP tasks. It outperforms four baselines in FP16 and FP32, with MI accuracy of 69.49%, SSVEP accuracy of 93.48%, and a hybrid information transfer rate of 64.82 bits/min in FP16. The model runs efficiently with only 3.03 KB of INT8 storage, enabling low-power embedded deployment.

github llama.cpp · 7d ago

ggml-cpu: Conditionally enable POWER11 backend based on compiler support

The ggml-cpu project now conditionally enables the POWER11 backend in ggml based on compiler support for -mcpu=power11. This prevents build failures on current GCC/Clang toolchains while maintaining forward compatibility. Updates to CMakeLists.txt support this change, and -mcpu=power10 is used for both P10 and P11 architectures.

github llama.cpp · 7d ago

llama.cpp Release b9692 Adds New Binaries and Fixes

llama.cpp version b9692 introduces new binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures. The release includes updates to support Vulkan, ROCm, OpenVINO, SYCL, and HIP, with fixes to remove batch dim usage in llava_uhd.

github llama.cpp · 7d ago

LLaMA.cpp Release b9685 Adds SYCL Dev2Dev Memcpy and Multiple Platform Binaries

LLaMA.cpp version b9685 introduces SYCL-based dev2dev memcpy functionality, moving GGML_SYCL_DEV2DEV_MEMCPY to runtime table and improving peer-to-peer communication detection. The release includes precompiled binaries for macOS, Linux, Android, Windows, and openEuler across multiple architectures and APIs including Vulkan, ROCm, OpenVINO, and SYCL (FP32/FP16).