Introducing LongCat-2.0, a large-scale MoE language model
LongCat-2.0 is introduced as a large-scale Mixture of Experts (MoE) language model featuring 1.6 trillion total parameters with approximately 48 billion activated per token.
LongCat-2.0 is introduced as a large-scale Mixture of Experts (MoE) language model featuring 1.6 trillion total parameters with approximately 48 billion activated per token.
This work introduces natural identifiers (NIDs), which are structured random strings like cryptographic hashes and shortened URLs found in LLM training data, to address the challenges of auditing large language model privacy. NIDs enable scalable, post-hoc differential privacy auditing without costly retraining and facilitate dataset inference without requiring private held-out datasets.
This article investigates whether partial data augmentation can achieve the same statistical benefits as full augmentation by developing a framework using Fourier analysis and representation theory of finite groups.
This article introduces PCFM, a flow matching approach for medical point cloud completion that integrates Point Transformer v3 (PTv3) with continuous-time generative modeling. The method is evaluated on the SkullFix, SkullBreak, and Mandibular Defect datasets to assess its performance in anatomical reconstruction tasks.
Researchers have developed an agnostic model for the Photosynthetic Habitable Zone (PHZ) based on thermodynamics and redox chemistry, eliminating Earth-centric biases found in previous estimates. By optimizing a generic photochemical reaction against exoplanet irradiance spectra using a genetic algorithm, the study predicts that photosynthetic viability declines linearly with orbital distance rather than quadratically.
This paper proposes a knowledge-guided two-stage transfer learning framework to address bearing fault diagnosis challenges involving dataset heterogeneity, operating condition variations, and limited labeled data. The approach utilizes a lightweight GPT-2-style Transformer with causal self-attention for hierarchical feature extraction from vibration signals.
CrossPool is a serving engine designed for cold Mixture-of-Experts (MoE) models that addresses GPU memory inefficiencies by separating FFN weights and KV-cache into distinct pools. This disaggregation allows the system to consolidate static weights while dynamically provisioning active KV-cache demand, overcoming the limitations of monolithic memory allocation.
This study conducts a rigorous reevaluation of nine recent Graph Foundation Models (GFMs) for node property prediction to address the lack of unified evaluation standards in the field. The authors compare these models against strong Graph Neural Network (GNN) baselines to determine their relative performance and efficiency.
This paper reinterprets Large Language Models as high-dimensional Dense Associative Memories where correct reasoning corresponds to deep attractor basins in the energy landscape. The authors introduce a retrieval mechanism that samples multiple reasoning paths and weights them by inverse energy to approximate the equilibrium distribution.
This paper introduces EERLoss, a subdifferentiable approximation of the Equal Error Rate (EER) designed to align deep biometric model training with primary evaluation metrics. Validated on keystroke dynamics verification using the KVC-onGoing benchmark, the approach addresses the misalignment between optimization objectives and performance assessment.
The authors propose QC-SMOTE, a quality-controlled oversampling framework designed to address the generation of low-quality synthetic samples in noisy or overlapping regions common in imbalanced classification tasks. This method estimates minority sample reliability using a composite neighborhood trustworthiness score and employs an IPQ-guided best-of-K strategy for generating synthetic candidates.
This paper introduces ASALT, a method that enables lateral transfer learning in multi-agent reinforcement learning by accommodating mismatched state-space dimensionalities between source and target domains. The approach uses observation-level and state-level adapters to map inputs into a shared embedding space, facilitating effective knowledge transfer across heterogeneous environments.
The article formulates the Cross-Level Design Principle to address how ODRL policy evaluators fail to specify normative positions, authority structures, or violation declaration power. It establishes that any normative language with violable norms requires both conduct-level positions like Permission and Duty, and competence-level positions such as Power and Immunity.
Researchers propose MVG-KAN, a model for accurate short-term PM2.5 forecasting that addresses the limitations of existing methods in capturing complex pollutant dispersion driven by meteorological factors.
Researchers introduce DigenRL, a disaggregated reinforcement learning framework designed to address the inefficiencies of colocated execution in diffusion-based generative large language models. The system supports flexible resource allocation and heterogeneous GPUs while utilizing novel parallelism techniques to reduce execution bubbles.
A study reveals that large language models systematically suppress 'Causal Caution'—the tendency to refrain from causal judgment without sufficient evidence—when shifting from academic to practical advisory contexts. This suppression occurs despite the models retaining the underlying capability, as evidenced by the ability to restore cautious reasoning through specific prompts.
The article introduces Structural Kolmogorov-Arnold Networks (KANs) that place learnable functions in the convolution structure rather than on individual kernel entries, organizing the design by whether the function acts on pixel values or filter shape. It presents three realizations: SV-KAN with a shared value function, AG-KAN using a content-adaptive Gaussian gate, and RF-KAN which builds filters from oriented ridge profiles in a Morlet wavelet basis.
This paper systematically studies the stability of prompt rankings under common variability sources like random seeds and limited evaluation subsets across three open-weight LLMs and two benchmark tasks.
Researchers propose a cycle-consistent neural architecture that generates faithful natural language explanations for formal verification certificates, addressing the opacity of these machine-checkable proofs for non-specialists. The system achieves 90.0% cycle-verified soundness on test data from a financial compliance domain, significantly outperforming multi-LLM baselines in both accuracy and inference speed.
A user reports achieving a 30-40% increase in token generation speed by pairing the Ornith-1.0-35B model as a draft model with Qwen3.6-35B-A3B-DFlash using llama-server.