Topic · Training data
arxiv arXiv cs.CL · 8d ago

LLM Features Can Hurt GNNs via Concatenation Interference

Concatenating LLM-generated features to graph neural networks systematically reduces accuracy on homophilous benchmarks, with PubMed accuracy dropping by -17.0 ± 0.3 pp. This degradation is linked to LLM-alone discriminability (Delta_sig), which correlates strongly with concatenation cost (r² = 0.38) and shows a power law relationship with feature dimension and node count (r² = 0.97), particularly in low-Delta_sig, low-node scenarios.

arxiv arXiv cs.LG · 8d ago

McWC: Forecasting with Cyclicity, Trend, and Channel Correlation

McWC introduces a model that separately captures cyclicity, trend, and inter-channel correlations in long-term time series forecasting. It uses multi-layer cyclicity construction, wavelet decomposition, and a multi-layer perceptron to extract and fuse high- and low-frequency information, while decoupling intra-channel autocorrelations via frequency-domain loss. Experiments on six real-world datasets show McWC achieves state-of-the-art performance with high computational efficiency.

arxiv arXiv cs.AI · 8d ago

McWC: Forecasting with Cyclicity, Trend, and Channel Correlation

McWC introduces a model that separately captures cyclicity, trend, and inter-channel correlations in long-term time series forecasting. It uses multi-layer cyclicity construction, wavelet decomposition, and a multi-layer perceptron to extract and fuse high- and low-frequency information, while decoupling intra-channel autocorrelations via frequency-domain loss. Experiments on six real-world datasets show McWC achieves state-of-the-art performance with high computational efficiency.

arxiv arXiv cs.CL · 9d ago

IMPACTeen Dataset Released with English and Polish Versions

IMPACTeen is a dataset of 1,021 texts annotated from five perspectives—teenagers, parents, psychologists, communication experts, and teachers. It includes 5,100 annotation records covering social influence techniques, intentions, consequences, and resistance, with annotations validated through human editing. The dataset, created using LLM generation and human validation, is available in both Polish and English and supports research on social influence and language model training.

arxiv arXiv cs.AI · 9d ago

IMPACTeen Dataset Released with English and Polish Versions

IMPACTeen is a dataset of 1,021 texts annotated from five perspectives—teenagers, parents, psychologists, communication experts, and teachers. It includes 5,100 annotation records covering social influence techniques, intentions, consequences, and resistance, with annotations validated through human editing. The dataset, created using LLM generation and human validation, is available in both Polish and English and supports research on social influence and language model training.

arxiv arXiv cs.LG · 9d ago

A Mathematical Review of Shape Space Analysis in Machine Learning

This survey presents a mathematical framework for analyzing geometric data, integrating differential geometry, statistics, and machine learning. It outlines a unified pipeline for shape representation, geodesic metrics, statistical analysis, and geometry-aware learning, enabling the study of shape variability and structural trajectories across populations and time. Applications span biology, medicine, anthropology, and computer vision, highlighting challenges in handling nonlinear and unaligned geometric variation.