AI agents
arxiv arXiv cs.CL · 9d ago

A Framework for Evaluating Agentic Skills at Scale

We present a framework for evaluating agentic skills by constructing realistic tasks and assessing skill utility through task execution. Applied to 500 real-world skills, it generates 1,000 tasks and scoring rubrics, evaluating 19 agent-model configurations across proprietary and open-source models. Results show significant variation in instruction adherence and performance gains, with skills substantially altering model behavior compared to no-skill setups.

arxiv arXiv cs.CL · 10d ago

LOGOS: A General-Purpose Generative Model for Natural Sciences

LOGOS is a unified generative language model that represents scientific objects and their interactions as token sequences in a shared grammar. It achieves consistent or superior performance across diverse natural science tasks, demonstrating the feasibility of a single model serving multiple domains. The model scales positively with parameter count, and its design suggests that AI for Science should align deeply with large language models through shared architectures and training.

arxiv arXiv cs.CL · 10d ago

IMPACTeen Dataset Released with English and Polish Versions

IMPACTeen is a dataset of 1,021 texts annotated from five perspectives—teenagers, parents, psychologists, communication experts, and teachers. It includes 5,100 annotation records covering social influence techniques, intentions, consequences, and resistance, with annotations validated through human editing. The dataset, created using LLM generation and human validation, is available in both Polish and English and supports research on social influence and language model training.