Retrieval & RAG
arxiv arXiv cs.CL · 1d ago

MMed-Bench-IR: A Multilingual Medical Retrieval Benchmark

MMed-Bench-IR introduces a heterogeneous benchmark for multilingual medical information retrieval across six languages. It evaluates cross-lingual alignment, concept discrimination, and evidence retrieval through three distinct tasks with no overlapping concepts or queries. Evaluation shows significant cross-lingual performance drops, with English biomedical encoders falling from 0.818 to 0.056 nDCG@10 when transitioning to Japanese, highlighting limitations undetected by English-only benchmarks.

media r/LocalLLaMA · 2d ago

Comparing Docling, Liteparse, MinerU, and Unstructured for On-Prem Document Processing

A university seeking on-premises document processing for academic workflows must use local parsers due to strict data governance policies banning cloud APIs. The user evaluates Docling, Liteparse, MinerU, and Unstructured, noting Docling excels in complex layouts with Apache 2.0 licensing but is slower; Liteparse offers good printed document performance with Tesseract OCR; MinerU uses PaddleOCR and handles French documents well despite longer setup; Unstructured supports multiple formats including DOCX and PPTX. The solution must support recurring, stable parsing of evolving PDFs with minimal formatting changes.

lab Mistral AI News · 2d ago

Mistral Releases OCR 4 with Multilingual Support and Structured Output

Mistral OCR 4 introduces bounding boxes, block classification, and inline confidence scores for 170 languages across 10 language groups. It outperforms leading OCR systems in human preference evaluations with a 72% win rate and achieves the top score on OlmOCRBench (85.20), while offering self-hosted deployment in a single container and supporting enterprise use cases like RAG and document ingestion.

arxiv arXiv cs.LG · 6d ago

Train, Retrieve, or Both? Head-to-Head on Statutory Citation for Ontario RTA

A four-arm comparison shows that retrieval is essential for accurate statutory citation under the Ontario Residential Tenancies Act. The SFT+RAG hybrid model achieves 0.481 exact-match with zero hallucinations, outperforming base and SFT-only models, and matches a pipeline using larger, specialized models without needing more data or larger training sets. Results are based on a small, human-verified real-world evaluation set and are preliminary.