Retrieval & RAG
arxiv arXiv cs.AI · 1d ago

Deep Learning Pipeline for Sign Language Recognition and Translation to Indian Vernaculars

A two-stage deep learning pipeline classifies Indian sign language video clips into English words using a fine-tuned VideoMAE model and translates them into Hindi, Telugu, and Bengali via the NLLB-200 multilingual model. The system achieves 99% training and 78% validation accuracy on a 13-class, 197-clips dataset with uniform 16-frame clips at 22-224 resolution, and includes a Streamlit demo for user-uploaded videos with per-class analysis and failure mode identification.

arxiv arXiv cs.CL · 2d ago

MMed-Bench-IR: A Multilingual Medical Retrieval Benchmark

MMed-Bench-IR introduces a heterogeneous benchmark for multilingual medical information retrieval across six languages. It evaluates cross-lingual alignment, concept discrimination, and evidence retrieval through three distinct tasks with no overlapping concepts or queries. Evaluation shows significant cross-lingual performance drops, with English biomedical encoders falling from 0.818 to 0.056 nDCG@10 when transitioning to Japanese, highlighting limitations undetected by English-only benchmarks.

media r/LocalLLaMA · 2d ago

Comparing Docling, Liteparse, MinerU, and Unstructured for On-Prem Document Processing

A university seeking on-premises document processing for academic workflows must use local parsers due to strict data governance policies banning cloud APIs. The user evaluates Docling, Liteparse, MinerU, and Unstructured, noting Docling excels in complex layouts with Apache 2.0 licensing but is slower; Liteparse offers good printed document performance with Tesseract OCR; MinerU uses PaddleOCR and handles French documents well despite longer setup; Unstructured supports multiple formats including DOCX and PPTX. The solution must support recurring, stable parsing of evolving PDFs with minimal formatting changes.

lab Mistral AI News · 2d ago

Mistral Releases OCR 4 with Multilingual Support and Structured Output

Mistral OCR 4 introduces bounding boxes, block classification, and inline confidence scores for 170 languages across 10 language groups. It outperforms leading OCR systems in human preference evaluations with a 72% win rate and achieves the top score on OlmOCRBench (85.20), while offering self-hosted deployment in a single container and supporting enterprise use cases like RAG and document ingestion.