Highlights
- Pro
Stars
SGLang is a fast serving framework for large language models and vision language models.
Assessing Context-Aware Creative Intelligence in MLLMs
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
verl: Volcano Engine Reinforcement Learning for LLMs
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
[ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
[ICLR'25 Oral] UGround: Universal GUI Visual Grounding for GUI Agents
This repository includes the official implementation of OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs.
[CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)
🤠 Agent-as-a-Judge and DevAI dataset
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
[ICLR 2025] Automated Design of Agentic Systems
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
[EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
This is a Phi Family of SLMs book for getting started with Phi Models. Phi a family of open sourced AI models developed by Microsoft. Phi models are the most capable and cost-effective small langua…