Post-Training Large Language Models as Software Engineering Agents

LYU, Zhiheng2026-04-282026-04-282026-04-282026-04-13https://hdl.handle.net/10012/23070Large language models (LLMs) have demonstrated remarkable capabilities in code un- derstanding and generation, yet a significant gap remains between static code generation and interactive software engineering. This thesis investigates the post-training of LLMs as software engineering agents, focusing on three interconnected challenges: infrastructure, data, and training methodology. First, we contribute to VerlTool, a unified framework for agentic reinforcement learn- ing with tool integration (ARLT). The author’s contributions center on the training orches- tration layer — the stateful environment protocol, environment server architecture, and SWE agent post-training pipeline — which make tool-augmented RL training practical and accessible for researchers. Second, we address the critical bottleneck of training data and evaluation infrastructure. SWE-Next provides a scalable, Ray-native pipeline for synthesizing verifiable software engineering tasks from open-source repositories (ongoing work with intermediate results reported). For SWE-QA-Pro, a representative benchmark for code question answering, the author contributes the data sourcing and synthesis pipeline. Third, we investigate the post-training design space for software engineering agents, spanning supervised fine-tuning (SFT), rejection fine-tuning (RFT), RL from AI feed- back (RLAIF), and RL with verifiable rewards (RLVR). Through three complementary case studies—code question answering (SFT + RLAIF), web-based information retrieval (SFT + RFT), and repository-level bug fixing (RLVR)—we demonstrate that the opti- mal training recipe depends on task characteristics such as reward verifiability, exploration complexity, and data availability. Our experiments show that task-specific post-training of smaller open-weight models can be competitive with larger proprietary models, and that matching the training method to the task structure is more important than uniformly applying all stages.enPost-Training Large Language Models as Software Engineering AgentsMaster Thesis