Post-Training Large Language Models as Software Engineering Agents

LYU, Zhiheng

Post-Training Large Language Models as Software Engineering Agents

dc.contributor.author	LYU, Zhiheng
dc.date.accessioned	2026-04-28T12:48:31Z
dc.date.available	2026-04-28T12:48:31Z
dc.date.issued	2026-04-28
dc.date.submitted	2026-04-13
dc.description.abstract	Large language models (LLMs) have demonstrated remarkable capabilities in code un- derstanding and generation, yet a significant gap remains between static code generation and interactive software engineering. This thesis investigates the post-training of LLMs as software engineering agents, focusing on three interconnected challenges: infrastructure, data, and training methodology. First, we contribute to VerlTool, a unified framework for agentic reinforcement learn- ing with tool integration (ARLT). The author’s contributions center on the training orches- tration layer — the stateful environment protocol, environment server architecture, and SWE agent post-training pipeline — which make tool-augmented RL training practical and accessible for researchers. Second, we address the critical bottleneck of training data and evaluation infrastructure. SWE-Next provides a scalable, Ray-native pipeline for synthesizing verifiable software engineering tasks from open-source repositories (ongoing work with intermediate results reported). For SWE-QA-Pro, a representative benchmark for code question answering, the author contributes the data sourcing and synthesis pipeline. Third, we investigate the post-training design space for software engineering agents, spanning supervised fine-tuning (SFT), rejection fine-tuning (RFT), RL from AI feed- back (RLAIF), and RL with verifiable rewards (RLVR). Through three complementary case studies—code question answering (SFT + RLAIF), web-based information retrieval (SFT + RFT), and repository-level bug fixing (RLVR)—we demonstrate that the opti- mal training recipe depends on task characteristics such as reward verifiability, exploration complexity, and data availability. Our experiments show that task-specific post-training of smaller open-weight models can be competitive with larger proprietary models, and that matching the training method to the task structure is more important than uniformly applying all stages.
dc.identifier.uri	https://hdl.handle.net/10012/23070
dc.language.iso	en
dc.pending	false
dc.publisher	University of Waterloo	en
dc.title	Post-Training Large Language Models as Software Engineering Agents
dc.type	Master Thesis
uws-etd.degree	Master of Mathematics
uws-etd.degree.department	David R. Cheriton School of Computer Science
uws-etd.degree.discipline	Computer Science
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.embargo.terms	0
uws.contributor.advisor	Wenhu, Chen
uws.contributor.affiliation1	Faculty of Mathematics
uws.peerReviewStatus	Unreviewed	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.scholarLevel	Graduate	en
uws.typeOfResource	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Lyu_Zhiheng.pdf
Size:: 5.9 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 6.4 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses
Computer Science