Towards Multi-Language Visualization Coding Agents
Loading...
Date
Authors
Advisor
Wenhu Chen
Journal Title
Journal ISSN
Volume Title
Publisher
University of Waterloo
Abstract
Large language models (LLMs) have made remarkable progress in general-purpose code generation, but they continue to struggle with visualization coding, generating executable code that produces accurate and semantically consistent plots. Visualization coding requires alignment across language, data, and rendered outputs, where execution success is non-binary and semantic correctness cannot be ensured by syntax alone. Existing instruction-tuning datasets rarely include runtime validation or feedback-based supervision, leading to fragile and unreliable plot generation. This thesis advances the development of \textit{visualization coding agents} that can generate, execute, and iteratively refine visualization code grounded in execution feedback.
We first present VisCode-200K, a large-scale instruction-tuning dataset for Python-based visualization and self-correction. It contains over 200K examples from two sources: (1) validated plotting code from open-source repositories, paired with natural language instructions and rendered plots; and (2) 45K multi-turn correction dialogues from Code-Feedback, enabling models to revise faulty code using runtime feedback. We fine-tune Qwen2.5-Coder-Instruct on VisCode-200K to create VisCoder, and evaluate it on PandasPlotBench. VisCoder significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4o-mini. A self-debug evaluation protocol further demonstrates the benefits of feedback-driven learning for executable and visually accurate code generation.
Building on this foundation, we extend the approach to a multi-language setting. We introduce three complementary resources for advancing visualization coding agents: VisCode-Multi-679K, a large-scale supervised dataset comprising 679K validated executable visualization samples and multi-turn correction dialogues, covering 12 programming languages; VisPlotBench, a benchmark for systematic evaluation featuring executable tasks, rendered outputs, and protocols for both initial generation and multi-round self-debug; and VisCoder2, a family of multi-language visualization models trained on VisCode-Multi-679K. Experiments show that VisCoder2 significantly outperforms strong open-source baselines and approaches the performance of proprietary models like GPT-4.1, reaching 82.4% overall execution pass rate at the 32B scale, particularly in symbolic or compiler-dependent languages such as LaTeX, LilyPond, and Asymptote.
Together, VisCoder and VisCoder2 form a coherent research trajectory towards reliable, generalizable visualization coding agents. By integrating executable supervision, semantic grounding, and feedback-based refinement across languages, this thesis establishes methodological and resource foundations for future systems capable of robust, self-correcting visualization code generation in real-world analytical workflows.