Precise and Scalable Constraint-Based Type Inference for Incomplete Java Code Snippets in the Age of Large Language Models
dc.contributor.author | Dong, Yiwen | |
dc.date.accessioned | 2025-09-08T17:14:24Z | |
dc.date.available | 2025-09-08T17:14:24Z | |
dc.date.issued | 2025-09-08 | |
dc.date.submitted | 2025-08-07 | |
dc.description.abstract | Online code snippets are prevalent and are useful for developers. These snippets are commonly shared on websites such as Stack Overflow to illustrate programming concepts. However, these code snippets are frequently incomplete. In Java code snippets, type references are typically expressed using simple names, which can be ambiguous. Identifying the exact types used requires fully qualified names typically provided in import statements. Despite their importance, such import statements are only available in 6.88% of Java code snippets on Stack Overflow. To address this challenge, this thesis explores constraint-based type inference to recover missing type information. It also proposes a dataset for evaluating the performance of type inference techniques on Java code snippets, particularly large language models (LLMs). In addition, the scalability of the initial inference technique is improved to enhance applicability in real-world scenarios. The first study introduces SnR, a constraint-based type inference technique to automatically infer the exact type used in code snippets and the libraries containing the inferred types, to compile and therefore reuse the code snippets. Initially, SnR builds a knowledge base of APIs, i.e., various facts about the available APIs, from a corpus of Java libraries. Given a code snippet with missing import statements, SnR automatically extracts typing constraints from the snippet, solves the constraints against the knowledge base, and returns a set of APIs that satisfies the constraints to be imported into the snippet. When evaluated on the StatType-SO benchmark suite, which includes 267 Stack Overflow code snippets, SnR significantly outperforms the state-of-the-art tool Coster. SnR correctly infers 91.0% of the import statements, which makes 73.8% of the snippets compilable, compared to Coster’s 36.0% and 9.0%, respectively. The second study evaluates type inference techniques, particularly of LLMs. Although LLMs demonstrate strong performance on the StatType-SO benchmark, the dataset has been publicly available on GitHub since 2017. If LLMs were trained on StatType-SO, then their performance may not reflect how the model would perform on novel, real-world code, but rather result from recalling examples seen during training. To address this, this thesis introduces ThaliaType, a new, previously unreleased dataset containing 300 Java code snippets. Results reveal that LLMs exhibit a significant drop in performance when generalizing to unseen code snippets, with up to 59% decrease in precision and up to 72% decrease in recall. To further investigate the limitations of LLMs in understanding the execution semantics of the code, semantic-preserving code transformations were developed. Analysis showed that LLMs performed significantly worse on code snippets that are syntactically different but semantically equivalent. Experiments suggest that the strong performance of LLMs in prior evaluations was likely influenced by data leakage in the benchmarks, rather than a genuine understanding of the semantics of code snippets. The third study enhances the scalability of constraint-based type inference by introducing Scitix. Constraint-solving becomes computationally expensive using a large knowledge base in the presence of unknown types (e.g. user-defined types) in code snippets. To improve scalability, Scitix represents certain unknown types as Any, ignoring such types during constraint solving. Then an iterative constraint-solving approach saves on computation and skips constraints involving unknown types. Extensive evaluations show that the insights improve both performance and scalability compared to SnR. Specifically, Scitix achieves F1-scores of 96.6% and 88.7% on StatType-SO and ThaliaType, respectively, using a large knowledge base of over 3,000 jars. In contrast, SnR consistently times out, yielding F1-scores close to 0%. Even with the smallest knowledge base, where SnR does not time out, Scitix reduces the number of errors by 79% and 37% compared to SnR. Furthermore, even with the largest knowledge base, Scitix reduces error rates by 20% and 78% compared to state-of-the-art LLMs. This thesis demonstrates the use of constraint-based type inference for Java code snippets. The proposed approach is evaluated through a comprehensive analysis that contextualizes its performance in the current landscape dominated by LLMs. The ensuing system, Scitix, is both precise and scalable, enhancing the reusability of Java code snippets. | |
dc.identifier.uri | https://hdl.handle.net/10012/22357 | |
dc.language.iso | en | |
dc.pending | false | |
dc.publisher | University of Waterloo | en |
dc.subject | type inference | |
dc.subject | Java | |
dc.subject | code snippet | |
dc.subject | Stack Overflow | |
dc.subject | Datalog | |
dc.subject | constraint | |
dc.subject | LLM | |
dc.subject | static analysis | |
dc.subject | repair | |
dc.subject | unknown type | |
dc.title | Precise and Scalable Constraint-Based Type Inference for Incomplete Java Code Snippets in the Age of Large Language Models | |
dc.type | Doctoral Thesis | |
uws-etd.degree | Doctor of Philosophy | |
uws-etd.degree.department | David R. Cheriton School of Computer Science | |
uws-etd.degree.discipline | Computer Science | |
uws-etd.degree.grantor | University of Waterloo | en |
uws-etd.embargo.terms | 0 | |
uws.contributor.advisor | Sun, Chengnian | |
uws.contributor.affiliation1 | Faculty of Mathematics | |
uws.peerReviewStatus | Unreviewed | en |
uws.published.city | Waterloo | en |
uws.published.country | Canada | en |
uws.published.province | Ontario | en |
uws.scholarLevel | Graduate | en |
uws.typeOfResource | Text | en |