Show simple item record

dc.contributor.authorSahu, Siddhartha
dc.date.accessioned2024-04-01 17:31:06 (GMT)
dc.date.available2024-04-01 17:31:06 (GMT)
dc.date.issued2024-04-01
dc.date.submitted2024-03-23
dc.identifier.urihttp://hdl.handle.net/10012/20410
dc.description.abstractDiverse applications spanning areas such as fraud detection, risk assessment, recommendations, and telecommunications process datasets characterized by entities and their relationships. Graphs naturally emerge as the most intuitive abstraction for modeling these datasets. Many practical applications seek the ability to share computations across multiple snapshots of evolving graphs to efficiently perform analyses, such as evaluating changing road conditions in transportation networks or performing contingency analysis on infrastructure networks. The research in this thesis is motivated by the challenge of efficiently supporting such applications on large datasets. Differential computation (DC) has emerged as a powerful general technique for incrementally maintaining computations over evolving datasets, even those containing arbitrarily nested loops. It is thus a promising technique that can be used to build the kinds of applications that motivate this thesis. We present a study of DC that explores how it can be used to build practical data systems. In particular, this thesis addresses two challenges that impede the adoption of DC: (i) the lack of high-level interfaces that can be used to develop graph-specific applications; and (ii) scalability challenges that arise due to the general maintenance technique used by DC, making it less efficient for application-specific workloads. The main contribution of this thesis is to show how DC can be made more practical for graph processing systems. To address the lack of high-level interfaces, we built GraphSurge, a system that can be used to create and analyze multiple views over static graphs using a declarative programming interface. When users perform graph computations on a collection of views, GraphSurge internally uses DC to share the computation across all views. We develop several optimizations that improve the scalability of DC. Within GraphSurge, we identify two optimization problems, which we call collection ordering and collection splitting, and present algorithms to solve these problems. These optimizations improve the runtime of GraphSurge applications by up to an order of magnitude. In the reference implementation of DC, we identify two design bottlenecks in how data is indexed and processed within operators. To address the bottlenecks, we implement a new index and an optimization called Fast Empty Difference Verification, which improves the runtime of graph processing workloads by up to 14x. Our work was informed by insights from a non-technical user survey we conducted to understand how graphs are used in practice.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjectdifferential dataflowen
dc.subjectincremental computationen
dc.subjectdataflow computationen
dc.subjectgraph computationen
dc.titleOptimizing Differential Computation for Large-Scale Graph Processingen
dc.typeDoctoral Thesisen
dc.pendingfalse
uws-etd.degree.departmentDavid R. Cheriton School of Computer Scienceen
uws-etd.degree.disciplineComputer Scienceen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws-etd.embargo.terms0en
uws.contributor.advisorSalihoglu, Semh
uws.contributor.affiliation1Faculty of Mathematicsen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages