Applied Mathematics
Permanent URI for this collectionhttps://uwspace.uwaterloo.ca/handle/10012/9926
This is the collection for the University of Waterloo's Department of Applied Mathematics.
Research outputs are organized by type (eg. Master Thesis, Article, Conference Paper).
Waterloo faculty, students, and staff can contact us or visit the UWSpace guide to learn more about depositing their research.
Browse
Browsing Applied Mathematics by Author "De Sterck, Hans"
Now showing 1 - 9 of 9
- Results Per Page
- Sort Options
Item Algorithms and Models for Tensors and Networks with Applications in Data Science(University of Waterloo, 2016-01-19) Winlaw, Manda; De Sterck, HansBig data plays an increasingly central role in many areas of research including optimization and network modeling. We consider problems applicable to large datasets within these two branches of research. We begin by presenting a nonlinearly preconditioned nonlinear conjugate gradient (PNCG) algorithm to increase the convergence speed of iterative unconstrained optimization methods. We provide a concise overview of several PNCG variants and their properties and obtain a new convergence result for one of the PNCG variants under suitable conditions. We then use the PNCG algorithm to solve two different problems: computing the rank-R canonical tensor decomposition and finding the solution to a latent factor model where latent factor models are often used as important building blocks in many practical recommendation systems. For both problems, the alternating least squares (ALS) algorithm is typically used to find a solution and as such we consider it as a nonlinear preconditioner. Note that the ALS algorithm can be viewed as a nonlinear preconditioner for the NCG algorithm or alternatively, NCG can be viewed as an acceleration process for ALS. We demonstrate numerically that the convergence acceleration mechanism in PNCG often leads to important pay-offs for difficult tensor decomposition problems, with convergence that is significantly faster and more robust than for the stand-alone NCG or ALS algorithms. As well, we show numerically that the PNCG algorithm requires many fewer iterations and less time to reach desired ranking accuracies than stand-alone ALS in solving latent factor models. We next turn to problems within the field of network or graph modeling. A network is a collection of points joined together by lines and networks are used in a broad variety of fields to represent connections between objects. Many large real-world networks share similar properties which has garnered considerable interest in developing models that can replicate these properties. We begin our discussion of graph models by closely examining the Chung-Lu model. The Chung-Lu model is a very simple model where by design the expected degree sequence of a graph generated by the model is equal to a user-supplied degree sequence. We explore what happens both theoretically and numerically when simple changes are made to the model and when the model assumptions are violated. As well, we consider an algorithm used to generate instances of the Chung-Lu model that is designed to be faster than the traditional algorithm but find that it only generates instances of an approximate Chung-Lu model. We explore the properties of this approximate model under a variety of conditions and examine how different the expected degree sequence is from the user-supplied degree sequence. We also explore several ways of improving this approximate model to reduce the approximation error in the expected degree sequence and note that when the assumptions of the original model are violated this error remains very large. We next design a new graph generator to match the community structure found in real-world networks as measured using the clustering coefficient and assortativity coefficient. Our graph generator uses information generated from a clustering algorithm run on the original network to build a synthetic network. Using several real-world networks, we test our algorithm numerically by creating a synthetic network and then comparing the properties to the real network properties as well as to the properties of another popular graph generator, BTER, developed by Seshadhri, Kolda and Pinar. Our graph generator does well at preserving the clustering coefficient and typically outperforms BTER in matching the assortativity coefficient, particularly when the assortativity coefficient is negative.Item Bidirectional TopK Sparsification for Distributed Learning(University of Waterloo, 2022-05-27) Zou, William; De Sterck, Hans; Liu, JunTraining large neural networks requires a large amount of time. To speed up the process, distributed training is often used. One of the largest bottlenecks in distributed training is communicating gradients across different nodes. Different gradient compression techniques have been proposed to alleviate the communication bottleneck, including topK gradient sparsification, which truncates the gradient to the top K components before sending it to other nodes. Some authors have adopted topK gradient sparsification to the parameter-server framework by applying topK compression in both the worker-to-server and server-to-worker direction, as opposed to only the worker-to-server direction. Current intuition and analysis suggest that adding extra compression degrades the convergence of the model. We provide a simple counterexample where iterating with bidirectional topK SGD allows better convergence than iterating with unidirectional topK SGD. We explain this example with the theoretical framework developed by Alistarh et al., remove a critical assumption the authors’ made in their non-convex convergence analysis of topK SGD, and show that bidirectional topK SGD can achieve the same convergence bound as unidirectional topK SGD with assumptions that are potentially easier to satisfy. We experimentally evaluate unidirectional topK SGD against bidirectional topK SGD and show that under careful tuning, models trained with bidirectional topK SGD will perform just as well as models trained with unidirectional topK SGD. Finally, we provide empirical evidence that the amount of communication saved by adding server-to-worker topK compression is almost linear to the number of workers.Item Data-Driven Methods for System Identification and Lyapunov Stability(University of Waterloo, 2023-01-27) Quartz, Thanin; De Sterck, Hans; Liu, JunThis thesis focuses on data-driven methods applied to system identification and stability analysis of dynamical systems. In the first major contribution of the theorem we propose a learning framework to simultaneously stabilize an unknown nonlinear system with a neural controller and learn a neural Lyapunov function to certify a region of attraction (ROA) for the closed-loop system. The algorithmic structure consists of two neural networks and a satisfiability modulo theories (SMT) solver. The first neural network is responsible for learning the unknown dynamics. The second neural network aims to identify a valid Lyapunov function and a provably stabilizing nonlinear controller. The SMT solver then verifies that the candidate Lyapunov function indeed satisfies the Lyapunov conditions. We provide theoretical guarantees of the proposed learning framework in terms of the closed-loop stability for the unknown nonlinear system. We illustrate the effectiveness of the approach with a set of numerical experiments. We then examine another popular data driven method for system identification involving the Koopman operator. Methods based on the Koopman operator aim to approximate advancements of the state under the flow operator by a high-dimensional linear operator. This is accomplished by the extended mode decomposition (eDMD) algorithm which takes non-linear measurements of the state. Under the suitable conditions we have a result on the weak convergence of the eigenvalues and eigenfunctions of the eDMD operator that can serve as components of Lyapunov functions. Finally, we review methods for finding the region of attraction of an asymptotically stable fixed point and compare this method to the two methods mentioned above.Item Generative Modeling with Neural Ordinary Differential Equations(University of Waterloo, 2019-12-19) Dockhorn, Tim; Rhebergen, Sander; De Sterck, HansNeural ordinary differential equations (NODEs) (Chen et al., 2018) are ordinary differential equations (ODEs) with their dynamics modeled by neural networks. Continuous normalizing flows (CNFs) (Chen et al., 2018; Grathwohl et al., 2018), a class of reversible generative models which builds on NODEs and uses an instantaneous counterpart of the change of variables formula (CVF), have recently proven to achieve state-of-the-art results on density estimation and variational inference tasks. In this thesis, we review key concepts that are important to understand NODEs and CNFs, ranging from numerical ODE solvers to generative models. We derive an explicit formulation of the adjoint sensitivity method for both NODEs and CNFs using a constrained optimization framework. Furthermore, we review several classes of NODEs and prove that a particular class of hypernetwork NODEs is a universal function approximator in the discretized state. Our numerical results suggest that the ODEs arising in CNFs do not need to be solved to high precision for training and we show that training of CNFs can be made more efficient by using a tolerance scheduler that exponentially reduces the ODE solver tolerances. Moreover, we quantify the discrepancy of the CVF and the discretized instantaneous CVF for two ODE solvers. Our hope in writing this thesis is to give a comprehensive and self-contained introduction to generative modeling (with neural ordinary differential equations) and to stimulate both theoretical as well as computational future work on NODEs and CNFs.Item Mathematical Modelling of Social Factors in Decision Making Processes at the Individual and Population Levels(University of Waterloo, 2016-08-11) Lang, John; De Sterck, HansIn this thesis we apply mathematical modelling techniques to investigate the implications of social influence on decision making processes in two related contexts. The first problem concerns the mathematical modelling of civil unrest. We consider the collective action problem facing individuals who are deciding whether or not to join a political revolution or protest in a dictatorial regime that employs censorship and repression. In studying this problem we develop both a population-level model and a network-based individual-level (or agent-based) model. The population-level model establishes a conceptual framework that can be used to understand the role that new communication technologies (e.g. the Internet, satellite television, Short Message Service (SMS) text messaging, social media, etc.) may have played in facilitating the political revolutions of the Arab Spring. We establish the consistency between the individual-level model and the population-level model, and show methodologically how these two modelling strategies can be applied to complement one another, establishing a hierarchy of differential equation models that explicitly take the structure of the social network into account. Finally, using proxy network data for network structure pre- and post-adoption of new communication technologies, we perform small-scale computational simulations of our individual-level model in order to establish quantitative evidence that the political revolutions of the Arab Spring may have been facilitated by new communication technologies. The second problem concerns the spread of smoking and obesity in populations. We consider two conformity problems that individuals face when deciding whether to join one population sub-group over another (or possibly over many others) in the context of two non-communicable diseases. We begin by studying the smoking epidemic over the past century, where individuals are given the choice to smoke or not to smoke. We establish a new data set for smoking prevalence over the past century in seven developed countries and use it to calibrate a population-level mathematical model for the dynamics of smoking prevalence. We compare our model's predictions to an independently established measure of individualism/collectivism, i.e. Hofstede's Individualism versus Collectivism (IDV) measure, and find evidence that a society's culture can have a quantitative effect on the spread of a contagion. Finally, we study the dynamics of individuals' body mass index (BMI - defined as weight divided by height squared). We establish an individual-level model that also has implications at the population level. At the population level our model fits empirical BMI distributions better than the log-normal and skew-normal distribution functions, i.e. two distributions commonly used to fit right-skewed data, and provides a mechanistic explanation for the right-skewness observed in empirical BMI distributions. At the individual level our model is able to reproduce the average and standard deviation in individuals' year-over-year change in BMI. At both the individual and population levels our model finds evidence in support of the hypothesis that social factors play a role in the dynamics of individuals' BMI.Item Nonlinear Preconditioning Methods for Optimization and Parallel-In-Time Methods for 1D Scalar Hyperbolic Partial Differential Equations(University of Waterloo, 2017-10-18) Howse, Alexander James Maxwell; De Sterck, HansThis thesis consists of two main parts, part one addressing problems from nonlinear optimization and part two based on solving systems of time dependent differential equations, with both parts describing strategies for accelerating the convergence of iterative methods. In part one we present a nonlinear preconditioning framework for use with nonlinear solvers applied to nonlinear optimization problems, motivated by a generalization of linear left preconditioning and linear preconditioning via a change of variables for minimizing quadratic objective functions. In the optimization context nonlinear preconditioning is used to generate a preconditioner direction that either replaces or supplements the gradient vector throughout the optimization algorithm. This framework is used to discuss previously developed nonlinearly preconditioned nonlinear GMRES and nonlinear conjugate gradients (NCG) algorithms, as well as to develop two new nonlinearly preconditioned quasi-Newton methods based on the limited memory Broyden and limited memory BFGS (L-BFGS) updates. We show how all of the above methods can be implemented in a manifold optimization context, with a particular emphasis on Grassmann matrix manifolds. These methods are compared by solving the optimization problems defining the canonical polyadic (CP) decomposition and Tucker higher order singular value decomposition (HOSVD) for tensors, which are formulated as minimizing approximation error in the Frobenius norm. Both of these decompositions have alternating least squares (ALS) type fixed point iterations derived from their optimization problem definitions. While these ALS type iterations may be slow to converge in practice, they can serve as efficient nonlinear preconditioners for the other optimization methods. As the Tucker HOSVD problem involves orthonormality constraints and lacks unique minimizers, the optimization algorithms are extended from Euclidean space to the manifold setting, where optimization on Grassmann manifolds can resolve both of the issues present in the HOSVD problem. The nonlinearly preconditioned methods are compared to the ALS type preconditioners and non-preconditioned NCG, L-BFGS, and a trust region algorithm using both synthetic and real life tensor data with varying noise level, the real data arising from applications in computer vision and handwritten digit recognition. Numerical results show that the nonlinearly preconditioned methods offer substantial improvements in terms of time-to-solution and robustness over state-of-the-art methods for large tensors, in cases where there are significant amounts of noise in the data, and when high accuracy results are required. In part two we apply a multigrid reduction-in-time (MGRIT) algorithm to scalar one-dimensional hyperbolic partial differential equations. This study is motivated by the observation that sequential time-stepping is an obvious computational bottleneck when attempting to implement highly concurrent algorithms, thus parallel-in-time methods are particularly desirable. Existing parallel-in-time methods have produced significant speedups for parabolic or sufficiently diffusive problems, but can have stability and convergence issues for hyperbolic or advection dominated problems. Being a multigrid method, MGRIT primarily uses temporal coarsening, but spatial coarsening can also be incorporated to produce cheaper multigrid cycles and to ensure stability conditions are satisfied on all levels for explicit time-stepping methods. We compare convergence results for the linear advection and diffusion equations, which illustrate the increased difficulty associated with solving hyperbolic problems via parallel-in-time methods. A particular issue that we address is the fact that uniform factor-two spatial coarsening may negatively affect the convergence rate for MGRIT, resulting in extremely slow convergence when the wave speed is near zero, even if only locally. This is due to a sort of anisotropy in the nodal connections, with small wave speeds resulting in spatial connections being weaker than temporal connections. Through the use of semi-algebraic mode analysis applied to the combined advection-diffusion equation we illustrate how the norm of the iteration matrix, and hence an upper bound on the rate of convergence, varies for different choices of wave speed, diffusivity coefficient, space-time grid spacing, and the inclusion or exclusion of spatial coarsening. The use of waveform relaxation multigrid on intermediate, temporally semi-coarsened grids is identified as a potential remedy for the issues introduced by spatial coarsening, with the downside of creating a more intrusive algorithm that cannot be easily combined with existing time-stepping routines for different problems. As a second, less intrusive, alternative we present an adaptive spatial coarsening strategy that prevents the slowdown observed for small local wave speeds, which is applicable for solving the variable coefficient linear advection equation and the inviscid Burgers equation using first-order explicit or implicit time-stepping methods. Serial numerical results show this method offers significant improvements over uniform coarsening and is convergent for inviscid Burgers' equation with and without shocks. Parallel scaling tests indicate that improvements over serial time-stepping strategies are possible when spatial parallelism alone saturates, and that scalability is robust for oscillatory solutions that change on the scale of the grid spacing.Item On Enabling Layer-Parallelism for Graph Neural Networks using IMEX Integration(University of Waterloo, 2024-06-20) Kara, Omer Ege; De Sterck, Hans; Liu, JunGraph Neural Networks (GNNs) are a type of neural networks designed to perform machine learning tasks with graph data. Recently, there have been several works to train differential equation-inspired GNN architectures, which are suitable for robust training when equipped with a relatively large number of layers. Neural networks with more layers are potentially more expressive. However, the training time increases linearly with the number of layers. Parallel-in-layer training is a method that was developed to overcome the increase in training time of deeper networks and was first applied to training residual networks. In this thesis, we first give an overview of existing works on layer-parallel training and graph neural networks inspired by differential equations. We then discuss issues that are encountered when these graph neural network architectures are trained parallel-in- layer and propose solutions to address these issues. Finally, we present and evaluate experimental results about layer-parallel GNN training using the proposed approach.Item Some Results on the Convergence of Anderson Acceleration(University of Waterloo, 2025-01-06) Smith, Adam; De Sterck, HansAnderson acceleration (AA), also known as Anderson mixing, is an extrapolation technique used to accelerate the convergence of fixed-point iterations $\bm{x}_{k+1}=\bm{q}(\bm{x}_k)$, $k=0,1,\dots$, with $\bm{q}:\R^n\to\R^n$, $\bm{x}_k\in\R^n$. AA was first introduced by D.G. Anderson in the context of solving integral equations but has since been adapted to fixed-point iteration problems in general. Despite relatively little being known about its convergence properties, AA has seen considerable usage in several areas such as electronic structure computations and machine learning. This thesis presents a broad overview of the current convergence literature for AA and introduces a variety of new results concerning properties of AA, such as its asymptotic convergence rate, the possibility of stagnation, and an analysis of its coefficients. Additionally, some variations of the AA iteration are proposed with an accompanying analysis and comparison to the classical AA algorithm.Item A Taylor polynomial expansion line search for large-scale optimization(University of Waterloo, 2016-08-29) Hynes, Michael`; De Sterck, HansIn trying to cope with the Big Data deluge, the landscape of distributed computing has changed. Large commodity hardware clusters, typically operating in some form of MapReduce framework, are becoming prevalent for organizations that require both tremendous storage capacity and fault tolerance. However, the high cost of communication can dominate the computation time in large-scale optimization routines in these frameworks. This thesis considers the problem of how to efficiently conduct univariate line searches in commodity clusters in the context of gradient-based batch optimization algorithms, like the staple limited-memory BFGS (LBFGS) method. In it, a new line search technique is proposed for cases where the underlying objective function is analytic, as in logistic regression and low rank matrix factorization. The technique approximates the objective function by a truncated Taylor polynomial along a fixed search direction. The coefficients of this polynomial may be computed efficiently in parallel with far less communication than needed to transmit the high-dimensional gradient vector, after which the polynomial may be minimized with high accuracy in a neighbourhood of the expansion point without distributed operations. This Polynomial Expansion Line Search (PELS) may be invoked iteratively until the expansion point and minimum are sufficiently accurate, and can provide substantial savings in time and communication costs when multiple iterations in the line search procedure are required. Three applications of the PELS technique are presented herein for important classes of analytic functions: (i) logistic regression (LR), (ii) low-rank matrix factorization (MF) models, and (iii) the feedforward multilayer perceptron (MLP). In addition, for LR and MF, implementations of PELS in the Apache Spark framework for fault-tolerant cluster computing are provided. These implementations conferred significant convergence enhancements to their respective algorithms, and will be of interest to Spark and Hadoop practitioners. For instance, the Spark PELS technique reduced the number of iterations and time required by LBFGS to reach terminal training accuracies for LR models by factors of 1.8--2. Substantial acceleration was also observed for the Nonlinear Conjugate Gradient algorithm for MLP models, which is an interesting case for future study in optimization for neural networks. The PELS technique is applicable to a broad class of models for Big Data processing and large-scale optimization, and can be a useful component of batch optimization routines.