Rapidly growing real-world networks, with billions of vertices, call for scalable, fast, and efficient graph algorithms. Luckily, commercial multi-core, multi-processor, and multi-machine environments can handle such volumes of data. Unfortunately, despite the availability of such resources, many current graph algorithms do not take full advantage of these parallel and distributed environments or have non-optimal theoretical guarantees, translating to slower and less efficient algorithms in practice. The purpose of this thesis is to theoretically improve previous graph algorithms in modern machines. We demonstrate through experiments that such theoretical improvements also translate to practical gains.
Towards this goal, this thesis takes a two-pronged approach.
First, we formulate algorithms in computation models that mimic large-scale data processing environments. Algorithms in such models take advantage of clusters of machines and a machine's multiple cores and processors. Second, we use specific properties of real-world networks when designing our algorithms. The degeneracy is one such characteristic; while a network may have billions of vertices, its degeneracy may only be a few hundred.
This thesis consists of three parts. The first part presents static graph algorithms. We first introduce a set of new editing algorithms for a framework that approximates solutions to hard-to-solve optimization problems via editing a graph into a desired structured class. Then, we present novel small subgraph counting algorithms, with better theoretical space and round guarantees, in the massively parallel computation model; our experiments corroborate our theoretical gains and show improvements in number of rounds and approximation factor, compared to the previous state-of-the-art, in real-world graphs. We conclude this part with a near-linear time scheduling algorithm for scheduling on identical machines with communication delay where precedence constrained jobs are modeled as directed acyclic graphs.
The second part focuses on dynamic graph algorithms. We first show a O(1) amortized time, with high probability, dynamic algorithm for (Δ+1)-vertex coloring. Then, we provide a new parallel level data structure for the k-core decomposition problem under batch-dynamic updates (where dynamic edge updates are applied in batches). We show that our data structure provably provides a (2+𝜀)-approximation on the coreness of each vertex, improving on the previously best-known bound of (4+𝜀). We conclude with new parallel, work-efficient batch-dynamic algorithms for triangle and clique counting. Our extensive experiments for our batch-dynamic algorithms show orders of magnitude improvements in performance over the best previous multi-core implementations in real-world networks.
The last part concludes with lower bounds. We show via hard instances the hardness of obtaining an optimal computation schedule on directed acyclic computation graphs in the external-memory model. We then demonstrate that such graphs can be used to construct static-memory-hard hash functions that use
disk memory to deter large-scale password-cracking attacks.
Ph.D.