dc.description |
Detection of anomalous events relies on the collection, filtration, and analysis of diverse types of temporal data. Interactions derived from such data can be modeled as networks to provide a better understanding of the structure and dynamics of the underlying systems. This dissertation examines the temporal evolution of Internet-scale phenomena to provide a more nuanced characterization of normal functionality and anomalous behavior—usually undesired and often malicious. Characterizing regular behavior is often a prerequisite for identifying these anomalies. However, the volume and patterns of interactions during a system’s evolution under particular circumstances may be highly variant. In security, I create hypotheses about the nature of attacks as a core component of detection. In the insider threat, I hypothesized that malicious insiders would require access to identifiably more diverse repositories than the non-malicious. In routing, my first hypothesis was that the role of nation-state actors could be identified using traditional macroeconomic analyses. My second hypothesis, that control plane hijacking could be identified by leveraging k-shell decomposition of Autonomous System level graphs, could not be validated. In contrast, the analysis of inter-arrival times of route announcements provided clear identification and early warning of large-scale incidents. This dissertation contributes to a more comprehensive understanding of security threats using data and network science methods. Specifically, I use (i) Graph mining to show that surprising patterns about community structure and k-shell decomposition of graphs can be leveraged to detect classes of anomalies. Leveraging (ii) Graph robustness, I show how community detection-based methods are less biased against the density of edges in the system, providing a robust approach to detect anomalous behavior. Finally, I illustrate the potential of (iii) Graph anomaly detection for identifying anomalies in different real-world scenarios, including (a) email interactions, (b) social media, (c) code repositories, and (d) Internet control-plane updates. |
|