Sangam: A Confluence of Knowledge Streams

Big Data Analytics in Static and Streaming Provenance

Show simple item record

dc.contributor Plale, Beth
dc.creator Chen, Peng
dc.date 2016-04-27T18:25:35Z
dc.date 2016-04-27T18:25:35Z
dc.date 2016-04
dc.date.accessioned 2023-02-21T11:20:31Z
dc.date.available 2023-02-21T11:20:31Z
dc.identifier http://hdl.handle.net/2022/20817
dc.identifier.uri http://localhost:8080/xmlui/handle/CUHPOERS/253074
dc.description Thesis (Ph.D.) - Indiana University, Informatics and Computing,, 2016
dc.description With recent technological and computational advances, scientists increasingly integrate sensors and model simulations to understand spatial, temporal, social, and ecological relationships at unprecedented scale. Data provenance traces relationships of entities over time, thus providing a unique view on over-time behavior under study. However, provenance can be overwhelming in both volume and complexity; the now forecasting potential of provenance creates additional demands. This dissertation focuses on Big Data analytics of static and streaming provenance. It develops filters and a non-preprocessing slicing technique for in-situ querying of static provenance. It presents a stream processing framework for online processing of provenance data at high receiving rate. While the former is sufficient for answering queries that are given prior to the application start (forward queries), the latter deals with queries whose targets are unknown beforehand (backward queries). Finally, it explores data mining on large collections of provenance and proposes a temporal representation of provenance that can reduce the high dimensionality while effectively supporting mining tasks like clustering, classification and association rules mining; and the temporal representation can be further applied to streaming provenance as well. The proposed techniques are verified through software prototypes applied to Big Data provenance captured from computer network data, weather models, ocean models, remote (satellite) imagery data, and agent-based simulations of agricultural decision making.
dc.language en_US
dc.publisher [Bloomington, Ind.] : Indiana University
dc.subject Big Data provenance
dc.subject stream processing
dc.subject data mining
dc.subject data representation
dc.subject data visualization
dc.title Big Data Analytics in Static and Streaming Provenance
dc.type Doctoral Dissertation


Files in this item

Files Size Format View
PhD_Thesis_PengChen.pdf 6.009Mb application/pdf View/Open

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Advanced Search

Browse