HARP: A MACHINE LEARNING FRAMEWORK ON TOP OF THE COLLECTIVE COMMUNICATION LAYER FOR THE BIG DATA SOFTWARE STACK

Bingjing, Zhang

Sangam Home
→
Electronic Theses and Dissertations (ETDs)
→
IUScholarWorks
→
View Item

dc.contributor	Qiu, Judy
dc.creator	Bingjing, Zhang
dc.date	2017-05-16T19:50:38Z
dc.date	2017-05-16T19:50:38Z
dc.date	2017-05
dc.date.accessioned	2023-02-21T11:20:47Z
dc.date.available	2023-02-21T11:20:47Z
dc.identifier	http://hdl.handle.net/2022/21445
dc.identifier.uri	http://localhost:8080/xmlui/handle/CUHPOERS/253100
dc.description	Thesis (Ph.D.) - Indiana University, Informatics and Computing, 2017
dc.description	Almost every field of science is now undergoing a data-driven revolution requiring analyzing massive datasets. Machine learning algorithms are widely used to find meaning in a given dataset and discover properties of complex systems. At the same time, the landscape of computing has evolved towards computers exhibiting many-core architectures of increasing complexity. However, there is no simple and unified programming framework allowing for these machine learning applications to exploit these new machines’ parallel computing capability. Instead, many efforts focus on specialized ways to speed up individual algorithms. In this thesis, the Harp framework, which uses collective communication techniques, is prototyped to improve the performance of data movement and provides high-level APIs for various synchronization patterns in iterative computation. In contrast to traditional parallelization strategies that focus on handling high volume training data, a less known challenge is that the high dimensional model is also in high volume and difficult to synchronize. As an extension of the Hadoop MapReduce system, Harp includes a collective communication layer and a set of programming interfaces. Iterative machine learning algorithms can be parallelized through efficient synchronization methods utilizing both inter-node and intra-node parallelism. The usability and efficiency of Harp’s approach is validated on applications such as K-means Clustering, Multi-Dimensional Scaling, Latent Dirichlet Allocation and Matrix Factorization. The results show that these machine learning applications can achieve high parallel performance on Harp.
dc.language	en
dc.publisher	[Bloomington, Ind.] : Indiana University
dc.subject	MACHINE LEARNING
dc.subject	COLLECTIVE COMMUNICATION
dc.subject	BIG DATA
dc.title	HARP: A MACHINE LEARNING FRAMEWORK ON TOP OF THE COLLECTIVE COMMUNICATION LAYER FOR THE BIG DATA SOFTWARE STACK
dc.type	Doctoral Dissertation

Files in this item

Files	Size	Format	View
Zhangthesis.pdf	3.422Mb	application/pdf	View/Open

This item appears in the following Collection(s)

IUScholarWorks [635]
Indiana University Bloomington

Show simple item record

Search DSpace

Advanced Search

Browse

All of DSpace
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

HARP: A MACHINE LEARNING FRAMEWORK ON TOP OF THE COLLECTIVE COMMUNICATION LAYER FOR THE BIG DATA SOFTWARE STACK

Files in this item

This item appears in the following Collection(s)

Search DSpace

Browse

All of DSpace

This Collection