Abstract:
Warehouse Scale Computers (WSC) are often used for various big data jobs where the big data under processing comes from a variety of sources. We show that different data ...Show MoreMetadata
Abstract:
Warehouse Scale Computers (WSC) are often used for various big data jobs where the big data under processing comes from a variety of sources. We show that different data portions, from the same or different sources, have different significances in determining the final outcome of the computation, and hence, by prioritizing them and assigning more resources to processing of more important data, the WSC can be used more efficiently in terms of time as well as cost. We provide a simple low-overhead mechanism to quickly assess the significance of each data portion, and show its effectiveness in finding the best ranking of data portions. We continue by demonstrating how this ranking is used in resource allocation to improve time and cost by up to 24 and 9 percent respectively, and also discuss other uses of this ranking information, e.g., in faster progressive approximation of the final outcome of big data job without processing entire data, and in more effective use of renewable energies in WSCs.
Published in: IEEE Computer Architecture Letters ( Volume: 16, Issue: 2, 01 July-Dec. 2017)