Abstract
Due to consistent improvements in memory and processor technology, object storage devices (OSDs) have greater memory space and more powerful processing power, which allow the OSDs to execute user-defined programs. Shifting part of an application’s processing to the disk drives drops the amount of data transferred across the network and explores the parallelism of large-scale distributed storage systems, reducing the execution time for many basic data analytics tasks. In this paper, we propose a large-scale object-based active storage platform, named Gem, for data analytics in the internet of things (IoT). All data from the IoT that resides in disk drives form objects with attributes, methods and policies. For some applications such as data analytics, application-specific operations are executed by the drive processors. In this way, only the results are returned to clients, rather than data files being read by the clients. Therefore, the platform Gem is able to greatly reduce the overhead of data analytics applications in the Internet of Things. By conducting performance evaluation, experimental results demonstrate the effectiveness and scalability of Gem.














Similar content being viewed by others
References
Evans D (2011) The internet of things how the next evolution of the internet is changing everything. http://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL
Xu Q, Aung KMM, Zhu Y, Yong KL (2015) A large-scale object-based active storage platform for data analytics in the internet of things. In: The 9th International Conference on Multimedia and Ubiquitous Engineering, pp 405–413
Riedel E, Gibson GA, Faloutsos C (1998) Active storage for large-scale data mining and multimedia. In: VLDB, pp 62–73
Acharya A, Uysal M, Saltz JH (1998) Active disks: Programming model, algorithms and evaluation. In: ASPLOS, pp 81–91
Fromm R, Perissakis S, Cardwell N, Kozyrakis CE, McGaughy B, Patterson DA, Anderson TE, Yelick KA (1997) The energy efficiency of iram architectures. In: ISCA, pp 327–337
Cho S, Park C, Oh H, Kim S, Yi Y, Ganger GR (2013) Active disk meets flash: a case for intelligent ssds. In: ICS, pp 91–102
Xu Q, Shen HT, Chen Z, Cui B, Zhou X, Dai Y (2009) Hybrid information retrieval policies based on cooperative cache in mobile P2P networks. Front Comput Sci China 3(3):381–395
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: MSST, pp 1–10
Boumkheld N, Ghogho M, Koutbi ME (2015) Energy consumption scheduling in a smart grid including renewable energy. J Inf Process Syst 11(1):116–124
Vanus J, Smolon M, Martinek R, Koziorek J, Zidek J, Bilik P (2015) Testing of the voice communication in smart home care. Human-centr Comput Inf Sci 5(15):1–22
Stoica I, Morris R, Karger DR, Kaashoek MF, Balakrishnan H (2001) Chord: a scalable peer-to-peer lookup service for internet applications. In: SIGCOMM, pp 149–160
Xu Q, Arumugam RV, Yong KL, Mahadevan S (2014) Efficient and scalable metadata management in eb-scale file systems. IEEE Trans Parallel Distrib Syst 25(11):2840–2850
Chekuri C, Khanna S (2005) A polynomial time approximation scheme for the multiple knapsack problem. SIAM J Comput 35(3):713–728
Xu Q, Arumugam RV, Yong KL, Mahadevan S (2013) Drop: Facilitating distributed metadata management in eb-scale storage systems. In: MSST, pp 1–10
Xu Q, Xi W, Yong KL, Jin C (2015) Concurrent regeneration code with local reconstruction in distributed storage systems. In: The 9th international conference on multimedia and ubiquitous engineering, pp 415-422
Weber RO (2009) Scsi object-based storage device commands-2 (osd-2)
Mesnier M, Ganger GR, Riedel E (2003) Object-based storage. Commun Mag IEEE 41(8):84–90
Welch B, Unangst M, Abbasi Z, Gibson GA, Mueller B, Small J, Zelenka J, Zhou B (2008) Scalable performance of the panasas parallel file system. In: FAST, pp 17–33
Gibson GA, Meter RV (2000) Network attached storage architecture. Commun ACM 43(11):37–45
Thornburgh RH, Schoenborn B (2000) Storage Area Networks. Prentice Hall PTR, USA
Ahn H, Ju M, Yoo D, Kim H, Kim Y (2014) Data analysis of fish species change depending on existence of wetland at lake paro upstream for the wireless monitoring of ecosystem. J Converg 5(3):23–27
Wang J, Shang P, Yin J (2014) Draw: a new data-grouping-aware data placement scheme for data intensive applications with interest locality. In: Cloud Computing for Data-Intensive Applications, Springer, pp 149–174
Keeton K, Patterson DA, Hellerstein JM (1998) A case for intelligent disks (idisks). SIGMOD Rec 27(3):42–52
Huston L, Sukthankar R, Wickremesinghe R, Satyanarayanan M, Ganger GR, Riedel E, Ailamaki A (2004) Diamond: A storage architecture for early discard in interactive search. In: FAST, pp 73–86
Son SW, Lang S, Carns P, Ross R, Thakur R, Ozisikyilmaz B, Kumar P, Liao WK, Choudhary A (2010) Enabling active storage on parallel i/o software stacks. In: MSST, pp 1–12
Cai Q, Arumugam RV, Xu Q, He B (2014) Understanding the Behavior of Solid State Disk. In: The 18th Asia Pacific symposium on intelligent and evolutionary systems. vol 1, pp 341–355
Boboila S, Kim Y, Vazhkudai SS, Desnoyers P, Shipman GM (2012) Active flash: Out-of-core data analytics on flash storage. In: MSST, pp 1–12
Tiwari D, Boboila S, Vazhkudai SS, Kim Y, Ma X, Desnoyers PJ, Solihin Y (2013) Active flash: Towards energy-efficient, in-situ data analytics on extreme-scale machines. In: FAST, pp 119–132
Agrawal N, Prabhakaran V, Wobber T, Davis JD, Manasse M, Panigrahy R (2008) Design tradeoffs for ssd performance. In: USENIX Annual Technical Conference, pp 57–70. http://dblp.uni-trier.de/db/conf/usenix/usenix2008.html
Kim S, Oh H, Park C, Cho S, Lee SW (2011) Fast, energy efficient scan inside flash memory. In: ADMS@VLDB, pp 36–43
Acknowledgments
This work is supported by A\(^*\)STAR Thematic Strategic Research Programme (TSRP) Grant No. 1121720013.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xu, Q., Aung, K.M.M., Zhu, Y. et al. Building a large-scale object-based active storage platform for data analytics in the internet of things. J Supercomput 72, 2796–2814 (2016). https://doi.org/10.1007/s11227-016-1621-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1621-2