Abstract
Parallel memory modules can be used to increase memory bandwidth and feed a processor with the required access patterns of data. The parallel storage mechanism organized and managed by multiple storage modules can suit applications of images and videos. Previous investigation into data storage schemes can be used to achieve continuous conflict free access by rows, columns or blocks, however it is not only satisfied with some sliding window applications in video and image processing algorithms (including convolutional neural networks, sub-pixel difference, 2D filtering, etc.) which need non-conflicting access by steps in computation, but also there is a different demand for horizontal and vertical strides in computing sub-processes. This paper presents a storage scheme that support for row access without collision alignment, and non-aligned block-with-stride access storage modes beginning at any address. Theoretical proofs and experiments verify the correct ness of the module address (module number to which the address is mapped). And in hardware design, it was found that in the typical case there was no path violation and with less area overhead. It suitable for application of CNN to improve performance in algorithm in convolutional.
This paper is supported by the National Nature Science Foundation of China (No. 61602493, Name Researches on Efficient Parallel Memory Techniques for Wide Vector DSPs).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, S., Postula, A., Jozwiak, L.: Synthesis of XOR storage schemes with different cost for minimization of memory contention. In: 1999 Proceedings of the Euromicro Conference, vol. 1, pp. 170–177. IEEE (1999)
Hartenstein, R.W., Becker, J., Herz, M., Nageldinger, U.: An embedded accelerator for real world computing. In: Reis, R., Claesen, L. (eds.) VLSI: Integrated Systems on Silicon. ITIFIP, pp. 215–226. Springer, Boston (1997). https://doi.org/10.1007/978-0-387-35311-1_18
Aho, E., Vanne, J., Kuusilinna, K., et al.: Address computation in configurable parallel memory architecture. IEICE Trans. Inf. Syst. 87-D(7), 1674–1681 (2004)
Takala, J., Jarvinen, T.: Stride permutation access in interleaved memory systems (2003)
Budnik, P., Kuck, D.J.: The organization and use of parallel memories. IEEE Trans. Comput. 20(12), 1566–1569 (1971)
Park, J.W.: An efficient buffer memory system for subarray access. IEEE Trans. Parallel Distrib. Syst. 12(3), 316–335 (2002)
Park, J.W.: Multiaccess memory system for attached SIMD computer. IEEE Trans. Comput. 53(4), 439–452 (2004)
Park, J.W.: Conflict-free memory system and method of address calculation and data routing by using the same. US 6845423 B2[P], US (2005)
Hong, Y., Choi, B., Lee, K., et al.: Conflict management considering a smooth transition of aircraft into adjacent airspace. IEEE Trans. Intell. Transp. Syst. 17(9), 2490–2501 (2016)
Liu, C., Yan, X., Qin, X.: An optimized linear skewing interleave scheme for on-chip multi-access memory systems. In: ACM Great Lakes Symposium on VLSI, pp. 8–13. ACM (2007)
Liu, S., Chen, S., Chen, H., et al.: A novel parallel memory organization supporting multiple access types with matched memory modules. IEICE Electron Express 9(6), 602–608 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, R., Zeng, G., Liu, S., Chen, H. (2018). Conflict-Free Block-with-Stride Access of 2D Storage Structure. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11336. Springer, Cham. https://doi.org/10.1007/978-3-030-05057-3_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-05057-3_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05056-6
Online ISBN: 978-3-030-05057-3
eBook Packages: Computer ScienceComputer Science (R0)