No abstract available.
Proceeding Downloads
Non-Blocking GPU-CPU Notifications to Enable More GPU-CPU Parallelism
GPUs are increasingly popular in HPC systems, and more applications are adopting GPUs each day. However, the control synchronization of GPUs with CPUs is suboptimal and only possible after GPU kernel termination points, resulting in serialized host and ...
Portable Implementations of Work Stealing
Work stealing is a well-known technique for dynamic load balancing; however, manually writing work-stealing protocols is error-prone. We can use the Tascell parallel programming language for the correct and portable implementation of work stealing; the ...
sKokkos: Enabling Kokkos with Transparent Device Selection on Heterogeneous Systems using OpenACC
This paper presents a new feature to enable Kokkos with transparent device selection. For application developers, it is not easy to identify which device is the most appropriate to use in a heterogeneous system, since this depends on the characteristics ...
Parallelized Remapping Algorithms for km-scale Global Weather and Climate Simulations with Icosahedral Grid System
In weather and climate research, latitude–longitude grid data are typically used for analysis and visualization, and remapping from model native grids to latitude–longitude grids typically requires a significant amount of time. Here, we developed a ...
Approximate Block Diagonalization of Symmetric Matrices Using Quantum Annealing
We consider the problem of transforming a given symmetric matrix into a nearly block diagonal form by permutation of its rows and columns. Such a transformation is useful as preconditioning to accelerate the convergence of an eigenvalue solver, but the ...
QUBO formulation using inequalities for problems with complex constraints
Quantum annealing is an optimization technique that uses quantum fluctuation effects to search for solutions and is being applied as a metaheuristic method. Quantum annealing solves a problem expressed as quadratic unconstrained binary optimization (...
Evaluation of POSIT Arithmetic with Accelerators
We present an evaluation of 32-bit POSIT arithmetic through its implementation as accelerators on FPGAs and GPUs. POSIT, a floating-point number format, adaptively changes the size of its fractional part. We developed hardware designs for FPGAs and ...
Low-latency Communication in RISC-V Clusters
- Michalis Gianioudis,
- Pantelis Xirouchakis,
- Charisios Loukas,
- Evangelos Mageiropoulos,
- Orestis Mousouros,
- Sokratis Mpartzis,
- Aggelos Ioannou,
- Vassilis Papaefstathiou,
- Manolis Katevenis,
- Nikolaos Chrysos
Low-latency inter-node communication is important in HPC clusters. In this work, we design and integrate a low-cost interconnect, capable for low-latency user-level communication with open-source RISC-V processors, obviating the need for bulky and ...
Flexible Systolic Array Platform on Virtual 2-D Multi-FPGA Plane
Systolic arrays are a promising approach to achieving high-performance processing based on highly parallelized designs in various fields, such as AI and bioinformatics. Many previous studies have devoted considerable effort to exploring efficient ...
An Efficient Task-Parallel Pipeline Programming Framework
The pipeline is a fundamental pattern to parallelize a series of stage tasks over a sequence of data in loops. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. Although this design is convenient for ...
Task-based low-rank hybrid parallel Cholesky factorization for distributed memory environment
The primary targets for improving efficiency for large-scale matrix factorization are reducing synchronization, addressing the overlap in communication and computation, and improving load balance. In recent years, tiled algorithms with task parallelism ...
AshPipe: Asynchronous Hybrid Pipeline Parallel for DNN Training
Deep Neural Networks (DNNs) have become increasingly computationally intensive and have larger parameters, requiring efficient parallelization or distribution using multiple accelerators. Pipeline parallelism has been proposed as an effective way to ...
Bruck Algorithm Performance Analysis for Multi-GPU All-to-All Communication
In high-performance computing, collective communication is critical for facilitating comprehensive data exchange involving all processes within an MPI communicator. Due to their inherently global nature, many collective operations present scalability ...
Efficient GPU-Implementation of H-P Sort Based on Improved Histogram Computation
We present an enhanced GPU implementation of the H-P sort algorithm, which is a widely used method for integer sorting based on histogram computation and prefix sum calculation. This work extends a previous high-performance GPU version of the algorithm, ...
Eulerian elastoplastic simulation of vehicle structures by building-cube method on supercomputer Fugaku
- Koji Nishiguchi,
- Shusuke Takeuchi,
- Hirofumi Sugiyama,
- Shigenobu Okazawa,
- Tadasuke Katsuhara,
- Keiichi Yonehara,
- Shigeki Kojima,
- Kosho Kawahara,
- Hiroya Hoshiba,
- Junji Kato
This paper presents a novel numerical method for the elastoplastic simulation of vehicle component structures under large deformation problems, such as crash-worthiness analysis. Elastoplastic simulation of vehicle structures is essential for designing ...
Analysis Towards Energy-Aware Image-based In Situ Visualization on the Fugaku
- Razil Tahir,
- Jorji Nonaka,
- Ken Iwata,
- Taisei Matsushima,
- Naohisa Sakamoto,
- Chongke Bi,
- Masahiro Nakao,
- Hitoshi Murai
Energy efficiency has become a serious concern when running applications on HPC systems. Although these systems were designed to mainly run simulation codes as fast as possible, due to the ever-increasing size of the simulation outputs, the in situ ...
Information Entropy-based Camera Focus Point and Zoom Level Adjustment for Smart In-Situ Visualization
With the recent developments in computational science and HPC technology, large-scale numerical simulations have become common in various scientific and technological fields. The output volume data from these simulations have also become larger and more ...
Index Terms
- Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
HPCAsia '23 | 34 | 15 | 44% |
HPCAsia '23 Workshops | 10 | 9 | 90% |
HPCAsia '19 | 32 | 15 | 47% |
HPCAsia '18 | 67 | 30 | 45% |
Overall | 143 | 69 | 48% |