Abstract
Modifying a single hardware component can have a profound effect on the system's claimed speed, power consumption, and synchronized and parallel functionality. An integral part of these digital circuits and systems is the multiplier. Carry look-ahead (CLAs) adducts are low-power components used by researchers with various enhancements to reduce latency and power consumption. The Baugh Wooley multiplier is a well-known multiplier that incorporates CLA. In this paper, we present an enhanced design for this multiplier. By incorporating a carry look-ahead adder based on Quaternary logic, this work improved the structural behavior of the Baugh Wooley Multiplier. In this design, the Wallace Tree algorithm was used to optimize the functionality. For use with 180 nm technology and requiring only 1.8 w of power, this proposed improved Baugh Wooley multiplier was developed. The CLA Multiplier, QSDCLA (Quaternary Signed Digit-based Carry Look Ahead) Multiplier, Baugh Wooley Multiplier, Wallace Tree Multiplier, Hasan Multiplier, and Improved Radix Adder are compared to the proposed architecture. We used latency and energy use as our metrics of evaluation. The latency time was reduced to 0.0008962 ns, and the power consumption was reduced by 1.693 w with the proposed architecture. The results show that the new multiplier is significantly more effective and reliable than the previous multipliers.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The inexpensive layout design of multipliers of the AND gate is used to perform both the partial product and the multiplication of multipliers. The multiplier's internal structure is determined by the relative sizes of the multiplicand and the multiplier. The multiplier is built using a dynamic decomposition method in which the multiplier and multiplicand sizes are predetermined. To perform the multiplication operation, a series of additions must be applied [1]. The equation illustrating the multiplication's internal additive process is: (1). [2,3,4] increases system speed while decreasing energy consumption. To accomplish high-speed processing, multipliers are used as appropriate components. Utilizing multipliers efficiently can boost the speed of mathematical operations and algorithms. Addition and shifting are the workhorses of digital processing. A system that uses parallel multipliers to improve the speed of multiplication and shift operations is presented. The design and implementation of multipliers can be performed using a wide variety of architectures and algorithms. The goals of these architectures and algorithms are simplification in terms of complexity, number of parts, power consumption, and runtime [5, 6]. One such efficient multiplier that lessens the total number of integrated additions is the Baugh-Wooley [7, 8] multiplier. This multiplier's operational behavior can be thought of as occurring in a series of stages. The functional responsibilities of each stage are linked to the subsequent stage and the previous stage to decrease the power consumption and execution time, respectively. This study integrates quaternary logic and the Walsh Tree algorithm to boost the efficiency of the Baugh-Wooley multiplier.
The AND gate is used to perform both the partial product and the multiplication in multipliers. The multiplier's internal structure is determined by the relative sizes of the multiplicand and the multiplier. The multiplier is built using a dynamic decomposition method in which the multiplier and multiplicand sizes are predetermined. To perform the multiplication operation, a series of additions must be applied [1, 9]. The equation illustrating the multiplication's internal additive process is: (1).
The inputs A and B are binary sequences of lengths m and n, respectively.
The steps of the algorithm for multiplication are listed below. An algorithm was used to add the digits. Partial products are calculated using a sequential serial adder. Parallel multiplication is achieved by adding these partial products together [10, 11].
-
Check the least significant bit of the multiplier; if it is ‘1’, then add the multiplier into an accumulator.
-
Apply a one-bit shift to the right as well as a one-bit shift to the left.
-
This process is repeated for all bits until the multiplier is not set to 0.
In regard to efficiency, power consumption, and overall complexity, serial multipliers [12, 13] shine. This multiplier calculates m*n partial products with a single adder. A schematic of the serial multiplier is shown in Fig. 1. This circuit makes use of a time-locked stream of data. Each clock pulse triggers a shift operation on the incoming bitwise input. The multiplicand and multiplier bits are retrieved at regular intervals. The extracted bits are then fed into a partial product adder. The data processing clock is independent of the reset clock in this design [14,15,16]. The complexity of the architecture is given by O (m, n).
The serial/parallel multiplier, depicted in Fig. 2, is another popular and useful multiplier. Bits that are input in parallel are converted into sequential bits and zeros by this multiplier. Every time the lock is activated, N partial products are calculated. The cycle after that adds up the numbers in one column of the multiplication table. The product of the multiplication is written to the output register. The final result requires N + M clock cycles [17, 18].
To perform parallel multiplication, a simple array multiplier can be used. Figure 3 shows the circuit diagram of the 4 × 4 array multiplier. Array multiplies imply stagewise processing through parallel adders. The partial input is taken from an array of bits. Rowwise array processing is carried out in this multiplier. In each stage, the multiplier performs the partial products using Full Adders. This process can be performed partially so that full adder-based work can be performed. This work can be performed for n bits by using an n-bit array specification that can be processed vertically and horizontally [19,20,21]. A 16-bit multiplier can be specified such that the bit delay can be obtained during the processing of the full adder. The pipelining of the array system is analyzed to perform parallel multiplication as well as simultaneous work. The pipelining can be increased to optimize the multiplier output. The number of pipelining layers increases the latency and the area of the multiplier. A pipelined-based array is defined to avoid array-level analysis and the clock of the processor [22,23,24,25]. The intermediate analysis, along with the partial sum, is defined to obtain the repeated propagation over the system. This process is repeated under the specifications of the array multiplier, and the depth of the array is specified over the system. This kind of work is defined under the specification of a partial product [26, 27].
1.1 Problem Definition
In digital signal processing, multipliers are highly important for solving different arithmetic and logical problems. One of the primary requirements of digital signal processing is multiplication. This multiplication process can be performed by using different multiplication algorithms. A multiplier is a combinational circuit that can be used to perform multiplication. Conventional multipliers are designed with complex structures and high-power requirements. The performance of the multiplier directly affects the performance of the system. With technological advancements, the size of transistors is decreasing, which reduces the component area and power requirements. Various serial, parallel, shift, and array multipliers were also designed to improve the performance of the system. Various architectural designs, component integrations, and algorithms were adopted within multiplier designs to improve the computational speed and reduce power consumption. In the present work, the optimization of the Baugh-Wooley architecture is presented. In this new design, a quaternary logic-based look-ahead carry multiplier is used. The Wallace tree algorithm is also integrated for the effective integration of components and for carrying out effective operations. The main objective of the designed system is to reduce power consumption and delay and improve the performance of the existing Baugh Wooley Multiplier.
In this paper, an improved Baugh Wooley multiplier is designed to reduce the power consumption and latency time. The designed multiplier is integrated with a quaternary adaptive carry look-ahead adder. The Wallace tree algorithm is integrated into this multiplier to effectively enhance the performance. In this section, different types of multipliers are discussed. The complexities and features of the multipliers are described in this section. In Sect. 2, the work and improvements investigated by earlier research on the Baugh Wooley design, the Wallace tree algorithm, and quaternary logic are discussed. In Sect. 3, the architecture design of the proposed Baugh Wooley multiplier is provided. The algorithmic and functional description of this improved multiplier is also provided in this section. In Sect. 4, simulation and analysis results are provided for the conventional Baugh Wooley multiplier. In Sect. 5, the conclusion of this research is provided.
2 Related Work
In describing microcontrollers and microprocessors, architectural improvements are necessary for optimizing the performance and reducing the size of the component or system. Various kinds of serial and parallel multipliers have been designed by researchers for optimizing specific functionalities or operations, such as shifts, multiplication, and partial products. Baugh Wooley is a multiplier that is optimized by researchers by integrating various additional components and algorithms. In this section, the research contributions of earlier researchers are described.
Bhoi et al. [2] integrated quantum dot cellular automata (QCA) technology within the Baugh-Weoley multiplier to optimize computational performance. In this architecture, a single layer-based wire-crossing mechanism is used for minimizing clock phasing. The designed layout was scalable and achieved effective preliminary results. Raveendran et al. [28] presented a novel inexact Baugh Wooley Wallace tree multiplier by optimizing the compressor of an existing architecture. Reversible logic was applied in this work with 4:2 compression and a 12.5% error rate. The realization metrics were analyzed for this multiplier, and its scope for complex applications was identified. The structural similarity measure-based method was defined and achieved a high accuracy rate of more than 97.1%. Xiong et al. [3] designed the Wallace tree architecture based on 8-bit Baugh Wooley 2's complement multiplier. The Wallace tree architecture is integrated specifically for quicker partial products. The architecture is also integrated with carry-save adders to reduce the number of additional cycles. The array architecture was adopted in this design to reduce power consumption. Gudivada et al. [4] employed quantum-dot cellular automata-based nanotechnology to design a 1-bit full adder. This lower-power-consumption-based design is integrated within the Baugh-Wooley multiplier. This improved design ensured a reduction in cell count and energy consumption and achieved efficient blocking. Biradar et al. [5] designed an improved Baugh-Wooley and Braun multiplier for unsigned and signed number multiplication. To reduce the size and power consumption, the adder is designed using CMOS, domino, and split path data-driven dynamic logic (SPD3L). The implementation results confirmed that the designed multiplier reduced energy consumption and achieved a significant performance gain. Asati et al. [6] designed a novel pipelined 8 × 8 signed Baugh Wooley multiplier and implemented it using CMOS TSPC logic at 0.6 µm. The designed model effectively reduced the transistor, power consumption, and delay. Muley et al. [7] designed a novel Baugh Wooley and Wallace tree multiplier and implemented it using a 2-phase clocked adiabatic static CMOS logic (2PASCL). The designed architecture has an energy consumption of up to 62.66% in comparison with that of the static CMOS logic-based Wallace tree multiplier. Pudi et al. [29] employed QCA design technology to optimize the performance of the Baugh-Wooley Multiplier. The multiplier was designed for the multiplication of a pair of 2's complement numbers. The simulation results revealed that the proposed design reduced the area and delay and improved the throughput. Sjalander et al. [1] used the Baugh Wooley algorithm with an HPM reduction tree for designing a high-speed and low-power multiplier. The adoption of nanotechnologies reduced the size, delay, and power consumption in the design of the modified multiplier. Tu et al. [10] applied the skin-tolerant domino technique to design an improved Baugh Wooley multiplier. In this improved design, four overlapping clock signals were used for the AND gate, NAND gate, half adder, and full adder. The carry save vector was employed for merging. The simulation results confirm the significant improvement achieved in performance. Warrier et al. [12] proposed a pipelined MAC architecture and integrated it within a 16 × 16 multiplier. The integration is performed using the Baugh-Wooley algorithm with reduced energy consumption and effective clock gating. The designed architecture reduced the power consumption by 30 to 80% compared with that of existing contemporary MAC architectures. Tu et al. [13] designed a pipelined reconfiguration and fixed width-based Baugh Wooley multiplier design framework. This framework provided four configuration modes. This pipelined reconfiguration-based design effectively reduced the power consumption.
Mukherjee et al. [18] defined a generic model for the Baugh Wooley Multiplier. A linear carry-select adder was used instead of the conventional Baugh Wooley multiplier. The author defined a postsynthesis analysis to attain the results. Mohanty et al. [17] defined an efficient Baugh Wooley architecture to perform signed and unsigned multiplication. The work was defined for parallel multipliers with useless adders or less. An iterative process was defined to perform the multiplication in a series of add operations. This approach simplified the system complexity and increased the computational effort. Another work on low-power array specification architecture was defined by Bajaj et al. [30] on an unsigned system with signed multiplication. The author defined the work for the Baugh Wooley multiplier. The reduced power consumption is based on the optimum approach within the multiplier. Another study on the signed array multiplier was performed by Das et al. [31]. The author defined a novel architecture for a binary signed multiplier using the Baugh Wooley algorithm. The work was defined for a 2 × 2 multiplier, and the negative operand was adjusted to improve the system performance and reduce the computations.
Di et al. [32] reconfigured the multiplier design to reduce the power consumption of the system. The author defined a two-complemented-based signed multiplier scheme. The author divided the input data into smaller blocks and reduced the computations. The sign-bit processing-based method avoided negative operands and reduced the delay and power consumption. Bansal et al. [33] defined a 64 × 64 bit multiplier using a radix 2 modified booth algorithm. The author improved the multiplier performance by including a modified Wallace structure. The author defined the fixed-length structure and presented the work with an array of coprocessor systems. The proposed pipelined system reduced the computational latency and delay. Another study on a modified booth encoder and multiplier system was performed by Rajput et al. [34]. The author defined the design and multiplication phase under the multiplication operation in signed number analysis. The authors used the saved adder tree and final carry look-ahead adder to speed up the multiplier operation. The designed component reduced energy consumption. Another study on high-speed multipliers was performed by Saokar [35]. The author defined the system with a floating point multiplier. The author proposed a method for fixed-point multiplication based on Vedic mathematics.
Vakili et al. [36] introduced an effective and robust approximation method to multiply the signed number on FPGAs. This approach integrated a segmentation method within the Baugh-Wooley multiplication algorithm to optimize the outcome. Each segment was processed within a look-up table to generate accurate and quick results. The model achieved 53.6% utilization of resources in comparison with the INT8 Xilinx multiplier. Kishore et al. [37] used a Quantum-dot Cellular Automata (QCA) mechanism within CMOS technology to optimize the multiplication process. The model was defined to enhance the performance of nanodevices. The author used the parallel array multiplier with Braun multiplier and ripple carry adder to reduce power consumption. This QCA-enhanced Baugh Wooly multiplier was comparatively small, requires lesser power, and provides high performance in nanodevices. The computational capabilities of this model was efficient. Pakkiraiah et al. [38] provided an extension over the existing Baugh-Wooley multiplier design by reducing the number of gate counts and optimizing the power consumption and speed. The reversible logic gates were used to reduce the power consumption and to improve the computational efficiency. The method achieved effective results against white cells, pink cells and reversible full adder. Raj et al. [39] proposed Multiple Control Toffoli and Multiple Control Fredrick and Reversible logic gate-based Baugh Wooley multiplier design to reduce the hardware complexity and to optimize the performance. The designed architecture has taken lesser area with low power consumption. The processing speed was also improved in this architecture with lesser power usage. The analysis results observed a 23.77% and 16.88% reduction in processing delay against FPGA-BWM-RL-TG and FPGA-BWM-TG-FG architectures.
Kishore et al. [40] used the modified gate diffusion input technique to optimize the power consumption of the Baugh Wooley Multiplier. The model was implemented using GDI, CMOS, and MGDI methods. This model observed that the CMOS-based Baugh Wooley multiplier used more transistors and consumed higher power with maximum delay. The multiplier with 32 nm technology provided the most effective results. Beura et al. [41] integrated a compressor within the Baugh-Wooley multiplier to obtain the addition of partial products. The algorithm was defined with the inclusion of 4:2 compressor in the product bit array. The model was designed to reduce total error distance by balancing the overestimation of design time. The minimum single-to-noise ratio was achieved in this model when implemented for edge detection. The proposed design improved the error tolerance and reduced the area-delay tradeoff. This model achieved better accuracy and improved the reliability of the system. Thamizharasan et al. [42] provided a hybrid compressor-based multiplier that was implemented with Field Programmable Gate Array. It was a modification over the Vedic multiplier to optimize the performance of signal/image processing applications. The results identified that the model improved the processing speed upto 35.83% against the Array multiplier, 24.49% against the Carry Look ahead adder-based Vedic multiplier, 20.65% against Ripple carray adder Ripple-based Vedic multiplier, 21.65% against the Booth multiplier and 7.15% hybrid vedic multiplier.
3 Research Methodology
Multiplication is the core operation generally used in most arithmetic and logical applications. A multiplier is a combinational circuit that is used for various arithmetic and logical operations, including multiplication. Multiplication is a complex problem that has high computational and power requirements. Various serial, parallel, and array multipliers were investigated and optimized by researchers by integrating various algorithmic methods and components. Baugh Wooley is a multiplier that uses sign bits effectively via a multiplication algorithm. This technique accepts the 2's complement number and provides an effective way to multiply over regular multipliers. The present work involves optimizing the functional behavior and structure of Baugh Wooley multipliers. The structural improvement presented in this work is the use of a Quaternary logic adaptive look ahead carry adder. Functional improvement is achieved by integrating the algorithmic flow with the Wallace Tree algorithm. The objective of this improved and optimized Baugh Wooley multiplier is to reduce the power consumption and execution time and to enhance the performance of the system (Fig. 4).
3.1 Baugh Wooley Multiplier
In this research, multiplication optimization is suggested using the improved Baugh-Wooley algorithm. Baugh Wooley accepts signed digital data as input and executes array multiplication. The 2's complement is used to represent the negative number. The algorithm accepts the 8-bit format and applies partial multiplication. N-bit array-based partial production creation and processing via Wooley array multiplication are described in this subsection. Let A and B be N-bit array representations of two signed numbers. Equation (2) provides the representation of the number in signed bit format.
The multiplication of A and B is provided in Eq. (3):
Equation (2) shows that multiplication is performed by using the adder and subtractor cells. Equation (4) reframes the negative cell equation using only adder cells.
Now, by using these adder cells, Eq. (3) is reframed to Eq. (5). In Eq. (5), Eq. (4) is substituted.
The functional flow of the Baugh Wooley array multiplier is provided below:
-
The most significant bits of the first N − 1 partial product row and all bits other than MSB of the last partial product row are inverted
-
Replace the Nth bit with '1'
-
Invert the MSB of the Final Results.
These steps show that the partial product of array bits is generated by applying the AND gate to operand bits. The obtained partial product is inverted, and the NAND gate is used for this inversion. Now, the nth bit is replaced by '1'. A block diagram of the 4 × 4 Baugh Wooley multiplier is shown in Fig. 5. In this figure, two kinds of cells are used: gray cells and white cells. To design the circuit diagram of the Baugh Wooley multiplier, the following steps are carried out. A broader view of this design process is provided in Fig. 4.
-
Design a 1-bit Quaternary logic-based look carry ahead Full Adder (FA)
-
Design of Gray and White Cells using the Designed FA
-
Employ the Wallace Tree Algorithm.
3.2 Wallace Tree Algorithm
WALLACE provides an effective parallel multiplier algorithm and phenomenon with fewer gates. It is one of the faster multipliers available with multiple schemes. In this work, a quaternary component integrated Wallace tree algorithm is integrated within the Baugh Wooley Multiplier to optimize its functionality. This algorithm adapts to reduce the number of adder stages. The quaternary logic-based look ahead carry adder generates the summation of partial products. The half and full adders using quaternary logic are integrated to reduce the number of rows and bits at different stages of this Wallace Tree. This tree-based Wallace algorithm is less complex than other algorithms and uses serial multiplication. The algorithmic steps of the proposed Wallace tree-based multiplication with quaternary logic components are described below. The architecture of this Wallace tree functionality is provided in Fig. 6.
-
1.
Take the quaternary bits as an argument and perform multiplication to generate N-bit results. The look carry ahead adder is used with a weighted value to perform the positional multiplication.
-
2.
The partial product-driven layer multiplication reduces the number of required components.
-
3.
Use look carry ahead adder to add number.
-
4.
Partial products are generated at each stage by integrating layered quaternary logic-based adders, and carry is forwarded sequentially at the same stage to process the next two data values.
-
5.
This method is applied in hierarchical order and processed in two layers by using quaternary look carry ahead adders.
-
6.
In the final stage, the quaternary look-ahead carry adder is applied to generate the final products p1 to p8, as shown in Fig. 6.
The effectiveness of this Wallace Tree algorithm within the Baugh Wooley multiplier is to reduce the layers by improving the structure and components of the standard architecture. This improved structure reduces the number of layers. The layered-driven partial products are computed, and the final addition is performed at the last layer. The proposed architecture reduced the number of components and computations, which reduced the energy consumption and improved the performance of the multiplier in comparison with the existing multiplier architecture.
3.3 Quaternary Logic-Based Look Carry Ahead Adder
In this paper, a Quaternary logic and Wallace Tree-integrated Baugh Wooley multiplier architecture is proposed to reduce power consumption and improve architectural behavior. The main component used in this model is quaternary logic adaptive Carry Look Ahead Adder. It is a lightweight sequential processor that performs a series of adder operations. The look-ahead carry adder reduces the computation time by adding two quaternary numbers. It uses two signals called propagation and generation. The bit position-based carry propagation is performed at the LSB position. A block diagram of the quaternary logic bit-based full adder is provided in Fig. 7. It is composed of two half-adders. In this adder, the inputs use the generate and propagate structure and pass the carry sequentially. The first two bits are passed to the first half of the adder as input, and the result is taken as the sum and carry. This sum and the third quaternary digits are passed to the second half of the adder, and the sum and carry are generated as the final output.
In this research, the quaternary logic-based look ahead carry adder is used in different layers of the Wallace tree and Baugh-Wooly architecture. The circuit diagram of the look ahead carry adder is provided in Fig. 8. It can be used to perform large multiplication operations in terms of a group of bits. Here, g0, g1, g2, and g3 are the groups of bits. The unit generator is applied to these groups to obtain the predicted information called p0, p1, p2, and p3. The predicted information is collected for each block, and the carry is generated for processing them sequentially. As the block-based information is processed, the method provides higher performance with less power consumption. This standard architecture is improved in this work by specifying the groups in the form of quaternary logic bits. Figure 9 shows the block diagram of a 4-bit adder applied to the Quaternary logic number.
Figure 9 shows the functional architecture of a 4-bit adder that is composed of four quaternary logic-based look-ahead carry adders. The first two quaternary bits are passed as inputs to the first adder, and the sum and carry are taken as inputs. The sum is taken as the output, and the carry is passed to the next adder as the second input. The third input of 4 bits is also passed to the second adder. Based on these three inputs, the sum and carry are generated. This process is repeated until all four bits are not passed, and the functional output of the 4th adder is not obtained.
Figure 10 shows the functional block diagram of the quaternary carrier look-ahead adder. The inputs are processed initially with a specification of generation and propagation functions. This stage performs the bit grouping and transition of quaternary data to the adaptive bit form. Once this transformation is complete, the carry looks ahead adder is applied. The generated P and G are processed, and every bit position is specified. After processing the adder, the carry is obtained and processed in the final stage, where the addition is performed. Equation (6) shows the functional process of this quaternary carry look-ahead adder.
4 Results and Discussion
In this paper, a quaternary logic and Wallace tree improved Baugh Wooley multiplier is designed. The designed model used functional complexity and lightweight components to reduce energy consumption and improve the performance of the multiplier. The proposed improved multiplier is designed and simulated in the active high-density lipoprotein (HDL) environment. This multiplier is designed for 180 nm technology. This architecture is simulated with a 1.8w power supply. The performance of the proposed improved multiplier is analyzed against existing serial and parallel multipliers. A comparative evaluation is performed using latency and power consumption parameters. The simulation results of the proposed protocol are provided in Figs. 11 and 12.
The performance of the proposed improved multiplier is evaluated in terms of latency and power dissemination parameters. The energy dissemination is computed for the proposed layout using coherence vector simulation. The decreased energy dissemination confirms the reliability and effectiveness of the proposed architecture. The proposed architecture uses the partial product and level-based methods to reduce the complexity of the model. Performance was enhanced and functional processing delay is decreased with the addition of a quaternary look ahead carry adder as provided in the results of Figs. 13 and 14. The proposed architecture is compared against the CLA (Carry Look Ahead) Multiplier [43], QSDCLA (Quaternary Signed Digit-based Carry Look Ahead) Multiplier [43], Baugh Wooley Multiplier [44], Wallace Tree Multiplier [45], Quantum Digits and Component-based Wallace Tree Multiplier proposed by Hasan et al. and named this the Hasan Multiplier [46] and a fast adder improved Radix Adder [47].
Figure 13 provides the power dissemination-based comparative analysis to validate the performance of the proposed quaternary logic improved and Wallace tree algorithm-integrated Baugh Wooley multiplier against the existing multipliers. The Carry Look Ahead (CLA) multiplier uses smaller sequential and pipelined components that increase power dissemination. The total power consumption of this architecture for the given input is 9.192 µW. The QSD-CLA (Quaternary Signed Digit Number System-based Carry Look Ahead) multiplier improved the architecture by processing on quaternary signed digits. The architecture became simpler, which reduced the power consumption, and the power consumption decreased to 7.89 µW. The Baugh–Woolley multiplier is a well-known multiplier that ensures high-speed architecture design and decomposition. This simplified and effective multiplier reduced the power dissemination to 4.3 µW. The wallace tree multiplier uses the compressor adder and carry select adder as the integrated components. These are light processing components that reduce power consumption and achieve a power dissemination of 4.11 µW. The Hasan multiplier is an improved Wallace tree multiplier constructed by using quantum cell-based processing. This quantum processing-based method increased the architectural complexity and level-specific computations. The power consumption is increased, and a 9.96 µW power dissemination rate is achieved. The radix-4 multiplier was improved by integrating a fast adder as the functional component. The integration of this component increased the architectural complexity, which achieved high performance but was compromised by power dissemination. The power recorded for this architecture is 1.693 µW. In this proposed multiplier architecture, partial product and multilayered processing were performed to reduce power dissemination. The proposed compact and improved architecture achieved 1.693 µw of power dissemination. The results confirm that the proposed architecture improved the reliability and reduced the cost of multiplier functioning.
Figure 14 provides the latency-based comparative analysis to validate the performance of the proposed improved architecture against the existing multipliers. The bar graph shows that the CLA (carry look ahead) multiplier is the least effective, with a maximum latency of 0.03469 ns. It is a sequential and pipelined multiplier and takes extra time because the number of variables is increased. The QSD-CLA (Quaternary Signed Digit Number System-based Carry Look Ahead) multiplier improved the performance with a simpler circuit. It reduced the latency to 0.01636 ns. The simpler and lighter-weight Baugh-Weololey device improved the performance and effectively reduced the latency time. This architecture improved the latency to 0.00892 ns. A Wallace tree is a high-speed multiplier that uses partial product-based mapping at different layers. This layered architecture improved the performance and reduced the latency up to 0.006873 ns. The quantum cell and bit-based processing methods were adapted by the Hasan multiplier within the Wallace tree architecture. This quantum processing-based group bit processing reduced the delay using complex architecture. This architecture achieved a latency of up to 0.002658 ns. The radix-4 multiplier was improved by integrating the Ultrafast carry in the radix-4 multiplier. This ultrafast carry-based architecture reduced the latency to 0.0018962 and effectively improved the performance of the multiplier for n-bit multiplication. The proposed lightweight component-based multiplier reduced the latency time to 0.0008962 ns. This architecture improved the efficiency and reliability of the existing multipliers.
In recent years, various component-optimized, area and power-optimized Baugh-Wooley multipliers were designed by the researchers. In this sub-section, a comparative analysis is provided against a few recent methods against latency and power dissemination parameters. Pala et al. [48] used the RTL to GDSII generation method to optimize the physical design of the multiplier. The model was incorporated with 180 nm and 45 nm technologies and optimized under timing constraints, power, and area parameters. Rampeesa et al. [49] designed a low-power 4-big Baugh wooley multiplier by integrating the approximate full adder and 1-bit mirror full adder. The model was verified on different technologies from 16 to 90 nm. The optimization was achieved under different temperature values to reduce the latency. This architecture achieved high performance with slightly high power consumption. Ponugoti et al. [50] provided a four-big Baugh-Wooley multiplier with full-swing gate diffusion and technology. The arithmetic operations were improved to improve the capabilities against different applications. This model reduced the area and number of transistors in comparison with other designs. Haridas et al. [51] provided an analytical study of the radix 2 FFT butterfly unit to optimize the multiplier. The parallel prefix was included in the multiplier to reduce the delay and power consumption. The area of design was also reduced in comparison with a traditional multiplier. Gudivada et al. [4] identified the leakage current problem and introduced the quantum dot cellular automata to optimize the performance of nanotechnologies. The energy and area-optimized 1-bit adder using QCA was integrated within the multiplier to optimize energy consumption and reduce cell count. The comparative analysis against these methods and architectures is provided under power consumption and latency parameters in Figs. 15 and 16.
Figure 15 provides the analysis results against the power dissipation parameter. It shows that the power consumption of the model proposed by Pala et al. is a maximum of 9.342 µw. Most effective results are obtained by Ponugoti et al. and Gudivada et al. with the least power dissemination of 6.349 µw and 5.626 µw. The analysis results show that the proposed architecture reduced the power dissemination against all existing methods with the least power dissipation of 3.243 µw.
The results of the investigation against the latency parameter are presented in Fig. 16. The data indicates that the model put out by Haridas et al. has a maximum latency of 0.007321 ns. Ponugoti et al. and Gudivada et al. achieved the best outcomes with the least latency of 0.002145 ns and 0.002023 ns, respectively. The results of the analysis demonstrate that, compared to all current methods, the suggested design had the lowest latency of 0.001238.
5 Conclusion
In this paper, an improved Baugh Wooley multiplier is proposed to optimize the performance of tiny IoT devices and smart devices. The proposed architecture is integrated with a quaternary adaptive carry look-ahead adder. Functional improvement in the architecture was achieved by using the Wallace tree algorithm. The integration of the carry propagation chain within the architecture improved the performance and reduced power consumption. The use of quaternary logic within the architecture makes it suitable for small devices with high computational capability. The proposed architecture was implemented on 180 nm technology and a 1.8 W power supply. A computational evaluation was conducted on the delay and power dissipation parameters. The comparative evaluation was performed against the CLA Multiplier, Quaternary Signed Digit-based Carry Look Ahead (QSDCLA) Multiplier, Baugh Wooley Multiplier, Wallace Tree Multiplier, Hasan Multiplier, and Improved Radix Adder architectures. The analysis revealed that the latency decreased by the proposed multiplier was 0.03379 s against the CLA multiplier, 0.0155 s against the QSD CLA multiplier, 0.0080 against the Baugh-Wooley Multiplier,0.00598 against the Wallace Tree Multiplier, 0.0018 s against the Hasan Multiplier and 0.001 against the Radix-4 Multiplier. A significant performance and reliability gain is achieved under the power dissipation parameter. The proposed reduced power consumption is 7.499 w against the CLA multiplier, 6.197 w against the QSD CLA multiplier, 2.607w against the Baugh-Wooley Multiplier, 2.417 w against the Wallace Tree Multiplier, 5.267 against Hasan and 6.537w against the Radix-4 Multiplier. The comparative analysis is also provided against some of recent Baugh Wooley architecture and identified that the proposed model reduced the power dissipation and delay effectively.
Data Availability
No data or material is used in this paper.
Code Availability
Not available.
References
Sjalander M, Larsson-Edefors P (2008) High-speed and low-power multipliers using the Baugh-Wooley algorithm and HPM reduction tree. In: 15th IEEE international conference on electronics, circuits and systems, pp 33–36
Bhoi BK, Misra NK, Pradhan M, Rout R (2018) Synthesis methods of Baugh-Wooley multiplier and non-restoring divider to enhance primitive’s results of QCA circuits, vol 104, pp 237–245
Xiong X, Lin M (2012) Low power 8-bit Baugh-Wooley multiplier based on Wallace tree architecture. In: Emerging trends in computing, informatics, systems sciences and engineering; lecture notes in electrical engineering, vol 151, pp 851–865
Gudivada AA, Sudha GF (2020) Design of Baugh-Wooley multiplier in quantum-dot cellular automata using a novel 1-bit full adder with power dissipation analysis. SN Appl Sci. https://doi.org/10.1007/s42452-020-2595-5
Biradar VB, Vishwas PG, Chetan CS, Premananda BS (2017) Design and performance analysis of modified unsigned braun and signed Baugh-Wooley multiplier. In: International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), pp 1–6
Asati A, Chandrashekhar (2008) An improved high speed fully pipelined 500 MHz 8×8 Baugh Wooley multiplier design using 0.6 μm CMOS TSPC logic design style. In: Third international conference on industrial and information systems, pp 1–6
Muley VS, Tom A, Vigneswaran T (2015) Design of Baugh Wooley and Wallace tree multiplier using two phase clocked adibatic static CMOS logic. In: International Conference on Industrial Instrumentation and Control (ICIC), pp 1178–1183
Premananda B, Bhargav U, Vineeth KS (2018) Design and analysis of compact QCA based 4-bit serial-parallel multiplier. In: International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques (ICEECCOT), pp 1014–1018
Ramkumar B, Kittur HM (2013) Faster and energy-efficient signed multipliers. VLSI Des 2013:1–12
Tu SH-L, Yen C-H (2006) A high-speed Baugh-Wooley multiplier design using skew-tolerant domino techniques. In: IEEE Asia Pacific conference on circuits and systems, pp 598–601
Pai C-Y, Al-Khalili A, Lynch W (2003) Low-power constant-coefficient multiplier generator. J VLSI Signal Process Syst Signal Image Video Technol 35:187–194
Warrier R, Vun C, Zhang W (2014) A low-power pipelined MAC architecture using Baugh-Wooley based multiplier. In: IEEE 3rd Global Conference on Consumer Electronics (GCCE), pp 505–506
Tu J-H, Van L-D (2009) Power-efficient pipelined reconfigurable fixed-width Baugh-Wooley multipliers. IEEE Trans Comput 58(10):1346–1355
Abraham S, Kaur S, Singh S (2015) Study of various high speed multipliers. In: International Conference on Computer Communication and Informatics (ICCCI), pp 1–5
Akhter S, Chaturvedi S (2014) HDL based implementation of N×N bit-serial multiplier. In: International Conference on Signal Processing and Integrated Networks (SPIN), pp 470–474
Singh KN, Tarunkumar H (2015) A review on various multipliers designs in VLSI. In: Annual IEEE India Conference (INDICON), pp 1–4
Mohanty P, Rajan R (2012) An efficient Baugh-Wooley architecture for both signed & unsigned multiplication. Int J Comput Sci Eng Technol 3:1–5
Mukherjee A, Asati A (2013) Generic modified Baugh Wooley multiplier. In: International conference on circuits, power and computing technologies, pp 1–5
Lamba B, Sharma A (2018) A review paper on different multipliers based on their different performance parameters. In: 2nd International Conference on Inventive Systems and Control (ICISC), pp 324–327
Rajaa L, Prabhu BM, Thanushkodi K (2012) Design of low power digital multiplier using dual threshold voltage adder module. Procedia Eng 30:1179–1186
Paliwal P, Sharma JB, Nath V (2020) Comparative study of FFA architectures using different multiplier and adder topologies. Microsyst Technol 26:1455–1462
Tomar GS, George ML (2018) Modified binary multiplier architecture to achieve reduced latency and hardware utilization. Wirel Pers Commun 98:3549–3561
Renxi G, Shangjun Z, Hainan Z, Xiaobi M, Wenying G, Lingling X, Yan H (2009) Hardware implementation of a high speed floating point multiplier based on FPGA. In: 4th international conference on computer science & education, pp 1902–1906
Song L, Parhi KK (1998) Low-energy digit-serial/parallel finite field multipliers. J VLSI Signal Process Syst Signal Image Video Technol 19:149–166
Sabbagh S, Baseri J (2014) Optimization of serial-serial multiplier and implementation of a 4-bit multiplier. In: 22nd Iranian Conference on Electrical Engineering (ICEE), pp 476–479
Yugandhar K, Raja VG, Tejkumar M, Siva D (2018) High performance array multiplier using reversible logic structure. In: International Conference on Current Trends towards Converging Technologies (ICCTCT), pp 1–5
Hosseiny A, Amanollahi S, Hashemi R, Jahanian A (2013) Improved performance and resource usage of FPGA using resource-aware design; the case of a decimal array multiplier. In: 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013), pp 121–122
Raveendran S, Edavoor PJ, Kumar YN, Vasantha MH (2021) Inexact signed Wallace tree multiplier design using reversible logic. IEEE Access 9:108119–108130
Pudi V, Sridharan K (2013) Efficient design of Baugh-Wooley multiplier in quantum-dot cellular automata. In: 13th IEEE international conference on nanotechnology, pp 702–706
Bajaj R, Chhabra S, Veeramachaneni S, Srinivas MB (2009) A novel; low-power array multiplier architecture. In: 9th international symposium on communications and information technology, pp 119–123
Das D, Rahaman H (2010) A novel signed array multiplier. In: International conference on advances in computer engineering, pp 19–23
Di J, Yuan J (2003) Run-time reconfigurable power-aware pipelined signed array multiplier design. In: International symposium on signals, circuits and systems, vol 2, pp 405–408 2003
Bansal M, Nakhate S, Somkuwar A (2011) High performance pipelined signed 64×64-bit multiplier using Radix-32 modified Booth algorithm and Wallace structure. In: International conference on computational intelligence and communication networks, pp 411–415
Rajput RP, Swamy MS (2012) High speed modified Booth encoder multiplier for signed and unsigned numbers. In: 14th international conference on computer modelling and simulation, pp 649–654
Saokar SS, Banakar RM, Siddamal S (2012) High speed signed multiplier for digital signal processing applications. In: IEEE international conference on signal processing, computing and control, pp 1–6
Vakili S (2024) A cost-effective Baugh-Wooley approximate multiplier for FPGA-based machine learning computing. In: 6th International Conference on AI Circuits and Systems (AICAS)
Kishore P, Sirimalla R, Sushma KS, Reddy RS (2023) Implementation of braun and Baugh-Wooley multipliers using qca. In: 2nd International Conference for Innovation in Technology (INOCON)
Pakkiraiah C, Lakshmi AHNSV, Sucharita K, Raghuveer J (2024) Design and implementation of low power Baugh Wooley multiplier using reversible circuits. J Nonlinear Anal Optim 15(1):3092–3102
Raj KS, Kumar PR, Satyanarayana M (2023) Baugh-Wooley multiplier design using Multiple Control Toffoli and Multiple Control Fredkin reversible logic gates. Int Rev Appl Sci Eng 14(2):285–292
Kishore P, Akash B, Aditya G, Harika N (2024) Design and analysis of low power and high-speed Baugh Wooley multiplier using modified gate diffusion input technique. In: 3rd International Conference for Innovation in Technology (INOCON)
Beura SK, Devi BP, Saha PK, Meher PK (2024) Design of a Novel Inexact 4: 2 compressor and its placement in the partial product array for area, delay, and power-efficient approximate multipliers. Circuits Syst Signal Process 43(6):3748–3774
Thamizharasan V, Parthipan V (2024) Design of efficient binary multiplier architecture using hybrid compressor with FPGA implementation. Sci Rep 14(1):8492
Muralidharan V, Kumar NS (2020) Design and implementation of low power and high speed multiplier using quaternary carry look-ahead adder. Microprocess Microsyst 75:103054
Ganavi MG, Premananda BS (2020) Design of low power reduced complexity Wallace tree multiplier using positive feedback adiabatic logic. In: Advanced computing and intelligent engineering, pp 139–150
Pudi V, Sridharan K (2013) Efficient design of Baugh-Wooley multiplier in quantum-dot cellular automata. In: 13th IEEE International Conference on Nanotechnology (IEEE-NANO 2013)
Faraji H, Mosleh M (2018) A fast wallace-based parallel multiplier in quantum-dot cellular automata. Int J Nano Dimens 9(1):68–78
Hanninen I, Takala J (2009) Radix-4 recoded multiplier on quantum-dot cellular automata. In: International workshop on embedded computer systems, Berlin
Pala V, Makhe V, Bhuva K, Parekh R (2022) RTL to GDSII flow implementation of 8-bit Baugh-Wooley multiplier. In: IEEE International Conference on Nanoelectronics, Nanophotonics, Nanomaterials, Nanobioscience & Nanotechnology (5NANO)
Rampeesa A, Akhila P, Irfan M, Rebelli S, Thoutam LR, Ajayan J (2022) Design of low power 4-bit Baugh-Wooley multiplier using 1-bit mirror and approximate full adders. In: 2nd Asian Conference on Innovation in Technology (ASIANCON)
Ponugoti V, Oruganti S, Poloju S, Bopidi S (2021)"Design of Baugh-Wooley multiplier using full swing GDI technique. In: International conference on soft computing and signal processing
Haridas K, Sreehari KN, Chalil A (2021) Performance comparison of Radix-2 FFT butterfly unit with Baugh Wooley and modified Baugh Wooley. In: Second International Conference on Electronics and Sustainable Communication Systems (ICESC)
Funding
No funds were received for this research.
Author information
Authors and Affiliations
Contributions
All the authors participated in the model design, paper writing and coding. All the authors reviewed the paper and participated equally in preparing this paper.
Corresponding author
Ethics declarations
Conflict of interest
There are no conflict of interest, financial or others. On behalf of all the authors, I ensure ethics approval and participation in the research.
Ethics Approval and Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Juneja, K., Jangra, A. & Khurana, D. Design of a Quaternary Component and Wallace Tree Integrated Baugh-Wooley Multiplier. Int J Netw Distrib Comput 13, 3 (2025). https://doi.org/10.1007/s44227-024-00047-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44227-024-00047-8