Elsevier

Integration

Volume 37, Issue 2, May 2004, Pages 83-102
Integration

Direct connect device core: design and applications

https://doi.org/10.1016/j.vlsi.2003.11.002Get rights and content

Abstract

In this paper, we present the design and synthesis of “Direct Connected Device Core” (DCD-Core) as a low cost and low power consumption embedded system, which includes an electrical erasable programmable read only memory (EEPROM) controller for configurable address assignments. The main function of DCD-embedded system is to eliminate the operating system processing of network protocol stack running by personal computers CPU and simplifying network connection requirements. Not only the hardware solution requires connection simplification, but also it improves network performance. Our DCD-Core utilizes the concept of network channels, where Ethernet frames are delivered through a custom multicast addressing scheme. After validating DCD-Core embedded system simulation outputs, we synthesize the design in FPGA chip using Verilog Hardware Description Language (HDL). Performance measures like power consumption and area utilization are computed using Verilog HDL synthesis tools. Initial performance measures had shown that the DCD-Core reduces the power consumption. Thus, network devices may be powered through the network cable and eliminate the process of regular electrical power outlet installations and maintenance. This way, DCD-Core reduces the connection complexity in terms of device installations, especially for large number of devices (e.g. surveillance system cameras).

Introduction

Ethernet, or the IEEE 802.3 standard for local area networks (LANs), evolved from the 10 Mbps DIX standard networks [1], [2]. Newer versions of Ethernet are called Fast Ethernet and Gigabit Ethernet, which support 100 Mbps, and 1 Gbps, respectively [3], [4]. Ethernet has become a matured technology, and had overcome many technical problems over the past 3 decades. These problems include; collisions, protocol stack, and local bus bottlenecks.

The Ethernet original characteristic was sharing the same media among all transmitters. Ethernet transmitters listen for collision during their transmission period. If any collision takes place, all the transmissions involved are considered as failures and are retransmitted with random delays. Ethernet new versions, collision free, implementations used the concept of buffering packets in a centralized Ethernet controller/switch, which is connected to all transmitters in a star topology, using dedicated point-to-point twisted pairs. The buffer is large enough to hold multiple packets at the same time. When the Ethernet controller is busy, the buffer receives and stores the packet. If the buffer reaches to half its capacity, the Ethernet controller stops the transmitters from sending more packets. If the Ethernet controller becomes free, it sends packets out of its buffer immediately using first-in-first-out (FIFO) policy and no collisions take place. The buffering concept was an important lesson to learn in solving this problem. This collision free approach has been used in Gigabit Ethernet full duplex network controllers.

Since network controllers are connected to a personal computer (PC), there are other PC related performance limitations; including protocol stack processing by the PC operating systems, PCI bus speeds, and hard disk access times [5]. Often a bottleneck was reported in the operating system processing of the network protocol (e.g. TCP/IP or UDP/IP) or PCI bus interrupt handling.

To clarify this operating system problem processing protocol stacks, it was repeatedly found that Ethernet performance is limited to less than 10% of expected data rate. For example, in a test done in the Electrical Engineering Department, at the University of Tulsa, using large file transfer method through a dedicated LAN of 100 Mbps between two computers running Windows 98, an average data transfer speed of 9.26, 10, 15.56 Mbps was reported for file sizes of 13.7, 42.7, and 96.8 Mbytes, respectively [6]. In another test of a dedicated 100 Mbps LAN between two computers running Linux operating system, sending constant packet sizes of 992 bytes, the average packet transmit time was 4492 and 869.3 for TCP/IP and UDP/IP protocols, respectively [7]. Doing further computations, we find a 992-byte packet transmission delay is: 992B×(8b/B)×(1/100Mbps)=79.36μs, which is too little compared to the 896.3 μs. The rest of time is the operating system processing of the packet. Even with the low overhead UDP/IP, as compared to TCP/IP, the performance reported is not acceptable. Further, the relative Ethernet LAN performance of UDP/IP can be computed as79.36896.5−79.36×100=9.7%.

Due to the above problem, there is a wealth of research on bypassing this operating system bottleneck [8], [9], [10], [11], [12], [13], [14]. A typical approach is to use a network processor (NP) on the network interface card, which runs a hardware implementation of protocol stacks [15]. These NP can be reconfigurable as in [16], and may utilize the industry available intellectual properties of TCP/IP and UDP/IP cores in hardware. For example, the Internet-On-Chip™ [17] provided a TCP/IP protocol stack on chip, along with other high-level protocols. Another example is Bright Star Engineering [18], which has been marketing the IP-Engine. It is a credit card sized module that features a built in 10/100 BASE-T Ethernet interface as well as other lower speed interfaces. In both examples, power requirements, size and cost were disadvantages.

The persistent problem after using NP cards is local bus bottleneck. Because of the way the Peripheral Component Interface (PCI bus) is handled by the operating system, another bottleneck exists. A 32-bit-wide, 33 MHz bus can handle theoretically 1 Gbps (which is approximately 32 bits every on cycle with a frequency of 33 million cycles per second) [19], [20]. However, it is typically only about 10% of the time that the operating system allows the bus to be used for network transfers [5]. This means that the PCI bus would be limited for maximum network transfers of 100 Mbps, regardless of the available LAN bandwidth. This bottleneck is caused by the operating system handling of PCI. As long as NP cards are connected to PCI bus, the performance will be limited in high-speed networks. One solution to this problem is to change the operating system handling of PCI.

Another solution to the last two problems is proposed in this research. We transferred functions implemented in the operating systems (e.g. addressing, connection establishments, etc.) to a hardware core chip (like before), but to disconnect this core chip from the computer interface. An embedded system should be able to connect devices to the LAN independently, utilizing the concept of Ethernet network channels. Ethernet frames are addressed using channel addressing. This chip, the “Direct Connected Devices core” (DCD-Core) [21], [22], [23], is synthesized on an FPGA, to exploit the advantages of FPGA designs, like short design cycles. The constraints of designing this DCD core chip are size, power consumption, and cost. In doing so, we do not want or need to design a whole embedded computer.

One of the main differences between this research and other similar ones is that the DCD-Core includes a Custom Media Access Controller (CMAC). The difference between MAC and CMAC is that a MAC receiver accepts frames destined to its own unique address, in a point-to-point mode. MAC also allows multicast and broadcast addressing modes, where a receiver may accept frames having special addressing formats. In CMAC, the destination address is similar to that used in multicast addressing mode, except that it uses the channel concept. The receiver accepts frames based on channel number and not its own unique address. Since the CMAC does not have a point-to-point communication ability, it is simpler to design and more efficient in multicast-based applications, e.g. surveillance digital networks with DCD cameras and monitors.

The rest of the paper is organized as follows. In Section 2, the architecture of DCD-Core embedded system is presented and main functions of the major components are discussed. Section 3 gives the design details and specifications used like buffer size and clock rates. Section 4 presents simulation and output waveforms for the proposed DCD-Core design. Section 5 presents synthesis results and analysis. In Section 6, analysis of obtained results is given. Finally, Section 7 gives the conclusion and proposes future work.

Section snippets

DCD system architecture

As stated before, one of the DCD-Core main functions is to do a hardware realization of a Custom Media Access Layer, which performs data framing, buffering, error checking, and Ethernet frame addressing functions using channels. Fig. 1, shows the block diagram of the proposed system architecture. It consists of the DCD-Core, Generic Peripheral Interface (GPI), Ethernet Interface (PHY), two buffers, input and output, for the temporary storage of packets in both directions, and an electrical

DCD-core design

As shown in Fig. 1, the DCD core design consists of Custom Ethernet MAC, channel manager and EEPROM controller. The following subsections give design details for each part.

DCD-core simulation and output waveforms

Verilog Hardware Description Langua (HDL) [24] was used to design, simulate and verify the DCD-Core. Simulation input/output signals for both DCD-Core modules are verified.

In Fig. 11, the TXC clock, which is having a 40 ns period (25 MHz), is used. The transitions of TXD, TXEN, and Buffer read (R_) signals are synchronized with TXCs falling edge. At the rising edge of the R_, data read from buffer (FIFO_IN) are clocked-in and placed at the 4 Transmit Data lines (TXD), one nibble per cycle. This

Synthesis results

The DCD-Core had been synthesized to Ql4036 FPGA chip of 208 pins. The Chip consists of 672 buffered and un-buffered cells, organized as 28 columns by 24 rows.

Fig. 13 shows a full-scale synthesized DCD-Core chip with all connection paths drawn as connection lines between the cells. The design used 278 un-buffered and 415 buffered, cells. Thus, cell utilization is 41.4% and 61.8% of un-buffered and buffered cells, respectively. Routing resource utilization is only 16.1% (i.e. 7742 out of

Analysis

The CMAC transmit and receive sub-module are operating on 25 MHz clock signals, provided by the 100 Mbps Ethernet PHY. From the previous simulation results, we can see that, a byte is delivered by the CMAC per two cycles to the buffer. Therefore, the delivered data rate is 8b/2cycles×25MHz(cyclepersecond)=100Mbps.

The GPI interface size of 8 data bits, provided by an independent 50 MHz clock and one byte per cycle can deliver data rate up to 400 Mbps. In this case, the CMAC is slower than the GPI

Conclusion and future work

We had presented the design and synthesis of DCD-Core as a cost effective embedded system, which is also re-configurable by the use of dynamic EEPROM programming.

The main function of DCD-Core is to eliminate the operating system processing of network protocol stack running by regular computes in the non-DCD or by the dedicated network microprocessors. This will simplify the connection setup and improve network performance. It utilizes the concept of network channels.

The DCD-Core is designed and

Eng. Omar S. Elkeelany is a Ph.D. candidate in School of Interdisciplinary Computing and Engineering (SICE) Computer Science/Electrical Engineering division- University of Missouri- Kansas City (UMKC). He received his M.Sc. and B.Sc. degrees in the University of Alexandria-Faculty of Engineering, Egypt in Computer Science and Automatic control department. His research interests include neural networks, high-speed computer networks and the use of Hardware Description Language (HDL) in building

References (24)

  • R.M. Metcalfe

    Computer/network interface designlessons from Arpanet and Ethernet

    IEEE J. Sel. Areas Commun.

    (1993)
  • B.W. Abeysundara et al.

    High speed local area networks and their performance

    Comput. Surv.

    (1991)
  • Gigabit Ethernet Alliance, Gigabit Ethernet Migration, Technology Overview, 1997....
  • Moorthy, Vijay, Gigabit Ethernet, Survey of Gigabit Ethernet Technology, CIS Ohio State University, 1997....
  • R.E. Billings, P. Nichole, J. Potter, Wideband Networking, 3rd Edition, International Academy of Science, Independence,...
  • G. Ramamurthy, K. Ashenayi, Comparative study of the firewire IEEE-1394 protocol with the universal serial bus and...
  • R.C. Norris, D.M. Miller, Comparing the performance of IP over Ethernet and IEEE 1394 on a Java platform, in:...
  • Compaq, Microsoft, and Intel, Virtual Interface Architecture Specification Version 1.0, Technical Report, Compaq,...
  • M. Lauria, S. Pakin, A. Chien, Efficient Layering for High-Speed Communication: Fast Messages 2x, in: Proceedings of...
  • Myricom, Inc. The GM message passing system, Technical Report, Myricom, Inc., 1997, www.myri.com (last accessed 24...
  • P. Shivam, P. Wyckoff, D. Panda, EMP: Zero-Copy OS Bypass NIC-Driven Gigabit Ethernet Message Passing, Proceedings of...
  • Task Group of Technical Committee T11, Information Technology—Scheduled Transfer Protocol—Working Draft 2.0, Technical...
  • Cited by (6)

    • Applications of proton exchange membrane fuel cell systems

      2007, Renewable and Sustainable Energy Reviews
    • A Multi-core architecture for bi-network protocol conversion

      2007, 20th International Conference on Parallel and Distributed Computing Systems, PDCS 2007

    1. Download : Download full-size image
    Eng. Omar S. Elkeelany is a Ph.D. candidate in School of Interdisciplinary Computing and Engineering (SICE) Computer Science/Electrical Engineering division- University of Missouri- Kansas City (UMKC). He received his M.Sc. and B.Sc. degrees in the University of Alexandria-Faculty of Engineering, Egypt in Computer Science and Automatic control department. His research interests include neural networks, high-speed computer networks and the use of Hardware Description Language (HDL) in building new efficient designs. He has a teaching experience in computer architecture and design for three years in UMKC. Besides, he is also a faculty member in the Egyptian National Institute of Transport (ENIT) since 1995. He achieved an interdisciplinary Ph.D. Merit Award in UMKC School of Graduate Studies, 2001–2002 Academic year. He also had the honor of The Outstanding Graduate Student, School of Engineering, UMKC, 1999. He had his B.Sc. with Distinction and degree of honor - Alexandria University, 1992. He is a student member of Institute of Electronic and Electrical Engineers (IEEE) since 1999.

    1. Download : Download full-size image
    Prof. Ghulam M. Chaudhry, B.S. University of Punjab, Pakistan. M.S. Wayne State University, University of Multan, Pakistan. He holds the Ph.D. Wayne State University-Detroit, 1989 Currently; he is an associate professor of Division of Computer Science and Electrical Engineering in the School of Computing and Engineering, University of Missouri-Kansa City. Areas of Research interest are Computer Architecture and Parallel Processing, Performance of Multiprocessor Systems, Digital System Design, Neural Network Applications, Computer Network Management, ATM Architecture and Performance, and Verilog HDL. Among different other awards, he is the recipient of Good Teaching Award 2000, and Faculty Research Award 1997, Missouri University, College of Engineering. He is a Senior Member, Institute of Electrical and Electronics Engineers (IEEE), since 1998.

    The University of Missouri System Research Board has funded this research, award #782/2003.

    View full text