|
|||||||||
![]() |
|
MORPHEUS
application for the ICT 2008 event on 25-27 November 2008 in Lyon,
France.
The proposed
demonstration will
show a prototype toolset for programming an heterogeneous
reconfigurable
architecture. Some application examples will be provided and visitors
will be
able to test modifications of those examples. This demonstration is an
intermediate result of the FP6 IST MORPHEUS (027342) project after
almost three
years of developments. Reconfigurable technology
provides simultaneously high computing performance density and
flexibility.
However, controlling such components integrated within a system level
architecture is generally difficult to handle. Moreover, programming
reconfigurable components is often difficult because it usually
requires
hardware design skills. The toolset proposed
within the
project provides an easy to use interface to control the acceleration
of
critical sections of applications on reconfigurable technologies from a
system
programming point of view. That is to say that all the burden related
to
setting up the configuration, calling the accelerator, managing data
exchanges
and synchronising several accelerators is managed by the compiler
thanks to
this interface. Also, the design of the
accelerated part of the application on the reconfigurable unit is
supported by
the proposed toolset. That is to say that a graphical interface permits
to
describe the accelerated function in a manner that helps to express and
exploit
the inherent parallelism from which high performance can be obtained. The proposed toolset
approach is
notably based on a programming model that consist in inserting pragmas
in the C
code of the application to identify the functions that have to be
accelerated
on the reconfigurable units. The management of the acceleration at
system level
is then performed through specific operating system services developed
to
handle the dynamic reconfiguration. The design of the
accelerated
function itself uses a graphical capture interface ensuring the good
management
of data blocks, corner-turns, etc. The correct logic of interconnection
and
data reorganization between elementary processing function is
automatically
generated. These elementary functions are also provided in C code. The
tool
then generates code for the various technologies. This toolset is developed
to
program the chip that is also developed during the project. The
realisation
will be done on a 90 nm technology after the tape-out that is planned
in
October this year (thus just before the ICT event). This chip is build
around
an heterogeneous architecture with a NOC making the interconnection
between 3
different reconfigurable units using 3 different reconfigurable
technology
grains (from fine grain to coarse grain). Each of the 3 technologies is
provided with its own tools with different programming interfaces that
will be
included and homogenised in the presented toolset. The MORPHEUS project presented its status and results in special session num. 3 at the VLSI SoC 2008 Conference in a 30 minutes speech. The accompaning paper is included in the conference proceedings. A paper has
from the MORPHEUS project was presented at the RAW 2008 conference. Authors: Sean Whitty, Rolf Ernst Institute of Computer and Communcation Network Engineering Technical University of Braunschweig, Germany Abstract High-end applications designed for the MORPHEUS computing platform require a massive amount of memory and memory bandwidth to fully demonstrate MORPHEUS's potential as a high-performance reconfigurable architecture. For example, a proposed film grain noise reduction application for high definition video, which is composed of multiple image processing tasks, requires huge amounts of bandwidth due to its large input image size and real-time processing constraints. To meet these requirements and to eliminate external memory bottlenecks, a bandwidth-optimized DDR-SDRAM memory controller has been designed for use with the MORPHEUS platform and its Network On Chip interconnect. This paper describes the controller's architecture, including the interface to the Network On Chip and the two-stage memory access scheduler, and presents relevant experiments and performance figures. Introduction Reconfigurable architectures have opened the door to exciting new research directions and application domains, many of which have been heavily investigated in recent years. One such project, the ``Multi-purpose Dynamically Reconfigurable Platform for Intensive Heterogeneous Processing'' (MORPHEUS) project, is a European Integrated Project (IST 027342) which addresses innovative solutions for embedded computing based on a dynamically reconfigurable platform and a corresponding toolset~\cite{thoma:morpheus}. Its goal is to provide a flexible heterogeneous platform for HW/SW co-design via a unique architecture, composed of reconfigurable computing units of varying granulatity, as well as an integrated toolset that can be utilized to easily map and implement target applications. The potential of the MORPHEUS platform will be demonstrated in several application domains. These include reconfigurable broadband wireless access and network routing systems, processing for intelligent cameras used in security applications, and film grain noise reduction for use in high definition video. The image-based applications have been shown to exhibit immense memory needs. For example, digital film applications require an image resolution of 2K \footnote{2K implies 2048x1536 pixels/frame, 30 bits/pixel, and 24 frames/s}, with data rates of up to 2.1 GiBit/s necessary for real-time operation. Higher resolutions of up to 4K and even 8K are on the horizon. Satisfying such memory requirements is no easy task. SDRAM interfaces have long been a performance bottleneck, especially in network processing and multimedia applications. A recurring issue with modern DRAM architectures is relatively long access latencies. DDR-SDRAM and DirectRamBus DRAM (RDRAM) attempt to reduce these latencies by accessing several consecutive data words. This burst access technique, however, only lowers latencies and does not increase bandwidth. To this end, optimizations such as bank interleaving, which exploits the internal structure of DRAMs by accessing a second bank while another is busy, and request bundling, or the grouping or reads and write requests into groups, can be used to ensure maximum possible throughput across the SDRAM data bus. Using such techniques to increase throughput naturally increases access latencies, as do complex access patterns. However, applications developed for the MORPHEUS platform are bandwidth hungry and can tolerate such latencies. Furthermore, the onboard ARM processor is used for control purposes and is not expected to make extended use of external memory. Therefore, a bandwidth-optimized memory controller, designed to serve the needs of high-performance reconfigurable architectures such as the MORPHEUS platform, is presented in this paper. For flexibility, the design also supports multiple service levels to reduce latency when necessary. After a brief overview of related work, the Architecture itself is defined, finally synthesis and performance results are examined. Date 2008 conference. THALES is presenting a paper
entitled "Definition and SIMD
Implementation of Multi-Processing Architecture Approach on FPGA"Autors: Ph. Bonnot, G. Gaillat et al, 13/3/2008. The DATE 2008 friday workshop
(14/3/2008) entitled “The Run-Time Reconfigurable and Heterogeneous
MORPHEUS Platform” will be presented by Ph. Bonnot. ARCES, ST, UK, THALES are
presenting a paper entitled "Design of a HW/SW Communication
Infrastructure for a heterogeneus
reconfigurable processor". Abstract: Reconfigurable
architectures and NoC (Network-on-Chip) communication
systems have introduced new research directions for technology and
flexibility issues, which have been largely investigated in the last
decades. Exploiting the flexibility of reconfigurable architectures,
the run-time adaptivity through run-time reconfiguration, opens a
new area of research by considering dynamic reconfiguration.
In this paper, we present the architecture and associated development
tools of a new heterogeneous reconfigurable SoC. This SOC integrates
units of various sizes of reconfiguration granularity. Moreover, the
included NoC approach demonstrates the mentioned benefits and
scalability for actual and future SoC design. Spatial and sequential
design capabilities of the toolset permit mapping and execution of
the target applications. The toolset involves compilation
optimization techniques to schedule macro-operand level instruction
configuration and execution making benefit of a set of dynamic
reconfiguration services. Micro-operand level instructions
accelerated on reconfigurable units are designed thanks to
data-parallelism mapping and high level synthesis techniques.
On a reference CMOS090 implementation the described interconnect infrastructure is works at the system reference frequency of 200 MHZ sustaining the run-time bandwidth required by the different HREs, requiring a share of 5% in area and <1% in power consumption wrt the overall SoC. Arces and ST are presenting a
paper
at DATE'08 on the implementation of LFSR-based
applications (e.g. CRC, Scrambler) on DREAM/PiCoGA. Title: "Implementation of Parallel LFSR-based Applications on an Adaptive DSP featuring a Pipelined Configurable Gate Array" Authors: C.Mucci, L.Vanzolini, I.Mirimin, D.Gazzola, A.Deledda - ARCES, University of Bologna L.Ciccarelli, F.Campi - STMicroelectronics Abstract: Linear feedback shift registers (LFSRs) are common structures in many application fields, including cryptography, digital braodcasting and communication. High-throughput demands require highly parallel implementations, usually accomplished in state of the art system on chips (SoCs) with application specific coprocessors. Although this approach achieves the required performance, it rapidly shows lack of flexibility when those devices are proposed, as an example, for multi-standard modems or for security applications in which run-time update can provide added value. This paper shows the implementation of parallel LFSR-based applications on an embedded adaptive DSP featuring a Pipelined Configurable Gate Array (PiCoGA). With respect to standard embedded FPGAs, pipelined devices usually provide better performance, e.g. in term of speed, but they commonly show the undeniable drawback of additional design constraints. As a test-case, we consider the implementation of the 32-bit CRC used in the Ethernet standard that achieve on the target architecture up to ~25 Gbit/sec throughput, with a parallel LFSR processing 128 bit at time, which is comparable to the performance offered by some ASIC devices. University of Technology Chemnitz
and Alcatel-Lucent are pesenting a paper entitled "A Prototype of
reconfigurable Network Application" Authors: Uwe Proß, Sebastian Goller, Marko Rößler, Ulrich Heinkel, Axel Schneider, Joachim Knäblein Abtract: High-end telecommunication network technologies are underlying a rapid evolution. On the one hand, newer network technologies and standards provide often better bandwidth utilization and a higher quality of service. Network providers are interested in an early adaption of new technologies and standards for optimal network exploitation. This requires manufacturers of telecommunication equipment to develop rapidly and implement early the new technologies. On the other hand, long lasting standardization processes force manufacturers into risky development strategies, since an early time to market collides with probably unstable network standards. This causes high business risks. Design re-spins and updates of the network devices resulting from this situation can cause enormous costs. Since the overall situation cannot be changed, a solution is required which lowers the development risks. In this paper, we present the implementation of a prototype of an ethernet node, which can be dynamically reconfigured using the ethernet protocol. The ethernet node is a prototype of a later application on the MORPHEUS platform in order to show the capabilities of SoCs with embedded reconfigurable technologies. The implemented application is a reconfigurable network node based on the ethernet protocol. The overall goal is to develop a system that can monitor the data stream received via ethernet and identify reconfiguration data on the basis of the EtherType field in the ethernet header. The reconfiguration data is then extracted from the datastream and stored in a memory. After the complete data stream has been received and no transmission errors have been found by checksum calculation the reconfiguration of the network node starts. For demonstration purposes, the node is implemented on a prototype platform consisting of two Xilinx boards (XUP Development Board). These boards feature a Virtex-II Pro XC2VP30, an ethernet interface including an onboard PHY, a RS232 interface and onboard DDR RAM (256 MB). Since the prototype platform emulates an embedded FPGA placed on an ASIC it has been divided into corresponding parts. The first board emulates the embedded FPGA macro and contains an ethernet MAC. The MAC receives the ethernet data stream generated by an external PC. After receipt the content of the ethernet packages is checked for reconfiguration data by checking the EtherType field of the header. In case a reconfiguration packet has been identified, the payload of the package is copied from the data stream and sent to the second board via RocketIO. The reconfiguration data received on the second board is stored in the onboard DDR RAM. After the complete configuration stream has been received the PowerPC core on the second board is used to calculate the CRC of the configuration stream. If no errors are detected, the reconfiguration of the XC2VP30 on the first board is initialized. The success of the reconfiguration is shown by fixing an initially faulty ethernet MAC. Ethernet packets contain a CRC32 checksum to verify their payload. This CRC32 calculation produces wrong results in the initial ethernet MAC. To make this error visible, all received ethernet packets are sent back to its source. The CRC32 calculation error is detected by the PC by receiving incorrect ethernet packets. After reconfiguration, the CRC32 error is fixed. The MAC can be monitored and controlled by the PowerPC core, which is implemented in the XC2VP30. The software running on this core provides access to all registers of the MAC. A separate RS232 connection between the external PC and the board allows an ethernet independent communication so the functionality of the MAC can be checked even if the ethernet connection has a malfunction. MORPHEUS integrated toolset,
presentation at the university booth.Florian thoma, Matthias Kunle, Michael Hubner, Jurgen Becker, Klaus D. Muller-Glaser. Universitat Karlsruhe (TH) Germany The European MORPHEUS project addresses a technology breakthrough for embedded computing by developing a reconfigurable platform and the corresponding toolset. This paper details the integrated toolset. MORPHEUS copes with the challenges of rising complexity and the enlarging design productivity gap by developing a global solution based on a modular heterogeneous System-on-Chip (SoC) platform providing the disruptive technology of dynamically reconfigurable computing including a software oriented design flow and a consistent toolset. The toolset supports retargetable compiling, spatial design and dynamic control. Flexim Real-time Digital Film
Processing with a FPGA-based Reconfigurable Platform, presented at the
university booth.![]() S. Guyetant from CEA-List is going to present a paper into the french-speaking SYMPA08 conference, held in Fribourg, Switzerland (http://gridgroup.tic.hefr.ch/renpar/). The paper, entitled "Predictive configuration service for heterogeneous reconfigurable multicore platform" shows the use of the PCM in the presented Morpheus platform. Abstract: "This article describes a reconfiguration management service by predictive pre-fetch useful for multicore architectures composed of heterogenous reconfigurable cores. The goal is to hide reconfiguration latencies due to large sized bitstream transfers, therefore getting a higher dynamicity of reconfiguration. We present the implementation of the prefetch service and its functionnal validation. The architecture of the european Morpheus project is presented as an example for this validation: using simplified application graphs, we show how to hide the reconfiguration overhead." L. Lagadec from UBO will present a paper entitled "Chaine de programmation pour architecture hétérogène reconfigurable" (CAD tool for reconfigurable heterogeneous architecture) it shows the use of the CDFG intermediate format in the Morpheus platform (Spatial Design). Abstract: To take advantage of the innovative heterogeneous reconfigurable systems on chip architectures, new software tools are required. Based on a computing model, a software chain must allow to migrate from a high level specification to an hardware configuration coupled with a data transfer control. This paper proposes a solution in the form of a complete chain including: (1) a computation model, (2) a portable data structure allowing the integration of third-party tools to consitute a complete software chain, and (3) a low level generic backend performing the applicative synthesis in a reconfigurable scope. |
|
|