|
|||||||||
![]() |
|
MORPHEUS (IST-4-027342) WP3 ARCHITECTURE.
Overview and Challenges Unless in some specific and very simple situations, today’s
reconfigurable computing platforms cannot be used as the sole computing
resources in a given system. In general, reconfigurable resources are
used in combination with standard computing resources and other devices
in a system that resembles the sketch drawn on Figure 1. The MORPHEUS
architecture target, as far as it has to comply with a broad range of
applications, is intended to be a complete and heterogeneous platform. hardware. In order to reach this goal and to increase the efficiency of today’s reconfigurable computing systems the following open issues have to be considered: a - Control and (dynamic) reconfiguration concept Up to now, requirements of embedded computing solutions (cost, mobility, functionalities) are typically translated by designers in area, energy and performances constraints and thus often lead to the specification of dedicated chips. In the same time, the explosion of the cost of development results in the need for flexible architectures taking advantage of high-level programming tools. Within this framework, static reconfiguration is used to adapt the architecture to the application. Then, computing resources and communications can be configured according to the application requirements. However, performances, energy constraints and low cost demand a clear breakthrough which can only be achieved through a stronger adaptation of the architecture to the application. For this purpose, dynamic reconfiguration is compulsory. It enables to optimise the architecture “on the fly” taking into account the current pattern of calculation, to implement either loop kernels, pipeline stages or taking advantage of data locality. Such kind of reconfiguration is only relevant if and only if mechanisms are established to speed the reconfiguration process. b - Modularity Modularity is a key aspect of the MORPHEUS approach. Architectural concepts, as already researched by several partners are quite heterogeneous. This is a huge opportunity if these differing architectures become scalable and modular. Another important aspect of modularity is the possibility to easily integrate the scalable and modular block into one architecture. For this reason generic interfaces have to be provided by the modules. It is denoted in Figure 1 that the definition of interfaces for logical and physical interconnection of the modules integrated into the reconfigurable architecture is one of the main challenges of this workpackage. The link of this modular HW platform with the toolset should be ensured by “tool-interfaces” providing the important aspects and requirements like simulation, debugging, verification and monitoring. c - Architectures for coarse and fine grain reconfigurable computing Coarse-grained reconfigurable architectures fill the gap between General Purpose Processors (GPP, DSPs) fine-grained FPGAs specialized hardware (ASICs). Reconfigurable architectures are flexible and provide a high degree of parallelism. They are built from a large number (typically in the range between 10 and 100) of processing elements with ALUs for signal processing algorithms. Applications are mapped for a certain time to the array while data flows through the network of operators (i.e. ALUs). After a certain number of data has been processed, the array can be reconfigured, thus the functionality of the nodes and the interconnection network is changed. This approach is well suited for streaming data with limited control flow. It is the intention of the project to improve the application space towards more control-flow oriented architectures. A more flexible coarse grained architecture needs to communicate very efficiently with the steering unit – typically a GPP - and must be integrated with low latency into the memory hierarchy (including dynamic reconfiguration). Coarse-grained architectures are designed for algorithms operating on word-level (e.g. 16 bit). However several algorithms (e.g. Entropy encoder in video codecs) are demanding fine-grained architecture such as FPGAs. Though eFPGAs are not in the focus of the project, we will utilise legacy eFPGAs in the SOC and will design efficient interfaces to the coarse grained architectures. Thus, if the algorithm was properly partitioned each of the architectures can operate in its optimal application space. The benefit is a better ratio of area vs. performance for the overall application without sacrificing flexibility. d - Efficient interconnection infrastructure While fine- as well as coarse-grained IP has progressed significantly during the last years, the resulting requirements on interconnect in terms of bandwidth, flexibility and efficiency have hardly been targeted by reconfigurable architecture research. Especially the huge opportunities of run time reconfiguration of interconnect are only marginally exploited so far. In order to do so, today’s dominant bus architectures need to be extended by reconfigurable high bandwidth point to point connections as well as suitable network-onchip (NoC) approaches. The heterogeneous and mixed-grain SOC architecture with its different possibilities to run tasks on the chip and also the flexibility for tasks to be migrated from one architecture tile to another forces to integrate a high performing and adaptive interconnection infrastructure. For this a run-time adaptable network with the possibility of changing the topology and protocol, e.g. exploiting also dynamically the trade-offs between packet and circuit-switched communication parts/phases, has to be developed and synthesized. In addition, the connection of the different cores with parallel memory modules has to be considered. To provide a fast data-throughput it has to be enabled, that bottlenecks for parallel memory access and inter-tile communication have to be avoided. To exploit the parallel mixed-grained architecture efficiently it is necessary to integrate more than one memory module connection resulting in determining a suitable trade-off in central/decentral (e.g. global/local) memory access interconnect topologies. e – Memory topologies Since reconfigurable SOC offer the potential to drastically increase processing power and efficiency especially in data oriented processing schemes, the bottleneck is passed on to the simultaneously growing requirements on the respective memory infrastructure. Intelligently organized on-chip memories – configured as local memory with user controlled DMA access or as transparent cache – become mandatory, because off-chip solution lack the required bandwidth and are unacceptable regarding power consumption and system costs. In addition memories in reconfigurable SOCs pose special challenges to the digital designer, since typical applications often require certain flexible access patterns (e.g. different word sizes, parallel access or different addressing modes). Hence, on-chip memories are probably the most mission critical components of today’s embedded signal processing systems. Generalized solutions and methodologies are not yet state of the art. It is one goal of the MORPHEUS approach to develop such methodologies and to extend the huge opportunities of (dynamic) reconfiguration to the memory infrastructure. f – System Integration The integration of a large number of different units (coarse and fine grained reconfigurable units, GPPs, and high bandwidth I/O and peripherals) demand for efficient simulation capabilities. Therefore contributing partners should deliver silicon-proven IP with new extension with test benches. During back-end processing tight cooperation ensures fast design iteration cycles in case of problems to reach the objectives. Special focus will be on the interconnect between the individual SOC modules (arrows of all colours) as well as on the topology and respective SOC integration of memories. Objectives
All developments in this work package will be driven by real needs of key application domains which are addressed in this project. Overall objectives:
|
|
|