The proceeding of Microarchitecture 2010 is released. Some interesting papers are:
- Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior, a scheduling algorithm that takes both system throughput and fairness into account. Scheduler classifies threads into two types: latency-sensitive and bandwidth-sensitive.
- Task Superscalar: An Out-of-Order Task Pipeline, it opts for implicit specification of task level parallelism.
- Many-Thread Aware Prefetching Mechanisms for GPGPU Applications, improving SW/HW prefetching in the context of SIMT architecture.
- Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?, a grand model was built for the exploration. The model may over-simplify the architecture details.
- Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels, everything is for the branch.
- Register Cache System Not for Latency Reduction Purpose, but for area and energy reduction purpose.
- Erasing Core Boundaries for Robust and Configurable Performance, from U Mich.
- Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors, compacting the execution of identical control flows in SPMD.
- Understanding the Energy Consumption of Dynamic Random Access Memories, from Rambus.
- Throughput-Effective On-Chip Networks for Manycore Accelerators, utilizing the characteristics of Bulk Synchronous Parallelism (BSP) model to design efficient NOC.
- ReMAP: A Reconfigurable Heterogeneous Multicore Architecture, supporting point-to-point communication for pipeline parallelism and barrier synchronization.