Some more papers on register allocation, mostly for embedded processors, vector processors, irregular register files, and energy consumption:
- An efficient technique for exploring register file size in ASIP synthesis, in CASES’02.
- An efficient technique for exploring register file size in ASIP design, update in TCAD’04.
- Integrated on-chip storage evaluation in ASIP synthesis, in ICVD’05.
- Efficient architecture/compiler co-exploration for ASIPs, CASES’02.
- Compiler design issues for embedded processors, from Aachen, in IEEE Design and Test of Computers 2002.
- Optimized address assignment for DSPs with SIMD memory accesses, in ASP-DAC’01.
- Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware, from Seoul National U., in CASES’08,
- Register allocation and binding for low power, from USC, in DAC’95.
- Partitioned register file for TTAs, in microarchitecture 1995.
- Very wide register: an asymmetric register file organization for low power embedded processors, in DATE’07.
- Compiler-Driven Leakage Energy Reduction in Banked Register Files, in PATMOS’06.
- Partitioned register files for VLIWs: a preliminary analysis of tradeoffs, in microarchitecture 1992.
- Register packing: Exploiting narrow-width operands for reducing register file pressure, in microarchtiecture 2004.
- Energy-efficient register caching with compiler assistance, in TACO’09.
- Exploring the limits of early register release: Exploiting compiler analysis, TACO’09.
- SPARTAN: speculative avoidance of register allocations to transient values for performance and energy efficiency, in PACT’06.
- Selective writeback: exploiting transient values for energy-efficiency and performance, in ISLPED’05.
- A case for a complexity-effective, width-partitioned microarchitecture, in TACO’06.
- Selective Writeback: Reducing Register File Pressure and Energy Consumption, update in TVLSI’08.
- Early Register Deallocation Mechanisms Using Checkpointed Register Files, TC’06.
- The energy complexity of register files, in ISLPED’98.
- Code Optimization Techniques for Embedded DSP Microprocessors, in DAC’95.
- Efficient register and memory assignment for non-orthogonal architectures via graph coloring and MST algorithms, in LCTES’02.
- Simultaneous reference allocation in code generation for dual data memory bank ASIPs, in TODAES’00.
- Register allocation for irregular architectures, in LCTES’02.
- Vector register allocation, in TC’92.
- Register windows vs. register allocation, in PLDI’88.
- On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems, in TODAES’00.
- Global register partitioning, in PACT’00.
- Low power register file architecture for application specific DSPs, in ISCAS’02.
- Multiple-banked register file architectures, in ISCA’00.
- Effective loop partitioning and scheduling under memory and register dual constraints, in DATE’08.
- Progressive spill code placement, in CASES’09.
Some random collections on register allocation:
- Linear Scan Register Allocation on SSA Form, in CGO’10, from UC Irvine.
- Efficiently computing static single assignment form and the control dependence graph, in TOPLAS’91, from IBM.
- Linear scan register allocation, in TOPLAS’99, from MIT and IBM.
- Quality and speed in linear-scan register allocation, in SIGPLAN Notices’98, from Harvard.
- Programming and compiling for embedded SIMD architectures, PhD thesis of Anton Lokhmotov in 2008, from the SPO group leaded by Paul Kelly.
- Compiler-directed high-level energy estimation and optimization, from PSU, in TECS’05.
- Energy aware compilation for DSPs with SIMD instructions, in SIGPLAN Notices 2002, from Dortmund.
- Code selection for media processors with SIMD instructions, in DATE’00, from Dortmund.
- Precise register allocation for irregular architectures, in microarchitecture’98, from UC Davis.
- Register allocation for irregular architectures, in LCTES’02, from TU Vienna.
Some links on GCC register allocation:
Hatsune Miku, a vocaloid, held a concert (video clip) in Tokyo in March 2010. The rending quality is stunning.
A heavy update from gpucomputing.net:
- Evaluation of Multi-core Architectures for Image Processing Algorithms, a master thesis in Clemson. Much better than mine. Shame on me.
- Image Processing for Multiple-Target Tracking on a Graphics Processing Unit, from air force.
- GPU Acceleration of Object Classification Algorithms Using NVIDIA CUDA, from Rochester Tech.
- Exploring the Multiple-GPU Design Space, from NUCAR of NEU.
- Performance Analysis of Accelerated Image Registration Using GPGPU, from Nortre Dam.
- Hardware and Compute Abstraction Layers For Accelerated Computing Using Graphics Hardware and Conventional CPUs, in HPEC 2007.
- Detection and Tracking of Human Subjects, from Skadron’s group.
- Architecture-Aware Optimization Targeting Multithreaded Stream Computing, from the NUCAR group again.
- A Survey of Medical Image Registration on Multicore and the GPU, from ANU in Australia.
- A comparative study on ASIC, FPGAs, GPUs and general purpose processors in the O(N2) gravitational N-body simulation, from Nagasaki U in Japan.
- Acceleration and Energy Efficiency of a Geometric Algebra Computation using Reconfigurable Computers and GPUs, from TU Darmstadt. It should also work for to Image Algebra.
- Robust GPU-assisted camera tracking using free-form surface models, in 2007.
- Simultaneous and Fast 3D Tracking and Multiple Faces in Video by GPU-based Stream Processing, in ICASSP 2008.
- High Speed Articulated Object Tracking using GPUs: A Particle Filter Approach, in 2009.
- GPU-accelerated Real-Time 3D Tracking for Humanoid Autonomy, by AIST in Japan and CMU.
- Real-Time Optical Flow Calculations on FPGA and GPU Architectures: A Comparison Study, from BYU. GPUs seem to have an edge over FPGA for this application.
- Real-time Stereo-Image Stitching using GPU-based Belief Propagation, from TU Darmstadt.
- Neural Network Implementation using CUDA and OpenMP
- Real-time Motion-based Gesture Recognition using the GPU, from Simon Fraser.
- Stereo Depth with a Unified Architecture GPU, from UCSD in CVPR 2008.
- Why do Commodity Graphics Hardware Boards (GPUs) work so well for acceleration of Computed Tomography?, from SUNY Stony Brook in 2007.
Creation of a Bacterial Cell Controlled by a Chemically Synthesized Genome, in Science May 2010. The first synthetic cell. John Craig Venter is on the spotlight again.
GLVLSI 2010 has some interesting papers:
CF 2010 has some interesting works.
Preprints of some papers in ECVW 2010 are available.
Exploiting Memory Access Patterns to Improve Memory Performance in Data Parallel Architectures, from David Kaeli’s NUCAR group, to appear in TPDS.