IMECO: A RECONFIGURABLE FPGA-BASED IMAGE ENHANCEMENT CO-PROCESSOR FRAMEWORK

Z. Salcic, J. Sivaswamy
University of Auckland, Department of Electrical and Electronic Engineering
Private Bag 92019, Auckland, New Zealand
{z.salcic, j.sivaswamy}@auckland.ac.nz

ABSTRACT
This paper presents a way to improve the computational speed of image contrast enhancement using low-cost FPGA-based hardware primarily targeted to X-ray images. The enhancement method considered here consists of filtering via the high boost filter (HBF), followed by histogram modification using histogram equalisation (HE). An image enhancement co-processor (IMECO) concept is proposed that enables efficient hardware implementation of enhancement procedures and hardware/software co-design to achieve high-performance, low-cost solutions. The co-processor runs on an FPGA prototyping ISA-bus board. It consists of two hardware functional units that implement HBF and HE and can be downloaded onto the board sequentially or reside on the board at the same time. These units represent an embryo of virtual hardware units that form a library of image enhancement algorithms. In trials with chest X-ray images performance improvement over software-only implementations was more than two orders of magnitude, thus providing real-time or near real-time enhancement.

1. INTRODUCTION
Contrast enhancement is a digital image processing methodology which has a wide range of applications in medical and non-medical areas. Various techniques have been developed for contrast enhancement to suit different types of images. These techniques use a transformation operation applied either globally, or adaptively, to enhance the contrast of given images [1] [7]. Global techniques are computationally simple and are suited to images with poor global contrast while adaptive techniques are more computationally expensive and are better suited to natural images with varying local contrast. In application areas such as radiology, the computational cost is an important factor affecting the efficiency of medical diagnosis.

In this paper, we present a way to improve the computational speed of image contrast enhancement primarily targeted to X-ray images. In particular, we consider an enhancement algorithm that consists of filtering followed by histogram modification. Filtering is done via the HBF which is based on unsharp masking, and the histogram modification is based on HE. Existing approaches to speeding up computations have primarily focussed on the adaptive version of histogram equalisation (AHE). These use linear interpolation to reduce computations [4], or employ several precomputed mapping curves which have to be manually selected [2]. Hardware solutions have also been proposed for the speed up problem in AHE. However, they are expensive, calling for MIMD parallel machines [3] or other specially designed parallel machines [5].

We have investigated the use of a low cost FPGA-based hardware that is simple to design for implementing both HBF and HE. (We present the global version of HE for a start, since extensions to AHE are straightforward.) FPGAs have become one of the prevailing technologies for fast prototyping and implementation of digital systems [6]. Being dynamically reconfigurable, FPGAs provide additional interesting features to hardware implementation of complex algorithms with performance exceeding both general-purpose and digital signal processor implementations. Using FPGAs we propose an image enhancement co-processor, IMECO, that enables efficient hardware implementation of enhancement procedures and hardware/software co-design. We present the IMECO framework in Section 2, and the HBF and HE implementation using IMECO, in the following sections.

2. FPGA-IMPLEMENTED IMECO FRAMEWORK
The IMECO framework consists of a standard PC and a general purpose FPGA prototyping ISA-bus board. The PC provides storage resources, programming facilities, flexibility, user-friendly interface (under the Windows environment), and controls the operation of IMECO. The FPGA board is used as a rapid prototyping system (RAPROS) for implementing and executing hardware implemented enhancement algorithms. IMECO is intended to contain a library of hardware implemented algorithms (configurations) which can be combined using software, into complex algorithms to suit specific needs. These hardware implemented algorithms can be viewed as functional units which appear as co-processors to the PC. At present, there are two units that implement HBF and (global) HE which can be downloaded onto the RAPROS sequentially or reside on the board at the same time. FPGAs on the RAPROS are used on a time-multiplexed basis to implement different functional units. Thus IMECO offers two key features, namely, custom-configurability and dynamic reconfigurability. The global organisation of the FPGA board used in our approach is illustrated in Figure 1.
With IMECO, complex image enhancement algorithms are prototyped. Different versions of the algorithm represent different hardware designs. Given an application, the user can select the implementation of computational units that suit best its requirement. A selected configuration, represented by a file on the PC host, is downloaded to the board. A program is used to store the input image to source SRAM memory locations. Then, it activates the functional units used in the enhancement algorithm (HBF & HE in our case), in turn produces the output image in destination SRAM chips. The output image is available either to other hardware units or the program for further processing.

Program control is only needed to control DMA transfers of the original and the final images to/from SRAM chips on the board and for the change of hardware configuration of the FPGAs. A user-friendly interface provides easy selection of the configurations that will be used in the algorithm, and subsequently loaded by a configuration loader.

3. HIGH-BOOST FILTER IMPLEMENTATION

The task of the HBF is to calculate for each pixel in the filtered image $P_{HB}(x,y)$, $(x, y = 0, 1, \ldots, N-1)$

$$P_{HB}(x,y) = \alpha P_{IN}(x,y) - P_{LP}(x,y) \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdots \cdOTS

$$
FPGA hardware and employs parallelism. At the beginning of each new column, contents of the FIFO is cleared and partial sums of the rows are calculated to be stored in the FIFO as the window slides downwards. The details of the algorithm, including the processing of boundary pixels, are not discussed in this paper.

**Figure 2. Data path for the HBF functional unit**

**4. HISTOGRAM EQUALISATION FUNCTIONAL UNIT**

The HE functional unit performs global histogram equalisation of the entire image. The global HE algorithm can be described as follows:

1. Compute histogram $H_A(k)$ of given image $A$ of size $N \times N$, with $A$ being the result of HBF operation, as

   $$H_A(k) = n_k$$

   where, $n_k$ is the total number of pixels in the image at the $k$th gray level, with $k = 0, 1, ..., L-1$.

2. Compute the equalisation value $s_k$ for each gray level $k$ as

   $$s_k = \text{Int}\left(\frac{L}{N^2} \sum_{j=0}^{L-1} H_A(j)\right)$$

   where ‘Int’ is an integer part of the calculated number.

3. Equalise and compute new image pixels $P_{HE}(x,y)$ as

   $$P_{HE}(x,y) = s_k$$

   if $A(x,y) = k$, then $P_{HE}(x,y) = s_k$ for every $x$ and $y$.

The data path for the HE functional unit is presented in Figure 3. The unit finds the input image in the memory block to which it was stored by HBF functional unit. Then, it reads all pixel values and accordingly increments the histogram value of the corresponding gray level. The histogram values are stored in a separate memory called Histogram Memory which was added to the existing FPGA board. The gray level of the pixel fetched from image memory represents the memory address of the histogram value that has to be incremented. An incrementer is used to do this operation and return the new histogram value to the corresponding histogram memory location. The second step of the algorithm takes place when all pixels from the input image have been processed. In this step, equalisation values for each gray level are calculated by reading the values from histogram memory and accumulating them to the previously found values. Only one pass of the histogram memory read operations and memory write operations is needed to calculate equalisation values which are now stored in histogram memory. This part of the algorithm is performed by an accumulator and an arithmetic block that carries out multiplication of the accumulated value by $\frac{L}{N^2}$ and rounding to the integer part. If we consider the expression $\frac{L}{N^2}$, we can see that usually we have $L = 2^b$, where $b$ is the number of bits which represent gray levels, and $N^2 = 2^{2w}$, where $2w > b$. Therefore, the $\frac{L}{N^2}$ will produce result in the form $2^q$, where $q$ is an integer and $q > 0$. This further means that the resulting accumulated sum must be divided by $2^q$ reducing the multiplication and division operation to a simple shift right (by $q$ bits) operation. The final result is stored in HE memory block which is essentially the block in which the image was stored (input and output memory for each algorithm swap their roles). The number of operations involved in the HE computation is now proportional to $N^2$.

**4. PERFORMANCE ANALYSIS**

A full implementation of the HBF and HE functional units have been performed using Altera’s FLEX8282 FPGAs. Several versions of each of the designs are stored in a library of configuration files and enable the end user to structure the algorithm by selecting appropriate files. Because of the small capacity of SRAM chips available on the board, the maximum size of the image which can be processed is
256x256. Larger images need to be partitioned into blocks of 256x256, and processed block by block under global software control. The maximum frequency at which functional units can operate is limited by the system clock on the board, to 10 MHz of the PC ISA-bus clock. The circuit simulation has shown that the maximum frequency can be increased to 30 MHz using FLEX8000 devices, effectively enabling tripped performance without any architectural or design change. The execution times for processing 256x256 image is presented in Table 1. Equivalent software implementations are in the order of several minutes, using much faster processors.

Table 1. Performance figures for a 256 x 256 X-ray image

<table>
<thead>
<tr>
<th>Operation</th>
<th>Time (sec)</th>
<th>Time (sec)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>10 MHz clock</td>
<td>30 MHz clock</td>
</tr>
<tr>
<td>HBF Straightforward</td>
<td>1.18</td>
<td>0.39</td>
</tr>
<tr>
<td>hardware implementation</td>
<td></td>
<td></td>
</tr>
<tr>
<td>HBF (simplified arithmetic)</td>
<td>0.36</td>
<td>0.12</td>
</tr>
<tr>
<td>HBF (full arithmetic)</td>
<td>0.42</td>
<td>0.14</td>
</tr>
<tr>
<td>HE</td>
<td>0.15</td>
<td>0.05</td>
</tr>
</tbody>
</table>

The HBF design versions require between two and four FLEX8282 devices, depending on the complexity of arithmetic circuits used. The result of processing a chest X-ray image using our HBF hardware is shown in Figure 4.

5. CONCLUSIONS

Our main goal was to find a low cost solution to increasing the computational speed of image enhancement algorithms. We have done that by substituting the software implementations with FPGA-based application-specific hardware. The HBF-HE combination has been chosen as an example for implementation in FPGAs. The algorithm can be customised to specific enhancement requirements, downloaded into FPGAs, and executed by a single host PC instruction. We have demonstrated the feasibility of the whole concept of a flexible algorithm execution using virtual hardware units, executed on the FPGA prototyping board. Our further research is directed towards other algorithms for image enhancement and their implementation using FPGAs to form a library of hardware-implemented algorithms. They are to be combined in any desired order or with different implementation variations on a PC/FPGA flexible hardware/software platform.

6. REFERENCES


Figure 4. Contrast enhancement of a chest X-ray image