129 lines
3.6 KiB
Markdown
129 lines
3.6 KiB
Markdown
# Golang AVX512 Fast Fourier Transform
|
|
|
|
This project implements a Fast Fourier Transform (FFT) using Go's x86 assembly dialect with AVX512 instructions for maximum performance on modern Intel processors.
|
|
|
|
## Features
|
|
|
|
- **AVX512 Optimized**: Uses the latest AVX512 vector instructions for maximum performance
|
|
- **Automatic Fallback**: Falls back to pure Go implementation if AVX512 is not available
|
|
- **Power of 2 Support**: Automatically pads input to the next power of 2 for optimal FFT performance
|
|
- **Complex Number Support**: Full support for complex128 data types
|
|
- **Inverse FFT**: Includes IFFT implementation for complete FFT functionality
|
|
|
|
## Requirements
|
|
|
|
- Go 1.21 or later
|
|
- Intel processor with AVX512 support (Skylake-X, Cascade Lake, Ice Lake, or newer)
|
|
- Linux x86_64 environment
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
go mod tidy
|
|
```
|
|
|
|
## Usage
|
|
|
|
```go
|
|
package main
|
|
|
|
import (
|
|
"fmt"
|
|
"complex128"
|
|
)
|
|
|
|
func main() {
|
|
// Create test data
|
|
data := []complex128{
|
|
complex(1, 0),
|
|
complex(2, 0),
|
|
complex(3, 0),
|
|
complex(4, 0),
|
|
complex(5, 0),
|
|
complex(6, 0),
|
|
complex(7, 0),
|
|
complex(8, 0),
|
|
}
|
|
|
|
// Perform forward FFT
|
|
fftResult := FFT(data)
|
|
fmt.Println("FFT Result:", fftResult)
|
|
|
|
// Perform inverse FFT
|
|
ifftResult := IFFT(fftResult)
|
|
fmt.Println("IFFT Result:", ifftResult)
|
|
}
|
|
```
|
|
|
|
## API
|
|
|
|
### `FFT(data []complex128) []complex128`
|
|
Performs Fast Fourier Transform on the input data. Automatically detects AVX512 support and uses the optimized assembly implementation when available.
|
|
|
|
### `IFFT(data []complex128) []complex128`
|
|
Performs Inverse Fast Fourier Transform to recover the original signal from the frequency domain.
|
|
|
|
## Performance
|
|
|
|
The AVX512 implementation provides significant performance improvements over the pure Go version:
|
|
|
|
- **Vectorization**: Processes 8 complex numbers simultaneously using 512-bit ZMM registers
|
|
- **Optimized Memory Access**: Uses aligned memory operations and efficient data movement
|
|
- **Reduced Function Call Overhead**: Critical loops are implemented entirely in assembly
|
|
|
|
## Implementation Details
|
|
|
|
### Algorithm
|
|
The implementation uses the Cooley-Tukey FFT algorithm with the following optimizations:
|
|
|
|
1. **Bit-Reversal Permutation**: Efficiently reorders input data for optimal memory access patterns
|
|
2. **Radix-2 Decimation**: Processes data in powers of 2 for maximum efficiency
|
|
3. **Twiddle Factor Optimization**: Pre-computes and broadcasts trigonometric values using AVX512
|
|
|
|
### Assembly Features
|
|
- **ZMM Registers**: Uses 512-bit vector registers for maximum throughput
|
|
- **SIMD Operations**: Leverages AVX512 instructions like `VMOVUPD`, `VADDPD`, `VSUBPD`
|
|
- **Broadcasting**: Uses `VBROADCASTSD` for efficient twiddle factor distribution
|
|
- **Memory Alignment**: Ensures optimal memory access patterns
|
|
|
|
## Building
|
|
|
|
```bash
|
|
# Build with optimizations
|
|
go build -ldflags="-s -w" -o fft
|
|
|
|
# Run
|
|
./fft
|
|
```
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
# Run tests
|
|
go test -v
|
|
|
|
# Benchmark performance
|
|
go test -bench=.
|
|
```
|
|
|
|
## Limitations
|
|
|
|
- Input length must be a power of 2 (automatically padded if necessary)
|
|
- Requires AVX512-capable processor
|
|
- Currently optimized for complex128 data types
|
|
- Assembly implementation is x86_64 specific
|
|
|
|
## Future Improvements
|
|
|
|
- Support for non-power-of-2 lengths using mixed-radix FFT
|
|
- Real-to-complex FFT optimization
|
|
- Multi-threaded implementation for very large datasets
|
|
- Support for other data types (float64, complex64)
|
|
|
|
## License
|
|
|
|
This project is open source and available under the MIT License.
|
|
|
|
## Contributing
|
|
|
|
Contributions are welcome! Please feel free to submit pull requests or open issues for bugs and feature requests. |