Files
esp32-opencv/esp32/doc/optimization.md
Joachim 58d3b77970 Started to optimize OpenCV for the ESP32
- Using float instead of double for floating point matrix multiplications (in core/src/matmul.simd.hpp) reduces a lot the computation time
2020-05-20 14:12:33 +02:00

1.4 KiB

Optimization

This doc details some optimizations done for OpenCV to run faster on the ESP32.

Activating optimization

To activate the optimization for the ESP32, the CMake parameter -DESP32_OPTIMIZATION=ON must be enabled. Every optimization done will be disabled if this parameter is OFF.

Floating point support

The ESP32 only have a single precision Floating Point Unit (no double precision). Therefore, OpenCV functions using double types are very slow.

Matrix multiplications

In files core/matmul.dispatch.cp and core/matmul.simd.hpp.

Results by multiplying 100x6 * 6x100 matrices:

  1. Initial test : 60 ms
  2. Changing alpha and beta from double to float in GEMMsingleMult() function: 12ms
  3. Changing alpha and beta from double to float in gemmImpl() function: 4.6ms

Results by multiplying 150x100 * 100x150 matrices:

  1. Initial test: 2757ms
  2. Changing double in GEMMStore() function: 888ms

Esp-dsp library

The ESP32 processor has the following hardware:

  • 16/24-bit Instruction Set
  • Support for FPU (Floating Point Unit)
  • Support for DSP instructions
    • 32-bit integer multiplier
    • 32-bit integer divider
    • 40-bit MAC (Multiply-Accumulate)

The esp-dsp library (https://github.com/espressif/esp-dsp) provides functions written in assembly to use this hardware.

This part describes which functions are used where in OpenCV for better performances.