initial commit
After Width: | Height: | Size: 83 KiB |
After Width: | Height: | Size: 85 KiB |
After Width: | Height: | Size: 75 KiB |
BIN
doc/tutorials/gapi/anisotropic_segmentation/pics/result.jpg
Normal file
After Width: | Height: | Size: 34 KiB |
BIN
doc/tutorials/gapi/anisotropic_segmentation/pics/segm.gif
Normal file
After Width: | Height: | Size: 183 KiB |
BIN
doc/tutorials/gapi/anisotropic_segmentation/pics/segm_fluid.gif
Normal file
After Width: | Height: | Size: 220 KiB |
@ -0,0 +1,401 @@
|
||||
# Porting anisotropic image segmentation on G-API {#tutorial_gapi_anisotropic_segmentation}
|
||||
|
||||
[TOC]
|
||||
|
||||
# Introduction {#gapi_anisotropic_intro}
|
||||
|
||||
In this tutorial you will learn:
|
||||
* How an existing algorithm can be transformed into a G-API
|
||||
computation (graph);
|
||||
* How to inspect and profile G-API graphs;
|
||||
* How to customize graph execution without changing its code.
|
||||
|
||||
This tutorial is based on @ref
|
||||
tutorial_anisotropic_image_segmentation_by_a_gst.
|
||||
|
||||
# Quick start: using OpenCV backend {#gapi_anisotropic_start}
|
||||
|
||||
Before we start, let's review the original algorithm implementation:
|
||||
|
||||
@include cpp/tutorial_code/ImgProc/anisotropic_image_segmentation/anisotropic_image_segmentation.cpp
|
||||
|
||||
## Examining calcGST() {#gapi_anisotropic_calcgst}
|
||||
|
||||
The function calcGST() is clearly an image processing pipeline:
|
||||
* It is just a sequence of operations over a number of cv::Mat;
|
||||
* No logic (conditionals) and loops involved in the code;
|
||||
* All functions operate on 2D images (like cv::Sobel, cv::multiply,
|
||||
cv::boxFilter, cv::sqrt, etc).
|
||||
|
||||
Considering the above, calcGST() is a great candidate to start
|
||||
with. In the original code, its prototype is defined like this:
|
||||
|
||||
@snippet cpp/tutorial_code/ImgProc/anisotropic_image_segmentation/anisotropic_image_segmentation.cpp calcGST_proto
|
||||
|
||||
With G-API, we can define it as follows:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi.cpp calcGST_proto
|
||||
|
||||
It is important to understand that the new G-API based version of
|
||||
calcGST() will just produce a compute graph, in contrast to its
|
||||
original version, which actually calculates the values. This is a
|
||||
principal difference -- G-API based functions like this are used to
|
||||
construct graphs, not to process the actual data.
|
||||
|
||||
Let's start implementing calcGST() with calculation of \f$J\f$
|
||||
matrix. This is how the original code looks like:
|
||||
|
||||
@snippet cpp/tutorial_code/ImgProc/anisotropic_image_segmentation/anisotropic_image_segmentation.cpp calcJ_header
|
||||
|
||||
Here we need to declare output objects for every new operation (see
|
||||
img as a result for cv::Mat::convertTo, imgDiffX and others as results for
|
||||
cv::Sobel and cv::multiply).
|
||||
|
||||
The G-API analogue is listed below:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi.cpp calcGST_header
|
||||
|
||||
This snippet demonstrates the following syntactic difference between
|
||||
G-API and traditional OpenCV:
|
||||
* All standard G-API functions are by default placed in "cv::gapi"
|
||||
namespace;
|
||||
* G-API operations _return_ its results -- there's no need to pass
|
||||
extra "output" parameters to the functions.
|
||||
|
||||
Note -- this code is also using `auto` -- types of intermediate objects
|
||||
like `img`, `imgDiffX`, and so on are inferred automatically by the
|
||||
C++ compiler. In this example, the types are determined by G-API
|
||||
operation return values which all are cv::GMat.
|
||||
|
||||
G-API standard kernels are trying to follow OpenCV API conventions
|
||||
whenever possible -- so cv::gapi::sobel takes the same arguments as
|
||||
cv::Sobel, cv::gapi::mul follows cv::multiply, and so on (except
|
||||
having a return value).
|
||||
|
||||
The rest of calcGST() function can be implemented the same
|
||||
way trivially. Below is its full source code:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi.cpp calcGST
|
||||
|
||||
## Running G-API graph {#gapi_anisotropic_running}
|
||||
|
||||
After calcGST() is defined in G-API language, we can construct a graph
|
||||
based on it and finally run it -- pass input image and obtain
|
||||
result. Before we do it, let's have a look how original code looked
|
||||
like:
|
||||
|
||||
@snippet cpp/tutorial_code/ImgProc/anisotropic_image_segmentation/anisotropic_image_segmentation.cpp main_extra
|
||||
|
||||
G-API-based functions like calcGST() can't be applied to input data
|
||||
directly, since it is a _construction_ code, not the _processing_ code.
|
||||
In order to _run_ computations, a special object of class
|
||||
cv::GComputation needs to be created. This object wraps our G-API code
|
||||
(which is a composition of G-API data and operations) into a callable
|
||||
object, similar to C++11
|
||||
[std::function<>](https://en.cppreference.com/w/cpp/utility/functional/function).
|
||||
|
||||
cv::GComputation class has a number of constructors which can be used
|
||||
to define a graph. Generally, user needs to pass graph boundaries
|
||||
-- _input_ and _output_ objects, on which a GComputation is
|
||||
defined. Then G-API analyzes the call flow from _outputs_ to _inputs_
|
||||
and reconstructs the graph with operations in-between the specified
|
||||
boundaries. This may sound complex, however in fact the code looks
|
||||
like this:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi.cpp main
|
||||
|
||||
Note that this code slightly changes from the original one: forming up
|
||||
the resulting image is also a part of the pipeline (done with
|
||||
cv::gapi::addWeighted).
|
||||
|
||||
Result of this G-API pipeline bit-exact matches the original one
|
||||
(given the same input image):
|
||||
|
||||

|
||||
|
||||
## G-API initial version: full listing {#gapi_anisotropic_ocv}
|
||||
|
||||
Below is the full listing of the initial anisotropic image
|
||||
segmentation port on G-API:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi.cpp full_sample
|
||||
|
||||
# Inspecting the initial version {#gapi_anisotropic_inspect}
|
||||
|
||||
After we have got the initial working version of our algorithm working
|
||||
with G-API, we can use it to inspect and learn how G-API works. This
|
||||
chapter covers two aspects: understanding the graph structure, and
|
||||
memory profiling.
|
||||
|
||||
## Understanding the graph structure {#gapi_anisotropic_inspect_graph}
|
||||
|
||||
G-API stands for "Graph API", but did you mention any graphs in the
|
||||
above example? It was one of the initial design goals -- G-API was
|
||||
designed with expressions in mind to make adoption and porting process
|
||||
more straightforward. People _usually_ don't think in terms of
|
||||
_Nodes_ and _Edges_ when writing ordinary code, and so G-API, while
|
||||
being a Graph API, doesn't force its users to do that.
|
||||
|
||||
However, a graph is still built implicitly when a cv::GComputation
|
||||
object is defined. It may be useful to inspect how the resulting graph
|
||||
looks like to check if it is generated correctly and if it really
|
||||
represents our alrogithm. It is also useful to learn the structure of
|
||||
the graph to see if it has any redundancies.
|
||||
|
||||
G-API allows to dump generated graphs to `.dot` files which then
|
||||
could be visualized with [Graphviz](https://www.graphviz.org/), a
|
||||
popular open graph visualization software.
|
||||
|
||||
<!-- TODO THIS VARIABLE NEEDS TO BE FIXED TO DUMP DIR ASAP! -->
|
||||
|
||||
In order to dump our graph to a `.dot` file, set `GRAPH_DUMP_PATH` to a
|
||||
file name before running the application, e.g.:
|
||||
|
||||
$ GRAPH_DUMP_PATH=segm.dot ./bin/example_tutorial_porting_anisotropic_image_segmentation_gapi
|
||||
|
||||
Now this file can be visualized with a `dot` command like this:
|
||||
|
||||
$ dot segm.dot -Tpng -o segm.png
|
||||
|
||||
or viewed interactively with `xdot` (please refer to your
|
||||
distribution/operating system documentation on how to install these
|
||||
packages).
|
||||
|
||||

|
||||
|
||||
The above diagram demonstrates a number of interesting aspects of
|
||||
G-API's internal algorithm representation:
|
||||
1. G-API underlying graph is a bipartite graph: it consists of
|
||||
_Operation_ and _Data_ nodes such that a _Data_ node can only be
|
||||
connected to an _Operation_ node, _Operation_ node can only be
|
||||
connected to a _Data_ node, and nodes of a single kind are never
|
||||
connected directly.
|
||||
2. Graph is directed - every edge in the graph has a direction.
|
||||
3. Graph "begins" and "ends" with a _Data_ kind of nodes.
|
||||
4. A _Data_ node can have only a single writer and multiple readers.
|
||||
5. An _Operation_ node may have multiple inputs, though every input
|
||||
must have an unique _port number_ (among inputs).
|
||||
6. An _Operation_ node may have multiple outputs, and every output
|
||||
must have an unique _port number_ (among outputs).
|
||||
|
||||
## Measuring memory footprint {#gapi_anisotropic_memory_ocv}
|
||||
|
||||
Let's measure and compare memory footprint of the algorithm in its two
|
||||
versions: G-API-based and OpenCV-based. At the moment, G-API version
|
||||
is also OpenCV-based since it fallbacks to OpenCV functions inside.
|
||||
|
||||
On GNU/Linux, application memory footprint can be profiled with
|
||||
[Valgrind](http://valgrind.org/). On Debian/Ubuntu systems it can be
|
||||
installed like this (assuming you have administrator privileges):
|
||||
|
||||
$ sudo apt-get install valgrind massif-visualizer
|
||||
|
||||
Once installed, we can collect memory profiles easily for our two
|
||||
algorithm versions:
|
||||
|
||||
$ valgrind --tool=massif --massif-out-file=ocv.out ./bin/example_tutorial_anisotropic_image_segmentation
|
||||
==6101== Massif, a heap profiler
|
||||
==6101== Copyright (C) 2003-2015, and GNU GPL'd, by Nicholas Nethercote
|
||||
==6101== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
|
||||
==6101== Command: ./bin/example_tutorial_anisotropic_image_segmentation
|
||||
==6101==
|
||||
==6101==
|
||||
$ valgrind --tool=massif --massif-out-file=gapi.out ./bin/example_tutorial_porting_anisotropic_image_segmentation_gapi
|
||||
==6117== Massif, a heap profiler
|
||||
==6117== Copyright (C) 2003-2015, and GNU GPL'd, by Nicholas Nethercote
|
||||
==6117== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
|
||||
==6117== Command: ./bin/example_tutorial_porting_anisotropic_image_segmentation_gapi
|
||||
==6117==
|
||||
==6117==
|
||||
|
||||
Once done, we can inspect the collected profiles with
|
||||
[Massif Visualizer](https://github.com/KDE/massif-visualizer)
|
||||
(installed in the above step).
|
||||
|
||||
Below is the visualized memory profile of the original OpenCV version
|
||||
of the algorithm:
|
||||
|
||||

|
||||
|
||||
We see that memory is allocated as the application
|
||||
executes, reaching its peak in the calcGST() function; then the
|
||||
footprint drops as calcGST() completes its execution and all temporary
|
||||
buffers are freed. Massif reports us peak memory consumption of 7.6 MiB.
|
||||
|
||||
Now let's have a look on the profile of G-API version:
|
||||
|
||||

|
||||
|
||||
Once G-API computation is created and its execution starts, G-API
|
||||
allocates all required memory at once and then the memory profile
|
||||
remains flat until the termination of the program. Massif reports us
|
||||
peak memory consumption of 11.4 MiB.
|
||||
|
||||
A reader may ask a right question at this point -- is G-API that bad?
|
||||
What is the reason in using it than?
|
||||
|
||||
Hopefully, it is not. The reason why we see here an increased memory
|
||||
consumption is because the default naive OpenCV-based backend is used to
|
||||
execute this graph. This backend serves mostly for quick prototyping
|
||||
and debugging algorithms before offload/further optimization.
|
||||
|
||||
This backend doesn't utilize any complex memory management strategies yet
|
||||
since it is not its point at the moment. In the following chapter,
|
||||
we'll learn about Fluid backend and see how the same G-API code can
|
||||
run in a completely different model (and the footprint shrunk to a
|
||||
number of kilobytes).
|
||||
|
||||
# Backends and kernels {#gapi_anisotropic_backends}
|
||||
|
||||
This chapter covers how a G-API computation can be executed in a
|
||||
special way -- e.g. offloaded to another device, or scheduled with a
|
||||
special intelligence. G-API is designed to make its graphs portable --
|
||||
it means that once a graph is defined in G-API terms, no changes
|
||||
should be required in it if we want to run it on CPU or on GPU or on
|
||||
both devices at once. [G-API High-level overview](@ref gapi_hld) and
|
||||
[G-API Kernel API](@ref gapi_kernel_api) shed more light on technical
|
||||
details which make it possible. In this chapter, we will utilize G-API
|
||||
Fluid backend to make our graph cache-efficient on CPU.
|
||||
|
||||
G-API defines _backend_ as the lower-level entity which knows how to
|
||||
run kernels. Backends may have (and, in fact, do have) different
|
||||
_Kernel APIs_ which are used to program and integrate kernels for that
|
||||
backends. In this context, _kernel_ is an implementation of an
|
||||
_operation_, which is defined on the top API level (see
|
||||
G_TYPED_KERNEL() macro).
|
||||
|
||||
Backend is a thing which is aware of device & platform specifics, and
|
||||
which executes its kernels with keeping that specifics in mind. For
|
||||
example, there may be [Halide](http://halide-lang.org/) backend which
|
||||
allows to write (implement) G-API operations in Halide language and
|
||||
then generate functional Halide code for portions of G-API graph which
|
||||
map well there.
|
||||
|
||||
## Running a graph with a Fluid backend {#gapi_anisotropic_fluid}
|
||||
|
||||
OpenCV 4.0 is bundled with two G-API backends -- the default "OpenCV"
|
||||
which we just used, and a special "Fluid" backend.
|
||||
|
||||
Fluid backend reorganizes the execution to save memory and to achieve
|
||||
near-perfect cache locality, implementing so-called "streaming" model
|
||||
of execution.
|
||||
|
||||
In order to start using Fluid kernels, we need first to include
|
||||
appropriate header files (which are not included by default):
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi_fluid.cpp fluid_includes
|
||||
|
||||
Once these headers are included, we can form up a new _kernel package_
|
||||
and specify it to G-API:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi_fluid.cpp kernel_pkg
|
||||
|
||||
In G-API, kernels (or operation implementations) are objects. Kernels are
|
||||
organized into collections, or _kernel packages_, represented by class
|
||||
cv::gapi::GKernelPackage. The main purpose of a kernel package is to
|
||||
capture which kernels we would like to use in our graph, and pass it
|
||||
as a _graph compilation option_:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi_fluid.cpp kernel_pkg_use
|
||||
|
||||
Traditional OpenCV is logically divided into modules, with every
|
||||
module providing a set of functions. In G-API, there are also
|
||||
"modules" which are represented as kernel packages provided by a
|
||||
particular backend. In this example, we pass Fluid kernel packages to
|
||||
G-API to utilize appropriate Fluid functions in our graph.
|
||||
|
||||
Kernel packages are combinable -- in the above example, we take "Core"
|
||||
and "ImgProc" Fluid kernel packages and combine it into a single
|
||||
one. See documentation reference on cv::gapi::combine.
|
||||
|
||||
If no kernel packages are specified in options, G-API is using
|
||||
_default_ package which consists of default OpenCV implementations and
|
||||
thus G-API graphs are executed via OpenCV functions by default. OpenCV
|
||||
backend provides broader functional coverage than any other
|
||||
backend. If a kernel package is specified, like in this example, then
|
||||
it is being combined with the _default_.
|
||||
It means that user-specified implementations will replace default implementations in case of
|
||||
conflict.
|
||||
|
||||
<!-- FIXME Document this process better as a part of regular -->
|
||||
<!-- documentation, not a tutorial kind of thing -->
|
||||
|
||||
## Troubleshooting and customization {#gapi_anisotropic_trouble}
|
||||
|
||||
After the above modifications, (in OpenCV 4.0) the app should crash
|
||||
with a message like this:
|
||||
|
||||
```
|
||||
$ ./bin/example_tutorial_porting_anisotropic_image_segmentation_gapi_fluid
|
||||
terminate called after throwing an instance of 'std::logic_error'
|
||||
what(): .../modules/gapi/src/backends/fluid/gfluidimgproc.cpp:436: Assertion kernelSize.width == 3 && kernelSize.height == 3 in function run failed
|
||||
|
||||
Aborted (core dumped)
|
||||
```
|
||||
|
||||
Fluid backend has a number of limitations in OpenCV 4.0 (see this
|
||||
[wiki page](https://github.com/opencv/opencv/wiki/Graph-API) for a
|
||||
more up-to-date status). In particular, the Box filter used in this
|
||||
sample supports only static 3x3 kernel size.
|
||||
|
||||
We can overcome this problem easily by avoiding G-API using Fluid
|
||||
version of Box filter kernel in this sample. It can be done by
|
||||
removing the appropriate kernel from the kernel package we've just
|
||||
created:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi_fluid.cpp kernel_hotfix
|
||||
|
||||
Now this kernel package doesn't have _any_ implementation of Box
|
||||
filter kernel interface (specified as a template parameter). As
|
||||
described above, G-API will fall-back to OpenCV to run this kernel
|
||||
now. The resulting code with this change now looks like:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/porting_anisotropic_image_segmentation/porting_anisotropic_image_segmentation_gapi_fluid.cpp kernel_pkg_proper
|
||||
|
||||
Let's examine the memory profile for this sample after we switched to
|
||||
Fluid backend. Now it looks like this:
|
||||
|
||||

|
||||
|
||||
Now the tool reports 4.7MiB -- and we just changed a few lines in our
|
||||
code, without modifying the graph itself! It is a ~2.4X improvement of
|
||||
the previous G-API result, and ~1.6X improvement of the original OpenCV
|
||||
version.
|
||||
|
||||
Let's also examine how the internal representation of the graph now
|
||||
looks like. Dumping the graph into `.dot` would result into a
|
||||
visualization like this:
|
||||
|
||||

|
||||
|
||||
This graph doesn't differ structurally from its previous version (in
|
||||
terms of operations and data objects), though a changed layout (on the
|
||||
left side of the dump) is easily noticeable.
|
||||
|
||||
The visualization reflects how G-API deals with mixed graphs, also
|
||||
called _heterogeneous_ graphs. The majority of operations in this
|
||||
graph are implemented with Fluid backend, but Box filters are executed
|
||||
by the OpenCV backend. One can easily see that the graph is partitioned
|
||||
(with rectangles). G-API groups connected operations based on their
|
||||
affinity, forming _subgraphs_ (or _islands_ in G-API terminology), and
|
||||
our top-level graph becomes a composition of multiple smaller
|
||||
subgraphs. Every backend determines how its subgraph (island) is
|
||||
executed, so Fluid backend optimizes out memory where possible, and
|
||||
six intermediate buffers accessed by OpenCV Box filters are allocated
|
||||
fully and can't be optimized out.
|
||||
|
||||
<!-- TODO: add a chapter on custom kernels -->
|
||||
<!-- TODO: make a full-fluid pipeline -->
|
||||
<!-- TODO: talk about parallelism when it is available -->
|
||||
|
||||
# Conclusion {#gapi_tutor_conclusion}
|
||||
|
||||
This tutorial demonstrates what G-API is and what its key design
|
||||
concepts are, how an algorithm can be ported to G-API, and
|
||||
how to utilize graph model benefits after that.
|
||||
|
||||
In OpenCV 4.0, G-API is still in its inception stage -- it is more a
|
||||
foundation for all future work, though ready for use even now.
|
||||
|
||||
Further, this tutorial will be extended with new chapters on custom
|
||||
kernels programming, parallelism, and more.
|
@ -0,0 +1,440 @@
|
||||
# Implementing a face beautification algorithm with G-API {#tutorial_gapi_face_beautification}
|
||||
|
||||
[TOC]
|
||||
|
||||
# Introduction {#gapi_fb_intro}
|
||||
|
||||
In this tutorial you will learn:
|
||||
* Basics of a sample face beautification algorithm;
|
||||
* How to infer different networks inside a pipeline with G-API;
|
||||
* How to run a G-API pipeline on a video stream.
|
||||
|
||||
## Prerequisites {#gapi_fb_prerec}
|
||||
|
||||
This sample requires:
|
||||
- PC with GNU/Linux or Microsoft Windows (Apple macOS is supported but
|
||||
was not tested);
|
||||
- OpenCV 4.2 or later built with Intel® Distribution of [OpenVINO™
|
||||
Toolkit](https://docs.openvinotoolkit.org/) (building with [Intel®
|
||||
TBB](https://www.threadingbuildingblocks.org/intel-tbb-tutorial) is
|
||||
a plus);
|
||||
- The following topologies from OpenVINO™ Toolkit [Open Model
|
||||
Zoo](https://github.com/opencv/open_model_zoo):
|
||||
- `face-detection-adas-0001`;
|
||||
- `facial-landmarks-35-adas-0002`.
|
||||
|
||||
## Face beautification algorithm {#gapi_fb_algorithm}
|
||||
|
||||
We will implement a simple face beautification algorithm using a
|
||||
combination of modern Deep Learning techniques and traditional
|
||||
Computer Vision. The general idea behind the algorithm is to make
|
||||
face skin smoother while preserving face features like eyes or a
|
||||
mouth contrast. The algorithm identifies parts of the face using a DNN
|
||||
inference, applies different filters to the parts found, and then
|
||||
combines it into the final result using basic image arithmetics:
|
||||
|
||||
\dot
|
||||
strict digraph Pipeline {
|
||||
node [shape=record fontname=Helvetica fontsize=10 style=filled color="#4c7aa4" fillcolor="#5b9bd5" fontcolor="white"];
|
||||
edge [color="#62a8e7"];
|
||||
ordering="out";
|
||||
splines=ortho;
|
||||
rankdir=LR;
|
||||
|
||||
input [label="Input"];
|
||||
fd [label="Face\ndetector"];
|
||||
bgMask [label="Generate\nBG mask"];
|
||||
unshMask [label="Unsharp\nmask"];
|
||||
bilFil [label="Bilateral\nfilter"];
|
||||
shMask [label="Generate\nsharp mask"];
|
||||
blMask [label="Generate\nblur mask"];
|
||||
mul_1 [label="*" fontsize=24 shape=circle labelloc=b];
|
||||
mul_2 [label="*" fontsize=24 shape=circle labelloc=b];
|
||||
mul_3 [label="*" fontsize=24 shape=circle labelloc=b];
|
||||
|
||||
subgraph cluster_0 {
|
||||
style=dashed
|
||||
fontsize=10
|
||||
ld [label="Landmarks\ndetector"];
|
||||
label="for each face"
|
||||
}
|
||||
|
||||
sum_1 [label="+" fontsize=24 shape=circle];
|
||||
out [label="Output"];
|
||||
|
||||
temp_1 [style=invis shape=point width=0];
|
||||
temp_2 [style=invis shape=point width=0];
|
||||
temp_3 [style=invis shape=point width=0];
|
||||
temp_4 [style=invis shape=point width=0];
|
||||
temp_5 [style=invis shape=point width=0];
|
||||
temp_6 [style=invis shape=point width=0];
|
||||
temp_7 [style=invis shape=point width=0];
|
||||
temp_8 [style=invis shape=point width=0];
|
||||
temp_9 [style=invis shape=point width=0];
|
||||
|
||||
input -> temp_1 [arrowhead=none]
|
||||
temp_1 -> fd -> ld
|
||||
ld -> temp_4 [arrowhead=none]
|
||||
temp_4 -> bgMask
|
||||
bgMask -> mul_1 -> sum_1 -> out
|
||||
|
||||
temp_4 -> temp_5 -> temp_6 [arrowhead=none constraint=none]
|
||||
ld -> temp_2 -> temp_3 [style=invis constraint=none]
|
||||
|
||||
temp_1 -> {unshMask, bilFil}
|
||||
fd -> unshMask [style=invis constraint=none]
|
||||
unshMask -> bilFil [style=invis constraint=none]
|
||||
|
||||
bgMask -> shMask [style=invis constraint=none]
|
||||
shMask -> blMask [style=invis constraint=none]
|
||||
mul_1 -> mul_2 [style=invis constraint=none]
|
||||
temp_5 -> shMask -> mul_2
|
||||
temp_6 -> blMask -> mul_3
|
||||
|
||||
unshMask -> temp_2 -> temp_5 [style=invis]
|
||||
bilFil -> temp_3 -> temp_6 [style=invis]
|
||||
|
||||
mul_2 -> temp_7 [arrowhead=none]
|
||||
mul_3 -> temp_8 [arrowhead=none]
|
||||
|
||||
temp_8 -> temp_7 [arrowhead=none constraint=none]
|
||||
temp_7 -> sum_1 [constraint=none]
|
||||
|
||||
unshMask -> mul_2 [constraint=none]
|
||||
bilFil -> mul_3 [constraint=none]
|
||||
temp_1 -> mul_1 [constraint=none]
|
||||
}
|
||||
\enddot
|
||||
|
||||
Briefly the algorithm is described as follows:
|
||||
- Input image \f$I\f$ is passed to unsharp mask and bilateral filters
|
||||
(\f$U\f$ and \f$L\f$ respectively);
|
||||
- Input image \f$I\f$ is passed to an SSD-based face detector;
|
||||
- SSD result (a \f$[1 \times 1 \times 200 \times 7]\f$ blob) is parsed
|
||||
and converted to an array of faces;
|
||||
- Every face is passed to a landmarks detector;
|
||||
- Based on landmarks found for every face, three image masks are
|
||||
generated:
|
||||
- A background mask \f$b\f$ -- indicating which areas from the
|
||||
original image to keep as-is;
|
||||
- A face part mask \f$p\f$ -- identifying regions to preserve
|
||||
(sharpen).
|
||||
- A face skin mask \f$s\f$ -- identifying regions to blur;
|
||||
- The final result \f$O\f$ is a composition of features above
|
||||
calculated as \f$O = b*I + p*U + s*L\f$.
|
||||
|
||||
Generating face element masks based on a limited set of features (just
|
||||
35 per face, including all its parts) is not very trivial and is
|
||||
described in the sections below.
|
||||
|
||||
# Constructing a G-API pipeline {#gapi_fb_pipeline}
|
||||
|
||||
## Declaring Deep Learning topologies {#gapi_fb_decl_nets}
|
||||
|
||||
This sample is using two DNN detectors. Every network takes one input
|
||||
and produces one output. In G-API, networks are defined with macro
|
||||
G_API_NET():
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp net_decl
|
||||
|
||||
To get more information, see
|
||||
[Declaring Deep Learning topologies](@ref gapi_ifd_declaring_nets)
|
||||
described in the "Face Analytics pipeline" tutorial.
|
||||
|
||||
## Describing the processing graph {#gapi_fb_ppline}
|
||||
|
||||
The code below generates a graph for the algorithm above:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ppl
|
||||
|
||||
The resulting graph is a mixture of G-API's standard operations,
|
||||
user-defined operations (namespace `custom::`), and DNN inference.
|
||||
The generic function `cv::gapi::infer<>()` allows to trigger inference
|
||||
within the pipeline; networks to infer are specified as template
|
||||
parameters. The sample code is using two versions of `cv::gapi::infer<>()`:
|
||||
- A frame-oriented one is used to detect faces on the input frame.
|
||||
- An ROI-list oriented one is used to run landmarks inference on a
|
||||
list of faces -- this version produces an array of landmarks per
|
||||
every face.
|
||||
|
||||
More on this in "Face Analytics pipeline"
|
||||
([Building a GComputation](@ref gapi_ifd_gcomputation) section).
|
||||
|
||||
## Unsharp mask in G-API {#gapi_fb_unsh}
|
||||
|
||||
The unsharp mask \f$U\f$ for image \f$I\f$ is defined as:
|
||||
|
||||
\f[U = I - s * L(M(I)),\f]
|
||||
|
||||
where \f$M()\f$ is a median filter, \f$L()\f$ is the Laplace operator,
|
||||
and \f$s\f$ is a strength coefficient. While G-API doesn't provide
|
||||
this function out-of-the-box, it is expressed naturally with the
|
||||
existing G-API operations:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp unsh
|
||||
|
||||
Note that the code snipped above is a regular C++ function defined
|
||||
with G-API types. Users can write functions like this to simplify
|
||||
graph construction; when called, this function just puts the relevant
|
||||
nodes to the pipeline it is used in.
|
||||
|
||||
# Custom operations {#gapi_fb_proc}
|
||||
|
||||
The face beautification graph is using custom operations
|
||||
extensively. This chapter focuses on the most interesting kernels,
|
||||
refer to [G-API Kernel API](@ref gapi_kernel_api) for general
|
||||
information on defining operations and implementing kernels in G-API.
|
||||
|
||||
## Face detector post-processing {#gapi_fb_face_detect}
|
||||
|
||||
A face detector output is converted to an array of faces with the
|
||||
following kernel:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp vec_ROI
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp fd_pp
|
||||
|
||||
## Facial landmarks post-processing {#gapi_fb_landm_detect}
|
||||
|
||||
The algorithm infers locations of face elements (like the eyes, the mouth
|
||||
and the head contour itself) using a generic facial landmarks detector
|
||||
(<a href="https://github.com/opencv/open_model_zoo/blob/master/models/intel/facial-landmarks-35-adas-0002/description/facial-landmarks-35-adas-0002.md">details</a>)
|
||||
from OpenVINO™ Open Model Zoo. However, the detected landmarks as-is are not
|
||||
enough to generate masks --- this operation requires regions of interest on
|
||||
the face represented by closed contours, so some interpolation is applied to
|
||||
get them. This landmarks
|
||||
processing and interpolation is performed by the following kernel:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ld_pp_cnts
|
||||
|
||||
The kernel takes two arrays of denormalized landmarks coordinates and
|
||||
returns an array of elements' closed contours and an array of faces'
|
||||
closed contours; in other words, outputs are, the first, an array of
|
||||
contours of image areas to be sharpened and, the second, another one
|
||||
to be smoothed.
|
||||
|
||||
Here and below `Contour` is a vector of points.
|
||||
|
||||
### Getting an eye contour {#gapi_fb_ld_eye}
|
||||
|
||||
Eye contours are estimated with the following function:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ld_pp_incl
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ld_pp_eye
|
||||
|
||||
Briefly, this function restores the bottom side of an eye by a
|
||||
half-ellipse based on two points in left and right eye
|
||||
corners. In fact, `cv::ellipse2Poly()` is used to approximate the eye region, and
|
||||
the function only defines ellipse parameters based on just two points:
|
||||
- The ellipse center and the \f$X\f$ half-axis calculated by two eye Points;
|
||||
- The \f$Y\f$ half-axis calculated according to the assumption that an average
|
||||
eye width is \f$1/3\f$ of its length;
|
||||
- The start and the end angles which are 0 and 180 (refer to
|
||||
`cv::ellipse()` documentation);
|
||||
- The angle delta: how much points to produce in the contour;
|
||||
- The inclination angle of the axes.
|
||||
|
||||
The use of the `atan2()` instead of just `atan()` in function
|
||||
`custom::getLineInclinationAngleDegrees()` is essential as it allows to
|
||||
return a negative value depending on the `x` and the `y` signs so we
|
||||
can get the right angle even in case of upside-down face arrangement
|
||||
(if we put the points in the right order, of course).
|
||||
|
||||
### Getting a forehead contour {#gapi_fb_ld_fhd}
|
||||
|
||||
The function approximates the forehead contour:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp ld_pp_fhd
|
||||
|
||||
As we have only jaw points in our detected landmarks, we have to get a
|
||||
half-ellipse based on three points of a jaw: the leftmost, the
|
||||
rightmost and the lowest one. The jaw width is assumed to be equal to the
|
||||
forehead width and the latter is calculated using the left and the
|
||||
right points. Speaking of the \f$Y\f$ axis, we have no points to get
|
||||
it directly, and instead assume that the forehead height is about \f$2/3\f$
|
||||
of the jaw height, which can be figured out from the face center (the
|
||||
middle between the left and right points) and the lowest jaw point.
|
||||
|
||||
## Drawing masks {#gapi_fb_masks_drw}
|
||||
|
||||
When we have all the contours needed, we are able to draw masks:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp msk_ppline
|
||||
|
||||
The steps to get the masks are:
|
||||
* the "sharp" mask calculation:
|
||||
* fill the contours that should be sharpened;
|
||||
* blur that to get the "sharp" mask (`mskSharpG`);
|
||||
* the "bilateral" mask calculation:
|
||||
* fill all the face contours fully;
|
||||
* blur that;
|
||||
* subtract areas which intersect with the "sharp" mask --- and get the
|
||||
"bilateral" mask (`mskBlurFinal`);
|
||||
* the background mask calculation:
|
||||
* add two previous masks
|
||||
* set all non-zero pixels of the result as 255 (by `cv::gapi::threshold()`)
|
||||
* revert the output (by `cv::gapi::bitwise_not`) to get the background
|
||||
mask (`mskNoFaces`).
|
||||
|
||||
# Configuring and running the pipeline {#gapi_fb_comp_args}
|
||||
|
||||
Once the graph is fully expressed, we can finally compile it and run
|
||||
on real data. G-API graph compilation is the stage where the G-API
|
||||
framework actually understands which kernels and networks to use. This
|
||||
configuration happens via G-API compilation arguments.
|
||||
|
||||
## DNN parameters {#gapi_fb_comp_args_net}
|
||||
|
||||
This sample is using OpenVINO™ Toolkit Inference Engine backend for DL
|
||||
inference, which is configured the following way:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp net_param
|
||||
|
||||
Every `cv::gapi::ie::Params<>` object is related to the network
|
||||
specified in its template argument. We should pass there the network
|
||||
type we have defined in `G_API_NET()` in the early beginning of the
|
||||
tutorial.
|
||||
|
||||
Network parameters are then wrapped in `cv::gapi::NetworkPackage`:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp netw
|
||||
|
||||
More details in "Face Analytics Pipeline"
|
||||
([Configuring the pipeline](@ref gapi_ifd_configuration) section).
|
||||
|
||||
## Kernel packages {#gapi_fb_comp_args_kernels}
|
||||
|
||||
In this example we use a lot of custom kernels, in addition to that we
|
||||
use Fluid backend to optimize out memory for G-API's standard kernels
|
||||
where applicable. The resulting kernel package is formed like this:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp kern_pass_1
|
||||
|
||||
## Compiling the streaming pipeline {#gapi_fb_compiling}
|
||||
|
||||
G-API optimizes execution for video streams when compiled in the
|
||||
"Streaming" mode.
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp str_comp
|
||||
|
||||
More on this in "Face Analytics Pipeline"
|
||||
([Configuring the pipeline](@ref gapi_ifd_configuration) section).
|
||||
|
||||
## Running the streaming pipeline {#gapi_fb_running}
|
||||
|
||||
In order to run the G-API streaming pipeline, all we need is to
|
||||
specify the input video source, call
|
||||
`cv::GStreamingCompiled::start()`, and then fetch the pipeline
|
||||
processing results:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp str_src
|
||||
@snippet cpp/tutorial_code/gapi/face_beautification/face_beautification.cpp str_loop
|
||||
|
||||
Once results are ready and can be pulled from the pipeline we display
|
||||
it on the screen and handle GUI events.
|
||||
|
||||
See [Running the pipeline](@ref gapi_ifd_running) section
|
||||
in the "Face Analytics Pipeline" tutorial for more details.
|
||||
|
||||
# Conclusion {#gapi_fb_cncl}
|
||||
|
||||
The tutorial has two goals: to show the use of brand new features of
|
||||
G-API introduced in OpenCV 4.2, and give a basic understanding on a
|
||||
sample face beautification algorithm.
|
||||
|
||||
The result of the algorithm application:
|
||||
|
||||

|
||||
|
||||
On the test machine (Intel® Core™ i7-8700) the G-API-optimized video
|
||||
pipeline outperforms its serial (non-pipelined) version by a factor of
|
||||
**2.7** -- meaning that for such a non-trivial graph, the proper
|
||||
pipelining can bring almost 3x increase in performance.
|
||||
|
||||
<!---
|
||||
The idea in general is to implement a real-time video stream processing that
|
||||
detects faces and applies some filters to make them look beautiful (more or
|
||||
less). The pipeline is the following:
|
||||
|
||||
Two topologies from OMZ have been used in this sample: the
|
||||
<a href="https://github.com/opencv/open_model_zoo/tree/master/models/intel
|
||||
/face-detection-adas-0001">face-detection-adas-0001</a>
|
||||
and the
|
||||
<a href="https://github.com/opencv/open_model_zoo/blob/master/models/intel
|
||||
/facial-landmarks-35-adas-0002/description/facial-landmarks-35-adas-0002.md">
|
||||
facial-landmarks-35-adas-0002</a>.
|
||||
|
||||
The face detector takes the input image and returns a blob with the shape
|
||||
[1,1,200,7] after the inference (200 is the maximum number of
|
||||
faces which can be detected).
|
||||
In order to process every face individually, we need to convert this output to a
|
||||
list of regions on the image.
|
||||
|
||||
The masks for different filters are built based on facial landmarks, which are
|
||||
inferred for every face. The result of the inference
|
||||
is a blob with 35 landmarks: the first 18 of them are facial elements
|
||||
(eyes, eyebrows, a nose, a mouth) and the last 17 --- a jaw contour. Landmarks
|
||||
are floating point values of coordinates normalized relatively to an input ROI
|
||||
(not the original frame). In addition, for the further goals we need contours of
|
||||
eyes, mouths, faces, etc., not the landmarks. So, post-processing of the Mat is
|
||||
also required here. The process is split into two parts --- landmarks'
|
||||
coordinates denormalization to the real pixel coordinates of the source frame
|
||||
and getting necessary closed contours based on these coordinates.
|
||||
|
||||
The last step of processing the inference data is drawing masks using the
|
||||
calculated contours. In this demo the contours don't need to be pixel accurate,
|
||||
since masks are blurred with Gaussian filter anyway. Another point that should
|
||||
be mentioned here is getting
|
||||
three masks (for areas to be smoothed, for ones to be sharpened and for the
|
||||
background) which have no intersections with each other; this approach allows to
|
||||
apply the calculated masks to the corresponding images prepared beforehand and
|
||||
then just to summarize them to get the output image without any other actions.
|
||||
|
||||
As we can see, this algorithm is appropriate to illustrate G-API usage
|
||||
convenience and efficiency in the context of solving a real CV/DL problem.
|
||||
|
||||
(On detector post-proc)
|
||||
Some points to be mentioned about this kernel implementation:
|
||||
|
||||
- It takes a `cv::Mat` from the detector and a `cv::Mat` from the input; it
|
||||
returns an array of ROI's where faces have been detected.
|
||||
|
||||
- `cv::Mat` data parsing by the pointer on a float is used here.
|
||||
|
||||
- By far the most important thing here is solving an issue that sometimes
|
||||
detector returns coordinates located outside of the image; if we pass such an
|
||||
ROI to be processed, errors in the landmarks detection will occur. The frame box
|
||||
`borders` is created and then intersected with the face rectangle
|
||||
(by `operator&()`) to handle such cases and save the ROI which is for sure
|
||||
inside the frame.
|
||||
|
||||
Data parsing after the facial landmarks detector happens according to the same
|
||||
scheme with inconsiderable adjustments.
|
||||
|
||||
|
||||
## Possible further improvements
|
||||
|
||||
There are some points in the algorithm to be improved.
|
||||
|
||||
### Correct ROI reshaping for meeting conditions required by the facial landmarks detector
|
||||
|
||||
The input of the facial landmarks detector is a square ROI, but the face
|
||||
detector gives non-square rectangles in general. If we let the backend within
|
||||
Inference-API compress the rectangle to a square by itself, the lack of
|
||||
inference accuracy can be noticed in some cases.
|
||||
There is a solution: we can give a describing square ROI instead of the
|
||||
rectangular one to the landmarks detector, so there will be no need to compress
|
||||
the ROI, which will lead to accuracy improvement.
|
||||
Unfortunately, another problem occurs if we do that:
|
||||
if the rectangular ROI is near the border, a describing square will probably go
|
||||
out of the frame --- that leads to errors of the landmarks detector.
|
||||
To avoid such a mistake, we have to implement an algorithm that, firstly,
|
||||
describes every rectangle by a square, then counts the farthest coordinates
|
||||
turned up to be outside of the frame and, finally, pads the source image by
|
||||
borders (e.g. single-colored) with the size counted. It will be safe to take
|
||||
square ROIs for the facial landmarks detector after that frame adjustment.
|
||||
|
||||
### Research for the best parameters (used in GaussianBlur() or unsharpMask(), etc.)
|
||||
|
||||
### Parameters autoscaling
|
||||
|
||||
-->
|
BIN
doc/tutorials/gapi/face_beautification/pics/example.jpg
Normal file
After Width: | Height: | Size: 172 KiB |
@ -0,0 +1,353 @@
|
||||
# Face analytics pipeline with G-API {#tutorial_gapi_interactive_face_detection}
|
||||
|
||||
[TOC]
|
||||
|
||||
# Overview {#gapi_ifd_intro}
|
||||
|
||||
In this tutorial you will learn:
|
||||
* How to integrate Deep Learning inference in a G-API graph;
|
||||
* How to run a G-API graph on a video stream and obtain data from it.
|
||||
|
||||
# Prerequisites {#gapi_ifd_prereq}
|
||||
|
||||
This sample requires:
|
||||
- PC with GNU/Linux or Microsoft Windows (Apple macOS is supported but
|
||||
was not tested);
|
||||
- OpenCV 4.2 or later built with Intel® Distribution of [OpenVINO™
|
||||
Toolkit](https://docs.openvinotoolkit.org/) (building with [Intel®
|
||||
TBB](https://www.threadingbuildingblocks.org/intel-tbb-tutorial) is
|
||||
a plus);
|
||||
- The following topologies from OpenVINO™ Toolkit [Open Model
|
||||
Zoo](https://github.com/opencv/open_model_zoo):
|
||||
- `face-detection-adas-0001`;
|
||||
- `age-gender-recognition-retail-0013`;
|
||||
- `emotions-recognition-retail-0003`.
|
||||
|
||||
# Introduction: why G-API {#gapi_ifd_why}
|
||||
|
||||
Many computer vision algorithms run on a video stream rather than on
|
||||
individual images. Stream processing usually consists of multiple
|
||||
steps -- like decode, preprocessing, detection, tracking,
|
||||
classification (on detected objects), and visualization -- forming a
|
||||
*video processing pipeline*. Moreover, many these steps of such
|
||||
pipeline can run in parallel -- modern platforms have different
|
||||
hardware blocks on the same chip like decoders and GPUs, and extra
|
||||
accelerators can be plugged in as extensions, like Intel® Movidius™
|
||||
Neural Compute Stick for deep learning offload.
|
||||
|
||||
Given all this manifold of options and a variety in video analytics
|
||||
algorithms, managing such pipelines effectively quickly becomes a
|
||||
problem. For sure it can be done manually, but this approach doesn't
|
||||
scale: if a change is required in the algorithm (e.g. a new pipeline
|
||||
step is added), or if it is ported on a new platform with different
|
||||
capabilities, the whole pipeline needs to be re-optimized.
|
||||
|
||||
Starting with version 4.2, OpenCV offers a solution to this
|
||||
problem. OpenCV G-API now can manage Deep Learning inference (a
|
||||
cornerstone of any modern analytics pipeline) with a traditional
|
||||
Computer Vision as well as video capturing/decoding, all in a single
|
||||
pipeline. G-API takes care of pipelining itself -- so if the algorithm
|
||||
or platform changes, the execution model adapts to it automatically.
|
||||
|
||||
# Pipeline overview {#gapi_ifd_overview}
|
||||
|
||||
Our sample application is based on ["Interactive Face Detection"] demo
|
||||
from OpenVINO™ Toolkit Open Model Zoo. A simplified pipeline consists
|
||||
of the following steps:
|
||||
1. Image acquisition and decode;
|
||||
2. Detection with preprocessing;
|
||||
3. Classification with preprocessing for every detected object with
|
||||
two networks;
|
||||
4. Visualization.
|
||||
|
||||
\dot
|
||||
digraph pipeline {
|
||||
node [shape=record fontname=Helvetica fontsize=10 style=filled color="#4c7aa4" fillcolor="#5b9bd5" fontcolor="white"];
|
||||
edge [color="#62a8e7"];
|
||||
splines=ortho;
|
||||
|
||||
rankdir = LR;
|
||||
subgraph cluster_0 {
|
||||
color=invis;
|
||||
capture [label="Capture\nDecode"];
|
||||
resize [label="Resize\nConvert"];
|
||||
detect [label="Detect faces"];
|
||||
capture -> resize -> detect
|
||||
}
|
||||
|
||||
subgraph cluster_1 {
|
||||
graph[style=dashed];
|
||||
|
||||
subgraph cluster_2 {
|
||||
color=invis;
|
||||
temp_4 [style=invis shape=point width=0];
|
||||
postproc_1 [label="Crop\nResize\nConvert"];
|
||||
age_gender [label="Classify\nAge/gender"];
|
||||
postproc_1 -> age_gender [constraint=true]
|
||||
temp_4 -> postproc_1 [constraint=none]
|
||||
}
|
||||
|
||||
subgraph cluster_3 {
|
||||
color=invis;
|
||||
postproc_2 [label="Crop\nResize\nConvert"];
|
||||
emo [label="Classify\nEmotions"];
|
||||
postproc_2 -> emo [constraint=true]
|
||||
}
|
||||
label="(for each face)";
|
||||
}
|
||||
|
||||
temp_1 [style=invis shape=point width=0];
|
||||
temp_2 [style=invis shape=point width=0];
|
||||
detect -> temp_1 [arrowhead=none]
|
||||
temp_1 -> postproc_1
|
||||
|
||||
capture -> {temp_4, temp_2} [arrowhead=none constraint=false]
|
||||
temp_2 -> postproc_2
|
||||
|
||||
temp_1 -> temp_2 [arrowhead=none constraint=false]
|
||||
|
||||
temp_3 [style=invis shape=point width=0];
|
||||
show [label="Visualize\nDisplay"];
|
||||
|
||||
{age_gender, emo} -> temp_3 [arrowhead=none]
|
||||
temp_3 -> show
|
||||
}
|
||||
\enddot
|
||||
|
||||
# Constructing a pipeline {#gapi_ifd_constructing}
|
||||
|
||||
Constructing a G-API graph for a video streaming case does not differ
|
||||
much from a [regular usage](@ref gapi_example) of G-API -- it is still
|
||||
about defining graph *data* (with cv::GMat, cv::GScalar, and
|
||||
cv::GArray) and *operations* over it. Inference also becomes an
|
||||
operation in the graph, but is defined in a little bit different way.
|
||||
|
||||
## Declaring Deep Learning topologies {#gapi_ifd_declaring_nets}
|
||||
|
||||
In contrast with traditional CV functions (see [core] and [imgproc])
|
||||
where G-API declares distinct operations for every function, inference
|
||||
in G-API is a single generic operation cv::gapi::infer<>. As usual, it
|
||||
is just an interface and it can be implemented in a number of ways under
|
||||
the hood. In OpenCV 4.2, only OpenVINO™ Inference Engine-based backend
|
||||
is available, and OpenCV's own DNN module-based backend is to come.
|
||||
|
||||
cv::gapi::infer<> is _parametrized_ by the details of a topology we are
|
||||
going to execute. Like operations, topologies in G-API are strongly
|
||||
typed and are defined with a special macro G_API_NET():
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/age_gender_emotion_recognition/age_gender_emotion_recognition.cpp G_API_NET
|
||||
|
||||
Similar to how operations are defined with G_API_OP(), network
|
||||
description requires three parameters:
|
||||
1. A type name. Every defined topology is declared as a distinct C++
|
||||
type which is used further in the program -- see below;
|
||||
2. A `std::function<>`-like API signature. G-API traits networks as
|
||||
regular "functions" which take and return data. Here network
|
||||
`Faces` (a detector) takes a cv::GMat and returns a cv::GMat, while
|
||||
network `AgeGender` is known to provide two outputs (age and gender
|
||||
blobs, respectively) -- so its has a `std::tuple<>` as a return
|
||||
type.
|
||||
3. A topology name -- can be any non-empty string, G-API is using
|
||||
these names to distinguish networks inside. Names should be unique
|
||||
in the scope of a single graph.
|
||||
|
||||
## Building a GComputation {#gapi_ifd_gcomputation}
|
||||
|
||||
Now the above pipeline is expressed in G-API like this:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/age_gender_emotion_recognition/age_gender_emotion_recognition.cpp GComputation
|
||||
|
||||
Every pipeline starts with declaring empty data objects -- which act
|
||||
as inputs to the pipeline. Then we call a generic cv::gapi::infer<>
|
||||
specialized to `Faces` detection network. cv::gapi::infer<> inherits its
|
||||
signature from its template parameter -- and in this case it expects
|
||||
one input cv::GMat and produces one output cv::GMat.
|
||||
|
||||
In this sample we use a pre-trained SSD-based network and its output
|
||||
needs to be parsed to an array of detections (object regions of
|
||||
interest, ROIs). It is done by a custom operation `custom::PostProc`,
|
||||
which returns an array of rectangles (of type `cv::GArray<cv::Rect>`)
|
||||
back to the pipeline. This operation also filters out results by a
|
||||
confidence threshold -- and these details are hidden in the kernel
|
||||
itself. Still, at the moment of graph construction we operate with
|
||||
interfaces only and don't need actual kernels to express the pipeline
|
||||
-- so the implementation of this post-processing will be listed later.
|
||||
|
||||
After detection result output is parsed to an array of objects, we can run
|
||||
classification on any of those. G-API doesn't support syntax for
|
||||
in-graph loops like `for_each()` yet, but instead cv::gapi::infer<>
|
||||
comes with a special list-oriented overload.
|
||||
|
||||
User can call cv::gapi::infer<> with a cv::GArray as the first
|
||||
argument, so then G-API assumes it needs to run the associated network
|
||||
on every rectangle from the given list of the given frame (second
|
||||
argument). Result of such operation is also a list -- a cv::GArray of
|
||||
cv::GMat.
|
||||
|
||||
Since `AgeGender` network itself produces two outputs, it's output
|
||||
type for a list-based version of cv::gapi::infer is a tuple of
|
||||
arrays. We use `std::tie()` to decompose this input into two distinct
|
||||
objects.
|
||||
|
||||
`Emotions` network produces a single output so its list-based
|
||||
inference's return type is `cv::GArray<cv::GMat>`.
|
||||
|
||||
# Configuring the pipeline {#gapi_ifd_configuration}
|
||||
|
||||
G-API strictly separates construction from configuration -- with the
|
||||
idea to keep algorithm code itself platform-neutral. In the above
|
||||
listings we only declared our operations and expressed the overall
|
||||
data flow, but didn't even mention that we use OpenVINO™. We only
|
||||
described *what* we do, but not *how* we do it. Keeping these two
|
||||
aspects clearly separated is the design goal for G-API.
|
||||
|
||||
Platform-specific details arise when the pipeline is *compiled* --
|
||||
i.e. is turned from a declarative to an executable form. The way *how*
|
||||
to run stuff is specified via compilation arguments, and new
|
||||
inference/streaming features are no exception from this rule.
|
||||
|
||||
G-API is built on backends which implement interfaces (see
|
||||
[Architecture] and [Kernels] for details) -- thus cv::gapi::infer<> is
|
||||
a function which can be implemented by different backends. In OpenCV
|
||||
4.2, only OpenVINO™ Inference Engine backend for inference is
|
||||
available. Every inference backend in G-API has to provide a special
|
||||
parameterizable structure to express *backend-specific* neural network
|
||||
parameters -- and in this case, it is cv::gapi::ie::Params:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/age_gender_emotion_recognition/age_gender_emotion_recognition.cpp Param_Cfg
|
||||
|
||||
Here we define three parameter objects: `det_net`, `age_net`, and
|
||||
`emo_net`. Every object is a cv::gapi::ie::Params structure
|
||||
parametrization for each particular network we use. On a compilation
|
||||
stage, G-API automatically matches network parameters with their
|
||||
cv::gapi::infer<> calls in graph using this information.
|
||||
|
||||
Regardless of the topology, every parameter structure is constructed
|
||||
with three string arguments -- specific to the OpenVINO™ Inference
|
||||
Engine:
|
||||
1. Path to the topology's intermediate representation (.xml file);
|
||||
2. Path to the topology's model weights (.bin file);
|
||||
3. Device where to run -- "CPU", "GPU", and others -- based on your
|
||||
OpenVINO™ Toolkit installation.
|
||||
These arguments are taken from the command-line parser.
|
||||
|
||||
Once networks are defined and custom kernels are implemented, the
|
||||
pipeline is compiled for streaming:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/age_gender_emotion_recognition/age_gender_emotion_recognition.cpp Compile
|
||||
|
||||
cv::GComputation::compileStreaming() triggers a special video-oriented
|
||||
form of graph compilation where G-API is trying to optimize
|
||||
throughput. Result of this compilation is an object of special type
|
||||
cv::GStreamingCompiled -- in constract to a traditional callable
|
||||
cv::GCompiled, these objects are closer to media players in their
|
||||
semantics.
|
||||
|
||||
@note There is no need to pass metadata arguments describing the
|
||||
format of the input video stream in
|
||||
cv::GComputation::compileStreaming() -- G-API figures automatically
|
||||
what are the formats of the input vector and adjusts the pipeline to
|
||||
these formats on-the-fly. User still can pass metadata there as with
|
||||
regular cv::GComputation::compile() in order to fix the pipeline to
|
||||
the specific input format.
|
||||
|
||||
# Running the pipeline {#gapi_ifd_running}
|
||||
|
||||
Pipelining optimization is based on processing multiple input video
|
||||
frames simultaneously, running different steps of the pipeline in
|
||||
parallel. This is why it works best when the framework takes full
|
||||
control over the video stream.
|
||||
|
||||
The idea behind streaming API is that user specifies an *input source*
|
||||
to the pipeline and then G-API manages its execution automatically
|
||||
until the source ends or user interrupts the execution. G-API pulls
|
||||
new image data from the source and passes it to the pipeline for
|
||||
processing.
|
||||
|
||||
Streaming sources are represented by the interface
|
||||
cv::gapi::wip::IStreamSource. Objects implementing this interface may
|
||||
be passed to `GStreamingCompiled` as regular inputs via `cv::gin()`
|
||||
helper function. In OpenCV 4.2, only one streaming source is allowed
|
||||
per pipeline -- this requirement will be relaxed in the future.
|
||||
|
||||
OpenCV comes with a great class cv::VideoCapture and by default G-API
|
||||
ships with a stream source class based on it --
|
||||
cv::gapi::wip::GCaptureSource. Users can implement their own
|
||||
streaming sources e.g. using [VAAPI] or other Media or Networking
|
||||
APIs.
|
||||
|
||||
Sample application specifies the input source as follows:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/age_gender_emotion_recognition/age_gender_emotion_recognition.cpp Source
|
||||
|
||||
Please note that a GComputation may still have multiple inputs like
|
||||
cv::GMat, cv::GScalar, or cv::GArray objects. User can pass their
|
||||
respective host-side types (cv::Mat, cv::Scalar, std::vector<>) in the
|
||||
input vector as well, but in Streaming mode these objects will create
|
||||
"endless" constant streams. Mixing a real video source stream and a
|
||||
const data stream is allowed.
|
||||
|
||||
Running a pipeline is easy -- just call
|
||||
cv::GStreamingCompiled::start() and fetch your data with blocking
|
||||
cv::GStreamingCompiled::pull() or non-blocking
|
||||
cv::GStreamingCompiled::try_pull(); repeat until the stream ends:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/age_gender_emotion_recognition/age_gender_emotion_recognition.cpp Run
|
||||
|
||||
The above code may look complex but in fact it handles two modes --
|
||||
with and without graphical user interface (GUI):
|
||||
- When a sample is running in a "headless" mode (`--pure` option is
|
||||
set), this code simply pulls data from the pipeline with the
|
||||
blocking `pull()` until it ends. This is the most performant mode of
|
||||
execution.
|
||||
- When results are also displayed on the screen, the Window System
|
||||
needs to take some time to refresh the window contents and handle
|
||||
GUI events. In this case, the demo pulls data with a non-blocking
|
||||
`try_pull()` until there is no more data available (but it does not
|
||||
mark end of the stream -- just means new data is not ready yet), and
|
||||
only then displays the latest obtained result and refreshes the
|
||||
screen. Reducing the time spent in GUI with this trick increases the
|
||||
overall performance a little bit.
|
||||
|
||||
# Comparison with serial mode {#gapi_ifd_comparison}
|
||||
|
||||
The sample can also run in a serial mode for a reference and
|
||||
benchmarking purposes. In this case, a regular
|
||||
cv::GComputation::compile() is used and a regular single-frame
|
||||
cv::GCompiled object is produced; the pipelining optimization is not
|
||||
applied within G-API; it is the user responsibility to acquire image
|
||||
frames from cv::VideoCapture object and pass those to G-API.
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/age_gender_emotion_recognition/age_gender_emotion_recognition.cpp Run_Serial
|
||||
|
||||
On a test machine (Intel® Core™ i5-6600), with OpenCV built with
|
||||
[Intel® TBB]
|
||||
support, detector network assigned to CPU, and classifiers to iGPU,
|
||||
the pipelined sample outperformes the serial one by the factor of
|
||||
1.36x (thus adding +36% in overall throughput).
|
||||
|
||||
# Conclusion {#gapi_ifd_conclusion}
|
||||
|
||||
G-API introduces a technological way to build and optimize hybrid
|
||||
pipelines. Switching to a new execution model does not require changes
|
||||
in the algorithm code expressed with G-API -- only the way how graph
|
||||
is triggered differs.
|
||||
|
||||
# Listing: post-processing kernel {#gapi_ifd_pp}
|
||||
|
||||
G-API gives an easy way to plug custom code into the pipeline even if
|
||||
it is running in a streaming mode and processing tensor
|
||||
data. Inference results are represented by multi-dimensional cv::Mat
|
||||
objects so accessing those is as easy as with a regular DNN module.
|
||||
|
||||
The OpenCV-based SSD post-processing kernel is defined and implemented in this
|
||||
sample as follows:
|
||||
|
||||
@snippet cpp/tutorial_code/gapi/age_gender_emotion_recognition/age_gender_emotion_recognition.cpp Postproc
|
||||
|
||||
["Interactive Face Detection"]: https://github.com/opencv/open_model_zoo/tree/master/demos/interactive_face_detection_demo
|
||||
[core]: @ref gapi_core
|
||||
[imgproc]: @ref gapi_imgproc
|
||||
[Architecture]: @ref gapi_hld
|
||||
[Kernels]: @ref gapi_kernel_api
|
||||
[VAAPI]: https://01.org/vaapi
|
42
doc/tutorials/gapi/table_of_content_gapi.markdown
Normal file
@ -0,0 +1,42 @@
|
||||
# Graph API (gapi module) {#tutorial_table_of_content_gapi}
|
||||
|
||||
In this section you will learn about graph-based image processing and
|
||||
how G-API module can be used for that.
|
||||
|
||||
- @subpage tutorial_gapi_interactive_face_detection
|
||||
|
||||
*Languages:* C++
|
||||
|
||||
*Compatibility:* \> OpenCV 4.2
|
||||
|
||||
*Author:* Dmitry Matveev
|
||||
|
||||
This tutorial illustrates how to build a hybrid video processing
|
||||
pipeline with G-API where Deep Learning and image processing are
|
||||
combined effectively to maximize the overall throughput. This
|
||||
sample requires Intel® distribution of OpenVINO™ Toolkit version
|
||||
2019R2 or later.
|
||||
|
||||
- @subpage tutorial_gapi_anisotropic_segmentation
|
||||
|
||||
*Languages:* C++
|
||||
|
||||
*Compatibility:* \> OpenCV 4.0
|
||||
|
||||
*Author:* Dmitry Matveev
|
||||
|
||||
This is an end-to-end tutorial where an existing sample algorithm
|
||||
is ported on G-API, covering the basic intuition behind this
|
||||
transition process, and examining benefits which a graph model
|
||||
brings there.
|
||||
|
||||
- @subpage tutorial_gapi_face_beautification
|
||||
|
||||
*Languages:* C++
|
||||
|
||||
*Compatibility:* \> OpenCV 4.2
|
||||
|
||||
*Author:* Orest Chura
|
||||
|
||||
In this tutorial we build a complex hybrid Computer Vision/Deep
|
||||
Learning video processing pipeline with G-API.
|