Friday, April 29, 2016

Evaluation of the Qt Quick Scene Graph Performance

To get a grasp of what I should do to achieve the best performance I played a little with Qt Quick Scene Graph.

Difference between QPainter approach and Qt Quick Scene Graph.

QPainter, which is the base of drawing in KStars, uses an imperative way whereas QtQuick Scene Graph utilizes declarative paradigm. In Scene Graph you add some set of "nodes" (classes with prefix QSG) to the root node that is returned by calling QQuickItem::updatePaintNode() whenever you want to render QQuickItem and manipulate them during the runtime (change position, geometry, material, etc.) This gives possibilities to perform some optimization like batching the nodes to draw them in fewer calls to OpenGL, which can be of tremendous help for us in drawing stars, for example.

SkyMapLite

My initial idea of SkyMapLite in KStars Lite was to create for every SkyComponent separate QQuickItem derived object and reparent it to SkyMapLite, thus making it visible. The QQuickItem derived object would contain the painting code and communicate with SkyComponent to get the actual coordinates of the object in sky and convert them using Projector to "x" and "y" coordinates. However, it seemed to me that this approach would result in making a lot of unnecessary actions like calling signals during the change of "x", "y" and "visible" properties.

Another approach is to directly ask SkyMapLite to instantiate a QSGNode derived object, which would visually represent SkyComponent, and add it to the root node of SkyMapLite, thus bypassing the QQuickItem. However, this way the code would become more complicated as the Scene Graph on the most platforms lives in the separate thread and possibilities in communicating with the GUI thread from QSGNodes are slightly limited.

To test both approaches I made two versions of program that just shows you 5000 .png images of blue balls with the resolution 16x16 (Use of images with the resolution of some power of 2 allows better batching). The first version - each ball is a separate QQuickItem derived object with overridden updatePaintNode() function, and the second one consists of the object of class Canvas, which instantiates objects of QSGNode derived class that paints the same texture on the screen. I also played with plain rectangles instead of textures and the results of my small test you can find here. The table shows the number of CPU counts used to create nodes and render the first frame of the program.

Areas with similar color are drawn in batch. Visualized by setting QSG_VISUALIZE env. variable to "batches"

As you can see from the test results - batching reduces the number of instructions executed in CPU significantly. As for the difference between two approaches - the direct creation of QSGNodes doesn't perform significantly better than instantiating QQuickItems.

So what I noticed was that rectangles with different colors are not drawn in batch, and late very good expert in Scene Graph, Gunnar Sletta, told me that I should use QSGGeometryNode + QSGVertexColorMaterial instead of QSGSimpleRectNode. Also to draw the texture in batch the QQuickWindow::TextureCanUseAtlas flag has to be set during the creation of texture and to include a lot of texture in texture atlas you probably would have to set QSG_ATLAS_WIDTH and QSG_ATLAS_HEIGHT env. variables to the GL_MAX_TEXTURE_SIZE. You can visualize what textures are included in atlas by setting QSG_ATLAS_OVERLAY to 1. A structure of all nodes in Scene Graph can be obtained by setting QSG_RENDERER_DEBUG to "dump". You can read our full conversation here.

Then I made textures "movable" and tried to measure how much CPU counts requires setting "x", "y" and "visible" properties when the QQuickItem is moved or gets out of screen. xChanged() and yChanged() each took about 2 500 000 counts during the few seconds of moving 5000 objects out and to the screen and visibleChanged() signal required 238.564 instructions to execute.
https://www.blogger.com/blogger.g?blogID=6452817988747192886#editor/target=post;postID=6946697078078991651
I tested the "moving" version with QQuickItems on my laptop and Nexus 7 (2nd gen) and got fairly smooth results, especially when part of the items was set to invisible state because they were out of the screen. I even managed to run it with 1000 objects on my old Motorola Defy with Android 2.3!

Results

I inserted a counter in SkyObject constructor to see that KStars creates about 14400 of them with standard set of catalogs (this count is just to get an idea about approximate amount of objects). Given that number and the fact that various objects appear only on some levels of zoom and not all of them will be inside the current user's view, although we won't be able to batch everything in SkyMapLite, as elements differ in their visual representation, I think that my initial approach with QQuickItem will be enough.
Doing all of this was very useful to me as I've used for the first time the Valgrind profiler and now know better the internals of Scene Graph.
From now on I will concentrate on prototyping the SkyComponentsLite derived objects and details of SkyMapLite.

No comments:

Post a Comment