Compilation time reduction and housekeeping
Compare changes
+ 21
− 27
@@ -3,18 +3,14 @@ layout: fr
@@ -30,7 +26,7 @@ milestones:
@@ -114,7 +110,7 @@ It becomes obvious, that the wallet2.h is the largest "hot spot" of the whole pr
What can be done with this is creating as many wrappers of the boost library as possible and putting as much as implementation code into .cpp files, instead of insisting on writing them inline, when these spots aren't bottlenecks. Putting a trivial method as inline may help, but only when it's called very very frequently, and only if that improvement is a large percentage of other parts of the software, which it usually isn't. Inlining has to be proven by profiling the software, and not being a default policy, since it brings nothing, while costing a lot, not only because multiple recompiles of the code in .cpp files in one session, but recompiles upon changes of the inlined implementation.
@@ -125,45 +121,43 @@ Previous text:
Static libraries tend to grow in sizes exponentially and slow down the generation of the final binaries. I would like to enable (opt-in) dynamic linkage in CMake for development purposes. Also whenever you are done writing a new test and would like to just modify the production code and just execute the test, the test binary can be made so, that it doesn't have to relink upon change of the production code's resulting shared library.
Monero can't be currently compiled with CLang. If it were, some advanced tools could be employed, that help in dynamic assessment of the quality of the code from many perspectives. For my purpose, I could use (ClangBuildAnalyzer)[https://github.com/aras-p/ClangBuildAnalyzer], which gives an objective truth about which parts of the code take longest time to compile. There's also CLang-based (Include What You Use)[https://include-what-you-use.org/] tool, which not only gives advice how to optimize the bottlenecks, but also tries to do it automatically.
Some advanced tools can be employed, that help in dynamic assessment of the quality of the code from many perspectives. For my purpose, I could use (ClangBuildAnalyzer)[https://github.com/aras-p/ClangBuildAnalyzer], which gives an objective truth about which parts of the code take longest time to compile. There's also CLang-based (Include What You Use)[https://include-what-you-use.org/] tool, which not only gives advice how to optimize the bottlenecks, but also tries to do it automatically (however it's better to use it just as a hint).
Similar as above, however done weekly, since this will take more time. The context is somewhat different here however. Valgrind is able to perform performance tests, able to catch new bottlenecks by executing the tests. I think it would be benefitial, if such reports were available for the public, since their generation costs plenty of time.
There will surely occur a situation, when a boost header cannot be reasonably wrapped, because it is used in a template code. Such headers are best handled by precompiled headers, reducing the compilation time by up to 50% per precompiled headed. CMake 3.16 is able to generate them natively. Since some users will still be using older versions of CMake, this has to remain optional. I will start with this one before moving the headers away, as this is a low hanging fruit, delivered by CMake devs.
If the compilation is to be done faster, all of the 3rd party large headers have to be moved outside from our headers, thus preventing them to be propagated into files, that don't need them and waste time on parsing them. This can be done via forward declarations and careful analysis of the dependency tree.
My such header analysis shows, that there are currently 369 occurrences of boost headers. Since each compilation costs 8.5 minutes and each change 2.5 minutes, we are at 11/60 * 369 = 67.65h of active work, excluding time of testing and verifying the speed improvement (passive work). This leaves us with 27 XMR for the active work. Let's round it up to 30 because of uncertainty and required passive work, as well as power costs. This forces me to split the task into 3 parts for simplicity. But as before, if I'm done earlier that I calculated, I will admit this and will report the work time for each PR.
My such header analysis shows, that there are currently 369 occurrences of boost headers. Since each compilation costs 8.5 minutes and each change 2.5 minutes, we are at 11/60 * 369 = 67.65h of active work, excluding time of testing and verifying the speed improvement (passive work). This leaves us with 21.6 XMR for the active work. Let's round it up to 25 because of uncertainty and required passive work, as well as power costs. This forces me to split the task into 3 parts for simplicity. But as before, if I'm done earlier that I calculated, I will admit this and will report the work time for each PR.
It will be of a lot help, if abstractions (interfaces) were used instead of concrete implementations. Then you can easily share just the forward declarations of the unused parts of the interface for the client using the i-face, and include only these parts, which are needed. It can be achieved quite easily by creating and returning a unique pointer to an object of an implementation within a static function of the interface.
There are 358 .cpp files, and definitely more classes than that. If I were to start from the "hottest" 50 classes first, to achieve largest results at the beggining, I'd need 20 hours, assuming 15 minutes of active work on a class and 8.5 minutes of compilation time ((8.5+15)/60 * 50 = 19.58). This would equate to 7.8 XMR. Rounding up for the power costs, let's say 8 XMR.
There are 358 .cpp files, and definitely more classes than that. If I were to start from the "hottest" 50 classes first, to achieve largest results at the beggining, I'd need 20 hours, assuming 15 minutes of active work on a class and 8.5 minutes of compilation time ((8.5+15)/60 * 50 = 19.58). This would equate to 6.26 XMR. Rounding up for the power costs, let's say 7 XMR.
Did you know, that ctest allows for running the tests in parallel, just like make does? The problem is, that if they use the same resources during execution, they might (and in our case they do) affect each other. The task here would be to group the tests, which use the same resources and run them sequentially, while running other similar groups in parallel.
I'd like to address here the problems mentioned by Endogenic (highlighted at Konferenco)[https://www.youtube.com/watch?v=AsJaMw-3gGE&feature=youtu.be&t=25614] (thanks, Scott Anecito!), namely making the wallet2 as stateless as possible. I propose here 4 XMR, as this is one of the largest classes in the whole project (if not the largest).
@@ -173,4 +167,4 @@ Shall we make it 3 XMR?
\ No newline at end of file