Compilation time reduction and housekeeping
Compare changes
- mj authored
+ 93
− 24
@@ -3,30 +3,62 @@ layout: fr
@@ -48,7 +80,7 @@ payouts:
@@ -82,26 +114,63 @@ It becomes obvious, that the wallet2.h is the largest "hot spot" of the whole pr
What can be done with this is creating as many wrappers of the boost library as possible and putting as much as implementation code into .cpp files, instead of insisting on writing them inline, when these spots aren't bottlenecks. Putting a trivial method as inline may help, but only when it's called very very frequently, and only if that improvement is a large percentage of other parts of the software, which it usually isn't. Inlining has to be proven by profiling the software, and not being a default policy, since it brings nothing, while costing a lot, not only because multiple recompiles of the code in .cpp files in one session, but recompiles upon changes of the inlined implementation.
As I can't show you anything from my work, I propose to make the first milestone for just 1 XMR, to see if there's any interest and that we have the same philosophy. For the remaining milestones I'd like to earn something like 40$/h. It's hard now to assess how much time it will take, so I'm not strict on the concrete values.
"I'd like the CMake script to automatically pick ccache and clcache, when it can find them in the PATH. This piece of software helps in reducing the compilation time of compilation units (.cpp files and all their included headers), when their content hasn't changed. This means, that the more forward declarations and fewer included headers your headers have, the more the ccache will be able to leverage your discipline. This is especially useful when switching between branches."
Static libraries tend to grow in sizes exponentially and slow down the generation of the final binaries. I would like to enable (opt-in) dynamic linkage in CMake for development purposes. Also whenever you are done writing a new test and would like to just modify the production code and just execute the test, the test binary can be made so, that it doesn't have to relink upon change of the production code's resulting shared library.
Monero can't be currently compiled with CLang. If it were, some advanced tools could be employed, that help in dynamic assessment of the quality of the code from many perspectives. For my purpose, I could use (ClangBuildAnalyzer)[https://github.com/aras-p/ClangBuildAnalyzer], which gives an objective truth about which parts of the code take longest time to compile. There's also CLang-based (Include What You Use)[https://include-what-you-use.org/] tool, which not only gives advice how to optimize the bottlenecks, but also tries to do it automatically.
Similar as above, however done weekly, since this will take more time. The context is somewhat different here however. Valgrind is able to perform performance tests, able to catch new bottlenecks by executing the tests. I think it would be benefitial, if such reports were available for the public, since their generation costs plenty of time.
There will surely occur a situation, when a boost header cannot be reasonably wrapped, because it is used in a template code. Such headers are best handled by precompiled headers, reducing the compilation time by up to 50% per precompiled headed. CMake 3.16 is able to generate them natively. Since some users will still be using older versions of CMake, this has to remain optional. I will start with this one before moving the headers away, as this is a low hanging fruit, delivered by CMake devs.
If the compilation is to be done faster, all of the 3rd party large headers have to be moved outside from our headers, thus preventing them to be propagated into files, that don't need them and waste time on parsing them. This can be done via forward declarations and careful analysis of the dependency tree.
My such header analysis shows, that there are currently 369 occurrences of boost headers. Since each compilation costs 8.5 minutes and each change 2.5 minutes, we are at 11/60 * 369 = 67.65h of active work, excluding time of testing and verifying the speed improvement (passive work). This leaves us with 27 XMR for the active work. Let's round it up to 30 because of uncertainty and required passive work, as well as power costs. This forces me to split the task into 3 parts for simplicity. But as before, if I'm done earlier that I calculated, I will admit this and will report the work time for each PR.
I'd like to prove, that the serialization doesn't need to happen inside a header file, but can reside nicely in a .cpp file. This will make the inclusion of the boost serialization headers useless in the header file. Notice how much less reasons to change (thus recompiles) will the header then have, on every protocol version change. I've had a similar problem in my trading platform and got away by just forward declaring the boost archive type in the header.
It will be of a lot help, if abstractions (interfaces) were used instead of concrete implementations. Then you can easily share just the forward declarations of the unused parts of the interface for the client using the i-face, and include only these parts, which are needed. It can be achieved quite easily by creating and returning a unique pointer to an object of an implementation within a static function of the interface.
There are 358 .cpp files, and definitely more classes than that. If I were to start from the "hottest" 50 classes first, to achieve largest results at the beggining, I'd need 20 hours, assuming 15 minutes of active work on a class and 8.5 minutes of compilation time ((8.5+15)/60 * 50 = 19.58). This would equate to 7.8 XMR. Rounding up for the power costs, let's say 8 XMR.
I'd like the CMake script to automatically pick ccache and clcache, when it can find them in the PATH. This piece of software helps in reducing the compilation time of compilation units (.cpp files and all their included headers), when their content hasn't changed. This means, that the more forward declarations and fewer included headers your headers have, the more the ccache will be able to leverage your discipline. This is especially useful when switching between branches.
There will surely occur a situation, when a boost header cannot be reasonably wrapped, because it is used in a template code. Such headers are best handled by precompiled headers, reducing the compilation time by up to 50% per precompiled headed. CMake 3.16 is able to generate them natively. Since some users will still be using older versions of CMake, this has to remain optional.
Did you know, that ctest allows for running the tests in parallel, just like make does? The problem is, that if they use the same resources during execution, they might (and in our case they do) affect each other. The task here would be to group the tests, which use the same resources and run them sequentially, while running other similar groups in parallel.
"iwyy" is a tool, which automatically highlights possible optimisations, like forward declarations, etc. I will use it with combination of my header evaluation and just logical thinking. The milestone is mostly about setting up the tool and showing that it works for the project. This will open up other possibilities like Lint checks with LLVM in future, but this is out of scope for now.
I'd like to address here the problems mentioned by Endogenic (highlighted at Konferenco)[https://www.youtube.com/watch?v=AsJaMw-3gGE&feature=youtu.be&t=25614] (thanks, Scott Anecito!), namely making the wallet2 as stateless as possible. I propose here 4 XMR, as this is one of the largest classes in the whole project (if not the largest).
The order of writing header files is not just a matter of taste. The proper order is local first, and more generic at the bottom, because only this way you could discover hidden dependencies, that you force the client to include manually. This is nicely described (here)[https://blog.knatten.org/2010/07/01/the-order-of-include-directives-matter/] and (here)[https://stackoverflow.com/questions/2762568/c-c-include-header-file-order/2762596#2762596]
\ No newline at end of file