mj requested to merge mj/ccs-proposals:mj-part-time-2022-03 into master Jan 31, 2022

What

In the same way as previously, I propose to work for 3 months, spending 30 hours a week on Monero and tsqsim.

In the last 3 months, I was able to achieve quite a progress with my tsqsim tool, which you may discover from my latest dev report on Reddit. Briefly speaking, I was able to deliver a minimalistic, yet working version of the simulator for the Monero Researchers, that allows for:

operation via a handy and simplistic GUI app, wxConfigurator
selection of prediction models and creation of own models by rewriting the template ones
optimization of the models' parameters using 3 methods, 2 of which are perfectly scalable for a larger number of parameters. The remaining one is just a naive reference implementation
using industry standard Python-based Time Series Analysis plots, such as ACF, PACF and Seasonal Decomposition (see the Reddit report above for more explanation)
using Python-based data plotting for platforms as an alternative, where the more advanced Qt App isn't available (yet)
R integration, where Rucknium wants to conduct his research
Walk Forward Validation (mostly)
a few lesser things, that you may read about in the project's Wiki

I'd like to continue working on tsqsim to make it more accessible for the most of time. About once in a week I'll pro-actively take a closer look at the opened Pull Requests and review those, where I'm a good match, as well as check if the Continuous Integration works fine. Otherwise I'm always available for the Team per request, whenever they discover a fitting task for me earlier than I discover it myself. This work model has already been practiced with a great success IMHO.

As always, I will keep generating my Monero health report, as new branches get merged.

Why

The tsqsim simulator is needed for OSPEAD, but already more uses are envisioned, whenever any kind of predictions, or outlier detections are needed. The simulator is written in such a modular way, that allows to mix various research branches at the same time.

tsqsim development plan

Now I'd like to present the details of the planned work. Apart from features, that aren't ready yet, there are also such, that are basically working, but need some final touch. I've already got a few requests from Rucknium. Some are easy, some not. Let's pinpoint them:

Rucknium requested “Confidence Bands” https://en.wikipedia.org/wiki/Confidence_and_prediction_bands for the predictions. Although this is a standard thing to have, I must admit, that this is completely non-existent in the simulator, as I've never needed them myself. Effort: medium (~ 1 week).
Rucknium's 2nd request were weekly discrete time steps, in order to cancel out the strong intra-day seasonality. Currently I use the following periods: minutely, 5 min., 15 min., 30 min., 1 h, 2 h, 4 h, 12 h, and finally: 1 day. I will take the liberty of adding the monthly at the same time. Effort: easy.
It would be great if the QT App were available on all platforms, and not just on the ones outdated by one generation. Effort: easy but time consuming (~ 2-3 weeks)
A lot of the QT App's advanced interactive features had to be temporarily disabled due to my decoupling effort, so that I could deliver the app. Effort: medium (~ 1-2 weeks)
The 1st difference transformation of the original series turned out to become essential for Monero, even though I thought it would be just a nice addition. To bring some context: the 1st difference transformation is being done in TSA (Time Series Analysis) in order to remove the trend from the original series, so that the standard prediction models (like ARIMA) can be used. The Monero's transaction volume doesn't trend in the short term, but it does so in the higher time scales, where it obviously trends higher across time. Please see below how the TSA tool, called Autocorrelation Function interprets the original series:

http://cryptog.hopto.org/monero/sim/prop-2022-02/autocorrel-norm.png

Taking a look at the most important lags, namely the ones to the left, you may see, that their autocorrelation (blue plot) is low, so rather random, and pretty close to be statistically insignificant (within the gray horizontal bands). OTOH, if we apply the 1st difference transformation for the same data, we are left with the following, more promising plot:

http://cryptog.hopto.org/monero/sim/prop-2022-02/autocorrel-1st-diff.png

At least the 1st lag shows a high negative autocorrelation with enough statistical significance.

A numerical representation of both of the above situations is shown below:

Original series:

  Dickey-Fuller GLS Test Results
  ====================================
  Statistic                     -7.492
  P-value                      < 0.001
  Optimal Lags                       5
  Criterion                        AIC
  Trend                       constant
  ------------------------------------

  Test Hypothesis
  ------------------------------------
  H0: The process contains a unit root
  H1: The process is weakly stationary

  Critical Values
  ---------------
   1%      -2.583
   5%      -1.963
  10%      -1.643

  Test Conclusion
  ---------------
  We can reject H0 at the 1% significance level

Differenced series:

  Dickey-Fuller GLS Test Results
  ====================================
  Statistic                    -12.006
  P-value                      < 0.001
  Optimal Lags                      14
  Criterion                        AIC
  Trend                       constant
  ------------------------------------

  Test Hypothesis
  ------------------------------------
  H0: The process contains a unit root
  H1: The process is weakly stationary

  Critical Values
  ---------------
   1%      -2.570
   5%      -1.952
  10%      -1.632

  Test Conclusion
  ---------------
  We can reject H0 at the 1% significance level

As you can see, even though both series appear stationary, the differenced one achieved a better score in this metric (-12.006 for the differenced series vs. -7.492 for original one). On higher time scales, where the trends start to be significant, the discrepancy between the original and differenced series becomes even more apparent.

The feature is actually coded, but still contains a small but nasty bug, where upon the reconstruction of the prediction in the differenced domain (the differences or changes of the volume) back to the original domain (the volume itself), there are some discrepancies at the beginning of the reconstructed series, that distort the further predictions. I have already isolated the problem via the according unit tests Test 1, Test 2, Test 3. This means, that I can instantly reproduce the described problem and after fixing it, immediately protect the system from regressing, by keeping these tests enabled for each compilation. Effort: Hard but should be quick (~ 1 week)

walk forward validation is mostly completed, but needs better UI and more consistency regarding data holding: adding data/records instead of replacing them for each validation window. Effort: Easy (< 1 week)
the system is seriously missing documentation. While it will be enough for a start to have just a few 1 on 1 sessions with the interested Researchers, I'd like them to be less dependent on me, being able to spend time with them. Effort: Easy but time consuming (~ 1-3 weeks)

I think this would be it for the most important tasks, that are easy enough to describe without having to contrive cases. The remaining tasks are available in my Wiki. If you have questions however, I'm ready to answer them.

Who

mj, I have been contributing to Monero-core since 2020. Here is a list of my previous work, all related to Monero, even if it got upstreamed.

Previous reports

Here is a list of the previous reports, that describe my completed or started tasks in more detail:

Previous CCS: https://ccs.getmonero.org/proposals/mj-part-time-2021-q4.html

Proposal

I will spend 30 hours a week on Monero for the next 3 month period, realistically starting from March, soonest from 2nd half of February.

I propose a wage of 45 €/h for 3 months. As of 31.01.2022 the XMR/EUR is at around 128 €. This would make a total of: 45 €/h * 30 h/week * 4 weeks * 3 months / 128 XMR/EUR = 126.56 XMR, rounded down to be divisible = 126 XMR

Cheers!

Expiration date

31 Jan, 2023

Edited Mar 14, 2022 by mj

Draft: mj part time coding 2022-03

What

Why

tsqsim development plan

Who

Previous reports

Proposal

Expiration date

Merge request reports