Computational work for OSPEAD parameterization

I'd like to know Ruckniums opinion on this.

The proposal was developed with my input.

From the beginning, the plan was to get help from C++ developers. From the original proposal:

I will work with j-berman and mj to develop and implement OSPEAD, as they outlined in their own CSS proposals.

j-berman is working on Seraphis now. mj's proposal to help with OSPEAD failed to raise funds.

isthmus approached me after the January 18, 2022 Monero Research Lab meeting where I said

So like I have said before, OSPEAD will have some blind spots due to lack of C++ support. But the plan is mostly intact. Unless someone else wants to work on the C++ work or mj finds funding another way....With a slow implementation, the estimate will not be as precise. And we will probably not have a good measure of how precise it is.

Under certain regularity conditions, the estimator of the real spend age distribution is guaranteed to get the right answer when the sample size is infinite. (The estimator is consistent). Of course, the number of Monero transactions is not infinite. In situations such as this, it is recommended to use two related statistical procedures:

Monte Carlo simulation. You generate a sample with a random number generator. Then you apply your estimator to the sample. Then generate and estimate again many times. Usually 1000+, but 100 may be acceptable if the estimation takes a long time. Then you compare the estimation results to the true values of what you are trying to estimate. You know the true values since you chose them when you generated the random sample. If the estimates are close to the true value, that's good news. If not, you need to increase the sample size (in our case it would be increasing the time window of the sample from weekly to monthly) or adjust the estimator.
Bootstrapping. Here the real data is used. You randomly sample from the real data with replacement, run the estimator, and store the results. Like Monte Carlo simulation, you do this 1000+ times. The stored results can tell you the approximate variability of the estimator. It can be used to calculate confidence intervals. If the estimate is precise, great. If not, you need to take the low precision into account when making decisions based on the estimate.

If you have an estimation procedure that is already computationally intensive and then do it 1,000+ times for Monte Carlo simulation and bootstrapping, you are potentially colliding with the limits of feasibility. The estimator is now written in R. R can be fast, but it has known weaknesses in these circumstances:

Sometimes R code just isn’t fast enough....Typical bottlenecks that C++ can address include:

Loops that can’t be easily vectorised because subsequent iterations depend on previous ones.

Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.

Problems that require advanced data structures and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues.

I believe that it will be possible to have one run of the R code for each week of data if this CCS proposal is not funded. Monte Carlo simulation and bootstrapping with the real spend age estimator are probably not possible with R. They are only possible with a compiled language like C++ or Rust.

Having little knowledge of the finite-sample performance of the estimator makes me nervous as a statistician. The phrases that come to mind are "blind spots" and "corner cutting". If isthmus is willing to manage the C++/Rust writing and the community is willing to fund it, then we should definitely do it.

My main concern with this proposal is that Cheyenne Atapour does not have much public C++ code to judge his capability to execute. I am writing a small "test" for Cheyenne to satisfy my concerns about this.

isthmus has an extensive body of work / contributions to the Monero project / active MRL member who is undertaking this proposal under the banner of their real company leaving no doubt of 'is this a great proposal' or not. But i must pose these questions with the bigger picture in mind.

A question I've seen asked elsewhere: Does the current existing infrastructure available to the MRL team have any effect on this proposal https://ccs.getmonero.org/proposals/gingeropolous_zenith_storage.html

And another from myself regarding the 100% upfront request. Can you specify the USD amount you require for this work and what buffer you are adding for the time frame you expect to complete the work (i see 5.5 weeks) and point to where your volatility analysis supports this buffer for the time frame. Perhaps an upfront amount less than 100% would make sense? https://github.com/Mitchellpkt/volatility_analysis

Edit* as per the latest transparency report if this proposal is considered fundamental to the Monero project by Core, it may benefit from receiving a contribution to make up for any volatility shortfalls. but before even considering this, you must be forthcoming with your USD rates to complete the work / buffer when you have completed you final investigations.

Could you make a comment on how this could be applied to the input selection algorithm that Seraphis would be using? Maybe @j-berman can also give some comments here about his experience with it?

I can speak to this. The current decoy selection algorithm samples decoys independently, without replacement, from a Gamma(19.28, 1/1.61) distribution as suggested by Moser et al. (2018). The parameters are defined in wallet2 and did not change when ring size increased from 11 to 16 in the August 2022 hard fork. The decoy selection algorithm does not depend on the number of decoys. It just repeats the random sampling until it has enough decoys to satisfy Monero's consensus rules.

The decoy selection algorithm that OSPEAD will suggest also will not depend on the ring size. I believe that the Seraphis code is currently planning to do "binning" by selecting a set of blocks by an algorithm similar to today's algorithm and then including a set of outputs within each of those blocks. (Binning has not been rigorously analyzed for potential advantages and disadvantages for Monero user privacy.) For example, with ring size 128 the algorithm could select 16 blocks by the Moser or OSPEAD probability distribution and then select 8 of the outputs within that block as the actual ring members. The real spend would be one of those ring members, of course. If OSPEAD passes review, its probability distribution would be used in the first step of the binned ring member selection procedure.

Mimicking the real spend age distribution as closely as possible is important regardless of ring size (except if rings were to include all transaction outputs). In the original OSPEAD proposal I wrote:

Increasing the ring size is part of Monero's long-term development roadmap. However, I have produced evidence that the statistical vulnerability would still remain with larger ring sizes. Raising the ring size from 11 to, say, 16 would barely dent the potency of my attack. Raising the ring size to 256 would mitigate the attack to a substantial degree, but user privacy might still be at some risk. In other words, we cannot get ourselves out of this problem by simply raising the ring size.

Hi everybody, thanks so much for the thoughtful comments and questions. I’ve been sharing brief updates on IRC for MRL meetings, and wanted to drop an update here as well.

Rucknium provided a toy function in R with corresponding tests, and Geometry Labs produced (at no cost) a little demo of the workflow for using C++ in an R setting. You can check it out here: https://github.com/geometry-labs/workflow_demo#readme

This week we plan to study the most severe OSPEAD bottlenecks, to sketch out the optimization & parallelization plans with enough detail to (1) put an accurate time estimate on the final scope, and (2) confirm that we have enough engineering time and bandwidth blocked out to execute the project before moving the CCS forward for funding.

If the final plan looks good in terms of scope and timing, I’ll update the proposal accordingly and add another comment. (I’ll also circle back and answer the questions from above comments)

Hello everyone, just a brief update. I've decided to carry out much of this work on my own time, at no cost. To avoid confusion, I'll be closing this CCS for now, but I might reopen it or create a new one if circumstances change. Thank you all again for your valuable input and support :)

What made you make such a surprising decision?

I mean. To have 190 or not have them is quite a difference.

Unfortunately, Geometry just doesn’t have enough Q2 bandwidth to commit to completing the entire project at this time.

I just plan to turn my upcoming hobby time towards helping with data cleaning (by identifying and filtering non-wallet2 transactions, as described above).

Of course that is only a portion of the original planned scope. But I’m hoping that chipping away at my part on the weekends will allow me to make a small but meaningful personal contribution to the OSPEAD effort.

closed

Any updates on Geometry's current workload? @Mitchellpkt

Computational work for OSPEAD parameterization

Computational Work for OSPEAD Parameterization

Abstract

Current Status

Proposed Analysis Engine Optimization Work

Timeline & Budget

Team

Activity

Computational work for OSPEAD parameterization

Computational Work for OSPEAD Parameterization

Abstract

Current Status

Proposed Analysis Engine Optimization Work

Timeline & Budget

Team

Merge request reports

Activity