Computational work for OSPEAD parameterization
Computational Work for OSPEAD Parameterization
Abstract
The OSPEAD project led by Rucknium will substantially improve user privacy by reducing Monero's statistical attack surface. The project's goal is to overhaul the decoy selection algorithm (DSA) to remedy weaknesses in Monero's privacy model.
To achieve the best fit for OSPEAD parametrization, the analysis code developed by Rucknium needs to be computed over a much larger portion of recent Monero transactions. The number of rings that can be processed is the “sample size” for the statistical analysis, so we desire as many data points as possible to enable Monero's OSPEAD-informed DSA to match the true spend distribution as closely as possible.
The first step to achieving a large sample size is porting the current OSPEAD code written in R (a slow interpreted language) to a fast compiled language like C++ or Rust. However, even with faster code, analyzing the rings one at a time in series will limit the practical sample size. To unlock analysis of 10^5 - 10^7 rings, it is also necessary to parallelize the code and run the analysis on a cluster or high performance workstation which contains dozens of CPU cores. Without these changes, the current R code would take years to run. With these changes the analysis will be executable with several days to weeks of compute time. This unlocks a critical bottleneck for OSPEAD parametrization, which itself is critical for Monero user privacy and Monero's future as a whole.
In addition to efficiency improvements described above, we will increase precision in the final output by pre-filtering transactions with heuristics based on transaction uniformity defects. For Rucknium's analysis, the quality of the results is degraded if the analysis includes rings that were not generated by wallet2
software (the standard reference implementation). To avoid this, we will filter out non-wallet2
transactions by applying a number of fingerprinting techniques developed by Isthmus and Neptune over the last 4 years of Noncesense Research Lab R & D. These include features such as unlock times and fees that would not be produced by standard wallets.
Current Status
The OSPEAD project has made exciting progress toward its goal of securing the decoy selection algorithm (DSA) against statistical attack, as outlined in Rucknium's original proposal. Given the statistical nature of the work involved, all algorithms and analysis is written in one of the primary programming languages used for statistics work: R. The benefits of R for statistical analysis are many fold, and it is certainly the right language for the job given the statistical focus of this project.
However, the OSPEAD project is currently bottlenecked by computation as there are an enormous number of rings to process from the last 6 months alone. There are certain computationally-heavy components of the code base that, when executed in R, result in the overall analysis taking 3 months to compute 1 week of historical data. While R is a best-in-class language for statistical programming, it is a higher-level language and thus quite slow in execution. Furthermore, the current code is single threaded and only runs on one CPU core. All of this is leading to a prohibitively-expensive computational barrier, which would need years of computation to complete the months of transaction analysis that is now needed to complete the OSPEAD project.
Proposed Analysis Engine Optimization Work
This proposal is designed to solve the challenge posed by the computational barriers described above. Specifically it allows the much faster processing of more data, in order to achieve the most robust OSPEAD results possible. If this project is approved, the following will be delivered to the OSPEAD team and the Monero community.
- Identify which parts of the code need to be sped up: Benchmark which components of the current OSPEAD R code are causing the computational bottlenecks described in the above section.
- Translate from R to a fast compiled language: Translate the bottleneck components into a fast, compiled language like C/C++ or Rust and bundle the translated code into R packages for easy integration. It will be critical for all computation and functionality to be precisely preserved, and to interface with remaining R code seamlessly, which will require significant engineering as well as robust testing. We expect that this translation step will result in a roughly 10x speed up.
- Parallelize the sped-up code: Once the code has been ported, we will further reduce runtime by parallelizing the existing code across multiple CPU cores. Parallelization can be carried out in R or the compiled language components, as needed to get the desired speedup. Parallelizing code can be quite challenging and requires expertise to work through the various common issues that develop, but the effort will be well worth it, as this step is expected to speed up computation approximately linearly with the number of CPU cores.
- Run the optimized code on a scientific workstation with 64 cores: As part of this CCS proposal, a scientific workstation will be leveraged to complete analysis and computation of the results needed by OSPEAD on 64 CPU cores. This CCS commits up to 23,000 CPU hours, providing more than enough compute needed to complete the OSPEAD analysis on a large number of recent rings with the above improvements.
- Results delivered for use in OSPEAD parameterization: Deliver code and results only to Rucknium, with full documentation and scripts designed to help others run the above code in the future (e.g. enabling the code to be run on MRC). All deliverables from this CCS will become part of the OSPEAD project and disclosure process managed by Rucknium.
Timeline & Budget
The overall project to speed up OSPEAD computation will take 5+ full time engineering weeks and require significant investment in computational resources. The breakdown is as follows:
- Benchmarking bottlenecks: 0.5 weeks full time equivalent (FTE) engineering work
- Translating bottleneck code and ensuring continued interoperability with initial code: 1.5 weeks FTE
- Parallelizing code to run on multiple CPUs: 1.0 weeks FTE
- Analysis to filter out non-
wallet2
transactions: 1.0 weeks FTE - Running optimization on scientific work station: 0.5 weeks FTE and up to 23,000 CPU hours committed
- Final deliverables including documentation; 1.0 weeks FTE
Total time: 5.5 weeks FTE + 23,000 CPU hours. We expect the engineering work to be complete by Mar 30, 2023.
Total budget: 190 XMR, paid at the start of work to avoid exchange rate volatility risk.
Team
Mitchell Krawiec-Thayer (Isthmus) and the Geometry Labs engineering team
Dr. Mitchell Krawiec-Thayer is a privacy tech researcher whose Monero contributions have largely focused on empirical transaction tree analysis leveraging statistical heuristics and transaction uniformity defects. Other past Monero work includes fingerprinting the mid-2021 transaction volume anomaly, a statistical analysis of nonce value distribution, an opportunistic study of miner equipment types, and identification of the heuristics that will be used to filter non-wallet2
transactions in this project. Mitchell is the President and Chief Scientist at Geometry Labs and will coordinate this CCS process. The lead engineer on this project will be Cheyenne Atapour, a blockchain research engineer with experience in developing performant systems. He has the experience with compiled languages such as C++ and Rust necessary to optimize the OSPEAD code. Geometry Labs is a blockchain and cryptography research & development team, whose specializations include: blockchain infrastructure, scientific tooling, analytics and observatories, and development of novel cryptographic mechanisms.