View on GitHub

Ethereum-Machine-Learning-Project

Simulated base fee by modelling different variables and analyzing them.

Goal

Ethereum introduced EIP 1559 which brought many changes to how ethereum worked. One of the big changes was that transactions included a basefee. Since there were already assumptions about how basefee might behave, simulations could be created and compared to actual basefee behavior to try and verify if the original assumptions were correct. Interestingly enough, aside from verifying assumptions, many other discoveries were found in the simulation analysis.

Assumptions

Valuations

Definition: Valuations is an economic term which means how much a good is worth to a person.

In the most simple scenario, this means a valuation can be found by how much a person is willing to pay for a good. However, the data team I worked with was also interested in how valuations change both pre- and post-EIP 1559. So simply using the highest bid would not be sufficient in this case.

Solution: In order to try and tease out a users valuation, we took their bids and divided it by a common reccommended bid provided by Go Ethereum (GETH). Most users of Ethereum will see the GETH recommended bid as they are putting in their transactions so how they bid relative to GETH recommendation is a good indicator of their valuation. Mathmatically, we defined valuations as (user’s bid)/(GETH recommended bid).

image

Above is a histogram showing valuations generated by using bids from October 2021. There are two interesting points here:

Disclaimer: When modelling valuations, I used a Gamma distribution, this was to help other members of my team that needed a named distribtuion for valuations. This is not neccesarily the best distribution to fit to valuations, it is more likely that a non-parametric model may be a better fit.

image image image image

Above are the histograms of gamma distributions that were fitted to the valuation and the QQ plots of the gamma distributions. The QQ plots aren’t quite a perfect fit but they are pretty close. The parameters of the low valuation gamma distribution are k=59.6849 and theta=0.0105. The parameters of the high valuation gamma distribution are k=52.1001 and theta=0.0185. The parameters were solved for by using maximum likelihood estimator.

Gas Used

Definition: Gas is how complex a transaction is to be included into a blockchain. While not strictly true, it is a good proxy as a metric for demand.

image

Above is a histogram of how much gas is used per transaction. Notably, it doesn’t really look like any named distribution but that’s ok. In this case, I simply bootstrapped the gas used in order to use it in the simulation.

Base Fee Update

As mentioned earlier, the base fee update formula is:

Simulation

This will be a very brief summary of how the simulations work, the finer details can be found in the code. Basically, now that I have a model for how full blocks will be and how much users will bid for their transactions, I can simulate blocks and fill them with transactions. This also means that I can simulate how the base fee behaves.

image image image image

So the simulation looks somewhat similar to real base fee. One interesting property is that the spikes going upward tend to be really sharp and take only a couple of blocks while spikes going downwards are fatter and takes more blocks to occur. Finally, I use an ARIMA model to compare the two time series.

ARIMA

Below are the PCAF and ACF graphs which suggest parameters of p=2, d=0 and q=0.

image image image image

Then below are the graphs of the residuals:

image image image image

So it looks like the ARIMA models aren’t fitted perfectly to the time series but an interesting note is that the ARIMA models tend to underpredict the base fee more often than overpredict. This is likely due to the sudden sharp spikes which I think is caused by the idea that there are two distributions of users, low and high valuation users. In the simulation formula, inclusion of high valuation users always increases the base fee whereas low valuation users do not neccesarily decrease the base fee.

Conclusion

Overall, improvements can be made to the simulation but it was able to generate ARIMA models with similar parameters to the ARIMA model for real base fee data. Going forward, I think the main way to improve this simulation is if there was a variable to distinguish low valuation and high valuation users.