Bitcoin & Data Science
Welcome! This is the first in a series of #RunTheNumbers posts I intend to write about Bitcoin where I will focus on data analysis and/or interesting topics around Bitcoin data. This also marks the first time I am writing publicly about Bitcoin!
For this first edition I want to focus on the very nature of the data stored in Bitcoin’s blockchain — immutable and timeless. By this I mean that the 1’s and 0’s recorded into each block which are independently verified and stored by thousands of nodes around the world is impossible to censor or erase. This has some powerful implications because as we will see, more information than just Bitcoin transactions can be stored in the blockchain.
On the latest episode of the Noded Bitcoin Podcast, lawyer Justin Wales points out that when Satoshi Nakamoto mined the genesis block in January 2009 he changed the world in more ways than one. In addition to a revolution in the nature of money, Satoshi broadcasted the world’s first immutable message: “The Times 03/Jan/2009 Chancellor on brink of second bailout for banks.” The purpose of including this now-famous Times headline in the very first Bitcoin transaction was two fold: as a means of timestamping the launch of the network, and as a political statement against fractional reserve banking and irresponsible monetary policy.
Even more profound than the message itself is the fact that it is now stored by every full node on the Bitcoin network around the world — forever. Later versions of the Bitcoin codebase tweaked the method of including such messages, it is now done with an OP_RETURN function which allows for up to 80 bytes of arbitrary data to be included in a transaction. Since 80 bytes is not a lot of space, it is common to include a cryptographic hash of a larger document, image, etc. But you may be asking yourself why anyone would want to include arbitrary data in a Bitcoin transaction in the first place which is then distributed and stored permanently on thousands of computers around the world… so lets look at some interesting use cases!
It is well known that the Chinese Communist Party censors the information that Chinese people can possess, both in physical form and digital via the “Great Firewall.” One major example is any information about the infamous Tiananmen Square Massacre of 1989 where the military slaughtered hundreds of students protesting the oppressive regime. The Wikipedia page describing it, for example, is unreachable in China. Bitcoin fixes this. In 2017 an anonymous Bitcoin user embedded a thread titled “China: Tell the Truth About Tiananmen on Anniversary” into a Bitcoin transaction. The thread, written in both Chinese and English, laid out pro-freedom and anti-government tyranny messages as well as historical information about the tragic event. That thread is now hashed into the blockchain and can be accessed from anywhere in the world (including China) using a Bitcoin node.
Another interesting usage of Bitcoin’s OP_RETURN function is to timestamp or notarize a piece of data. Peter Todd’s project Open Timestamps aims to be an open source standard for using the Bitcoin blockchain to timestamp data. In other words, it offers a cryptographic proof that the data (perhaps a document, message, photograph) existed at the time of stamping. This can be useful in legal disputes, to battle censorship as described above, or to simply keep an immutable log of events. Analyst and writer Nic Carter has advocated for journalists to timestamp their articles as a means of combating shady practices such as covertly changing an article after it was posted. Similarly, the service poex.io (Proof of Existence) notarizes a file with a timestamp and hashes it into the blockchain for all time.
Tweetstamp is a fun Twitter bot that when summoned, will timestamp a tweet into the blockchain! This is helpful for holding people accountable who may have deleted unfavorable tweets, or to simply troll your friend’s bad takes. To use it, just reply to the tweet you want stamped, tag @tweet_stamp and write the keyword stamp.
Hashing extra data into the Bitcoin blockchain does come at a cost. As a distributed network, each extra byte included in a block must be stored on the hard drive of every full node on the network, which is currently estimated to be in the neighborhood of 50,000. While it is great fun, you may wish to give it some thought before you Tweetstamp that bad take — you are consuming more block space and therefore putting a storage burden on all Bitcoin node operators. For this reason the OP_RETURN function is not universally loved and its merit has been debated throughout Bitcoin’s development history.
In my humble opinion it is well worth the extra block space to enable a globally distributed, openly available, immutable source of truth — especially since someone is willing to pay the higher transaction fee to include that extra data. This data will last forever as long as there is anyone willing to store a copy of the Bitcoin blockchain somewhere on Earth or beyond. At the time of writing, the entire blockchain (over 11 years of data) is less than 300 gigabytes and the open source Bitcoin Core software can be run on an old laptop or a $50 Raspberry Pi. This is good for Bitcoin.
Thanks for reading! If you liked this post please share to help me get the series off the ground. Feedback is also welcome.