This article is an attempt at cataloging all the types of bitcoin transaction locking scripts, their prevalence and their security implications. The data presented in this article was lifted directly from the bitcoin blockchain, which required custom code to quickly iterate over the entire blockchain (over 450 GB at the time of writing). The tool is available on Github https://github.com/nccgroup/FastBTCParser.
Note: in the rest of this article, Bitcoin and Satoshi will be used interchangeably to refer to an amount of currency in a transaction (1 Bitcoin = 100,000,000 Satoshis).
Bitcoin relies on the trust-less dissemination of a ledger called the Bitcoin blockchain, which holds a record of all transactions since the inception of Bitcoin. Each block on the chain contains a transaction made of:
ScriptSig
): each input transaction’s output needs a valid unlocking script to authorize the spending of all the proceeds of that prior transaction.ScriptPubKey
).Here is an example of a simple transaction data taken from the blockchain:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
010000000151f36afbb502ff5dd7507845c79cb07e44edc86add4ffd068b3e7b | |
4017bd290b000000008a4730440220591e3186aa579cd299eb27584a8f929eac | |
d8b4f810ba402b80f33a153fa8f3c1022024e0c4bc4710294a56ccf9e43c2714 | |
88139e73ae1dbcd22bf4e6c2194b489a2e014104bd117a74f353dfc60809c1c8 | |
f7d57ddbb2bae869fb8bc3d863cb3e8ecab5af6816729494fb0687b298e67be8 | |
75bafb5da82966394805611d0b1ef0f947025c1cffffffff02fe8ed485050000 | |
001976a9147549ddbffcab3fbbb07c52adbe9476351c42f2b188ac001bb70000 | |
0000001976a91465ed94fa5782ef897878140a2890babbf000853688ac000000 | |
00 |
And here is that same transaction cut into its individual fields:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
01000000 ] Transaction format version number | |
01 ] Number of inputs | |
———INPUTS———- | |
51f36afbb502ff5dd7507845c79cb07e ┐ TXID (bytes reversed) | |
44edc86add4ffd068b3e7b4017bd290b ┘ | |
00000000 ] TXID Output number | |
8a ] ScriptSig size (138 bytes) | |
4730440220591e3186aa579cd299eb27 ┐ ScriptSig | |
584a8f929eacd8b4f810ba402b80f33a | | |
153fa8f3c1022024e0c4bc4710294a56 | | |
ccf9e43c271488139e73ae1dbcd22bf4 | | |
e6c2194b489a2e014104bd117a74f353 | | |
dfc60809c1c8f7d57ddbb2bae869fb8b | | |
c3d863cb3e8ecab5af6816729494fb06 | | |
87b298e67be875bafb5da82966394805 | | |
611d0b1ef0f947025c1c ┘ | |
ffffffff ] Sequence Number | |
——END OF INPUTS—— | |
02 ] Number of outputs | |
———OUTPUTS——— | |
—OUTPUT 1— | |
fe8ed48505000000 ] Amount in Satoshis ~237 bitcoins (bytes reversed) | |
19 ] ScriptPubKey size (25 bytes) | |
76a9147549ddbffcab3fbbb07c52adbe ┐ ScriptPubKey | |
9476351c42f2b188ac ┘ | |
—OUTPUT 2— | |
001bb70000000000 ] Amount in Satoshis 12,000,000 (bytes reversed) | |
19 ] ScriptPubKey size (25 bytes) | |
76a91465ed94fa5782ef897878140a28 ┐ ScriptPubKey | |
90babbf000853688ac ┘ | |
——END OF OUTPUTS—– | |
00000000 ] Lock time |
TXID
is the 32-byte ID of a prior transaction from which one of the outputs is going to be spent.ScriptPubKey
is considered as a locking script. Each transaction output locks an arbitrary amount of Satoshis with such a script. Those Satoshis can then be used in a future transaction by unlocking them.ScriptSig
is considered as an unlocking script. As such it needs to provide the adequate data and commands to satisfy the input transaction’s output locking script conditions to unlock its funds so that they can be spent in the current transaction.In any Bitcoin transaction, the ScriptSig
and ScriptPubKey
are scripts written in a simple language with a limited amount of commands. The scripting language is not Turing-complete and each command is stored in a single byte. The language provides the ability to store fixed or variable length data blocks inlined within the script, and uses a stack to process that data.
In effect ScriptSig
from the current transaction and ScriptPubKey
from the input (prior) transaction are concatenated (referred to as “the script”) and executed to unlock the funds. The execution is successful, funds unlocked and spent, if:
A complete description of the scripting language and commands can be found here: https://en.bitcoin.it/wiki/Script
The first output locking script or ScriptPubKey
of the sample transaction above decodes to the following:
This is a P2PKH
type locking script and is described in the Pay To Public Key Hash section.
Using the tool associated with this article (see here), we can now obtain a list of all existing locking script fingerprints, along with their prevalence. The fingerprinting process ignores any part of the script that’s data and replaces it with a <data>
tag (but accounts for the data’s length if it is specified by the previous script op-code).
For brevity, only scripts with over 100 occurrences will be shown below. A complete unedited list can be found through the (accompanying tool’s github page). The complete list contains 156 unique script fingerprints.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
#OCCURRENCES ITEM | |
155 OP_1 OP_DATA_65 <data> OP_DATA_65 <data> OP_2 OP_CHECKMULTISIG | |
182 OP_IFDUP OP_IF OP_2SWAP OP_VERIFY OP_2OVER OP_DEPTH | |
336 OP_DATA_36 <data> | |
753 OP_1 OP_DATA_65 <data> OP_1 OP_CHECKMULTISIG | |
986 OP_DATA_32 <data> | |
1693 OP_1 OP_DATA_33 <data> OP_1 OP_CHECKMULTISIG | |
1749 OP_2 OP_DATA_33 <data> OP_DATA_33 <data> OP_DATA_33 <data> OP_3 OP_CHECKMULTISIG | |
4555 OP_2 OP_DATA_33 <data> OP_DATA_33 <data> OP_2 OP_CHECKMULTISIG | |
4907 OP_1 OP_DATA_65 <data> OP_DATA_33 <data> OP_DATA_33 <data> OP_3 OP_CHECKMULTISIG | |
8813 OP_2 OP_3 OP_DATA_75 <data> | |
16844 OP_1 OP_DATA_65 <data> OP_DATA_65 <data> OP_DATA_65 <data> OP_3 OP_CHECKMULTISIG | |
31401 OP_1 OP_DATA_65 <data> OP_DATA_33 <data> OP_2 OP_CHECKMULTISIG | |
70218 OP_1 OP_DATA_33 <data> OP_DATA_33 <data> OP_DATA_65 <data> OP_3 OP_CHECKMULTISIG | |
212896 OP_1 OP_DATA_33 <data> OP_DATA_33 <data> OP_2 OP_CHECKMULTISIG | |
219174 #note this is an empty script | |
289111 OP_RETURN | |
562934 OP_1 OP_DATA_33 <data> OP_DATA_33 <data> OP_DATA_33 <data> OP_3 OP_CHECKMULTISIG | |
888283 OP_DATA_65 <data> OP_CHECKSIG | |
2691300 OP_DATA_33 <data> OP_CHECKSIG | |
17663491 OP_1 OP_DATA_32 <data> | |
27521302 OP_0 OP_DATA_32 <data> | |
52119761 OP_RETURN <data> | |
306074873 OP_0 OP_DATA_20 <data> | |
655601631 OP_HASH160 OP_DATA_20 <data> OP_EQUAL | |
1292730245 OP_DUP OP_HASH160 OP_DATA_20 <data> OP_EQUALVERIFY OP_CHECKSIG | |
found 156 types of item in 2356718149 tx outputs. |
All data is accurate as of May 14th 2023.
From this list of scripts we can identify 5 script types which are the most commonly used as described below.
Originally this was the main way to send Bitcoins from one wallet to another. These scripts have two different fingerprints:
OP_DATA_65 <data> OP_CHECKSIG
OP_DATA_33 <data> OP_CHECKSIG
Historically, a 64-byte public key (+1 byte to identify the type) was used in these type of transaction locking scripts. Eventually, this was replaced by a 32-byte public key as a way to optimize transaction size and thus reduce the overall transaction fees spent when using this type of locking script.
It can be observed in the diagram above, that once the shorter version was adopted, it almost entirely replaced the legacy one. The gap between the two versions of that script can most likely be explained by the rise in prevalence of P2PKH, another type of transaction that can achieve the same overall goal; wallets favoring certain types of scripts over others to either reduce transaction fees.
It is worth noting that the script security feature, the actual signature check, is the last command of the script. That OP_CHECKSIG
command is what guarantees that the transaction’s outputs cannot be changed, and that funds are guaranteed to be sent where the transaction sender intended. Since blocks need to be verifiable by other miners to be considered as part of the main uninterrupted blockchain, a rogue miner attempting to change the transaction in any way would need to know the private key capable of appropriately signing the new forged transaction, so that the signature of the overall transaction would be valid and verifiable by a majority of miners.
P2PKH
scripts achieve the same goal as P2PK
locking scripts, however their prevalence in the blockchain is 2 orders of magnitude higher than P2PK
scripts. They are of the following form:
OP_DUP OP_HASH160 OP_DATA_20 <data> OP_EQUALVERIFY OP_CHECKSIG
This script locks the funds behind a hash of the public key of the payee. To unlock the funds, the ScriptSig
must contain the signature by a private key of the current transaction followed by the corresponding public key, whose hash160 (RIPEMD160(SHA256(publickey))
) must match the one stored in the locking script. Here the security feature of that type of script is guaranteed by the OP_HASH160
, OP_EQUALVERIFY
, and OP_CHECKSIG
commands of the locking script. The first and second commands force a recalculation of the hash160 of the provided public key to compare it to the one stored in the locking script, and the last command enables a check of the transaction signature, which must be computed using the private key corresponding to the public key that was just verified. Once again this effectively guarantees that the miner cannot change any part of the transaction after the sender submitted it without knowing the private key.
Despite their misleading appellation, P2MS
do not necessarily send funds to multiple addresses; rather, they are locking scripts which require multiple signatures to unlock the funds. The following are the most common valid P2MS
script fingerprints:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
OP_1 OP_DATA_33 <data> OP_DATA_65 <data> OP_2 OP_CHECKMULTISIG | |
OP_1 OP_DATA_65 <data> OP_DATA_65 <data> OP_DATA_33 <data> OP_3 OP_CHECKMULTISIG | |
OP_3 OP_DATA_33 <data> OP_DATA_33 <data> OP_DATA_33 <data> OP_3 OP_CHECKMULTISIG | |
OP_2 OP_DATA_65 <data> OP_DATA_65 <data> OP_2 OP_CHECKMULTISIG | |
OP_2 OP_DATA_65 <data> OP_DATA_65 <data> OP_DATA_65 <data> OP_3 OP_CHECKMULTISIG | |
OP_1 OP_DATA_65 <data> OP_DATA_65 <data> OP_2 OP_CHECKMULTISIG | |
OP_1 OP_DATA_65 <data> OP_1 OP_CHECKMULTISIG | |
OP_1 OP_DATA_33 <data> OP_1 OP_CHECKMULTISIG | |
OP_2 OP_DATA_33 <data> OP_DATA_33 <data> OP_DATA_33 <data> OP_3 OP_CHECKMULTISIG | |
OP_2 OP_DATA_33 <data> OP_DATA_33 <data> OP_2 OP_CHECKMULTISIG | |
OP_1 OP_DATA_65 <data> OP_DATA_33 <data> OP_DATA_33 <data> OP_3 OP_CHECKMULTISIG | |
OP_1 OP_DATA_65 <data> OP_DATA_65 <data> OP_DATA_65 <data> OP_3 OP_CHECKMULTISIG | |
OP_1 OP_DATA_65 <data> OP_DATA_33 <data> OP_2 OP_CHECKMULTISIG | |
OP_1 OP_DATA_33 <data> OP_DATA_33 <data> OP_DATA_65 <data> OP_3 OP_CHECKMULTISIG | |
OP_1 OP_DATA_33 <data> OP_DATA_33 <data> OP_2 OP_CHECKMULTISIG | |
OP_1 OP_DATA_33 <data> OP_DATA_33 <data> OP_DATA_33 <data> OP_3 OP_CHECKMULTISIG |
Depending on the number of signatures, the type of signature size, a lot of different combinations are possible. These scripts are always of the form OP_N SIGNATURE [SIGNATURE...] OP_M OP_CHECKMULTISIG
where N of M signatures are required to unlock the funds. Due to limitations in the original implementation of these types of scripts, and a desire to maintain backward compatibility, the unlocking script have to follow the corresponding locking script’s specific order to provide the needed signatures. In addition, an extra command is required to prevent an off-by-one bug.
e.g.: OP_0 SIGNATURE_1 SIGNATURE_3
, in the case of a multi-signature for a 2-of-3 locking script of the form OP_2 OP_DATA_33 <data> OP_DATA_33 <data> OP_DATA_33 <data> OP_3 OP_CHECKMULTISIG
.
The last command, OP_CHECKMULTISIG
, is what guarantees that the unlocking script for the funds must sign the entire transaction with a minimum of N signatures originating from the M recorded in the original locking script. Once again, this process ensures that the miner can not change any part of the transaction without knowing each of the corresponding private keys.
Due to further design limitations in the OP_CHECKMULTISIG
op-code handling for ScriptPubKey
, the maximum number of signatures for both M and N is limited to 3. However, it is possible to use a different type of script (P2SH
, see below) to achieve the same multi-signature feature without this limitation.
While there is a standardized way to store arbitrary data in a locking script (OP_RETURN ), multiple ways of storing arbitrary data on the blockchain have been used throughout Bitcoin history. Below is a non-exhaustive list of scripts that have been used for their data storage capabilities (in some cases within a tag, and sometimes by using op-codes for their corresponding ascii values).
Some, but not all, of these scripts are provably unspendable, and are effectively pruned from the record of Unspent Transaction (UTXO
). Usually, they lock a null or low amount of Satoshis (lower than would be necessary to pay in transaction fees to effectively spend the locked funds), and a lot of these carry ascii data, links, or scripts.
A different way of storing data on the blockchain has recently been proposed, and put to use in a vast amount of transactions through ordinals (since the end of 2022).
P2SH
are the second most prevalent type of locking scripts and also the most opaque as to the actual behavior of the script. They are of the following form:
OP_HASH160 OP_DATA_20 <data> OP_EQUAL
The P2SH
locking script stores a hash (RIPEMD160(SHA256(publickey))
) of the locking script in its data segment, thus only revealing the matching unlocking script when the previous transaction proceeds are to be spent. When the funds need to be unlocked, the ScriptSig
will provide the actual locking script whose hash must match. This can for example be used to make an N-of-M multi-signature locking script without the 3-signatures limitation of a direct P2MS
locking script.
However, this type of script carries a risk with regards to miner advantage attacks (as other types of custom locking scripts): if a locking script is revealed to not perform a signature check with a public key or hash embedded within the revealed locking script, the transaction could be hijacked to replace transaction outputs and sign with a different private key.
While this sort of attack needs to be timed between the moment the transaction is sent or propagated through the network and the moment it is actually mined, this potentially leaves a window of opportunity of several minutes up to multiple hours when the network is congested. This type of attack can also be automated, leading to fee bidding wars for the successful hijacking of improperly protected transactions.
Among the 156 script fingerprints there are a few other custom scripts whose behavior can be identified through analysis of its op-codes. Some provide puzzles or challenges, such as the following:
OP_2DUP OP_ADD OP_8 OP_EQUALVERIFY OP_SUB OP_2 OP_EQUAL
which is equivalent to the following system of equations:
x+y = 8
x-y = 2
Note: this locking script above has long been redeemed (the solution to solve it being OP_5 OP_3
).
Here, it should be noted that the person attempting to solve the puzzle and claim the reward locked by that script might see the transaction hijacked by a fairly simple miner advantage type of attack. Since most transactions are disseminated to the miner’s P2P network before being mined, an attacker could extract the solution to the puzzle from the transaction, and create another transaction with a different output (redirecting funds to their own wallet) and a higher transaction fee, to have their transaction processed before the original one.
This attack is possible because there are no requirements for the transaction to be signed using a private key whose corresponding public key would have been shared in the locking script. This makes such simple arithmetic puzzles and challenges difficult to secure against this type of attack.
To ensure redeeming the reward could not be hijacked, some challenge makers used to advise the challenge solver to mine the block themselves. However, this solution has now become impractical, since it would require large hashing power at the disposal of the challenge solver (mining pools are not an adequate solution either unless all miners in that pool can be trusted by the challenge redeemer).
While it appears that most of these challenges have migrated towards P2SH
scripts, this does not change the security implications outlined above.
Ordinals represent a recent development of the Bitcoin blockchain and have been observed in an increasing amount of transactions since the end of 2022. Ordinals can be approximated as Bitcoin’s implementation of NFTs; the main difference with other implementations of NFTs (such as on the Ethereum blockchain) consists in the fact that the actual NFT item is stored directly on the blockchain, rather than only a web link to the file. Ordinals use an arbitrary data storage mechanism built on top of the blockchain’s Segregated Witnesses feature (SegWit
), which is a method of providing unlocking scripts, designed with the intention of lowering transaction fees and mining power usage.
Ordinals can contain small files as part of the blockchain. These have been used to store a wide array of data ranging from gif
images, webm
and mp4
short videos, to ogg
sound files, and various text file formats. While there is no known limitation of the type of files that can be stored, the size is limited by the SegWit
format, the transaction size, and block size (4MiB).
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
Ordinals file types: | |
#OCCURRENCES ITEM | |
10 # Empty string used as a file type | |
1 <script>alert('xss in content type')</script> | |
1 application/epub+zip | |
1 application/javascript | |
24167 application/json | |
3 application/json;charset=utf-8 | |
2 application/msword | |
14 application/octet-stream | |
221 application/pdf | |
3 application/pgp-signature | |
1 application/x-gzip | |
1 application/yaml | |
1 audio/flac | |
1 audio/midi | |
1 audio/mod | |
338 audio/mpeg | |
2 audio/ogg | |
4 audio/wav | |
1 dadabots/was+here | |
309 image/avif | |
6261 image/gif | |
43568 image/jpeg | |
7 image/jpeg;charset=utf-8 | |
442976 image/png | |
35663 image/svg+xml | |
2 image/tiff | |
103823 image/webp | |
339 model/gltf-binary | |
3 model/stl | |
2 ordtext/plain;charset=utf-8 | |
66 text/html | |
12 text/html; charset=utf-8 | |
14467 text/html;charset=utf-8 | |
2 text/javascript | |
1 text/markdown | |
6317 text/plain | |
11 text/plain; charset=utf-8 | |
522 text/plain;charset=UTF-8 | |
1 text/plain;charset=us-ascii | |
1837614 text/plain;charset=utf-8 | |
1 text/plainn;charset=utf- | |
1 text/plainn;charset=utf-8 | |
1378 video/mp4 | |
415 video/webm | |
1 🟠 #UTF-16 character used as a file type | |
found 45 file types in 2518535 SegWit items containing ordinals. |
At the top of the listing above, one can notice that a fairly innocuous Cross Site Scripting (XSS
) classic payload was inserted in the file type field of an ordinal. This is likely to cause a popup, and highlight expose vulnerable webapps that scrap the blockchain for ordinals.
<script>alert('xss in content type')</script>
Another potentially mishandled file type can also be observed at the bottom of the list, in the form of a UTF-16 character.
🟠
All ordinals can be extracted using the tool associated with this article. However, this feature is provided without warranty of any kind with regards to the safety or legality of the extracted files.
The tool that made this article possible is called FastBTCParser. It enables a somewhat fast multithreaded parsing of the Bitcoin blockchain to fingerprint and extract statistics about locking scripts, as well as to check block Merkle root validity. It also allows ordinal file extraction. The tool is freely available under a free open source software license and can be found here https://github.com/nccgroup/FastBTCParser.
Studies of blockchain architectures often start with the consensus algorithms and implicitly assume that information flows perfectly through the underlying peer-to-peer network, and peer discovery is sound and fully decentralized. In practice this is not always the case. A few years ago, a team of researchers looked at the Bitcoin1…
Last month I was lucky enough to attend Eurocrypt 2023, which took place in Lyon, France. It was my first chance to attend an academic cryptography conference and the experience sat somewhere in between the familiar cryptography of the Real World Crypto conference and the abstract world of black holes…
Introduction We are going to walk through the process we took to reverse engineer parts of the Android game Coin Hunt World. Our goal was to identify methods and develop tooling to cheat at the game. Most of the post covers reverse engineering the game’s binary protocol and using that…