Hello World! I haven't shared my knowledge with you in a while, so I'll try to fill the void with an interesting post. Today we will talk about hash functions and a serious problem in them, known as collision.
While learning PostgreSQL from Dr. Chuck, I came across this exercise and asked for his permission to discuss with you. This question specially caught my attention: What a cryptography question is doing in the database course. Although it became apparent to me as I learned more about other fault tolerance mechanisms and the use of hash functions for indexing.
Let's suppose you download a file from internet and want to ensure if it is corrupted or tampered while in transit (yes, even after TLS encryption). How would you approach for solution to this problem? Here are some naive approaches I used to think of years ago
Approach | Problem |
---|---|
Compare the size of the file Content-Length and output of du command. | It may not accurately detect corruptions or errors caused by bit flips. Bit flips can occur during file transmission or storage, resulting in altered content that may not necessarily affect the file size |
Attempt to open file, and check if it shows any error message | One potential problem with this approach is that certain types of malware or malicious files could be designed to appear harmless initially (for example, Trojans). One click, and Boom! Your system is now compromised. |
Forget everything, re-download the file | This will result in bandwidth wastage. Although Internet fees in India are low, when I had these thoughts, re-downloading large files such as movies or other applications was not an option. |
It makes reasonable to compare to files. So, in order to do that, you need a number that is unique to that file and a mechanism to generate that number on your end, which you will then compare to the one given by the vendor on their website. This number is called checksum.
What is Collision?
Performing Collision
Hash functions using mod operations are very easy to break. Here, we need to focus on two operations: addition and multiplication. If we can find a set of plain text string, that their hash value hv
is same, then we have successfully completed this task
while True:
txt = input("Enter a string: ")
if len(txt) < 1 : break
hv = 0
pos = 0
for let in txt:
pos = ( pos % 3 ) + 1
hv = (hv + (pos * ord(let))) % 1000000
print(let, pos, ord(let), hv)
print(hv, txt)