Build a Hash Table in Python With TDD

Python

Invented over half a century ago, the hash table is a classic data structure that has been fundamental to programming. To this day, it helps solve many real-life problems, such as indexing database tables, caching computed values, or implementing sets. It often comes up in job interviews, and Python uses hash tables all over the place to make name lookups almost instantaneous.

Even though Python comes with its own hash table called dict, it can be helpful to understand how hash tables work behind the curtain. A coding assessment may even task you with building one. This tutorial will walk you through the steps of implementing a hash table from scratch as if there were none in Python. Along the way, you’ll face a few challenges that’ll introduce important concepts and give you an idea of why hash tables are so fast.

In addition to this, you’ll get a hands-on crash course in test-driven development (TDD) and will actively practice it while building your hash table in a step-by-step fashion. You’re not required to have any prior experience with TDD, but at the same time, you won’t get bored even if you do!

In this tutorial, you’ll learn:

How a hash table differs from a dictionary
How you can implement a hash table from scratch in Python
How you can deal with hash collisions and other challenges
What the desired properties of a hash function are
How Python’s hash() works behind the scenes

It’ll help if you’re already familiar with Python dictionaries and have basic knowledge of object-oriented programming principles. To get the complete source code and the intermediate steps of the hash table implemented in this tutorial, follow the link below:

Source Code: Click here to download the source code that you’ll use to build a hash table in Python.

Get to Know the Hash Table Data Structure

Before diving deeper, you should familiarize yourself with the terminology because it can get slightly confusing. Colloquially, the term hash table or hash map is often used interchangeably with the word dictionary. However, there’s a subtle difference between the two concepts as the former is more specific than the latter.

Hash Table vs Dictionary

In computer science, a dictionary is an abstract data type made up of keys and values arranged in pairs. Moreover, it defines the following operations for those elements:

Add a key-value pair
Delete a key-value pair
Update a key-value pair
Find a value associated with the given key

In a sense, this abstract data type resembles a bilingual dictionary, where the keys are foreign words, and the values are their definitions or translations to other languages. But there doesn’t always have to be a sense of equivalence between keys and values. A phone book is another example of a dictionary, which combines names with the corresponding phone numbers.

Note: Anytime you map one thing to another or associate a value with a key, you’re essentially using a kind of a dictionary. That’s why dictionaries are also known as maps or associative arrays.

Dictionaries have a few interesting properties. One of them is that you can think of a dictionary as a mathematical function that projects one or more arguments to exactly one value. The direct consequences of that fact are the following:

Only Key-Value Pairs: You can’t have a key without the value or the other way around in a dictionary. They always go together.
Arbitrary Keys and Values: Keys and values can belong to two disjoint sets of the same or separate types. Both keys and values may be almost anything, such as numbers, words, or even pictures.
Unordered Key-Value Pairs: Because of the last point, dictionaries don’t generally define any order for their key-value pairs. However, that might be implementation-specific.
Unique Keys: A dictionary can’t contain duplicate keys, because that would violate the definition of a function.
Non-Unique Values: The same value can be associated with many keys, but it doesn’t have to.

There are related concepts that extend the idea of a dictionary. For example, a multimap lets you have more than one value per key, while a bidirectional map not only maps keys to values but also provides mapping in the opposite direction. However, in this tutorial, you’re only going to consider the regular dictionary, which maps exactly one value to each key.

Here’s a graphical depiction of a hypothetical dictionary, which maps some abstract concepts to their corresponding English words:

Graphical Depiction of a Dictionary Abstract Data Type

It’s a one-way map of keys to values, which are two completely different sets of elements. Right away, you can see fewer values than keys because the word bow happens to be a homonym with multiple meanings. Conceptually, this dictionary still contains four pairs, though. Depending on how you decided to implement it, you could reuse repeated values to conserve memory or duplicate them for simplicity.

Now, how do you code such a dictionary in a programming language? The right answer is you don’t, because most modern languages come with dictionaries as either primitive data types or classes in their standard libraries. Python ships with a built-in dict type, which already wraps a highly optimized data structure written in C so that you don’t have to write a dictionary yourself.

Python’s dict lets you perform all the dictionary operations listed at the beginning of this section:

>>>>>> glossary = {“BDFL”: “Benevolent Dictator For Life”}
>>> glossary[“GIL”] = “Global Interpreter Lock” # Add
>>> glossary[“BDFL”] = “Guido van Rossum” # Update
>>> del glossary[“GIL”] # Delete
>>> glossary[“BDFL”] # Search
‘Guido van Rossum’
>>> glossary
{‘BDFL’: ‘Guido van Rossum’}

With the square bracket syntax ([ ]), you can add a new key-value pair to a dictionary. You can also update the value of or delete an existing pair identified by a key. Finally, you can look up the value associated with the given key.

That said, you may ask a different question. How does the built-in dictionary actually work? How does it map keys of arbitrary data types, and how does it do it so quickly?

Read the full article at https://realpython.com/python-hash-table/ »

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]