BIT-101
Bill Gates touched my MacBook Pro
Oof! It’s been a while. Why the silence?
A few days after my last post, I headed out to Portland, OR for several days. A couple of my older kids have wound up migrating out that way and not too long ago, my daughter had a baby boy, making me a grandfather! I’d love to show you pictures of him. He’s a very cute, very smiley kid. But I’ll let him discover the consequences of social media when he’s old enough to do that on his own. I had a very good time out there, getting to know the city (I’ve only passed through one time before) and catching up with family. It was also just great to get a good chunk of days off work, which has been pretty stressful lately.
I caught a red-eye on the way back, took the next day off and headed into the office on the following day. In the afternoon, my stomach started feeling not-great. When I got home, I was pretty damn nauseous. By early evening, I was sicker than I’ve been in many years. Feverish and delirious, purging everything in my system in every conceivable way. I could not even tolerate a single sip of water and wound up super dehydrated. Late in the evening, my wife dragged me to the ER where they put me on an IV to get fluids into me. Spent the night there. They did a CT scan but all was fine. In the end it was just a norovirus. “Just”, sure. The next few days were better but it took almost a week before I could eat three real meals. Mainly just quarantined up in my room. But I’m back to normal now.
During my trip and since then, I was pretty obsessed with graph theory. I’ve been digging through books and other resources on graphs, algorithms, data structures. It’s been really good. I’ve been implementing some graph data structures and learning a lot. So many cool algorithms that you can do with graphs. I’ll be doing more in the coming weeks.
Not entirely related, but while reading up on algorithms and data structures, I ran across something that described how hashmaps/hashtables/dictionaries/etc. are implemented. This blew my mind in a few ways.
For one, I always had in the back of my mind that hashmaps and hashtables and dictionaries and other similar data structures were very different things. But for the most part they are the same thing. Some languages give them slightly different different features, like one will be read-only while one is dynamic, one might only take strings as keys while another one can take other values, etc. But as far as I can tell, in general, across languages, they all refer to the same basic concept: you store a value using a key.
But as for how they were implemented, I was like, “What? Implemented? They just exist, like native arrays exist as sequences of memory addresses.” I never really thought how they might exist, or why they had “hash” in their name. The material I was reading just gave a general description, but it amazed me. They are not native data structures at all. They’re pretty complex implementations that sit on top of arrays.
You might know more about this than me, but for the uninitiated…
First you create an array with a given number of elements. Let’s say 100 for sake of argument. When you add something to the hashmap, you give it a key and a value. You run that key through some kind of hash function that’s going to return an integer. You can then mod that integer by the length of the array (100 in our case) so you have a number from 0 to 99. And you put the item in that element in the array. Now you’re saying, “but I’ll wind up with the same index now and then, won’t I?” Yes! You absolutely will! That’s known as a collision. So rather than just having an array of individual items, it has to be an array of collections - an array of arrays, or maybe an array of linked lists. You hash the key to get the index of the main array, then add it to the collection in that array element, along with the key itself. Or, if another element with that key is already in that sub-collection, you update its value. To get that value back, you hash the key, find the index, and iterate through the sub-collection in that element to find the actual key you’re looking for.
Iterating through a long list of sub-elements takes you from the ideal of O(1) - where every item has its own element - towards O(n) - where the time is relative to the length of the sub-collections. So when your array starts to get full, you want to resize it and redistribute the elements to make those sub-collections smaller.
Now there are lots of variables here. What’s the best hash function? How many elements to start with? What data structures should you use for the sub-lists? When should you resize the array? How should you resize the array, and how big?
Of course, every language has its own implementation of this kind of data structure, but I just had to create my own implementation in Go. It was pretty straightforward and a lot of fun to do. I created my own hash function and used linked lists as sub-collections. I ignored resizing because this is something I’ll never actually use anyway in real life. But it was a fantastic exercise and maybe I will be able to use the concepts for something or other on down the line.
I also made a Go implementation of binary search and quick sort. These are things I was already familiar with, but they’re things you rarely implement in real life, so it was fun to do them again. It wound up being a good exercise to get a better handle on Go generics, which are relatively new and something I haven’t done a lot with. I wound up making a generic queue, set and stack and adding them to my toolkit for re-use in future projects. Also working on generic linked list and graph implementations in Go.
OK, that’s what’s up with me. I’ll post another “Read/Watched” post soon with a lot of the stuff I’ve been reading recently. And I’ve been noodling on some other things as a pure distraction that I want to post about as well. See you soon.
Header image: Mount Hood from the Pittock Mansion land in Portland, OR, by me.
Comments? Best way to shout at me is on Mastodon