javascript - Understanding of hash tables and collision detection -


below implementation of hash table using "buckets" collision detection. i'm trying make sure can understand logic behind hash tables , visualize it. hash table looks in case:

[[[]<----tuple]<---bucket, []<----bucket]<--storage

the key value pairs in tuples placed in bucket based upon hashing function's output, i.e., bucket index. once @ hashed bucket index, place key value pair there, inside bucket. if matches exact key (once inside bucket) gets overwritten in implementation. collision detection takes place when find same key before, overwrite value.

could have done different—perhaps added key different value end of tuple (instead of overwriting value) or must keys unique? there ever case when values need unique?

    var makehashtable = function() {         var max = 4;        return {         _storage: [],     retrieve: function(key) {       //when retrieve, want access bucket in same way insert, don't need set storage since taken       //care of in insert function.  if there's nothing in bucket, loop won't run.       //this function return null default.       var bucketindex = hashfn(key, max);       var bucket = this._storage[bucketindex];        (var = 0; < bucket.length; i++) {         var tuple = bucket[i];         if (tuple[0] === key) {           return tuple[1];         };       };       return null;     },      insert: function(key, value) {       //hash function gives right index       var bucketindex = hashfn(key, max)         //where need put bucket. if there's no bucket, initialize it.       var bucket = this._storage[bucketindex] || [];       //now need store bucket there.        this._storage[bucketindex] = bucket;       //implement collission detection scheme whereby overwrite respective value if key matches, otherwise, add end.        // loop won't execute if there nothing in bucket if so, jump line 45 instead       // here what's happening. if key doesn't exist, key value pair gets added end of bucket.       // if key matches, must same value hashed key associated value previously, so, overwrite it.        (var = 0; < bucket.length; i++) {         var tuple = bucket[i];         if (tuple[0] === key) {           tuple[1] = value;           return;         };       };          bucket.push([key, value]);     }    }; };   hashtable.prototype.remove = function(key) {   var bucketindex = hashfn(key, max);   var bucket = this._storage[bucketindex];    (var = 0; < bucket.length; i++) {     var tuple = bucket[i];     if (tuple[0] === k) {       bucket.splice(i, 1);     };   };  };  //don't worry generic hashing function please, not point of question var hashfn = function(str, max) {   var hash = 0;   (var = 0; < str.length; i++) {     var letter = str[i];     hash = (hash << 5) + letter.charcodeat(0);     hash = (hash & hash) % max;   }   return hash; }; 

your implementation of hash table correct. should point out you've described in question isn't collision detection operation of updating key new value. collision when 2 different keys map same bucket, not when insert key , discover there prior entry same key. you're taking care of collisions chaining entries in same bucket.

in case, you've gone updating entries correctly. let's you've inserted (key, value) pair ('a', 'ant') hash table. means 'a' maps 'ant'. if insert ('a', 'aardvark'), intention overwrite 'a' entry maps 'aardvark'. therefore, iterate on chain of entries , check key 'a' in bucket. find it, replace value 'ant' 'aardvark'. 'a' maps 'aardvark'. good.

let's suppose don't iterate on chain of entries. happens if blindly append ('a', 'aardvark') end of chain? consequence when key 'a' , go through entries in bucket, come upon ('a', 'ant') first, return 'ant'. incorrect result. inserted ('a', 'aardvark'), should have returned 'aardvark'.

ah, if start searching through chain end? in other words, you're treating stack. insert entry, push onto end of chain. key, start searching end. first entry given key 1 inserted, it's correct 1 , can return value without searching further.

that implementation correct, make chain longer necessary. consider happens if you're using hash table count letter frequencies in text file. insert ('a', 0) in table. when find first occurrence of 'a' in text, read 0 table, add 1 it, , insert ('a', 1) hash table. have 2 entries in chain key 'a', , 1 nearer end valid. when find next occurrence of 'a', third entry added chain, , on. thousands of insertions same key result in thousands of entries in chain.

not use memory, slows down other key insertions. example, suppose hash function assigns same index keys 'a' , 'q'. means 'q' entries in same bucket 'a' entries. if have whole bunch of 'a' entries @ end of chain, might have go past many of them before find recent entry 'q'. these reasons, it's better did.

one more thought: if each entry tuple (key, values), values array of values? then, suggest, can append new value end of values in event of key collision. if that, meaning of values? contains values inserted key, in order of time inserted. if treat stack , return last value in list, you're wasting space. may overwrite single value.

is there ever case when can away putting new value bucket , not checking existing key? yes, can if have perfect hash function, guarantees there no collisions. each key gets mapped different bucket. don't need chain of entries. have maximum of 1 value in each bucket, can implement hash table array contains, @ each index, either undefined or inserted value @ index. sounds great, except isn't easy come perfect hash function, if want hash table contain no more buckets necessary. have know in advance every possible key might used in order devise hash function maps n possible keys n distinct buckets.


Comments

Popular posts from this blog

c# - Binding a comma separated list to a List<int> in asp.net web api -

how to prompt save As Box in Excel Interlop c# MVC 4 -

xslt 1.0 - How to access or retrieve mets content of an item from another item? -