Academic Journal
PHOBIC: Perfect Hashing With Optimized Bucket Sizes and Interleaved Coding
| Τίτλος: | PHOBIC: Perfect Hashing With Optimized Bucket Sizes and Interleaved Coding |
|---|---|
| Συγγραφείς: | Hermann S., Lehmann H. -P., Pibiri G. E., Sanders P., Walzer S. |
| Συνεισφορές: | Stefan Hermann and Hans-Peter Lehmann and Giulio Ermanno Pibiri and Peter Sanders and Stefan Walzer |
| Publication Status: | Preprint |
| Στοιχεία εκδότη: | Schloss Dagstuhl - Leibniz-Zentrum für Informatik (LZI), 2024. |
| Έτος έκδοσης: | 2024 |
| Θεματικοί όροι: | ddc:004, FOS: Computer and information sciences, Information systems → Point lookups, DATA processing & computer science, Computer Science - Data Structures and Algorithms, GPU, Data Structures and Algorithms (cs.DS), Compressed Data Structures, Minimal Perfect Hashing, Theory of computation → Data compression |
| Περιγραφή: | A minimal perfect hash function (or MPHF) maps a set of n keys to [n] : = {1, …, n} without collisions. Such functions find widespread application e.g. in bioinformatics and databases. In this paper we revisit PTHash - a construction technique particularly designed for fast queries. PTHash distributes the input keys into small buckets and, for each bucket, it searches for a hash function seed that places its keys in the output domain without collisions. The collection of all seeds is then stored in a compressed way. Since the first buckets are easier to place, buckets are considered in non-increasing order of size. Additionally, PTHash heuristically produces an imbalanced distribution of bucket sizes by distributing 60% of the keys into 30% of the buckets. Our main contribution is to characterize, up to lower order terms, an optimal choice for the expected bucket sizes, improving construction throughput for space efficient configurations both in theory and practice. Further contributions include a new encoding scheme for seeds that works across partitions of the data structure and a GPU parallelization. Compared to PTHash, PHOBIC is 0.17 bits/key more space efficient for same query time and construction throughput. For a configuration with fast queries, our GPU implementation can construct an MPHF at 2.17 bits/key in 28 ns/key, which can be queried in 37 ns/query on the CPU. |
| Τύπος εγγράφου: | Article Conference object |
| Περιγραφή αρχείου: | application/pdf |
| Γλώσσα: | English |
| DOI: | 10.5445/ir/1000174479 |
| DOI: | 10.48550/arxiv.2404.18497 |
| DOI: | 10.4230/lipics.esa.2024.69 |
| Σύνδεσμος πρόσβασης: | http://arxiv.org/abs/2404.18497 https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.ESA.2024.69 https://publikationen.bibliothek.kit.edu/1000174479 https://doi.org/10.5445/IR/1000174479 https://publikationen.bibliothek.kit.edu/1000174479/154801120 https://hdl.handle.net/10278/5081929 https://doi.org/10.4230/LIPIcs.ESA.2024.69 |
| Rights: | CC BY arXiv Non-Exclusive Distribution |
| Αριθμός Καταχώρησης: | edsair.doi.dedup.....e67d670e44f34ffc766d9dea0e7ca0f8 |
| Βάση Δεδομένων: | OpenAIRE |
| DOI: | 10.5445/ir/1000174479 |
|---|