Current Research

In the following we outline three research directions we are pursuing at the moment. Note that we are also working on other small projects not listed here.

Perfect Hashing and Consensus

A perfect hash function for a set of objects is a data structure that injectively maps those objects to unique IDs. Rather than storing information on the objects in any classical way, some perfect hash functions store seemingly meaningless numbers known as seed values such that a pseudo-random function producing IDs from the object and those seed values happens to satisify the desired uniqueness property.

We have developed an strategy for computing and storing such seed values called Combined Search and Encoding of Successful Seeds (Consensus) and won the Best Paper Award at ESA 2025. Paired with one of the most space efficient approaches to perfect hashing we achieve unprecedented space efficiency overall.

We are currently investigating ideas for combining the Consensus idea with the fastest to perfect hashing to see if we can match their speed while improving their space efficiency.

k-perfect Hashing

Hash functions are routinely used to randomly partition a set into parts of expected equal size, such as when assigning jobs to machines. A k-perfect hash function is a tiny data structure constructed for the set that guarantees that none of the part’s sizes exceeds k. Within static data structures such perfectly balanced partitioning can help with retrieving data with fewer cache misses and branch mispredictions.

While k-perfect hashing has received some attention in the past, even basic questions such as how much space such data structures must need have not been satisfactorialy answered. We have explored some approaches to k-perfect hashing in an ESA 2025 paper and intend to provide a more comprehensive account in the future.

Average Case Analysis and Learned Data Structures

Oftentimes an algorithm’s or data structure’s performance depends on the input data, i.e. there are easy cases and hard cases. In the absense of further assumptions one typically reports the performance in the hardest case as a conservative upper bound. Focusing on worst case performance in this way neglects that practical input data is often structured in a way that allow approaches that are aware of this structure to be significantly faster or, in the case of data structures, smaller than approaches that expect the worst.

There are two overlapping ways to think about such opportunities.