I needed this for a project I'm working on. I needed to inject appropriately small integers to multisets of a given universe and cardinality, but all I could find was this scheme, which only injects integers into sets of a given universe and cardinality.

On the Wikipedia page about multisets, I found this elaborate diagram showing a bijection between the k=3-element multisets of a n=5-element universe and the k-element subsets of an n+k−1-element universe. While this certainly showed that the bijection exists for k=3; n=5, and at least marginally convinced me that it would generalize, it seemed very convoluted and hard to interpret. After much puzzling, I was able to translate it into code, but how it worked still escaped me. However, after studying the code translation for a while, I figured it out—it's so incredibly simple—all you need to do to take a multiset and reversibly transform it into a set of the same cardinality belonging to the smallest feasible universe is incrementally offset every element (including non-unique ones)!

That is: start with the multiset's smallest element, add 0 to it; then go to the next element (which might be a duplicate), and add 1 to it (which will guarantee it's no longer a duplicate); then go to the next element (which might still be a duplicate of the first), and add 2 to it (which will guarantee it's no longer a duplicate of the first or second elements), and so on. After this transformation is applied, you'll have a set with no duplicates, from a k−1-elements-larger universe, that can easily be transformed back by just subtracting instead of adding!

Now, you might ask, “well, what if my elements aren't integers?” In that case, just order the elements (you are authorized by God to fake this if you feel like there's no obvious ordering) and biject them over into an (obviously contiguous, ideally 0-indexed) block of integers for the transformation. (So replace the “smallest” unique element with 0, the “second-smallest” unique element with 1, et cetera.) You will need a larger target set than your source set, anyway, and with integers you get “for free” an easy way to cope with this requirement.

Below, I've included some example code demonstrating this simplified interpretation of the bijection, plus a demo of the combinatorial number system thrown in for good measure.


from math import comb as _comb
from itertools import count as _count


_Multiset = list  # :^)


def combination_to_multiset(s):
	return _Multiset(elem - offset for offset, elem in enumerate(sorted(s)))


def multiset_to_combination(ms):
	return set(elem + offset for offset, elem in enumerate(sorted(ms)))


# Bonus methods


def multicomb(n, k):
	"https://en.wikipedia.org/wiki/Multiset#Counting_multisets"
	return _comb(n + k - 1, k)


def integer_to_combination(i, k, n):
	"https://en.wikipedia.org/wiki/Combinatorial_number_system#Finding_the_k-combination_for_a_given_number"
	s = set()
	universe = range(n)
	if not 0 <= i < _comb(n, k):
		raise ValueError(f"not 0 <= {i} < comb({n}, {k})")
	for j in range(k, 0, -1):
		elem = _search_maxsatisfying(lambda n, r=j, N=i, comb=_comb: _comb(n, r) <= N)
		assert elem in universe
		assert elem not in s
		s.add(elem)
		i -= _comb(elem, j)
	assert i == 0
	return s


def combination_to_integer(s):
	"https://en.wikipedia.org/wiki/Combinatorial_number_system#Place_of_a_combination_in_the_ordering"
	return sum(_comb(elem, k+1) for k, elem in enumerate(sorted(s)))


def _search_maxsatisfying(predicate, initial=0):
	"Return the largest n for which predicate succeeds"
	lower_bound = initial
	upper_bound = lower_bound + 1
	if not predicate(lower_bound):
		raise ValueError(f"initial does not satisfy predicate")
	# 1. Exponential approach to establish upper-bound
	while predicate(upper_bound):
		upper_bound = upper_bound*2
	guess = lower_bound = (upper_bound - 1) and (upper_bound // 2)
	# 2. Binary search
	while (upper_bound - lower_bound) > 1:
		guess = lower_bound + (upper_bound - lower_bound) // 2
		if not predicate(guess):
			upper_bound = guess
			guess = lower_bound
		else:
			lower_bound = guess
	assert predicate(guess) and not predicate(guess+1)
	return guess


# Composed methods


def integer_to_multiset(i, k=None, n=43252003274489856000):
	# TODO: fix redundancy to enable incremental decoding
	# https://discord.com/channels/834837272234426438/866699434116775946/1112819325666070539?invite=WZvZMVsXXR
	if k is None:
		assert n > 1
		k = next(k for k in _count(1) if multicomb(n, k) > i)
	nPrime = n + k - 1
	return combination_to_multiset(integer_to_combination(i, k, nPrime))


def multiset_to_integer(ms):
	return combination_to_integer(multiset_to_combination(ms))

Leave a Reply

Your email address will not be published. Required fields are marked *

Warning: This site uses Akismet to filter spam. Until or unless I can find a suitable replacement anti-spam solution, this means that (per their indemnification document) all commenters' IP addresses will be sent to Automattic, Inc., who may choose to share such with 3rd parties.
If this is unacceptable to you, I highly recommend using an anonymous proxy or public Wi-Fi connection when commenting.