How to store large integer values without losing precision

Comments Off on How to store large integer values without losing precision

I recently wanted to use some very large numbers in Swift and quickly came across some limitations.

Int

My first implementation started out naively with Int (Int64) type. This is, after all, the easiest way to use integers.

The limitations are also easy to understand.

print(Int64.max)  // 2^63 = 9223372036854775807
print(UInt64.max) // 2^64 = 18446744073709551615

and when you try to go beyond that you get an overflow error

// set value = Int.max + 1
let value = 9223372036854775808 // error: integer literal '9223372036854775808' overflows when stored into 'Int'

Those are some fairly large numbers. Large enough that you likely don’t ever think about it for most day to day development.

What about those times when you do need a bigger number?

Float

Floating point types can hold much larger values. The usual suspects Float, and Double are 32 and 64 bits respectively.

print(Float32.greatestFiniteMagnitude) // 3.40282e+38
print(Float64.greatestFiniteMagnitude) // 1.79769313486232e+308

So even a 32bit float can hold a number with nearly twice the number of digits as a 64 bit Integer. The reason a Float can hold much larger numbers is because a portion of those bits represents an exponent, which means the numbers, to some extent, increase exponentially.

These large numbers come at the cost of precision. As the value get larger, the space between values get larger. At small numbers you can have a lot of precision. This is probably how most Floating point numbers are used (at least that is my common use case). However, at very large numbers the precision gets quite bad.

print(Float64.ulpOfOne) // 2.22044604925031e-16
print(Float64.greatestFiniteMagnitude.ulp) // 1.99584030953472e+292

As you can see, if we care about not losing Integer precision then we can’t actually use the full range of Float values. So what’s the largest number that we can use before we start losing numbers? It’s going to be two to the power of the number of bits in the significand (including the implicit bit). Let’s try it out.

// Float range for ulp = 1
print(Float.significandBitCount) // 23
print(pow(2, Float.significandBitCount)) // 2^23 = 8388608
print(Float(8_388_607).ulp) // 0.5
print(Float(8_388_608).ulp) // 1.0
print(pow(2, Float.significandBitCount+1)) // 2^24 = 16777216
print(Float(16_777_215).ulp) // 1.0
print(Float(16_777_216).ulp) // 2.0
print(Float(16_777_217)) // 16777216 <- lost precision

So even though we can get much larger numbers, 16_777_216 is the last number we can reliably use without the possibility of losing some data.

For 64 bit float

print(Float64.significandBitCount) // 52
print(pow(2, Float64.significandBitCount+1)) // 2^53 = 9007199254740992
print(Float64(9007199254740991).ulp) // 1.0
print(Float64(9007199254740992).ulp) // 2.0
print(String(format: "%.0f", Float64(9007199254740993))) // 9007199254740992 <- lost precision

So the maximum Float64 value we can use without losing precision is 9,007,199,254,740,992 vs the UInt64 18,446,744,073,709,551,615. Just as you might have suspected, the Int formats are pretty good at holding Integer values.

There is one more Float type available in Swift that might help.

print(Float80.significandBitCount) // 63
print(pow(2, Float80.significandBitCount+1)) // 2^64 = 18446744073709551616
let f80: Float80 = 1.8446744073709551615e19
print(f80.ulp) // 1.0
print((f80+1).ulp) // 2.0
print(f80+2) // 1.84467440737095516e+19 <- lost precision

Float80 is an extended precision format. The first thing you might notice about this type is that its storage size is not a power of 2. This type is not meant for storage, it’s a format that can be used to reduce roundoff and overflow error while performing intermediate computations with 64-bit floats. A requirement for doing this is that the significand is 64 bits (63 + 1 implicit bit). Therefore, it can hold all of the values of a UInt64 without losing precision.

Note: This is actually double the range of the UInt64 since Float80 can also use all of the negative numbers with the same magnitude.

But how can we get more than 64 bits to hold Integers?

Decimal

Decimal is another type, available in Swift, that holds Base 10 numbers. This is our last hope for built in types. Let’s look at the definition.

public struct Decimal {

    public var _exponent: Int32

    public var _length: UInt32 // length == 0 && isNegative -> NaN

    public var _isNegative: UInt32

    public var _isCompact: UInt32

    public var _reserved: UInt32

    public var _mantissa: (UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16)

    public init()

    public init(_exponent: Int32, _length: UInt32, _isNegative: UInt32, _isCompact: UInt32, _reserved: UInt32, _mantissa: (UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16))
}

You can see that Decimal has some similar components to a Float. Length, isCompact and reserved are specific to Decimal. The other components are exponent, isNegative, and mantissa which are the components of a float type.

isNegative is just a flag and is represented with 1 bit in float formats.

exponent here is signed 32 bit Int. Float32 has 8 bits and Float64 has 11, both with an implicit offset for negative values.

mantissa (an alternate word for significand) is the part we’re most interested in. The Decimal type represents this as a tuple of 8 UInt16. What this really comes down to is that the significand can hold 8 * 16 = 128 bits! That’s double the amount of bits of a UInt64 and it can represent negative numbers as well.

In actuality, the full 128 bit range does not have ulp <= 1.0. It seems to break down between 2^127+2^126 and 2^128. I haven’t figured out why, so if you know, I’d love to hear about it.

Still if you need to deal in really big numbers with very high precision, Decimal is the key. You should still use Int64 (Int) and Float64 (Double) on 64 bit platforms wherever possible to maximize performance but know that Decimal is standing by when you need it.

How do you go beyond 128 bits? Read my follow up here.

Beyond 128-bit integers

DecimalFloatFloat32Float64Float80floating pointIntInt64Integermantissasignificandulp

Ken Boreham