详解map（一）—— map的常用方法

前言

map是语言中常用的一个数据结构，其在不同语言中有着不同的实现的，现在我们看下在go中的底层实现。

更多内容分享，欢迎关注公众号：Go开发笔记

map源码

map的底层实现的源码位于runtime/map.go中，其相关方法的调用在编译时确认。

map说明

简单来说，map就是一个哈希表。数据被排列成一个bucket数组。每个bucket最多包含8个key/elem对。哈希的低位用于选择一个bucket。每个bucket包含每个hash的一些高位，以区分单个bucket中的条目。

map的数据存储结构

map实际上就是hash表，其包含2^{B个bucket，每个bucket最多包含8个键值对。bucket内数据的存储格式为：存储bucket内key对应hash的topHash数组，然后是键值对。这些键值对的存储方式是：8个key然后才是key对应的值。key对的hash的低位表示对应的bucket，高位表示bucket内具体的位置。当前map的bucket数位2}B，一个key对应的hash为hash，则其对应bucket的位置为hash&(2^B-1)。

每个key对应的位置为：bucket的位置+bmp的大小+keySize*key的位置index。

每个key的value对应的位置为：bucket的位置+bmp的大小+keySize*8+elemSize*elem的位置index

bmp中存储了长度为8的数组，用以存储key的hash的高位。因bmp是数组，查找需要依次遍历确认与指定的hash的高位是否一致，如果一致，会取出key进行比较，避免hash冲突。

扩容中，hash的seed并没有发生变化，因此key的hash也不会改变。但扩容可能会导致B变化，由于hash的低位与B才能确定所在的bucket，如果B变化，则对应bucket的位置也会发生变化。因此扩容后，需要复制数据到新的位置中。

其具体结构为hmap。

hmap

// A header for a Go map.
type hmap struct {// Note: the format of the hmap is also encoded in cmd/compile/internal/gc/reflect.go.// Make sure this stays in sync with the compiler's definition.count     int // # live cells == size of map.  Must be first (used by len() builtin)flags     uint8B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for detailshash0     uint32 // hash seedbuckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growingnevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)extra *mapextra // optional fields
}// mapextra holds fields that are not present on all maps.
type mapextra struct {// If both key and elem do not contain pointers and are inline, then we mark bucket// type as containing no pointers. This avoids scanning such maps.// However, bmap.overflow is a pointer. In order to keep overflow buckets// alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.// overflow and oldoverflow are only used if key and elem do not contain pointers.// overflow contains overflow buckets for hmap.buckets.// oldoverflow contains overflow buckets for hmap.oldbuckets.// The indirection allows to store a pointer to the slice in hiter.overflow    *[]*bmapoldoverflow *[]*bmap// nextOverflow holds a pointer to a free overflow bucket.nextOverflow *bmap
}// A bucket for a Go map.
type bmap struct {// tophash generally contains the top byte of the hash value// for each key in this bucket. If tophash[0] < minTopHash,// tophash[0] is a bucket evacuation state instead.tophash [bucketCnt]uint8// Followed by bucketCnt keys and then bucketCnt elems.// NOTE: packing all the keys together and then all the elems together makes the// code a bit more complicated than alternating key/elem/key/elem/... but it allows// us to eliminate padding which would be needed for, e.g., map[int64]int8.// Followed by an overflow pointer.
}// A hash iteration structure.
// If you modify hiter, also change cmd/compile/internal/gc/reflect.go to indicate
// the layout of this structure.
type hiter struct {key         unsafe.Pointer // Must be in first position.  Write nil to indicate iteration end (see cmd/internal/gc/range.go).elem        unsafe.Pointer // Must be in second position (see cmd/internal/gc/range.go).t           *maptypeh           *hmapbuckets     unsafe.Pointer // bucket ptr at hash_iter initialization timebptr        *bmap          // current bucketoverflow    *[]*bmap       // keeps overflow buckets of hmap.buckets aliveoldoverflow *[]*bmap       // keeps overflow buckets of hmap.oldbuckets alivestartBucket uintptr        // bucket iteration started atoffset      uint8          // intra-bucket offset to start from during iteration (should be big enough to hold bucketCnt-1)wrapped     bool           // already wrapped around from end of bucket array to beginningB           uint8i           uint8bucket      uintptrcheckBucket uintptr
}

hmap中的fields代表的意思依次是：

type hmap struct {count     int // map的sizeflags     uint8 // 状态标志位B         uint8  // log2(buckets) (可以容纳最多loadFactor * 2^B项)noverflow uint16 // overflow buckets的大约数 hash0     uint32 // hash seedbuckets    unsafe.Pointer // 2^B Buckets的数组.如果count为0可能为niloldbuckets unsafe.Pointer // 前一个存储桶数组大小的一半，仅在增长时为非零nevacuate  uintptr        // 排空进度计数器（小于此值的buckets已排空）extra *mapextra // 可选的 fields
}

make

根据初始化map的size及平台，具体的初始化又分为以下几种：

makemap_small

适用于size小于等于8时使用。代码逻辑很简单，就是通过new一个hamp，然后初始化hash0.

// makemap_small implements Go map creation for make(map[k]v) and
// make(map[k]v, hint) when hint is known to be at most bucketCnt
// at compile time and the map needs to be allocated on the heap.
func makemap_small() *hmap {h := new(hmap)h.hash0 = fastrand()return h
}

makemap64

默认使用，当32位使用时，size超出int32位范围时，会重置size为0，然后调用makemap。

func makemap64(t *maptype, hint int64, h *hmap) *hmap {if int64(int(hint)) != hint {hint = 0}return makemap(t, int(hint), h)
}

makemap

32位平台时用以代替makemap64，更快。

func makemap(t *maptype, hint int, h *hmap) *hmap {mem, overflow := math.MulUintptr(uintptr(hint), t.bucket.size)if overflow || mem > maxAlloc {hint = 0}// initialize Hmapif h == nil {h = new(hmap)}h.hash0 = fastrand()// Find the size parameter B which will hold the requested # of elements.// For hint < 0 overLoadFactor returns false since hint < bucketCnt.B := uint8(0)for overLoadFactor(hint, B) {B++}h.B = B// allocate initial hash table// if B == 0, the buckets field is allocated lazily later (in mapassign)// If hint is large zeroing this memory could take a while.if h.B != 0 {var nextOverflow *bmaph.buckets, nextOverflow = makeBucketArray(t, h.B, nil)if nextOverflow != nil {h.extra = new(mapextra)h.extra.nextOverflow = nextOverflow}}return h
}

makemap的处理步骤如下：

计算大小为size的map需要占用的内存及是否溢出
若溢出则重置大小为0
若h为nil，则new一个hmap
初始化hashseed
根据size及LoadFactor计算B，并赋值到h.B
若h.B不为0，构造bucket array，赋值到h.B上，若存在提前分配的overflow buckets，则存入h.Extra
返回h

需要注意的是：

当B>=4时，会预先创建部分overflow buckets，用以存储当buckets已满的数据，同时也减少了扩容的频率。

具体源码见makeBucketArray。

func makeBucketArray(t *maptype, b uint8, dirtyalloc unsafe.Pointer) (buckets unsafe.Pointer, nextOverflow *bmap) {base := bucketShift(b)nbuckets := base// For small b, overflow buckets are unlikely.// Avoid the overhead of the calculation.if b >= 4 {// Add on the estimated number of overflow buckets// required to insert the median number of elements// used with this value of b.nbuckets += bucketShift(b - 4)sz := t.bucket.size * nbucketsup := roundupsize(sz)if up != sz {nbuckets = up / t.bucket.size}}if dirtyalloc == nil {buckets = newarray(t.bucket, int(nbuckets))} else {// dirtyalloc was previously generated by// the above newarray(t.bucket, int(nbuckets))// but may not be empty.buckets = dirtyallocsize := t.bucket.size * nbucketsif t.bucket.ptrdata != 0 {memclrHasPointers(buckets, size)} else {memclrNoHeapPointers(buckets, size)}}if base != nbuckets {// We preallocated some overflow buckets.// To keep the overhead of tracking these overflow buckets to a minimum,// we use the convention that if a preallocated overflow bucket's overflow// pointer is nil, then there are more available by bumping the pointer.// We need a safe non-nil pointer for the last overflow bucket; just use buckets.nextOverflow = (*bmap)(add(buckets, base*uintptr(t.bucketsize)))last := (*bmap)(add(buckets, (nbuckets-1)*uintptr(t.bucketsize)))last.setoverflow(t, (*bmap)(buckets))}return buckets, nextOverflow
}

assign

assign对应的使用为：

m[key] = elem

具体对应的实际是m[key]部分，返回的是key的地址。

// Like mapaccess, but allocates a slot for the key if it is not present in the map.
func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {if h == nil {panic(plainError("assignment to entry in nil map"))}if raceenabled {callerpc := getcallerpc()pc := funcPC(mapassign)racewritepc(unsafe.Pointer(h), callerpc, pc)raceReadObjectPC(t.key, key, callerpc, pc)}if msanenabled {msanread(key, t.key.size)}if h.flags&hashWriting != 0 {throw("concurrent map writes")}hash := t.hasher(key, uintptr(h.hash0))// Set hashWriting after calling t.hasher, since t.hasher may panic,// in which case we have not actually done a write.h.flags ^= hashWritingif h.buckets == nil {h.buckets = newobject(t.bucket) // newarray(t.bucket, 1)}again:bucket := hash & bucketMask(h.B)if h.growing() {growWork(t, h, bucket)}b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + bucket*uintptr(t.bucketsize)))top := tophash(hash)var inserti *uint8var insertk unsafe.Pointervar elem unsafe.Pointer
bucketloop:for {for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {if isEmpty(b.tophash[i]) && inserti == nil {inserti = &b.tophash[i]insertk = add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))}if b.tophash[i] == emptyRest {break bucketloop}continue}k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))if t.indirectkey() {k = *((*unsafe.Pointer)(k))}if !t.key.equal(key, k) {continue}// already have a mapping for key. Update it.if t.needkeyupdate() {typedmemmove(t.key, k, key)}elem = add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))goto done}ovf := b.overflow(t)if ovf == nil {break}b = ovf}// Did not find mapping for key. Allocate new cell & add entry.// If we hit the max load factor or we have too many overflow buckets,// and we're not already in the middle of growing, start growing.if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {hashGrow(t, h)goto again // Growing the table invalidates everything, so try again}if inserti == nil {// all current buckets are full, allocate a new one.newb := h.newoverflow(t, b)inserti = &newb.tophash[0]insertk = add(unsafe.Pointer(newb), dataOffset)elem = add(insertk, bucketCnt*uintptr(t.keysize))}// store new key/elem at insert positionif t.indirectkey() {kmem := newobject(t.key)*(*unsafe.Pointer)(insertk) = kmeminsertk = kmem}if t.indirectelem() {vmem := newobject(t.elem)*(*unsafe.Pointer)(elem) = vmem}typedmemmove(t.key, insertk, key)*inserti = toph.count++done:if h.flags&hashWriting == 0 {throw("concurrent map writes")}h.flags &^= hashWritingif t.indirectelem() {elem = *((*unsafe.Pointer)(elem))}return elem
}

mapassign的处理步骤如下：

若h为nil则panic
若有其他协程在写入map，则panic相关错误
对key进行hash
设置flags为Writing
若h.buckets为nil，则重新分配
进入again处理
根据hash获取对应的bucket
若h在扩容，则进行扩容工作
获取bucket对应的*bmap b及hash的高位top
进入bucketloop
遍历bucket
若b的当前topHash不为top
（1）若当前tophash的状态为empty且inserti为nil，则当前tophash的地址赋值给inserti，并获取key及element的地址
（2）若topHash的状态为emptyReset则跳出bucketloop
（3）继续遍历bucket
若b的当前topHash不为top，说明已找到匹配的hash。获取key确认与存入的key是否一致（是否hash冲突），不一致则继续遍历bucket。一致，则确实是否更新，需要更新，则更新对应的key。
根据key的地址获取element的地址，跳转done
获取b的overflow buckets，若为nil，则跳出bucketloop;否则将overflow赋值给b，继续bucketloop
如果map没在扩容，新增数据后已超过负载因子或拥有太多的overflow buckets，则进行扩容处理；扩容后进入again
如果inseti为nil，说明所有的buckets已经满了，创建新的overflow，存入
如果key类型t并非指针类型，则获取其指针
如果存储的值类型非指针，获取其指针
将key移动至insertK
更新inserti指向的值为top
进入done
如果有其它goroutine在写入，则panic
如果存储的类型为值类型，则获取其指向的值地址
返回elem

简单总结

简单来说，存入数据需要经过以下几步：

计算hash
根据hash低位从buckets中确认存入bucket的位置
根据hash高位确认bucket内的位置
确认key是否存在，若存在则获取数据存入地址
否则获取overflow buckets，继续步骤1
若需要扩容，则进行扩容，扩容后，继续步骤1
如果buckets及overflow buckets均已满，则新建overflow bucket，获取key、elem的地址
存入数据

正常存入值的顺序：

buckets
overflow buckets
扩容后存入buckets/overflow buckets或者创建overflow buckets后存入

assign时的扩容

若map中的数据达到负载因子或者overflow的数量过多时，会发生扩容。

func mapassign(t *maptype, h *hmap, key unsafe.Pointer) unsafe.Pointer {...// If we hit the max load factor or we have too many overflow buckets,// and we're not already in the middle of growing, start growing.if !h.growing() && (overLoadFactor(h.count+1, h.B) || tooManyOverflowBuckets(h.noverflow, h.B)) {hashGrow(t, h)goto again // Growing the table invalidates everything, so try again}...
}
// overLoadFactor reports whether count items placed in 1<<B buckets is over loadFactor.
func overLoadFactor(count int, B uint8) bool {return count > bucketCnt && uintptr(count) > loadFactorNum*(bucketShift(B)/loadFactorDen)
}// tooManyOverflowBuckets reports whether noverflow buckets is too many for a map with 1<<B buckets.
// Note that most of these overflow buckets must be in sparse use;
// if use was dense, then we'd have already triggered regular map growth.
func tooManyOverflowBuckets(noverflow uint16, B uint8) bool {// If the threshold is too low, we do extraneous work.// If the threshold is too high, maps that grow and shrink can hold on to lots of unused memory.// "too many" means (approximately) as many overflow buckets as regular buckets.// See incrnoverflow for more details.if B > 15 {B = 15}// The compiler doesn't see here that B < 16; mask B to generate shorter shift code.return noverflow >= uint16(1)<<(B&15)
}

具体来说需要满足以下两个条件中的一个：

map中的数据数量count>8 且 count>13*(2^b/2)
overflow的数量noverflow需要不小于2^{B，即对于一个map来说，noverflow最大为2}B -1，当B足够大时，noverflow最大为2^15 -1

若不满足，且当前的buckets已经满了(buckets、overflow均已经满了)时，则创建新的overflow，用以存储数据。

mapaccess

mapaccess的过程与mapassign的过程类似，区别在于存入/读出。

mapaccess有对应的使用有以下两种形式，

v ：= m[key] // 1
v,ok := m[key] // 2

第二种相对第一种多了一个bool的参数，表示是否包含此key，此处主要看方式2的源码。

func mapaccess2(t *maptype, h *hmap, key unsafe.Pointer) (unsafe.Pointer, bool) {if raceenabled && h != nil {callerpc := getcallerpc()pc := funcPC(mapaccess2)racereadpc(unsafe.Pointer(h), callerpc, pc)raceReadObjectPC(t.key, key, callerpc, pc)}if msanenabled && h != nil {msanread(key, t.key.size)}if h == nil || h.count == 0 {if t.hashMightPanic() {t.hasher(key, 0) // see issue 23734}return unsafe.Pointer(&zeroVal[0]), false}if h.flags&hashWriting != 0 {throw("concurrent map read and map write")}hash := t.hasher(key, uintptr(h.hash0))m := bucketMask(h.B)b := (*bmap)(unsafe.Pointer(uintptr(h.buckets) + (hash&m)*uintptr(t.bucketsize)))if c := h.oldbuckets; c != nil {if !h.sameSizeGrow() {// There used to be half as many buckets; mask down one more power of two.m >>= 1}oldb := (*bmap)(unsafe.Pointer(uintptr(c) + (hash&m)*uintptr(t.bucketsize)))if !evacuated(oldb) {b = oldb}}top := tophash(hash)
bucketloop:for ; b != nil; b = b.overflow(t) {for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {if b.tophash[i] == emptyRest {break bucketloop}continue}k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))if t.indirectkey() {k = *((*unsafe.Pointer)(k))}if t.key.equal(key, k) {e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))if t.indirectelem() {e = *((*unsafe.Pointer)(e))}return e, true}}}return unsafe.Pointer(&zeroVal[0]), false
}

mapaccess2的处理步骤如下：

当h为nil或map的大小为0时，说明当前map为空，返回零值及不存在。
如果当前map被其他goroutine写入，则panic
计算当前key的hash，获取bucket的数量，计算hash对应的buckets的bmp b
如果有旧的bucket（扩容中）
（1）如果是同样大小的扩容，则需要将m缩小一半（扩容时增长了一倍）
（2）查找对应位置的bmap为oldb
（3）如果没有oldb没有被清空，则赋值给b
获取hash的高位top
如果b!=nil，进入bucketloop
依次每个bucke对应hash的高位
如果与top不等，且后续没有bucket，说明key不存在，退出循环
如果与top相等，说明key可能在此bucket中，获取对应key，比较是否相等。相等则返回对应的值及存在标识。

简单来说：

计算key的hash，查找到对应的b
如果在扩容中，重新获取oldb为b
获取hash的top，在b（包含overflow）中循环查找k是否存在，存在则返回对应值，否则返回零值

mapdelete

func mapdelete(t *maptype, h *hmap, key unsafe.Pointer) {if raceenabled && h != nil {callerpc := getcallerpc()pc := funcPC(mapdelete)racewritepc(unsafe.Pointer(h), callerpc, pc)raceReadObjectPC(t.key, key, callerpc, pc)}if msanenabled && h != nil {msanread(key, t.key.size)}if h == nil || h.count == 0 {if t.hashMightPanic() {t.hasher(key, 0) // see issue 23734}return}if h.flags&hashWriting != 0 {throw("concurrent map writes")}hash := t.hasher(key, uintptr(h.hash0))// Set hashWriting after calling t.hasher, since t.hasher may panic,// in which case we have not actually done a write (delete).h.flags ^= hashWritingbucket := hash & bucketMask(h.B)if h.growing() {growWork(t, h, bucket)}b := (*bmap)(add(h.buckets, bucket*uintptr(t.bucketsize)))bOrig := btop := tophash(hash)
search:for ; b != nil; b = b.overflow(t) {for i := uintptr(0); i < bucketCnt; i++ {if b.tophash[i] != top {if b.tophash[i] == emptyRest {break search}continue}k := add(unsafe.Pointer(b), dataOffset+i*uintptr(t.keysize))k2 := kif t.indirectkey() {k2 = *((*unsafe.Pointer)(k2))}if !t.key.equal(key, k2) {continue}// Only clear key if there are pointers in it.if t.indirectkey() {*(*unsafe.Pointer)(k) = nil} else if t.key.ptrdata != 0 {memclrHasPointers(k, t.key.size)}e := add(unsafe.Pointer(b), dataOffset+bucketCnt*uintptr(t.keysize)+i*uintptr(t.elemsize))if t.indirectelem() {*(*unsafe.Pointer)(e) = nil} else if t.elem.ptrdata != 0 {memclrHasPointers(e, t.elem.size)} else {memclrNoHeapPointers(e, t.elem.size)}b.tophash[i] = emptyOne// If the bucket now ends in a bunch of emptyOne states,// change those to emptyRest states.// It would be nice to make this a separate function, but// for loops are not currently inlineable.if i == bucketCnt-1 {if b.overflow(t) != nil && b.overflow(t).tophash[0] != emptyRest {goto notLast}} else {if b.tophash[i+1] != emptyRest {goto notLast}}for {b.tophash[i] = emptyRestif i == 0 {if b == bOrig {break // beginning of initial bucket, we're done.}// Find previous bucket, continue at its last entry.c := bfor b = bOrig; b.overflow(t) != c; b = b.overflow(t) {}i = bucketCnt - 1} else {i--}if b.tophash[i] != emptyOne {break}}notLast:h.count--break search}}if h.flags&hashWriting == 0 {throw("concurrent map writes")}h.flags &^= hashWriting
}

对于nil或者大小为0的map的进行delete操作，进行可能会panic的处理，然后返回
如果当前map被其他goroutine写入，则panic
计算当前key的hash，获取bucket的数量，计算hash对应的bucket
如果正在扩容，则继续进行扩容处理
获取bucket对应的bmp
获取hash的高位top
进入search
如果b!=nil，依次查找bucke内对应hash的高位
（1）若没有查找到，若没有后续bucket时，退出search，否则继续查找bucket内的下一个数据
（2）若查找到，获取对应的key，比较两者是否一致，若不一致，继续查找
（3）若一致，将key及elem指向nil或清空数据
（3）将当前tophash对应的位置的状态设置为emptyOne
（4）继续判断后续位置的状态，如果bucket现在以一堆emptyOne的状态结束，那么将这些状态更改为emptyReset的状态.
(5) 数据count减一，退出search
如果当前map被其他goroutine写入，则panic

简单来说，就是在查找后进行k/e的清除处理，然后进行bucket的状态标记，最后count减一。