Go 的兩個黑魔法技巧（go 的兩個黑魔法技巧是什么）

作者：pedrogao，騰訊CSIG后臺研發(fā)工程師

Go 的兩個黑魔法技巧

最近，在寫 Go 代碼的時候，發(fā)現(xiàn)了其特別有意思的兩個奇技淫巧，于是寫下這篇
文章和大家分享一下。

魔法 1：調用 runtime 中的私有函數(shù)

按照 Go 的編譯約定，代碼包內以小寫字母開頭的函數(shù)、變量是私有的：

package test// 私有func abs() {}// 公共func Abs() {}

對于 test 包中 abs 函數(shù)只能在包內調用，而 Abs 函數(shù)卻可以在其它包中導入后使用。

私有變量、方法的意義在于封裝：控制內部數(shù)據(jù)、保證外部交互的一致性。

這樣既能促進系統(tǒng)運行的可靠性，也能減少使用者的信息負載。

這樣的規(guī)定對設計、封裝良好的包是友好的，但并不是每個人都有這樣的能力，另外對于一些特殊的函數(shù)，如：runtime 中的 memmove 函數(shù)，在有些場景下，確實是需要的。

因此 Go 在程序鏈接階段給開發(fā)者打開了一扇窗，即可以通過 go:linkname 指令來鏈接包內的私有函數(shù)。

memmove

以 memmove 為例，
如下：

func memmove(to, from unsafe.Pointer, n uintptr)

memmove 作為 runtime 中的私有函數(shù)，用于任意數(shù)據(jù)之間的內存拷貝，無視類型信息，直接操作內存，這樣的操作在 Go 中雖然是不提倡的，但是用好了，卻也是一把利刃。

新建一個 go 文件，如 runtime.go，并加上如下內容：

//go:noescape//go:linkname memmove runtime.memmove//goland:noinspection GoUnusedParameterfunc memmove(to unsafe.Pointer, from unsafe.Pointer, n uintptr)

把視角放到 go:linkname 指令上，該指令接受兩個參數(shù)：

memmove：當前函數(shù)名稱；
runtime.memmove：對應鏈接的函數(shù)的路徑，報名函數(shù)名。

這樣，編譯器在做鏈接時就會將當前的 memmove 函數(shù)鏈接到 runtime 中的 memmove 函數(shù)，我們就能使用該函數(shù)了。

在平常寫代碼的時候，我們經(jīng)常性地需要拷貝字節(jié)切片、字符串之間的數(shù)據(jù)。比如將數(shù)據(jù)從切片 1拷貝到切片 2，使用 memmove 代碼如下：

// runtime.gotype GoSlice struct { Ptr unsafe.Pointer Len int Cap int}// runtime_test.gofunc Test_memmove(t *testing.T) { src := []byte{1, 2, 3, 4, 5, 6} dest := make([]byte, 10, 10) spew.Dump(src) spew.Dump(dest) srcp := (*GoSlice)(unsafe.Pointer(&src)) destp := (*GoSlice)(unsafe.Pointer(&dest)) memmove(destp.Ptr, srcp.Ptr, unsafe.Sizeof(byte(0))*6) spew.Dump(src) spew.Dump(dest)}

字節(jié)切片([]byte)在內存中的形態(tài)如 GoSlice 結構體來所示，Len、Cap 分別表示切片長度、容量，字段 Ptr 指向真實的字節(jié)數(shù)據(jù)。

將兩個切片的數(shù)據(jù)指針以及拷貝長度作為參數(shù)傳入 memmove，數(shù)據(jù)就能從 src 拷貝到 dest。運行結果如下：

=== RUN Test_memmove# 拷貝之前([]uint8) (len=6 cap=6) { 00000000 01 02 03 04 05 06 |......|}([]uint8) (len=10 cap=10) { 00000000 00 00 00 00 00 00 00 00 00 00 |..........|}# 拷貝之后([]uint8) (len=6 cap=6) { 00000000 01 02 03 04 05 06 |......|}([]uint8) (len=10 cap=10) { 00000000 01 02 03 04 05 06 00 00 00 00 |..........|

顯然，對于切片之間的數(shù)據(jù)拷貝，標準庫提供的 copy 函數(shù)要更加方便一些：

func Test_copy(t *testing.T) {src := []byte{1, 2, 3, 4, 5, 6}dest := make([]byte, 10, 10) spew.Dump(src) spew.Dump(dest) copy(dest, src) spew.Dump(src) spew.Dump(dest)}

這樣也能達到一樣的效果，memmove 更加適合字符串(string)和數(shù)組切片之間的數(shù)據(jù)拷貝場景，如下：

// runtime.gotype GoString struct { Ptr unsafe.Pointer Len int}// runtime_test.gofunc Test_memmove(t *testing.T) { str := "pedro" // 注意：這里的len不能為0，否則數(shù)據(jù)沒有分配，就無法復制 data := make([]byte, 10, 10) spew.Dump(str) spew.Dump(data) memmove((*GoSlice)(unsafe.Pointer(&data)).Ptr, (*GoString)(unsafe.Pointer(&str)).Ptr, unsafe.Sizeof(byte(0))*5) spew.Dump(str) spew.Dump(data)}

類似地，GoString 是字符串在內存中的表達形態(tài)，通過 memmove 函數(shù)就能快速的將字符數(shù)據(jù)從字符串拷貝到切片，反之亦然，運行結果如下：

# 拷貝之前(string) (len=5) "pedro"([]uint8) (len=10 cap=10) { 00000000 00 00 00 00 00 00 00 00 00 00 |..........|}# 拷貝之后(string) (len=5) "pedro"([]uint8) (len=10 cap=10) { 00000000 70 65 64 72 6f 00 00 00 00 00 |pedro.....|}

growslice

切片是 Go 中最常用的數(shù)據(jù)結構之一，對于切片擴容，Go 只提供了 append 函數(shù)來隱式的擴容，但內部是通過調用 runtime 中的 growslice
函數(shù)來實現(xiàn)的：

func growslice(et *_type, old slice, cap int) slice

growslice 函數(shù)接受 3 個參數(shù)：

et：切片容器中的數(shù)據(jù)類型，如 int，_type 可以表示 Go 中的任意類型；
old：舊切片；
cap：擴容后的切片容量。

擴容成功后，返回新的切片。

同樣地，使用go:linkname來鏈接 runtime 中的 growslice 函數(shù)，如下：

// runtime.gotype GoType struct { Size uintptr PtrData uintptr Hash uint32 Flags uint8 Align uint8 FieldAlign uint8 KindFlags uint8 Traits unsafe.Pointer GCData *byte Str int32 PtrToSelf int32}// GoEface 本質是 interfacetype GoEface struct { Type *GoType Value unsafe.Pointer}//go:linkname growslice runtime.growslice//goland:noinspection GoUnusedParameterfunc growslice(et *GoType, old GoSlice, cap int) GoSlice

growslice 函數(shù)的第一個參數(shù) et 實際是 Go 對所有類型的一個抽象數(shù)據(jù)結構——GoType。

這里引入了 Go 語言實現(xiàn)機制中的兩個重要數(shù)據(jù)結構：

GoEface：empty interface，即 interface{}，空接口；
GoType：Go 類型定義數(shù)據(jù)結構，可用于表示任意類型。

關于 GoEface、GoIface、GoType、GoItab 都是 Go 語言實現(xiàn)的核心數(shù)據(jù)結構，這里的內容很多，感興趣的可以參考這里。

這樣，我們就能通過調用 growslice 函數(shù)來對切片進行手動擴容了，如下：

// runtime.gofunc UnpackType(t reflect.Type) *GoType { return (*GoType)((*GoEface)(unsafe.Pointer(&t)).Value)}// runtime_test.gofunc Test_growslice(t *testing.T) { assert := assert.New(t) var typeByte = UnpackType(reflect.TypeOf(byte(0))) spew.Dump(typeByte) dest := make([]byte, 0, 10) assert.Equal(len(dest), 0) assert.Equal(cap(dest), 10) ds := (*GoSlice)(unsafe.Pointer(&dest)) *ds = growslice(typeByte, *ds, 100) assert.Equal(len(dest), 0) assert.Equal(cap(dest), 112)}

由于 growslice 的參數(shù)et類型在 runtime 中不可見，我們重新定義了 GoType 來表示，
并且通過反射的機制來拿到字節(jié)切片中的 GoType，然后調用 growslice 完成擴容工作。

運行程序：

--- PASS: Test_growslice (0.00s)PASS

注意一個點，growslice 傳入的 cap 參數(shù)是 100，但是最后的擴容結果卻是 112，這個是因為 growslice 會做一個 roundupsize 處理，感興趣的同學可以參考這里。

魔法 2：調用 C/匯編函數(shù)

下面，我們再來看 Go 的另外一個更加有趣的黑魔法。

cgo

通過 cgo，我們可以很方便地在 Go 中調用 C 代碼，如下：

/*#include <stdio.h>#include <unistd.h>static void* Sbrk(int size) { void *r = sbrk(size); if(r == (void *)-1){ return NULL; } return r;}*/import "C"import ( "fmt")func main() { mem := C.Sbrk(C.int(100)) defer C.free(mem) fmt.Println(mem)}

運行程序，會得到如下輸出：

0xba00000

cgo 是 Go 與 C 之間的橋梁，讓 Go 可以享受 C 語言強大的系統(tǒng)編程能力，比如這里的 sbrk 會直接向
進程申請一段內存，而這段內存是不受 Go GC 的影響的，因此我們必須手動地釋放(free)掉它。

在一些特殊場景，比如全局緩存，為了避免數(shù)據(jù)被 GC 掉而導致緩存失效，那么可以嘗試這樣使用。

當然，這還不夠 tricky，別忘了，C 語言是可以直接內聯(lián)匯編的，同樣地，我們也可以在 Go 中內聯(lián)匯編
試試，如下：

/*#include <stdio.h>static int Add(int i, int j){ int res = 0; __asm__ ("add %1, %2" : "=r" (res) : "r" (i), "0" (j) ); return res;}*/import "C"import ( "fmt")func main() { r := C.Add(C.int(2022), C.int(18)) fmt.Println(r)}

運行程序，可以得到如下輸出：

2040

cgo 雖然給了我們一座橋梁，但付出的代價也不小，具體的缺點可以參考這里。

對 cgo 感興趣的同學可以參考這里。

匯編

isspace

那么有沒有一種方式可以回避掉 cgo 的缺點，答案自然是可以的。

這個方式其實很容易想到：不使用 cgo，而是使用 plan9，也就是 Go 支持的匯編語言。

當然我們不是直接去寫匯編，而是將 C 編譯成匯編，然后再轉化成 plan9 與 .go 代碼一起編譯。

編譯的過程如下圖所示：

Go 的兩個黑魔法技巧（go 的兩個黑魔法技巧是什么）

而且 C 本身就是匯編的高級抽象，作為目前最強勁性能的存在，這種方式不僅回避了 cgo 的性能問題，
反而將程序性能提高了。過程如下：

首先，我們定義一個簡單的 C 語言函數(shù) isspace(判斷字符為空)：

// ./inner/op.h#ifndef OP_H#define OP_Hchar isspace(char ch);// ./inner/op.c#include "op.h"char isspace(char ch) { return ch == ' ' || ch == 'r' || ch == 'n' | ch == 't';}

然后，使用 clang 將其編譯為匯編(注意：是 clang)：

$ clang -mno-red-zone -fno-asynchronous-unwind-tables -fno-builtin -fno-exceptions -fno-rtti -fno-stack-protector -nostdlib -O3 -msse4 -mavx -mno-avx2 -DUSE_AVX=1 -DUSE_AVX2=0 -S ./inner/*.c

編譯成功后，會在 inner 文件夾下生成一個 op.s 匯編文件，大致如下：

.section __TEXT,__text,regular,pure_instructions .build_version macos, 11, 0 .globl _isspace ## -- Begin function isspace .p2align 4, 0x90_isspace: ## @isspace## ?.0: pushq %rbp movq %rsp, %rbp movb $1, %al cmpb $13, %dil je LBB0_3

clang 默認生成的匯編是 AT&T 格式的，這種匯編 Go 是無法編譯的(gccgo 除外)，因此這里有一步轉換工作。

負責將 AT&T 匯編轉化成 plan9 匯編，而二者之間的語法差異其實是比較大的，因此這里借助一個轉換asm2asm 工具來完成。

將 asm2asm clone 到本地，然后運行：

$ git clone https://github.com/chenzhuoyu/asm2asm$ ./tools/asm2asm.py ./op.s ./inner/op.s

執(zhí)行后，會報錯。原因在于，Go 對于 plan9 匯編文件需要一個對應的 .go 聲明文件來對應。

我們在 ./inner/op.h 文件中定義了 isspace 函數(shù)，因此需要新建一個同名的 op.go 文件來聲明這個函數(shù)：

//go:nosplit//go:noescape//goland:noinspection GoUnusedParameterfunc __isspace(ch byte) (ret byte)

然后再次運行 asm2asm 工具來生成匯編：

$ ./tools/asm2asm.py ./op.s ./inner/op.s$ tree ..|__ inner| |__ op.c| |__ op.h| |__ op.s|__ op.go|__ op.s|__ op_subr.go

asm2asm 會生成兩個文件：op.s 和 op_subr.go：

op.s：翻譯而來的 plan9 匯編文件；
op_subr.go：函數(shù)調用輔助文件。

生成后，op.go 中的 __isspace 函數(shù)就能順利的鏈接上對應的匯編代碼，并運行，如下：

func Test___isspace(t *testing.T) { type args struct { ch byte } tests := []struct { name string args args wantRet byte }{ { name: "false", args: args{ch: '0'}, wantRet: 0, }, { name: "true", args: args{ch: 'n'}, wantRet: 1, }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { if gotRet := __isspace(tt.args.ch); gotRet != tt.wantRet { t.Errorf("__isspace() = %v, want %v", gotRet, tt.wantRet) } }) }}// output=== RUN Test___isspace=== RUN Test___isspace/false=== RUN Test___isspace/true--- PASS: Test___isspace (0.00s) --- PASS: Test___isspace/false (0.00s) --- PASS: Test___isspace/true (0.00s)PASS

__isspace 順利運行，并通過了單測。

u32toa_small

一個 isspace 函數(shù)有些簡單，無法完全發(fā)揮出匯編的能力，下面我們來看一個稍微復雜一點的例子：將整數(shù)轉化為字符串。

在 Go 中，整數(shù)轉化為字符串的方式有多種，比如說：strconv.Itoa 函數(shù)。

這里，我選擇用 C 來寫一個簡單的整數(shù)轉字符串的函數(shù)：u32toa_small，然后將其編譯為匯編代碼供 Go 調用，并看看二者之間的性能差異。

u32toa_small 的實現(xiàn)也比較簡單，使用查表法(strconv.Itoa 使用的也是這種方法)，如下：

#include "op.h"static const char Digits[200] = { '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9', '1', '0', '1', '1', '1', '2', '1', '3', '1', '4', '1', '5', '1', '6', '1', '7', '1', '8', '1', '9', '2', '0', '2', '1', '2', '2', '2', '3', '2', '4', '2', '5', '2', '6', '2', '7', '2', '8', '2', '9', '3', '0', '3', '1', '3', '2', '3', '3', '3', '4', '3', '5', '3', '6', '3', '7', '3', '8', '3', '9', '4', '0', '4', '1', '4', '2', '4', '3', '4', '4', '4', '5', '4', '6', '4', '7', '4', '8', '4', '9', '5', '0', '5', '1', '5', '2', '5', '3', '5', '4', '5', '5', '5', '6', '5', '7', '5', '8', '5', '9', '6', '0', '6', '1', '6', '2', '6', '3', '6', '4', '6', '5', '6', '6', '6', '7', '6', '8', '6', '9', '7', '0', '7', '1', '7', '2', '7', '3', '7', '4', '7', '5', '7', '6', '7', '7', '7', '8', '7', '9', '8', '0', '8', '1', '8', '2', '8', '3', '8', '4', '8', '5', '8', '6', '8', '7', '8', '8', '8', '9', '9', '0', '9', '1', '9', '2', '9', '3', '9', '4', '9', '5', '9', '6', '9', '7', '9', '8', '9', '9',};// < 10000int u32toa_small(char *out, uint32_t val) { int n = 0; uint32_t d1 = (val / 100) << 1; uint32_t d2 = (val % 100) << 1; /* 1000-th digit */ if (val >= 1000) { out[n ] = Digits[d1]; } /* 100-th digit */ if (val >= 100) { out[n ] = Digits[d1 1]; } /* 10-th digit */ if (val >= 10) { out[n ] = Digits[d2]; } /* last digit */ out[n ] = Digits[d2 1]; return n;}

然后在 op.go 中加入對應的 __u32toa_small 函數(shù)：

// < 10000//go:nosplit//go:noescape//goland:noinspection GoUnusedParameterfunc __u32toa_small(out *byte, val uint32) (ret int)

使用 clang 重新編譯 op.c 文件，并用 asm2asm 工具來生成對應的匯編代碼(節(jié)選部分)：

_u32toa_small: BYTE $0x55 // pushq %rbp WORD $0x8948; BYTE $0xe5 // movq %rsp, %rbp MOVL SI, AX IMUL3Q $1374389535, AX, AX SHRQ $37, AX LEAQ 0(AX)(AX*1), DX WORD $0xc06b; BYTE $0x64 // imull $100, ?x, ?x MOVL SI, CX SUBL AX, CX ADDQ CX, CX CMPL SI, $1000 JB LBB1_2 LONG $0x60058d48; WORD $0x0000; BYTE $0x00 // leaq $96(%rip), %rax /* _Digits(%rip) */ MOVB 0(DX)(AX*1), AX MOVB AX, 0(DI) MOVL $1, AX JMP LBB1_3

然后在 Go 中調用該函數(shù)：

func Test___u32toa_small(t *testing.T) { var buf [32]byte type args struct { out *byte val uint32 } tests := []struct { name string args args wantRet int }{ { name: "9999", args: args{ out: &buf[0], val: 9999, }, wantRet: 4, }, { name: "1234", args: args{ out: &buf[0], val: 1234, }, wantRet: 4, }, } for _, tt := range tests { t.Run(tt.name, func(t *testing.T) { got := __u32toa_small(tt.args.out, tt.args.val) assert.Equalf(t, tt.wantRet, got, "__u32toa_small(%v, %v)", tt.args.out, tt.args.val) assert.Equalf(t, tt.name, string(buf[:tt.wantRet]), "ret string must equal name") }) }}

測試成功，__u32toa_small 函數(shù)不僅成功運行，而且通過了測試。

最后，我們來做一個性能跑分看看 __u32toa_small 和 strconv.Itoa 之間的性能差異：

func BenchmarkGoConv(b *testing.B) { val := int(rand.Int31() % 10000) b.ResetTimer() for n := 0; n < b.N; n { strconv.Itoa(val) }}func BenchmarkFastConv(b *testing.B) { var buf [32]byte val := uint32(rand.Int31() % 10000) b.ResetTimer() for n := 0; n < b.N; n { __u32toa_small(&buf[0], val) }}

使用 go test -bench 運行這兩個性能測試函數(shù)，結果如下：

BenchmarkGoConvBenchmarkGoConv-12 60740782 19.52 ns/opBenchmarkFastConvBenchmarkFastConv-12 122945924 9.455 ns/op

從結果中，可以明顯看出 __u32toa_small 優(yōu)于 Itoa，大概有一倍的提升。

總結

至此，Go 的兩個黑魔法技巧已經(jīng)介紹完畢了，感興趣的同學可以自己實踐看看。

Go 的黑魔法一定程度上都使用了 unsafe 的能力，這也是 Go 不提倡的，當然使用 unsafe 其實就和普通的 C 代碼編寫一樣，因此也無需有太強的心理負擔。

實際上，上述的兩種方法都被 sonic 用在了生產環(huán)境上，而且?guī)淼暮艽蟮男阅芴嵘?，?jié)約大量資源。

因此，當 Go 現(xiàn)有的標準庫無法滿足你的需求時，不要受到語言本身的限制，而是用雖然少見但有效的方式去解決
它。

希望上面的兩個黑魔法能帶你對 Go 不一樣的認識。

Go 的兩個黑魔法技巧（go 的兩個黑魔法技巧是什么）

Go 的兩個黑魔法技巧

魔法 1：調用 runtime 中的私有函數(shù)

memmove

growslice

魔法 2：調用 C/匯編函數(shù)

cgo

匯編

isspace

u32toa_small

總結

相關新聞