BLAS
简介
BLAS
全称是Basic Linear Algebra Subprograms
是规定了一套低级的执行常见线性代数操作的规范。其实现经常针对特殊的机器进行优化,比较著名的·BLAS库有
ACML, ATLAS, MKL, OpenBLAS。许多常见的数值软件均采用兼容
BLAS规范的实现库来进行线性代数计算,比如
Matlab, Numpy, Mathematica`。
其中,Level 1 BLAS
主要提供向量操作
Level 2 BLAS
提供矩阵向量操作(gemv)
gemv
而Level 3 BLAS
则提供广义矩阵乘积操作(gemm)
gemm
GEMM
在深度学习中是十分重要的,全连接层以及卷积层基本上都是通过GEMM
来实现的,而网络中大约90%的运算都是在这两层中。而一个良好的GEMM
的实现可以充分利用系统的多级存储结构和程序执行的局部性来充分加速运算。
gemm
接口解析
gemm
的函数接口如下图所示,darknet
中也采用了类似的接口设计。
sgemm
其中,A,B,C
分别是MxK, KxN, MxN
的矩阵,TRANSA, TRANSB, TRANSC
表示是否使用对应矩阵的转置,ALPHA, BETA
为对应的系数。而LDA, LDB, LDC
表示对应矩阵的leading dimension
,即第一维度的大小。根据我的理解(结合darknet
的源码),是因为在内存中是连续存放的,而这个leading dimension
的量是用来定义元素的位置的,即add(A[i, j])=A+i*lda+j
。其中sgemm
中的s
表示是单精度的运算,类似的,还有dgemm
。
gemm代码分析:
源码
** -- Reference BLAS level3 routine (version 3.7.0) --* -- Reference BLAS is a software package provided by Univ. of Tennessee, --* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--* December 2016** .. Scalar Arguments ..REAL ALPHA,BETAINTEGER K,LDA,LDB,LDC,M,NCHARACTER TRANSA,TRANSB* ..* .. Array Arguments ..REAL A(LDA,*),B(LDB,*),C(LDC,*)* ..** =====================================================================** .. External Functions ..LOGICAL LSAMEEXTERNAL lsame* ..* .. External Subroutines ..EXTERNAL xerbla* ..* .. Intrinsic Functions ..INTRINSIC max* ..* .. Local Scalars ..REAL TEMPINTEGER I,INFO,J,L,NCOLA,NROWA,NROWBLOGICAL NOTA,NOTB* ..* .. Parameters ..REAL ONE,ZEROparameter(one=1.0e+0,zero=0.0e+0)* ..** Set NOTA and NOTB as true if A and B respectively are not* transposed and set NROWA, NCOLA and NROWB as the number of rows* and columns of A and the number of rows of B respectively.*nota = lsame(transa,'N')notb = lsame(transb,'N')IF (nota) THENnrowa = mncola = kELSEnrowa = kncola = mEND IFIF (notb) THENnrowb = kELSEnrowb = nEND IF** Test the input parameters.*info = 0IF ((.NOT.nota) .AND. (.NOT.lsame(transa,'C')) .AND.+ (.NOT.lsame(transa,'T'))) THENinfo = 1ELSE IF ((.NOT.notb) .AND. (.NOT.lsame(transb,'C')) .AND.+ (.NOT.lsame(transb,'T'))) THENinfo = 2ELSE IF (m.LT.0) THENinfo = 3ELSE IF (n.LT.0) THENinfo = 4ELSE IF (k.LT.0) THENinfo = 5ELSE IF (lda.LT.max(1,nrowa)) THENinfo = 8ELSE IF (ldb.LT.max(1,nrowb)) THENinfo = 10ELSE IF (ldc.LT.max(1,m)) THENinfo = 13END IFIF (info.NE.0) THENCALL xerbla('SGEMM ',info)RETURNEND IF** Quick return if possible.*IF ((m.EQ.0) .OR. (n.EQ.0) .OR.+ (((alpha.EQ.zero).OR. (k.EQ.0)).AND. (beta.EQ.one))) RETURN** And if alpha.eq.zero.*IF (alpha.EQ.zero) THENIF (beta.EQ.zero) THENDO 20 j = 1,nDO 10 i = 1,mc(i,j) = zero10 CONTINUE20 CONTINUEELSEDO 40 j = 1,nDO 30 i = 1,mc(i,j) = beta*c(i,j)30 CONTINUE40 CONTINUEEND IFRETURNEND IF** Start the operations.*IF (notb) THENIF (nota) THEN** Form C := alpha*A*B + beta*C.*DO 90 j = 1,nIF (beta.EQ.zero) THENDO 50 i = 1,mc(i,j) = zero50 CONTINUEELSE IF (beta.NE.one) THENDO 60 i = 1,mc(i,j) = beta*c(i,j)60 CONTINUEEND IFDO 80 l = 1,ktemp = alpha*b(l,j)DO 70 i = 1,mc(i,j) = c(i,j) + temp*a(i,l)70 CONTINUE80 CONTINUE90 CONTINUEELSE** Form C := alpha*A**T*B + beta*C*DO 120 j = 1,nDO 110 i = 1,mtemp = zeroDO 100 l = 1,ktemp = temp + a(l,i)*b(l,j)100 CONTINUEIF (beta.EQ.zero) THENc(i,j) = alpha*tempELSEc(i,j) = alpha*temp + beta*c(i,j)END IF110 CONTINUE120 CONTINUEEND IFELSEIF (nota) THEN** Form C := alpha*A*B**T + beta*C*DO 170 j = 1,nIF (beta.EQ.zero) THENDO 130 i = 1,mc(i,j) = zero130 CONTINUEELSE IF (beta.NE.one) THENDO 140 i = 1,mc(i,j) = beta*c(i,j)140 CONTINUEEND IFDO 160 l = 1,ktemp = alpha*b(j,l)DO 150 i = 1,mc(i,j) = c(i,j) + temp*a(i,l)150 CONTINUE160 CONTINUE170 CONTINUEELSE** Form C := alpha*A**T*B**T + beta*C*DO 200 j = 1,nDO 190 i = 1,mtemp = zeroDO 180 l = 1,ktemp = temp + a(l,i)*b(j,l)180 CONTINUEIF (beta.EQ.zero) THENc(i,j) = alpha*tempELSEc(i,j) = alpha*temp + beta*c(i,j)END IF190 CONTINUE200 CONTINUEEND IFEND IF*RETURN** End of SGEMM .*
SGEMV performs one of the matrix-vector operationsy := alpha*A*x + beta*y, or y := alpha*A**T*x + beta*y,where alpha and beta are scalars, x and y are vectors and A is anm by n matrix.
源码:
** -- Reference BLAS level2 routine (version 3.7.0) --* -- Reference BLAS is a software package provided by Univ. of Tennessee, --* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--* December 2016** .. Scalar Arguments ..REAL ALPHA,BETAINTEGER INCX,INCY,LDA,M,NCHARACTER TRANS* ..* .. Array Arguments ..REAL A(LDA,*),X(*),Y(*)* ..** =====================================================================** .. Parameters ..REAL ONE,ZEROparameter(one=1.0e+0,zero=0.0e+0)* ..* .. Local Scalars ..REAL TEMPINTEGER I,INFO,IX,IY,J,JX,JY,KX,KY,LENX,LENY* ..* .. External Functions ..LOGICAL LSAMEEXTERNAL lsame* ..* .. External Subroutines ..EXTERNAL xerbla* ..* .. Intrinsic Functions ..INTRINSIC max* ..** Test the input parameters.*info = 0IF (.NOT.lsame(trans,'N') .AND. .NOT.lsame(trans,'T') .AND.+ .NOT.lsame(trans,'C')) THENinfo = 1ELSE IF (m.LT.0) THENinfo = 2ELSE IF (n.LT.0) THENinfo = 3ELSE IF (lda.LT.max(1,m)) THENinfo = 6ELSE IF (incx.EQ.0) THENinfo = 8ELSE IF (incy.EQ.0) THENinfo = 11END IFIF (info.NE.0) THENCALL xerbla('SGEMV ',info)RETURNEND IF** Quick return if possible.*IF ((m.EQ.0) .OR. (n.EQ.0) .OR.+ ((alpha.EQ.zero).AND. (beta.EQ.one))) RETURN** Set LENX and LENY, the lengths of the vectors x and y, and set* up the start points in X and Y.*IF (lsame(trans,'N')) THENlenx = nleny = mELSElenx = mleny = nEND IFIF (incx.GT.0) THENkx = 1ELSEkx = 1 - (lenx-1)*incxEND IFIF (incy.GT.0) THENky = 1ELSEky = 1 - (leny-1)*incyEND IF** Start the operations. In this version the elements of A are* accessed sequentially with one pass through A.** First form y := beta*y.*IF (beta.NE.one) THENIF (incy.EQ.1) THENIF (beta.EQ.zero) THENDO 10 i = 1,lenyy(i) = zero10 CONTINUEELSEDO 20 i = 1,lenyy(i) = beta*y(i)20 CONTINUEEND IFELSEiy = kyIF (beta.EQ.zero) THENDO 30 i = 1,lenyy(iy) = zeroiy = iy + incy30 CONTINUEELSEDO 40 i = 1,lenyy(iy) = beta*y(iy)iy = iy + incy40 CONTINUEEND IFEND IFEND IFIF (alpha.EQ.zero) RETURNIF (lsame(trans,'N')) THEN** Form y := alpha*A*x + y.*jx = kxIF (incy.EQ.1) THENDO 60 j = 1,ntemp = alpha*x(jx)DO 50 i = 1,my(i) = y(i) + temp*a(i,j)50 CONTINUEjx = jx + incx60 CONTINUEELSEDO 80 j = 1,ntemp = alpha*x(jx)iy = kyDO 70 i = 1,my(iy) = y(iy) + temp*a(i,j)iy = iy + incy70 CONTINUEjx = jx + incx80 CONTINUEEND IFELSE** Form y := alpha*A**T*x + y.*jy = kyIF (incx.EQ.1) THENDO 100 j = 1,ntemp = zeroDO 90 i = 1,mtemp = temp + a(i,j)*x(i)90 CONTINUEy(jy) = y(jy) + alpha*tempjy = jy + incy100 CONTINUEELSEDO 120 j = 1,ntemp = zeroix = kxDO 110 i = 1,mtemp = temp + a(i,j)*x(ix)ix = ix + incx110 CONTINUEy(jy) = y(jy) + alpha*tempjy = jy + incy120 CONTINUEEND IFEND IF*RETURN** End of SGEMV .*
参考文献
- Basic Linear Algebra Subprograms-WikiPeia
- Dongarra J J, Croz J D, Hammarling S, et al. A set of level 3 basic linear algebra subprograms[J]. Acm Transactions on Mathematical Software, 1990, 16(1):1-17.
- Why gemm is at the heart of deep learning
- sgemm 官方文档
- https://www.jianshu.com/p/1dd118f431eb