Chapter 6
Link to the solution:
6.1 We used 8-byte floating point operands in this chapter; how many operands could be placed in level-1, 2, and 3 cache if one used 4-byte floating point operands instead? How about 16-byte operands?
6.2 When optimizing an application for a vector system like KNL, what is the first issue you should address?
6.3 What is hardware prefetching? What is software prefetching? When a multi-loop is blocked for enhancing cache reuse, how can prefetching destroy performance?
6.4 Can one obtain a performance increase by blocking the following multi- nested loop? Why?
6.5 Can the following loop-nests be strip-mined? Why?
6.6 When is blocking with tiles better than blocking with planes?
6.1 We used 8-byte floating point operands in this chapter; how many operands could be placed in level-1, 2, and 3 cache if one used 4-byte floating point operands instead? How about 16-byte operands?
6.2 When optimizing an application for a vector system like KNL, what is the first issue you should address?
6.3 What is hardware prefetching? What is software prefetching? When a multi-loop is blocked for enhancing cache reuse, how can prefetching destroy performance?
6.4 Can one obtain a performance increase by blocking the following multi- nested loop? Why?
6.5 Can the following loop-nests be strip-mined? Why?
6.6 When is blocking with tiles better than blocking with planes?
6.1 We used 8-byte floating point operands in this chapter; how many operands could be placed in level-1, 2, and 3 cache if one used 4-byte floating point operands instead? How about 16-byte operands?
6.2 When optimizing an application for a vector system like KNL, what is the first issue you should address?
6.3 What is hardware prefetching? What is software prefetching? When a multi-loop is blocked for enhancing cache reuse, how can prefetching destroy performance?
6.4 Can one obtain a performance increase by blocking the following multi- nested loop? Why?
DO I = 1, N1
DO J = 1, N2
DO K = 1, N3
A(I,J) = A(I,J) + B(I,J) * C(I,J)
END DO ; END DO ; END DO
DO I = 1, N1
DO J = 1, N2
DO K = 1, N3
A(I,J) = A(I,J) + B(I,J) * C(I,J)
END DO ; END DO ; END DO
6.5 Can the following loop-nests be strip-mined? Why?
do ic = 1,nc
do iz = 1 + (ic-1)*nz/nc,ic*nz/nc
do iy = 1,ny
do ix = 1, nx
a(ix,iy,iz) = a(ix,iy,iz)*2.0
enddo ; enddo ; enddo
do iz = 1 + (ic-1)*nz/nc,ic*nz/nc
do iy = 1,ny
do ix = 1, nx
a(ix,iy,iz) = a(ix+1,iy,iz)*0.5
enddo ; enddo ; enddo
enddo
do ic = 1,nc
do iz = 1 + (ic-1)*nz/nc,ic*nz/nc
do iy = 1,ny
do ix = 1, nx
a(ix,iy,iz) = a(ix,iy,iz)*2.0
enddo ; enddo ; enddo
do iz = 1 + (ic-1)*nz/nc,ic*nz/nc
do iy = 1,ny
do ix = 1, nx
a(ix,iy,iz) = a(ix+1,iy,iz)*0.5
enddo ; enddo ; enddo
enddo
6.6 When is blocking with tiles better than blocking with planes?