Programming for Hybrid Multi Many-core MPP Systems
  • Home
  • Table of Contents
    • Chapter 1 / Introduction
  • Previous Publications
    • High Performance Computing
    • A Guidebook
  • About the Authors
    • John Levesque
    • Aaron Vose

Chapter 6


Link to the solution:
6.1   We used 8-byte floating point operands in this chapter; how many operands could be placed in level-1, 2, and 3 cache if one used 4-byte floating point operands instead? How about 16-byte operands?
6.2   When optimizing an application for a vector system like KNL, what is the first issue you should address?
6.3   What is hardware prefetching? What is software prefetching? When a multi-loop is blocked for enhancing cache reuse, how can prefetching destroy performance?
6.4   Can one obtain a performance increase by blocking the following multi- nested loop? Why?

6.5   Can the following loop-nests be strip-mined? Why?
6.6   When is blocking with tiles better than blocking with planes?
 

6.1   We used 8-byte floating point operands in this chapter; how many operands could be placed in level-1, 2, and 3 cache if one used 4-byte floating point operands instead? How about 16-byte operands?​
 

6.2   When optimizing an application for a vector system like KNL, what is the first issue you should address?​
 

6.3   What is hardware prefetching? What is software prefetching? When a multi-loop is blocked for enhancing cache reuse, how can prefetching destroy performance?​
 

6.4   Can one obtain a performance increase by blocking the following multi- nested loop? Why?
      DO I = 1, N1
        DO J = 1, N2
          DO K = 1, N3
            A(I,J) = A(I,J) + B(I,J) * C(I,J)
      END DO ; END DO ; END DO​
 

6.5   Can the following loop-nests be strip-mined? Why?
      do ic = 1,nc
        do iz = 1 + (ic-1)*nz/nc,ic*nz/nc
          do iy = 1,ny
            do ix = 1, nx
              a(ix,iy,iz) = a(ix,iy,iz)*2.0
        enddo ; enddo ; enddo
        do iz = 1 + (ic-1)*nz/nc,ic*nz/nc
          do iy = 1,ny
            do ix = 1, nx
              a(ix,iy,iz) = a(ix+1,iy,iz)*0.5
        enddo ; enddo ; enddo
      enddo​
 

6.6   When is blocking with tiles better than blocking with planes?
Proudly powered by Weebly
  • Home
  • Table of Contents
    • Chapter 1 / Introduction
  • Previous Publications
    • High Performance Computing
    • A Guidebook
  • About the Authors
    • John Levesque
    • Aaron Vose