Programming for Hybrid Multi Many-core MPP Systems
  • Home
  • Table of Contents
    • Chapter 1 / Introduction
  • Previous Publications
    • High Performance Computing
    • A Guidebook
  • About the Authors
    • John Levesque
    • Aaron Vose

Chapter 4


Link to the solution:
4.1   Try the following example with your compiler:...Does your compiler perform this optimization automatically?
4.2   Given the Fortran 90 Array syntax example in this chapter, if the de- composition of parallel chunks were on the first and second dimension of the grid, at what sizes would the data being accessed in the code fit in level-2 cache? Assume that the level-2 cache is 512 KB and that the operands are 8-byte reals.​
4.3   Why might the use of derived types degrade program performance?
4.4   Derived types do not always introduce inefficiencies, how would you rewrite the derived type in the example given in this chapter to allow for contiguous accessing of the arrays?​
​4.5   Would X=Y**.5 run as fast as X=SQRT(Y)? What would be a better way of writing X=Y**Z? Try these on your compiler/system.
​4.6   Why might the use of array syntax degrade program performance?​
​4.7   What constructs restrict the ability of the compiler to optimize the alignment of arrays in memory? What constructs gives the compiler the most flexibility in aligning arrays in memory?​
4.8   What speedup factor is available from SSE instructions? How about GPGPUs?​
​4.9   Why is dependency analysis required to vectorize a loop?​
​4.10   Why might a DO loop get better performance than the equivalent array syntax?​​
4.11   When using array sections as arguments to a subroutine, what may degrade performance?​​
4.12   What is strength reduction? How can it be used to speed up array index calculations?
​​
 

4.1   Try the following example with your compiler:
     DO I = 1,100
       DO J = 1,100
         DO K = 1,100
           A(I,J,K) = B(I,J,K) + C(I,J,K)
ENDDO ENDDO
ENDDO
And then:
     DO K = 1,100
       DO J = 1,100
         DO I = 1,100
           A(I,J,K) = B(I,J,K) + C(I,J,K)
ENDDO ENDDO
ENDDO
Does your compiler perform this optimization automatically?​
 

4.2   Given the Fortran 90 Array syntax example in this chapter, if the de- composition of parallel chunks were on the first and second dimension of the grid, at what sizes would the data being accessed in the code fit in level-2 cache? Assume that the level-2 cache is 512 KB and that the operands are 8-byte reals.​
 

4.3   Why might the use of derived types degrade program performance?
DIMENSION A(100), B(100), C(100), D(100), X(100), Y(100)
DO I = 1,100
  X(I) = A(I) * B(I) * C(I) * D(I)
  Y(I) = E(I) * B(I) * F(I) * D(I)
END DO
DO I = 1,100
  T = B(I) * D(I)
  X(I) = A(I) * C(I) * T
  Y(I) = E(I) * F(I) * T
END DO​
 

4.4   Derived types do not always introduce inefficiencies, how would you rewrite the derived type in the example given in this chapter to allow for contiguous accessing of the arrays?​
 

4.5   Would X=Y**.5 run as fast as X=SQRT(Y)? What would be a better way of writing X=Y**Z? Try these on your compiler/system.​
 

4.6   Why might the use of array syntax degrade program performance?​
 

4.7   What constructs restrict the ability of the compiler to optimize the alignment of arrays in memory? What constructs gives the compiler the most flexibility in aligning arrays in memory?​
 

4.8   What speedup factor is available from SSE instructions? How about GPGPUs?​
 

4.9   Why is dependency analysis required to vectorize a loop?​
 

4.10   Why might a DO loop get better performance than the equivalent array syntax?​
 

4.11   When using array sections as arguments to a subroutine, what may degrade performance?​
 

4.12   What is strength reduction? How can it be used to speed up array index calculations?
​
Proudly powered by Weebly
  • Home
  • Table of Contents
    • Chapter 1 / Introduction
  • Previous Publications
    • High Performance Computing
    • A Guidebook
  • About the Authors
    • John Levesque
    • Aaron Vose