Introduction to OpenMP#

1. Target hardware#

  • Single computing node, multiple sockets, multiple cores.

  • Dell PowerEdge M600 Blade Server.

multi-socket motherboard
  • Intel Sandy Bridge CPU.

Intel Sandy Bridge CPU
  • In summary

    • Node with up to four sockets.

    • Each socker has up to 60 cores.

    • Each core is an independent CPU.

    • Each core has access to all the memory on the node.

2. Thread#

An abstraction for running multiple processes
  • A normal process is a running program with a single point of execution, i.e, a single PC (program counter).

  • A multi-threaded program has multiple points of execution, i.e., multiple PCs.

  • Each thread is very much like a separate process, except for one difference:

    • All threads of the same process share the same address space and thus can access the same data.

Threads model
POXIS threads (pthreads)
  • Standardized C language thread programming API.

  • pthreads specifies the interface of using threads, but not how threads are implemented in OS.

  • Different implementations include:

    • kernel-level threads,

    • user-level threads, or

    • hybrid

  • pthread_create

  • pthread_join

Say hello to my little threads …
  • Launch a Code Server on Palmetto

  • Open the Terminal

  • Create a directory called openmp, and change into that directory

$ cd
$ mkdir openmp
$ cd openmp
  • Create thread_hello.c with the following contents:

  • Compile and run thread_hello.c:

$ gcc -o thread_hello thread_hello.c -lpthread
$ ./thread_hello 1
$ ./thread_hello 2 
$ ./thread_hello 4

3. Target software#

  • Provide wrappers for threads and fork/join model of parallelism.

    • Program originally runs in sequential mode.

    • When parallelism is activated, multiple threads are forked from the original proces/thread (master thread).

    • Once the parallel tasks are done, threads are joined back to the original process and return to sequential execution.

threads/fork-join models
  • The threads have access to all data in the master thread. This is shared data.

  • The threads also have their own private memory stack.

4. Write, compile, and run an OpenMP program#

Basic requirements
  • Source code (C) needs to include #include <omp.h>

  • Compiling task need to have -fopenmp flag.

  • Specify the environment variable OMP_NUM_THREADS.

OMP directives
  • OpenMP must be told when to parallelize.

  • For C/C++, pragma is used to annotate

#pragma omp somedirective clause(value, othervalue)
  parallel statement;
  • or

#pragma omp somedirective clause(value, othervalue)
{
  parallel statement 1;
  parallel statement 2;
  ...
}
Hands-on 1: create hello_omp.c
  • In the EXPLORER window, right-click on openmp and select New File.

  • Type hello_omp.c as the file name and hits Enter.

  • Enter the following source code in the editor windows:

  • Line 1: Include omp.h to have libraries that support OpenMP.

  • Line 7: Declare the beginning of the parallel region. Pay attention to how the curly bracket is setup, comparing to the other curly brackets.

  • Line 10: omp_get_thread_num gets the ID assigned to the thread and then assign it to a variable named tid of type int.

  • Line 15: omp_get_num_threads gets the value assigned to OMP_NUM_THREADS and return it to a variable named nthreads of type int.

What’s important?
  • tid and nthreads.

  • They allow us to coordinate the parallel workloads.

  • Specify the environment variable OMP_NUM_THREADS.

$ export OMP_NUM_THREADS=4

{: .language-bash}

Trapezoidal
  • Problem: estimate the integral of \(y=x^2\) on \([2,8]\) using trapezoidal rule. four threads.

  • With 4 threads: nthreads=4.

    • How to decide which thread will handle which segment?

    • How to get all results back together?

Trapezoidal

5. Trapezoid implementation#

  • In the EXPLORER window, right-click on openmp and select New File.

  • Type trapezoid.c as the file name and hits Enter.

  • Enter the following source code in the editor windows:

  • Compile and run trapezoid.c.

6. A bit more detailed#

  • Modify the trapezoid.c so that it looks like below.

7. Challenges#

Part 1

Alternate the trapezoid.c code so that the parallel region will invokes a function to calculate the partial sum.

Solution
Part 2:
  • Write a program called sum_series.c that takes a single integer N as a command line argument and calculate the sum of the first N non-negative integers.

  • Speed up the summation portion by using OpenMP.

  • Assume N is divisible by the number of threads.

Solution
Part 3:
  • Write a program called sum_series_2.c that takes a single integer N as a command line argument and calculate the sum of the first N non-negative integers.

  • Speed up the summation portion by using OpenMP.

  • There is no assumtion that N is divisible by the number of threads.

Solution

8. With timing#

  • In the EXPLORER window, right-click on openmp and select New File.

  • Type trapezoid_time.c as the file name and hits Enter.

  • Enter the following source code in the editor windows (You can copy the contents of trapezoid.c with function from Challenge 1 as a starting point):

  • Save the file when you are done:

  • How’s the run time?