OpenMP is a C++/C API used to execute serial code in a parallel fashion using the core concept of multi-threading by adding compiler directives into our code.
Developers can use OpenMP to specify which structured block of their code should be executed in parallel and how many threads should be used in the parallel execution code block.
Note: OpenMP requires compiler support and hence it is possible that it may not run on some compilers.
OpenMP uses special instructions that are called pragmas which are essentially preprocessor directives. The keywords that follow the pragmas statement depend upon what extensions will be used.
The syntax of a pragma omp parallel
directive that is used to execute code by multiple threads is as below:
# pragma omp parallel num_threads(thread_count)/* structured block*/
Below is a breakdown of the above line of code:
thread_count
: The number of threads to be used for parallel execution.
/* structured block */
: The following serial code block will be converted to parallel.
num_threads:
A clause that is used to initialize the threads.
Below is an example of an OpenMP pragma omp parallel
clause on a print statement.
Note: To run the code written below in the terminal enter the command
./openmp
.
#include <stdio.h>#include <unistd.h>#include <stdlib.h>#include <omp.h>#define NUM_THREADS 6int main() {// pragma clause to run a print statement in parallel#pragma omp parallel num_threads(NUM_THREADS){printf("Hello World\n");}// print a done statement before program terminationprintf("All done\n");exit(EXIT_SUCCESS);}
In the output window, Hello World is printed six times. This is because we had specified six threads to run the print statement. So each thread created its own copy of the code in the {}
curly braces and then executed it.
Above we can see a flow diagram of the program that was executed. A master thread is a single thread used to run the program serially. When we come across a pragma omp parallel
directive, the code inside the {}
splits into the number of specified threads and is then executed by each thread.
Below we can see a simple C program that uses the pragma omp parallel
directive to calculate the sum of the first 59 natural numbers.
To run the code below, open the terminal and enter the command ./loop
#include <stdio.h>#include <unistd.h>#include <stdlib.h>#include <omp.h>// define the number of threads#define NUM_THREADS 6// function that is to be ran using OpenMPint thread_function(void) {// get the current thread and the total number of threadsint my_number = omp_get_thread_num();int totalThreads = omp_get_num_threads();// calculate the starting numberint startAt = my_number*10;int sum = 0;// find the sum of the next 10 numbers from the startfor (int i=startAt; i <= (startAt+9); i++){// add the sum to the threads local sum variablesum += i;}// print a closing statement at the end of executionprintf("\n\tBye from %d\n", my_number);// return the local sumreturn sum;}int main() {long totalSum = 0; // variable to store the total sum#pragma omp parallel num_threads(NUM_THREADS){long localSum =0; // variable to store the local sum of a threadlocalSum = thread_function(); // get the local sumtotalSum += localSum; // add the local sum to the total sum}// print the total sum and exitprintf("\nTotal sum: %ld\n", totalSum);exit(EXIT_SUCCESS);}
Line 7: We define the number of threads we want to use.
Lines 10–28: The function to be used by multiple threads.
Line 12: We retrieve the total number of threads accessing the function by using the function omp_get_num_threads()
Line 13: We retrieve the current thread number executing the function by using the function omp_get_thread_num()
Line 16: We get the starting number of the counter. For thread 1, it is 10; for thread 2, it is 20, and so on.
Lines 20–23: After the starting number, we loop over the following 10 numbers for the thread and find the sum of the total 10 numbers.
Lines 24–27: When the loop has finished, we print a closing statement and then return the sum that a single thread has calculated.
Line 32: We call the pragma opm parallel
directive on thread_function()
, so six threads execute it.
Lines 35–36: We retrieve the sum value of each thread returned and add it to the overall global sum variable.
Lines 39–40: After all threads are finished, we print the total sum of all six threads and terminate the program.
Free Resources