Multithreaded Programming

Overview

Thread Introduction
Multithreading Models
Threaded Case Study
Threading Issues

Thread Introduction

Thread

thread a.k.a lightweight process: basic unit of CPU utilization
all threads belonging to the same process share: code section, data section, and OS resources (e.g. open files and signals)
But each thread has its own (thread control block): thread ID, program counter, register set, and a stack

Single-threaded and multithreaded processes

Motivation

Example: a web browser
- One thread displays contents while the other thread receives data from network
Example: a web server
- One request / process: poor performance
- One request / thread: better performance as code and resource sharing
Example: RPC server
- One RPC request / thread

Benefits of Multithreading

Responsiveness: allow a program to continue running even if part of it is blocked or is performing a lengthy operation
Resource sharing: several different threads of activity all within the same address space
Utilization of MP (multi processor) architectur e: Several threads may be running in parallel on different processors
Economy: Allocating memory and resources for process creation is costly. In Solaris, creating a process is about 30 times slower than is creating a thread, and context switching is about five times slower. A register set switch is still required, but no memory-management related work is needed

Why Thread?

Multithcore Programming

Multithreaded programming provides a mechanism for more efficient use of multiple cores and improved concurrency (threads can run in parallel)
Multicore systems putting pressure on system designers and application programmers
- OS designers: scheduling algorithms use cores to allow the parallel execution

Parallel execution on a multicore system

Challenges in Multicore Programming

Dividing activities: divide program into concurrent tasks
Balance: evenly distribute tasks to cores
Data splitting: divide data accessed and manipulated by the tasks
Data dependency: synchronize data access
Testing and debugging

User vs. Kernel Threads

User threads – thread management done by user-level threads library

POSIX Pthreads
Win32 threads
Java threads

Kernel threads – supported by the kernel (OS) directly

Windows 2000 (NT)
Solaris
Linux
Tru64 UNIX

User threads

Thread library provides support for thread creation, scheduling, and deletion
Generally fast to create and manage
If the kernel is single-threaded, a user-thread blocks -> entire process blocks even if other threads are ready to run

Kernel threads

The kernel performs thread creation, scheduling, etc.
Generally slower to create and manage
If a thread is blocked, the kernel can schedule another thread for execution

Multithreading Models

Many-to-One

Many user-level threads mapped to single kernel thread
Used on systems that do not support kernel threads
Thread management is done in user space, so it is efficient
The entire process will block if a thread makes a blocking system call
Only one thread can access the kernel at a time, multiple threads are unable to run in parallel on multiprocessors

One-to-One

Each user-level thread maps to a kernel thread
There could be a limit on number of kernel threads
More concurrency
Overhead: Creating a thread requires creating the corresponding kernel thread
Examples
- Windows XP/NT/2000
- Linux
- Solaris 9 and later

Many-to-Many

Multiplexes many user-level threads to a smaller or equal number of kernel threads
Allows the developer to create as many user threads as wished
The corresponding kernel threads can run in parallel on a multiprocessor
When a thread performs a blocking call, the kernel can schedule another thread for execution

Case Study

Thread libraries
- Pthreads
- Java threads
OS examples
- WinXP
- Linux

Shared-Memory Programming

Definition: Processes communicate or work together with each other through a shared memory space which can be accessed by all processes
- Faster & more efficient than message passing
Many issues as well:
- Synchronization
- Deadlock
- Cache coherence
Programming techniques:
- Parallelizing compiler
- Unix processes
- Threads (Pthread, Java)

What is Pthread?

Historically, hardware vendors have implemented their own proprietary versions of threads
POSIX (Portable Operating System Interface) standard is specified for portability across Unix-like systems
Similar concept as MPI for message passing libraries
Pthread is the implementation of POSIX standard for threads

Pthread Creation

pthread_create(thread,attr,routine,arg)
- thread: An unique identifier (token) for the new thread
- attr: It is used to set thread attributes. NULL for the default values
  - for example, bind the thread to a specific CPU core
- routine: The routine that the thread will execute once it is created
- arg: A single argument that may be passed to routine

#include <pthread.h>
#include <stdio.h>
#define NUM_THREADS 5

void *PrintHello(void *threadId) {
    long* data = static_cast <long*> threadId;
    printf("Hello World! It's me, thread #%ld!\n", *data);
    pthread_exit(NULL);
}

int main (int argc, char *argv[]) {
    pthread_t threads[NUM_THREADS];
    for (long tid = 0; tid < NUM_THREADS; tid++) {
        pthread_create(&threads[tid], NULL, PrintHello, (void *)&tid);
    }
    /* Last thing that main() should do */
    pthread_exit(NULL);
}

Pthread Joining & Detaching

pthread_join(threadId, status)
- Blocks until the specified threadId thread terminates
- One way to accomplish synchronization between threads
- Example: to create a pthread barrier for (int i = 0; i < n; i++) pthread_join(thread[i], NULL);
pthread_detach(threadId)
- If we don’t want to join a thread, we can detach it
- Once a thread is detached, it can never be joined
- Detach a thread could free some system resources

Java Threads

Thread is created by
- Extending Thread class
- Implementing the Runnable interface
Java threads are implemented using a thread library on the host system
- Win32 threads on Windows
- Pthreads on UNIX-like system
Thread mapping depends on implementation of the JVM
- Windows 98/NT: one-on-one model
- Solaris 2: many-to-many model

Linux Threads

Linux does not support multithreading (no threads in the kernel)
Various Pthreads implementation are available for user-level
The fork system call – create a new process and a copy of the associated data of the parent process
The clone system call – create a new process and a link that points to the associated data of the parent process
A set of flags is used in the clone call for indication of the level of the sharing
- None of the flags is set -> clone = fork
- All flags are set -> parent and child share everything

Threading Issues

Semantics of fork() and exec() system calls. Duplicate all the threads or not?
Thread cancellation: Asynchronous or deferred
Signal handling: Where then should a signal be delivered?
Thread pools: Create a number of threads at process startup.

Semantics of fork() and exec()

Does fork() duplicate only the calling thread or all threads?
- Some UNIX system support two versions of fork()
execlp() works the same; replace the entire process
- If exec() is called immediately after forking, then duplicating all threads is unnecessary

different versions of fork

The middle image is a fork that duplicates all threads, while the right image is a fork that duplicates only the calling thread.

Thread Cancellation

What happen if a thread determinates before it has completed?
- e.g, terminate web page loading
Target thread: a thread that is to be cancelled
Two general approaches:
- Asynchronous cancellation:
  - One thread terminates the target thread immediately
- Deferred cancellation (default option):
  - The target thread periodically checks whether it should be terminated, allowing it an opportunity to terminate itself in an orderly fashion (canceled safely)
  - Check at Cancellation points

Signal Handling

Signals (synchronous or asynchronous) are used in UNIX systems to notify a process that an event has occurred
- Synchronous: illegal memory access, division by zero
- Asynchronous: control+C to terminate a process
A signal handler is used to process signals
1. Signal is generated by particular event
2. Signal is delivered to a process
3. Signal is handled
Options
- Deliver the signal to the thread to which the signal applies
  - e.g. sleep
- Deliver the signal to every thread in the process
  - e.g. process is killed
- Deliver the signal to certain threads in the process
- Assign a specific thread to receive all signals for the process

Thread Pools

Create a number of threads in a pool where they await work
Advantages
- Usually slightly faster to service a request with an existing thread than create a new thread
- Allows the number of threads in the application(s) to be bound to the size of the pool
# of threads: # of CPUs, expected # of requests, amount of physical memory

Multithreaded Programming#

Thread Introduction#

Thread#

Motivation#

Benefits of Multithreading#

Why Thread?#

Multithcore Programming#

Challenges in Multicore Programming#

User vs. Kernel Threads#

Multithreading Models#

Many-to-One#

One-to-One#

Many-to-Many#

Case Study#

Shared-Memory Programming#

What is Pthread?#

Pthread Creation#

Pthread Joining & Detaching#

Java Threads#

Linux Threads#

Threading Issues#

Semantics of fork() and exec()#

Thread Cancellation#

Signal Handling#

Thread Pools#