Process and thread in python

What’s are the Differences between Processes and Threads

Summary: in this tutorial, you’ll learn about the processes and threads, and more importantly, the main differences between them.

Introduction to processes and threads

Suppose that you have a simple Python program:

Computers don’t understand Python. They only understand machine code, which is a set of instructions containing zero and one.

Therefore, you need a Python Interpreter to execute this Python program, which translates the Python code to machine code.

When you execute the python app.py command, the Python interpreter (CPython) compiles the app.py file into machine code. And then the operating system (OS) needs to load the program into the memory (RAM) to run the program.

Once the OS loads the program to memory, it moves the instructions to the CPU for execution via bus.

In general, the OS moves the instructions to a queue, also known as a pipeline. Then, the CPU will execute the instructions from the pipeline.

By definition, a process is an instance of a program running on a computer. And a thread is a unit of execution within a process.

Notice that if you launch a program multiple times, you’ll have a single program but multiple processes, each representing an instance of the program.

A program is like a class while processes are like objects of the class.

The following picture illustrates the flow of running a Python program on a computer:

Differences between Processes and Threads - Single-core CPU Execution

So far, you’ve learned how to develop a program that has one process with one thread. Therefore, sometimes the terms process and thread are often used interchangeably.

A program may have one or more processes and a process can have one or more threads.

When a program has multiple processes, it’s called multiprocessing. If a program has multiple threads, it’s called multithreading.

Single-core processors

In the past, a CPU has only one core. In other words, it can run only a single process at one time. To execute multiple processes “at the same time”, the OS uses a software component called a scheduler:

The scheduler is like a switch that schedules processes. The main task of the scheduler is to select the instructions and submit them for execution regularly.

The scheduler switches between processes so quickly (around 1 ms) that it creates the illusion of the computer being able to execute multiple processes simultaneously.

Multicore processors

Today, the CPU often has multiple cores, e.g., two cores (dual-core) and four cores (quad-core).

The number of cores will determine the number of processes that the CPU can execute simultaneously. Generally, the more cores the CPU has, the more processes it can truly execute simultaneously.

For example, a dual-core CPU can execute exactly two processes simultaneously and a quad-core CPU can execute at most four processes simultaneously.

Multiprocessing uses a multi-core CPU within a single computer, which indeed executes multiple processes in parallel.

CPU-bound vs. I/O-bound tasks

In general, programs deal with two types of tasks: I/O-bound or CPU-bound.

  • I/O-bound tasks spend more time doing I/O than doing computations. The typical examples of I/O-bound tasks are network requests, database connections, and file reading/writing.
  • In contrast, CPU-bound tasks use more time doing computation than generating I/O requests. The typical examples of CPU-bound tasks are matrix multiplication, finding prime numbers, video compression, and video streaming.

Technically, multithreading is suitable for I/O-bound tasks, and multiprocessing is suitable for CPU-bound tasks.

The main differences between processes and threads

The following table illustrates the main differences between a process and a thread:

Criteria Process Thread
Memory Sharing Memory is not shared between processes Memory is shared between threads within a process
Memory footprint Large Small
CPU-bound & I/O-bound processing Optimized for CPU-bound tasks Optimized for I/O bound tasks
Starting time Slower than a thread Faster than a process
Interruptablity Child processes are interruptible. Threads are not interruptible.

Summary

  • A process is an instance of a program running on a computer.
  • A program can have one or more processes and a process can have one or more threads.
  • A thread is a unit of execution within a process.
  • A process can have one or more threads.

Источник

Processes and threads in Python (CPython)#

Python (or, more precisely, CPython — the implementation used in the book) is optimized to work in single-threaded mode. This is good if program uses only one thread. And, at the same time, Python has certain nuances of running in multithreaded mode. This is because CPython uses GIL (global interpreter lock).

GIL does not allow multiple threads to execute Python code at the same time. If you don’t go into detail, GIL can be visualized as a sort of flag that carried over from thread to thread. Whoever has the flag can do the job. The flag is transmitted either every Python instruction or, for example, when some type of input-output operation is performed.

Therefore, different threads will not run in parallel and the program will simply switch between them executing them at different times. However, if in the program there is some “wait” (packages from the network, user request, time.sleep pause), then in such program the threads will be executed as if in parallel. This is because during such pauses the flag (GIL) can be passed to another thread.

That is, threads are well suited for tasks that involve input-output (IO) operations:

  • Connection to equipment and network connectivity in general
  • Working with file system
  • Downloading files

In the Internet it is often possible to find phrases like «In Python it is better not to use threads at all». Unfortunately, such phrases are not always written in context, namely that it is about specific tasks that are tied to CPU.

The next sections discuss how to use threads to connect via Telnet/SSH. Script execution time will be checked comparing the sequential execution and execution using processes.

Processes#

Processes allow to execute tasks on different computer cores. This is important for tasks that are tied to CPU. For each process a copy of resources is created, a memory is allocated, each process has its own GIL. This also makes processes “heavier” than threads.

In addition, the number of processes that run in parallel depends on the number of cores and CPU and is usually estimated in dozens, while the number of threads for input-output operations can be estimated in hundreds.

Processes and threads can be combined but this complicates the program and at the base level for input-output operations it is better to stop at threads.

Combining threads and processes, i.e., starting a process in a program and then starting threads inside it, makes troubleshooting difficult. And I’d recomend not use that option.

Although it is usually better to use threads for input-output tasks, for some modules it is better to use processes because they may not work correctly with threads.

In addition to processes and threads, there is another version of concurrent connections to device: asynchronous programming. This option is not covered in the book.

Источник

Читайте также:  Color words with html
Оцените статью