python基础9_1-进程、线程、守护线程、全局解释器锁、生产者消费者模型

Badia ·

更新时间:2024-09-21

· 831 次阅读

目录1、Python GIL(Global Interpreter Lock)2、进程(process)多进程multiprocessing进程间通讯-Queues/Pipes/Managers进程锁进程池3、线程(thread)语法join函数daemon(守护线程)线程锁之Lock(互斥锁mutex)/RLock(递归锁)/Semaphore(信号量)EventQueue生产者消费者模型4、进程和线程的关系区别 1、Python GIL(Global Interpreter Lock)

全局解释器锁
python(在CPython执行环境下)同一时间只有一个线程在运行，不管你的机器是几核的cpu都一样

首先需要明确的一点是GIL并不是Python的特性，它是在实现Python解析器(CPython)时所引入的一个概念。就好比C++是一套语言（语法）标准，但是可以用不同的编译器来编译成可执行代码。有名的编译器例如GCC，INTEL C++，Visual C++等。Python也一样，同样一段代码可以通过CPython，PyPy，Psyco等不同的Python执行环境来执行。像其中的JPython就没有GIL。然而因为CPython是大部分环境下默认的Python执行环境。所以在很多人的概念里CPython就是Python，也就想当然的把GIL归结为Python语言的缺陷。所以这里要先明确一点：GIL并不是Python的特性，Python完全可以不依赖于GIL

2、进程(process)

进程是对各种资源管理的集合，比如说，qq，就是一个进程，qq以一个整体的形式暴露给操作系统管理，里面包含对各种资源的调用、内存的管理，网络接口的调用等

每一个程序(进程)的内存是独立的

进程的缺陷：

进程只能在一个时间干一件事，如果想同时干两件事或多件事，进程就无能为力了。进程在执行的过程中如果阻塞，例如等待输入，整个进程就会挂起，即使进程中有些工作不依赖于输入的数据，也将无法执行。

补充一个概念：
上下文切换
单核机器一次只能干一件事情，由于cpu运行速度快，每秒可以计算几亿次，速度太快，给我们造成了我们的机器很多任务在并发运行的错觉。

多进程multiprocessing

语法

# Author: 73
from multiprocessing import Process
import time, threading
def thread_run():
    print(threading.get_ident()) # 线程id
def run(name):
    time.sleep(2)
    print("hello", name)
    t = threading.Thread(target=thread_run, ) # 进程里面再启一个线程
    t.start()
for i in range(10):
    p = Process(target=run, args=("seth%s" % i, ))
    p.start()
#p.join()

通过进程id来看看进程与子进程之间的关系

# Author: 73
from multiprocessing import Process
import os
def info(title):
    print(title)
    print('module name:', __name__)
    print('parent process:', os.getppid())
    print('process id:', os.getpid())
    print("\n\n")
def f(name):
    info('\033[31;1mfunction f\033[0m')
    print('hello', name)
if __name__ == '__main__':
    info('\033[33;1mmain process line\033[0m')
    p = Process(target=f, args=('bob',))
    p.start()
    p.join()

进程间通讯-Queues/Pipes/Managers

不同进程间内存是不共享的，要想实现两个进程间的数据交换，可以用以下方法：
Queues
两个进程之间的数据传递

from multiprocessing import Process, Queue
def f(q):
    q.put([42, None, 'hello'])
if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print(q.get())    # prints "[42, None, 'hello']"
    p.join()

Pipes
两个进程之间的数据传递

# Author: 73
from multiprocessing import Process, Pipe
def f(conn):
    conn.send("hello father")
    print(conn.recv())
    conn.close()
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn, ))
p.start()
print(parent_conn.recv())
parent_conn.send("hello child")
p.join()

Managers
实现进程与进程之间的数据共享和操作

from multiprocessing import Process, Manager
def f(d, l):
    d[1] = '1'
    d['2'] = 2
    d[0.25] = None
    l.append(1)
    print(l)
if __name__ == '__main__':
    with Manager() as manager:
        d = manager.dict()
        l = manager.list(range(5))
        p_list = []
        for i in range(10):
            p = Process(target=f, args=(d, l))
            p.start()
            p_list.append(p)
        for res in p_list:
            res.join()
        print(d)
        print(l)

进程锁

保证在屏幕上输出信息的时候，不会乱（比如说在打印第一条信息，第一条信息还没打完，第二条信息就开始打印了）

from multiprocessing import Process, Lock
def f(l, i):
    l.acquire()
    try:
        print('hello world', i)
    finally:
        l.release()
if __name__ == '__main__': # 是否手动执行，通过模块导入就不会执行
    lock = Lock()
    for num in range(10):
        Process(target=f, args=(lock, num)).start()

进程池

作用：允许一次最多多少个进程一起运行
进程池内部维护一个进程序列，当使用时，则去进程池中获取一个进程，如果进程池序列中没有可供使用的进进程，那么程序就会等待，直到进程池中有可用进程为止。

进程池中有两个方法：

apply #同步执行/串行 apply_async # 异步执行/并行

# Author: 73
from multiprocessing import Process, Pool
import time, os
def Foo(i):
    time.sleep(2)
    print("hello ", os.getpid())
    return i + 100
def Bar(arg): # 主进程执行的这个回调函数
    print('-->exec done:', arg, os.getpid())
pool = Pool(5)
print("主进程: ", os.getpid())
for i in range(10):
    pool.apply_async(func=Foo, args=(i,), callback=Bar)
    #pool.apply(func=Foo, args=(i,)) # 同步执行
    #pool.apply_async(func=Foo, args=(i,))  # yi步执行
print('end')
pool.close()
pool.join()  # 进程池中进程执行完毕后再关闭，如果注释，那么程序直接关闭。

两个坑：

必须先close再join 异步执行时，必须join，否则程序直接关闭 3、线程(thread)

线程是一串指令的集合，是操作系统能够进行运算调度的最小单位。

多线程不适合cpu密集操作型任务，适合io操作密集型任务
*注：IO操作(如读取数据、socketserver)不占用CPU，计算占用CPU

语法

直接调用

import threading
import time
def run(num): #定义每个线程要运行的函数
    print("running on number:%s" % num)
    time.sleep(3)
if __name__ == '__main__':
    t1 = threading.Thread(target=run,args=(1,)) #生成一个线程实例
    t2 = threading.Thread(target=run,args=(2,)) #生成另一个线程实例
 	# 并发执行
    t1.start() #启动线程
    t2.start() #启动另一个线程
	#run(1)
	#run(2)
    print(t1.getName()) #获取线程名
    print(t2.getName())

继承式调用

import threading
import time
class MyThread(threading.Thread):
    def __init__(self,num):
        super(MyThread, self).__init__()
        #threading.Thread.__init__(self)
        self.num = num
    def run(self):#定义每个线程要运行的函数
        print("running on number:%s" %self.num)
        time.sleep(3)
if __name__ == '__main__':
    t1 = MyThread(1)
    t2 = MyThread(2)
    t1.start()
    t2.start()

join函数

主线程卡住，等待调用join的子线程执行完毕

import threading
import time
class MyThread(threading.Thread):
    def __init__(self, num, t):
        super(MyThread, self).__init__()
        # threading.Thread.__init__(self)
        self.num = num
        self.t = t
    def run(self):  # 定义每个线程要运行的函数
        print("running on number:%s" % self.num)
        time.sleep(self.t)
        print('task done...')
if __name__ == '__main__':
    start_time = time.time()
    t1 = MyThread(1,2)
    t2 = MyThread(2,3)
    t1.start()
    #t1.join() # wait, 加在这里，就把并行的变成了串行的
    t2.start()
    t1.join()
    t2.join() # 这句不加，打印的时间2秒多一点；加上，打印的时间3秒多一点
    stop_time = time.time()
    print(stop_time-start_time)

daemon(守护线程)

程序的主线程运行结束之后，会等待非守护线程运行结束再结束整个程序，当我们把一些线程设置为守护线程之后，如果主线程运行结束，那么守护线程就会同时结束，不管有没有运行完

import threading
import time
class MyThread(threading.Thread):
    def __init__(self, num, t):
        super(MyThread, self).__init__()
        # threading.Thread.__init__(self)
        self.num = num
        self.t = t
    def run(self):  # 定义每个线程要运行的函数
        print("running on number:%s" % self.num)
        time.sleep(self.t)
        print('task done...')
if __name__ == '__main__':
    start_time = time.time()
    t1 = MyThread(1,2)
    t2 = MyThread(2,3)
    t1.setDaemon(True) # t1设置为Daemon线程,它做为程序主线程的守护线程,当主线程退出时,t1线程也会退出,由t1启动的其它子线程会同时退出,不管是否执行完任务
    t2.setDaemon(True)
    t1.start()
    t2.start()
    stop_time = time.time()
    print(stop_time-start_time)

运行结果显示，打印窗口不再打印“task done…”

线程锁之Lock(互斥锁mutex)/RLock(递归锁)/Semaphore(信号量)

互斥锁
也叫线程锁
一个进程下可以启动多个线程，多个线程共享父进程的内存空间，也就意味着每个线程可以访问同一份数据，此时，如果2个线程同时要修改同一份数据，会出现什么状况？

# Author: 73
import threading, time
def run(n):
    global num
    num -= 1
    time.sleep(2)
num = 100
threading_list = []
for i in range(100):
    t = threading.Thread(target=run, args=(i,))
    t.start()
    threading_list.append(t)
for t in threading_list:
   t.join()
print(num)

正常来讲，这个num结果应该是0，但在python 2.7上多运行几次，会发现，最后打印出来的num结果不总是0，为什么每次运行的结果不一样呢？很简单，假设你有A,B两个线程，此时都要对num 进行减1操作，由于2个线程是并发同时运行的，所以2个线程很有可能同时拿走了num=100这个初始变量交给cpu去运算，当A线程去处完的结果是99，但此时B线程运算完的结果也是99，两个线程同时CPU运算的结果再赋值给num变量后，结果就都是99。那怎么办呢？很简单，每个线程在要修改公共数据时，为了避免自己在还没改完的时候别人也来修改此数据，可以给这个数据加一把锁，这样其它线程想修改此数据时就必须等待你修改完毕并把锁释放掉后才能再访问此数据。

*注：不要在3.x上运行，不知为什么，3.x上的结果总是正确的，可能是自动加了锁

# Author: 73
import threading, time
def addNum(n):
    global num
    mutex.acquire() # 修改前对数据加锁
    num -= 1
    mutex.release() # 修改后释放
    #time.sleep(2)
num = 100
mutex = threading.Lock()
threading_list = []
for i in range(100):
    t = threading.Thread(target=addNum, args=(i,))
    t.start()
    threading_list.append(t)
for t in threading_list:
   t.join()
print(num)

Python已经有一个GIL来保证同一时间只能有一个线程来执行了，为什么这里还需要lock? 注意啦，这里的lock是用户级的lock,跟那个GIL没关系，如下图
原创文章 55获赞 41访问量 1万+ 关注私信展开阅读全文
作者：73、