python 多线程坑

关键词

全局解释器锁（Global Interpreter Lock，GIL）限制

在Python中，一个著名的多线程“坑”就是GIL的存在。因为GIL的存在，即使在多核处理器上，Python的多线程也无法实现真正的并行执行。执行线程时，不管有多少线程，只有一个线程在运行，其它线程都在等待GIL的释放。因此，多线程的效率受到极大的限制。

import threading
import time

def foo():
    for _ in range(1000):
        print("Hello from foo")

def bar():
    for _ in range(1000):
        print("Hello from bar")

thread1 = threading.Thread(target=foo)
thread2 = threading.Thread(target=bar)

start_time = time.time()
thread1.start()
thread2.start()
thread1.join()
thread2.join()
end_time = time.time()
print(f"Time taken with GIL: {end_time - start_time}")

线程安全问题

在访问某些共享数据时，线程可能会发生冲突，导致数据不一致。为避免这一风险，需要使用锁定。（Lock）或者其他同步机制可以保证线程安全。但是使用锁可能会导致新的问题，比如死锁。因此，在设计多线程程序时，管理数据访问是一项技术性的工作。

import threading

class Counter:
    def __init__(self):
        self.value = 0
        self.lock = threading.Lock()

    def increment(self):
        with self.lock:
            self.value += 1
            print(self.value)

def worker(counter):
    for _ in range(1000):
        counter.increment()

counter = Counter()
threads = [threading.Thread(target=worker, args=(counter,)) for _ in range(5)]

for thread in threads:
    thread.start()
for thread in threads:
    thread.join()

线程费用问题

虽然线程是一个轻量级的过程，但与过程相比，创建和管理线程的成本并不低。如果创建大量线程没有限制，Python程序的效率会严重降低。因此，在设计程序时，需要考虑合理的线程数量，或者使用线程池来管理线程。

from concurrent.futures import ThreadPoolExecutor

def task(name):
    print(f"Task {name} is running")

with ThreadPoolExecutor(max_workers=5) as executor:
    tasks = [executor.submit(task, f'task_{i}') for i in range(10)]

# 创建和销毁ThreadPoolExecutor将自动管理线程

任务分配不当

多线程适用于IO密集型任务，因为其他线程可以在线程等待IO操作时继续执行，从而提高效率。但是，如果在CPU密集型任务中使用多线程，效果就不好了。由于CPU资源有限，同时只会有一个线程真正运行，其他的都在等CPU。此时使用多线程可能不如单线程有效。

import threading
import time

def compute_heavy_task():
    # 假定该函数执行一些复杂的计算。
    time.sleep(1)
    return "Task completed"

def execute_tasks():
    threads = []
    for i in range(4):
        thread = threading.Thread(target=compute_heavy_task)
        thread.start()
        threads.append(thread)

    for thread in threads:
        thread.join()

start_time = time.time()
execute_tasks()
end_time = time.time()

print(f"Time taken in a pseudo-CPU-heavy scenario with threading: {end_time - start_time}")

面对各种各样的坑，我们不仅要认识到它们的存在，还要知道如何选择合适的工具。在Python中，当多线程不够或不适用时，我们可以考虑多进程或异步编程(asyncio等)。).

使用多线程确实可以提高性能，但它就像一把双刃剑。如果用得好，可以分担我们的后顾之忧。如果用得不好，可能会带来更多的问题。就像Don一样。 Knuth说：“早期优化是万恶之源”，在真正了解了多线程的优缺点之后，我们才能更加合理地利用它。

本文链接：http://task.lmcjl.com/news/124.html

展开阅读全文

上一篇：群晖WordPress和NAS管理员默认密码设置指南下一篇：WordPress搬家方法和步骤整理指南

热门文章排行

推荐文章

关键词