现代C++性能优化实战(下)
模板元编程、内存管理、并发优化详解
引言
本文是现代 C++ 性能优化系列的第二部分,涵盖更高级的优化技术。
1. 模板元编程
类型推导与 SFINAE
1 2 3 4 5 6 7 8 9 10
| template<typename T, typename = void> struct has_size : std::false_type {};
template<typename T> struct has_size<T, std::void_t<decltype(std::declval<T>().size())>> : std::true_type {};
static_assert(has_size<std::vector<int>>::value, "vector has size"); static_assert(has_size<int>::value == false, "int has no size");
|
标签分发 (Tag Dispatch)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| template<typename RandomIt> RandomIt sortImpl(RandomIt first, RandomIt last, std::random_access_iterator_tag) { return std::sort(first, last); }
template<typename BidirectionalIt> BidirectionalIt sortImpl(BidirectionalIt first, BidirectionalIt last, std::bidirectional_iterator_tag) { return std::stable_sort(first, last); }
template<typename Iterator> Iterator sort(Iterator first, Iterator last) { return sortImpl(first, last, typename std::iterator_traits<Iterator>::iterator_category{}); }
|
2. 内存管理优化
内存池
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
| template<typename T, size_t BlockSize = 4096> class PoolAllocator { struct Block { Block* next; char data[BlockSize - sizeof(Block*)]; }; Block* head_ = nullptr; public: T* allocate() { if (!head_) { Block* newBlock = new Block(); } T* ptr = reinterpret_cast<T*>(head_->data); head_ = head_->next; return ptr; } void deallocate(T* ptr) { Block* block = reinterpret_cast<Block*>(ptr); block->next = head_; head_ = block; } };
|
缓存对齐
1 2 3 4 5 6 7 8 9
| struct alignas(64) CacheLineData { int counter; };
struct Data { int value; } __attribute__((aligned(64)));
|
3. 并发优化
无锁编程
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
| std::atomic<int> counter(0);
counter.fetch_add(1); counter.store(42); int old = counter.load();
bool compareExchange(std::atomic<int>& atomic, int expected, int desired) { return atomic.compare_exchange_strong(expected, desired); }
template<typename T> class LockFreeStack { struct Node { T data; Node* next; }; std::atomic<Node*> head_; public: void push(T data) { Node* newNode = new Node{data, head_.load()}; while (!head_.compare_exchange_weak(newNode->next, newNode)) { } } };
|
避免伪共享
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| struct BadStruct { std::atomic<int> a; std::atomic<int> b; std::atomic<int> c; std::atomic<int> d; };
struct GoodStruct { alignas(64) std::atomic<int> a; alignas(64) std::atomic<int> b; alignas(64) std::atomic<int> c; alignas(64) std::atomic<int> d; };
|
4. IO 优化
缓冲IO
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| std::ofstream file("data.txt"); for (int i = 0; i < 10000; ++i) { file << i << "\n"; }
std::ofstream file("data.txt", std::ios::out | std::ios::binary); file.rdbuf()->pubsetbuf(buffer, 100000); for (int i = 0; i < 10000; ++i) { file << i << "\n"; } file.close();
#include <sys/mman.h> void* mapped = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
|
5. 编译优化选项
GCC/Clang
1 2 3 4 5 6 7 8 9 10 11 12 13
| g++ -O2 file.cpp
g++ -O3 file.cpp
g++ -O3 -flto file.cpp
g++ -O3 -fprofile-generate main.cpp ./a.out g++ -O3 -fprofile-use main.cpp
|
6. 性能调优案例
案例:字符串处理优化
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| std::vector<std::string> split(const std::string& s, char delim) { std::vector<std::string> result; std::stringstream ss(s); std::string item; while (std::getline(ss, item, delim)) { result.push_back(item); } return result; }
#include <string_view> std::vector<std::string_view> split(std::string_view s, char delim) { std::vector<std::string_view> result; size_t start = 0; while (true) { auto pos = s.find(delim, start); if (pos == std::string_view::npos) { result.push_back(s.substr(start)); break; } result.push_back(s.substr(start, pos - start)); start = pos + 1; } return result; }
|
总结
| 优化技术 |
性能提升 |
复杂度 |
| 模板元编程 |
中 |
高 |
| 内存池 |
2-10x |
中 |
| 无锁编程 |
高 |
高 |
| 缓冲 IO |
10-100x |
低 |
| LTO |
10-20% |
低 |
| PGO |
10-30% |
中 |
本文档基于 drlongnecker 现代C++性能优化系列扩展而成