Skip to content

Commit cdffe12

Browse files
authored
Merge branch 'master' into feedburner
2 parents 9d553a7 + b9d026a commit cdffe12

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+1180
-160
lines changed

_posts/2018-02-15-totw-88.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ variable?", both to follow in your own code and to cite in your code reviews:
104104
```cpp
105105
// Bad code
106106

107-
// Could invoke an intializer list constructor, or a two-argument constructor.
107+
// Could invoke an initializer list constructor, or a two-argument constructor.
108108
Frobber frobber{size, &bazzer_to_duplicate};
109109

110110
// Makes a vector of two doubles.

_posts/2018-09-28-totw-152.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ TEST(MyType, SupportsAbslHash) {
8181
}
8282
</pre>
8383

84-
`absl::VerifyTypeImplementsAbslHashCorrectly` also supports testing heterogenous
84+
`absl::VerifyTypeImplementsAbslHashCorrectly` also supports testing heterogeneous
8585
lookup and custom equality operators.
8686

8787
Intrigued and want to know more? Read up at

_posts/2023-02-15-totw-198.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,7 @@ readable code. Take the following example:
132132
// with std::in_place.
133133
std::optional&lt;Foo&gt; with_tag(std::in_place, 5, 10);
134134

135-
// Here the intent is clearer: make an optional Foo by providing these argments.
135+
// Here the intent is clearer: make an optional Foo by providing these arguments.
136136
std::optional&lt;Foo&gt; with_factory = std::make_optional&lt;Foo&gt;(5, 10);
137137
</pre>
138138

_posts/2023-03-02-fast-21.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #21 on January 16, 2020
1212

1313
*By [Paul Wankadia](mailto:[email protected]) and [Darryl Gove](mailto:[email protected])*
1414

15-
Updated 2023-03-02
15+
Updated 2024-10-21
1616

1717
Quicklink: [abseil.io/fast/21](https://abseil.io/fast/21)
1818

_posts/2023-03-02-fast-39.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #39 on January 22, 2021
1212

1313
*By [Chris Kennelly](mailto:[email protected]) and [Alkis Evlogimenos](mailto:[email protected])*
1414

15-
Updated 2023-10-10
15+
Updated 2025-03-24
1616

1717
Quicklink: [abseil.io/fast/39](https://abseil.io/fast/39)
1818

@@ -146,14 +146,14 @@ would "reduce" the
146146
[data center tax](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44271.pdf),
147147
but we would actually hurt [application productivity](/fast/7)-per-CPU. Time we
148148
spend in malloc is
149-
[less important than application performance](https://research.google/pubs/pub50370.pdf).
149+
[less important than application performance](https://storage.googleapis.com/gweb-research2023-media/pubtools/6170.pdf).
150150

151151
Trace-driven simulations with hardware-validated architectural simulators showed
152152
the prefetched data was frequently used. Additionally, it is better to stall on
153153
a TLB miss at the prefetch site--which has no dependencies, than to stall at the
154154
point of use.
155155

156-
## Pitfalls
156+
## Pitfalls {#pitfalls}
157157

158158
There are a number of things that commonly go wrong when writing benchmarks. The
159159
following is a non-exhaustive list:
@@ -175,15 +175,23 @@ following is a non-exhaustive list:
175175
[Stabilizer (by Berger, et. al.)](https://people.cs.umass.edu/~emery/pubs/stabilizer-asplos13.pdf)
176176
deliberately perturb these parameters to improve benchmarking statistical
177177
quality.
178+
* Sensitivity to stack alignment. Changes anywhere in the stack--added/removed
179+
variables, better (or worse) spilling due to compiler optimizations,
180+
etc.--can affect the alignment at the start of the function-under-test. This
181+
has been seen to produce 20% performance swings.
178182
* Representative data. The data in the benchmark needs to be "similar" to the
179183
data in production - for example, imagine having short strings in the
180184
benchmark, and long strings in the fleet. This also extends to the code
181185
paths in the benchmarks being similar to the code paths that the application
182-
exercises.
186+
exercises. This is a common pain point for macrobenchmarks too. A loadtest
187+
may cover certain request types, rather than all of those seen by production
188+
servers.
189+
183190
* Benchmarking the right code. It's very easy to introduce code into the
184191
benchmark that's not present in the real workload. For example, using a
185192
random number generator's cost for a benchmark could exceed the cost of the
186193
work being benchmarked.
194+
187195
* Being aware of steady state vs dynamic behaviour. For more complex
188196
benchmarks it's easy to produce something that converges to a steady state -
189197
for example if it has a constant arrival rate and service time. Production

_posts/2023-03-02-fast-53.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #53 on October 14, 2021
1212

1313
*By [Mircea Trofin](mailto:[email protected])*
1414

15-
Updated 2023-09-04
15+
Updated 2024-11-19
1616

1717
Quicklink: [abseil.io/fast/53](https://abseil.io/fast/53)
1818

_posts/2023-03-02-fast-9.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #9 on June 24, 2019
1212

1313
*By [Chris Kennelly](mailto:[email protected])*
1414

15-
Updated 2023-10-10
15+
Updated 2025-03-27
1616

1717
Quicklink: [abseil.io/fast/9](https://abseil.io/fast/9)
1818

@@ -64,7 +64,7 @@ Prior to cleanups, the implementations weren't the same.
6464
working around a
6565
[false dependency bug](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011)
6666
in some processors.
67-
* When the compiler builtin is used (the "slow" version), we actually end up
67+
* When the compiler built-in is used (the "slow" version), we actually end up
6868
with a better sequence of machine code and can perform stronger
6969
optimizations at compile-time around constant folding.
7070

_posts/2023-09-14-fast-7.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #7 on June 6, 2019
1212

1313
*By [Chris Kennelly](mailto:[email protected])*
1414

15-
Updated 2023-10-31
15+
Updated 2025-03-25
1616

1717
Quicklink: [abseil.io/fast/7](https://abseil.io/fast/7)
1818

@@ -75,7 +75,7 @@ optimizations* that are not well-represented by existing metrics:
7575
produce outsized benefits for the entire application.
7676

7777
* **New instruction sets**: With successive hardware generations, vendors have
78-
added new instrutions to their ISAs.
78+
added new instructions to their ISAs.
7979

8080
In future hardware generations, we expect to replace calls to memcpy with
8181
microcode-optimized `rep movsb` instructions that are faster than any

_posts/2023-09-30-fast-52.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #52 on September 30, 2021
1212

1313
*By [Chris Kennelly](mailto:[email protected])*
1414

15-
Updated 2023-09-30
15+
Updated 2025-03-24
1616

1717
Quicklink: [abseil.io/fast/52](https://abseil.io/fast/52)
1818

@@ -130,13 +130,14 @@ test, and successfully land new features in production. Beyond just optimizing
130130
Extra complexity that delays an improvement to product experiences is a
131131
non-obvious externality.
132132

133-
For example, TCMalloc has a number of tuning options and customization points,
134-
but ultimately, several optimizations came from sanding away extra configuration
135-
complexity. The rarely used malloc hooks API required careful structuring of
136-
TCMalloc's fast path to allow users who didn't use hooks--most users--to not pay
137-
for their possible presence. In another case, removing the `sbrk` allocator
138-
allowed TCMalloc to structure its virtual address space carefully, enabling
139-
several enhancements.
133+
For example, TCMalloc has a number of
134+
[tuning options](https://github.com/google/tcmalloc/blob/master/docs/tuning.md)
135+
and customization points, but ultimately, several optimizations came from
136+
sanding away extra configuration complexity. The rarely used malloc hooks API
137+
required careful structuring of TCMalloc's fast path to allow users who didn't
138+
use hooks--most users--to not pay for their possible presence. In another case,
139+
removing the `sbrk` allocator allowed TCMalloc to structure its virtual address
140+
space carefully, enabling several enhancements.
140141

141142
## Beyond knobs
142143

@@ -147,7 +148,7 @@ An existing library, *X*, might be inadequate or insufficiently expressive,
147148
which can motivate building a "better" alternative, *Y*, along some dimensions.
148149
Realizing the benefit of using *Y* is dependent on users both discovering *Y*
149150
and picking between *X* and *Y* *correctly*--and in the case of a long-lived
150-
code base, keeping that choice optimal over time.
151+
codebase, keeping that choice optimal over time.
151152

152153
For some uses, this strategy is infeasible. `my::super_fast_string` will
153154
probably never replace `std::string` because the latter is so entrenched and the

_posts/2023-10-10-fast-64.md

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Originally posted as Fast TotW #64 on October 21, 2022
1212

1313
*By [Chris Kennelly](mailto:[email protected])*
1414

15-
Updated 2023-10-10
15+
Updated 2025-03-24
1616

1717
Quicklink: [abseil.io/fast/64](https://abseil.io/fast/64)
1818

@@ -42,10 +42,10 @@ preconditions to unlock the best possible performance. We need to document and
4242
test these sharp edges. Future debugging has an opportunity cost: When we spend
4343
time tracking down and fixing bugs, we are not developing new optimizations. We
4444
can use assertions for preconditions, especially in debug/sanitizer builds, to
45-
double-check contracts and *enforce* them. Testing robots never sleep, while
46-
humans are fallible. Randomized implementation behaviors provide a useful
47-
bulwark against Hyrum's Law from creeping in to implicitly expand the contract
48-
of an interface.
45+
double-check contracts and *enforce* them. Testing
46+
[robots never sleep](/fast/93), while humans are fallible. Randomized
47+
implementation behaviors provide a useful bulwark against Hyrum's Law from
48+
creeping in to implicitly expand the contract of an interface.
4949

5050
## Express intents
5151

@@ -124,11 +124,11 @@ memory.
124124

125125
## Avoid unnecessarily strong guarantees
126126

127-
There are situations where the benefits of duplicate APIs outweight the costs.
127+
There are situations where the benefits of duplicate APIs outweigh the costs.
128128

129129
The Abseil hash containers
130130
([SwissMap](https://abseil.io/about/design/swisstables)) added new hashtable
131-
implementations to the code base, which at first glance, appear redundant with
131+
implementations to the codebase, which at first glance, appear redundant with
132132
the ones in the C++ standard library. This apparent duplication allowed us to
133133
have a more efficient set of containers which match the standard library API,
134134
but adhere to a weaker set of constraints.
@@ -140,6 +140,11 @@ to a node-based implementation that requires data indirections and constrains
140140
performance. Given `std::unordered_map`'s widespread usage, it was not feasible
141141
to relax these guarantees all at once.
142142

143+
Node-based containers necessitate implementation overheads, but they come with a
144+
direct benefit: They actively facilitate migration while allowing weaker
145+
containers to be available. Making a guarantee stronger without an accompanying
146+
benefit is undesirable.
147+
143148
The migration was a replacement path for the legacy containers, not an
144149
alternative. The superior performance characteristics meant that users could
145150
"just use SwissMap" without tedious benchmarking on a case-by-case basis.
@@ -218,7 +223,7 @@ constrain future implementations by creating sharp performance edges.
218223
higher-level intent--transferring pointer ownership, "lending" a submessage
219224
to another one, etc.
220225

221-
### Concluding remarks
226+
## Concluding remarks
222227

223228
Good performance should be available by default, not an optional feature. While
224229
[feature flags and knobs can be useful for testing and initial rollout](/fast/52),

0 commit comments

Comments
 (0)