Discussion:
std::memset, std::fill, hand-written for loop
(too old to reply)
Hicham Mouline
2010-05-31 15:10:27 UTC
Permalink
Hello,

Are there any articles comparing the cases/reasons when/why to use memset vs
fill, vs writing a loop by hand?

Regards,
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Razvan Cojocaru
2010-06-01 17:45:52 UTC
Permalink
Post by Hicham Mouline
Are there any articles comparing the cases/reasons when/why to use memset vs
fill, vs writing a loop by hand?
The STL chapters in Bjarne Stroustrup's "The C++ Programming Language"
book will tell you why.

In short:

1. memset() is _very_ low-level, and you should prefer fill() wherever
it makes sense to think of the data being filled as a C++ container. I'm
assuming you want to fill a C-style array with some value, because
obviously you wouldn't be able to memset() a vector<T> or a list<T>.
Fill() makes your code a bit more portable. That is, if you later decide
to use a different container, you can simply change the type and the
rest of the code will continue to work.

2. using the STL algorithms instead of writing loops by hand insures at
least two good things: A. the code looks nicer and it's easier to
understand (because you're naming the function of the loop instead of
having to read the implementation and figure out what it does), and B.
it's _faster_. Some algorithms can take advantage of the exact
containter type they're working on, and have better complexity than your
"manual" loops would have had.


Hope this helps,
--
Razvan Cojocaru
KeyID: 0x04CA34DE


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Lailoken
2010-06-01 17:47:28 UTC
Permalink
Post by Hicham Mouline
Hello,
Are there any articles comparing the cases/reasons when/why to use memset vs
fill, vs writing a loop by hand?
memset is usually the fastest way (make sure to multiply with
sizeof(type))
fill is the most portable way to initialize arrays of any type (and
would work well with templates, etc).

writing it by hand should be avoided... you cannot do it faster than
memset in most cases (anecdotal)

(It is fun however to learn how to use Duff's device, and then later
learn not to use it.)
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
g***@hotmail.com
2010-06-01 17:52:51 UTC
Permalink
Post by Hicham Mouline
Hello,
Are there any articles comparing the cases/reasons when/why to use memset vs
fill, vs writing a loop by hand?
One reason could be performance. On VStudio2003 memset is on average
50% faster than std::fill. Ofc memset works only with pod and built in
types.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
h***@gmail.com
2010-06-01 17:50:33 UTC
Permalink
Post by Hicham Mouline
Hello,
Are there any articles comparing the cases/reasons when/why to use memset vs
fill, vs writing a loop by hand?
If you want an article, there's always google:
http://stackoverflow.com/questions/1373369/what-is-faster-prefered-memset-or-for-loop-to-zero-out-an-array-of-doubles

It was the second link google gave, the first being your post! Note
the comments on the first answer which state that std::fill and
std::copy are sometimes specialized for POD containers. Theoretically,
i suppose it should make a difference between std::fill and memset.
But...

#include <algorithm>
#include <cstring>
using namespace std;

// Note:
// usingbe = using [begin,end)
// usingbs = using [begin,begin+size)

void usingbe_memset( int* begin, int* end, int value )
{
memset( begin, value, (end-begin)*sizeof(int) );
}
void usingbs_memset( int* begin, size_t size, int value )
{
memset( begin, value, size );
}

void usingbe_fill( int* begin, int* end, int value )
{
std::fill( begin, end, value );
}
void usingbs_fill( int* begin, size_t size, int value )
{
std::fill_n( begin, size, value );
}

void usingbe_loop( int* begin, int* end, int value )
{
while( begin != end )
*(begin++) = value;
}
void usingbs_loop( int* begin, size_t size, int value )
{
while( size-- )
*(begin++) = value;
}

Compiled with g++ version 4.4.1, command line options "-c -O3 -S". ("-
c" removes the need for main().) The assembly, for brevity, i've
posted here: http://pastebin.com/PVUUGGqH

For those who can't read assembly, the code for usingb[se]_memset
neither actually calls memset, nor loops, but instead uses one
instruction, "rep stosb", which i highlighted, that does the job.
However, every other function uses pretty much the exact same loop.
MSVC produced similar results.

So, theoretically, there is no difference and your compiler will
optimize everything; in reality, that is not always true. Since this
seems to go against the general advice, i wonder if either my test is
poorly designed or my compilers just don't do this optimization.
Perhaps there's another compiler option i need enabled.

Other than that, std::fill is more generic than memset since it works
with any type. std::fill is more clear than a for loop because when
you write
for(int i=0; i < LEN; i++ )
a[i] = X;
the meaning of the loop is implied, but not as explicit as
std::fill( a, a+LEN, X );
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Thomas Richter
2010-06-01 17:52:21 UTC
Permalink
Post by Hicham Mouline
Hello,
Are there any articles comparing the cases/reasons when/why to use memset vs
fill, vs writing a loop by hand?
std::memset and std::fill are two very different beasts. std::memset
only fills memory with byte patterns, regardless of what this pattern
means, and ignoring any type of assignment operator the class defined.
If you're lucky, this might do the right thing; often, you're not that
lucky.

For example, std::memset *typically* works fine for setting int or
character arrays to zero, *maybe* double or float values if your machine
is IEEE based, but that is non-portable. If you use it to initialize an
array of PODs, you might get away. If the array is of non-PODs, the
result is very likely *not* what you want.

Conclusion: Unless you don't care about portability, and you're very
very sure that you know what you're doing, you might try memset.
Otherwise, hands off.

std::fill is type-aware and does always the right thing. If the compiler
is smart, it should be as fast (or almost as fast) as std::memset(), the
latter often using a special compiler built-in to fill the memory.
However, memset as an optimization is rarely ever worth it (or rather,
if your algorithm has to reset large arrays of data, it might be worth
trying to reconsider the algorithm should this really be the bottleneck
of your program).

A manual loop does of course also do the right thing, but is less
generic (i.e. container dependent). Whether that makes it better or
worse is in the eye of the beholder. For a short C-style array, I
personally prefer the manual loop since I consider it more readable, but
others might disagree.

So long,
Thomas
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
g***@hotmail.com
2010-06-03 13:31:45 UTC
Permalink
Post by Thomas Richter
if your algorithm has to reset large arrays of data, it might be worth
trying to reconsider the algorithm should this really be the bottleneck
of your program).
I partially agree. Clearing (byte) buffers with memset isn't less
readable and might even be performance intensive (e.g. using images or
bitmaps which have often considerable large pixel buffers).

Also have a look at some memset implementations. VStudio uses even
SSE2 (if present) for memset to get the last percent performance
improvement. Even without this SSE2 stuff, it seems a non trivial
piece of assembly code, which might be hard to reproduce by an
ordinary compiler.

Personal I use memset only in scenarios with buffers of build in
types, like the example above which imho isn't less readable or a c-
style code.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
George Neuner
2010-06-04 05:29:57 UTC
Permalink
Post by g***@hotmail.com
I partially agree. Clearing (byte) buffers with memset isn't less
readable and might even be performance intensive (e.g. using images or
bitmaps which have often considerable large pixel buffers).
True, but if the buffer has to be cleared (or in general, uniformly
filled), what choices do you have? std::fill won't be any faster and
may be slower.

There are VMM systems that can zero-fill allocated pages on first
access - which amortizes the fill and may improve performance for a
sparse buffer. But VMM functions are non-portable and the page
granularity of the fill can be wasteful if the access pattern consists
of lots of small sub-arrays.
Post by g***@hotmail.com
Also have a look at some memset implementations. VStudio uses even
SSE2 (if present) for memset to get the last percent performance
improvement. Even without this SSE2 stuff, it seems a non trivial
piece of assembly code, which might be hard to reproduce by an
ordinary compiler.
memset is just template code ... there may be a number of versions if
the user can select ALU vs SIMD and/or the CPU has special zero-fill
capabilities.

In any case, the algorithm is simple:
- byte fill until the address is aligned for long fill
- calculate the # of iterations of long fill
- construct long fill pattern
- do long fills
- byte fill any remaining locations

but it can result in a healthy chunk of code depending on the CPU's
capabilities. Using SIMD registers vs ALU registers really only
changes the alignment and iteration calculation. And even if there is
a special zero-fill version, if the call site uses a variable for the
fill pattern, the compiler has to use the general version unless it
can prove the value of the variable is zero (which sometimes is simple
but often is not - and some compilers don't bother trying).

George
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Mathias Gaunard
2010-06-04 14:34:21 UTC
Permalink
Post by George Neuner
Post by g***@hotmail.com
I partially agree. Clearing (byte) buffers with memset isn't less
readable and might even be performance intensive (e.g. using images or
bitmaps which have often considerable large pixel buffers).
True, but if the buffer has to be cleared (or in general, uniformly
filled), what choices do you have? std::fill won't be any faster and
may be slower.
std::fill falls back to std::memset when the memory is contiguous...
It is called overloading.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
h***@gmail.com
2010-06-04 21:34:18 UTC
Permalink
Post by Mathias Gaunard
std::fill falls back to std::memset when the memory is contiguous...
It is called overloading.
Theoretically, you're right, but i tried that and found it untrue.
Here's an abridged repost:

I'm told that std::fill is specialized for POD types, but the
fallowing code...

void usingbe_memset( int* begin, int* end, int value )
{
memset( begin, value, (end-begin)*sizeof(int) );
}

void usingbe_fill( int* begin, int* end, int value )
{
std::fill( begin, end, value );
}

Compiled with g++ version 4.4.1, command line options "-c -O3 -S". The
assembly, for brevity, i've posted here: http://pastebin.com/PVUUGGqH

For those who can't read assembly, basically, the std::fill code is a
straight loop, the memset code is done without looping, mostly in one
assembly line (highlighted). Theoretically, there should be no
difference, but it reality, there is.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Mathias Gaunard
2010-06-05 20:48:22 UTC
Permalink
Post by h***@gmail.com
I'm told that std::fill is specialized for POD types, but the
fallowing code...
[...]
For those who can't read assembly, basically, the std::fill code is a
straight loop, the memset code is done without looping, mostly in one
assembly line (highlighted). Theoretically, there should be no
difference, but it reality, there is.
Looks like a quality of implementation issue.
I suspect the result be different on MSVC, which has native support to
tell whether a type is a POD or has a trivial assignment.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Andrew
2010-06-08 09:55:15 UTC
Permalink
Post by Mathias Gaunard
Post by h***@gmail.com
I'm told that std::fill is specialized for POD types, but the
fallowing code...
[...]
For those who can't read assembly, basically, the std::fill code is a
straight loop, the memset code is done without looping, mostly in one
assembly line (highlighted). Theoretically, there should be no
difference, but it reality, there is.
Looks like a quality of implementation issue.
I suspect the result be different on MSVC, which has native support to
tell whether a type is a POD or has a trivial assignment.
Indeed. I have been disappointed with the lack of such performance
optimisations in VS 2005. You would think it would specialise common
cases like char arrays but unfortunately not. Something to be
especially wary of is any code that is using iterators to go over a
container with a large number of items. In debug mode it will use
checked iterators which is extremely slow. It was quite a surpise to
me because the code I wrote at the time started off using GCC then as
soon as I switched to VS 2005 in debug it ground to a halt.

Regards,

Andrew Marlow
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
George Neuner
2010-06-06 15:12:28 UTC
Permalink
On Sat, 05 Jun 2010 22:08:43 -0400, George Neuner
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
movb 16(%ebp), %al
movl %edx, %edi
rep stosb
sets up the byte pattern in AL, the destination address in EDX and the
while (--ECX >= 0)
*EDX++ = AL;
Whoops, "EDX" should have been "EDI" in the above description.

George
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Chris Morley
2010-06-11 01:11:40 UTC
Permalink
Post by George Neuner
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
movb 16(%ebp), %al
movl %edx, %edi
rep stosb
sets up the byte pattern in AL, the destination address in EDX and the
while (--ECX >= 0)
*EDX++ = AL;
Whoops, "EDX" should have been "EDI" in the above description.
Going back to the OP's question about can you do better with a hand loop
then typically yes. There are optimisations which can be made in C/C++
source (or assembler) which involve better use of bus width & cache. Some
are general, others processor/platform specific & involved.

e.g. on the 386DX(!) it was significantly quicker to movsd/stosd vs
movsb/stosb as you push 32 bits per access. (still is now!)
e.g. unrolling movsd vs rep movsd
e.g. Pentium it was worth pushing doubles around (regardless of actual data
type) to move 64bits (extend for MMX, 3dNow, then SSE(n) etc.)
e.g. Cache block prefetching

This is worth a read, while from 2002 & AMD specific it still has relevance
(e.g. page 174+):
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/22007.pdf

Also mentioned in post:
http://groups.google.com/group/comp.lang.c++.moderated/browse_thread/thread/79dfd15c698a7187/bb52552bab788aba?lnk=gst&q=22007#bb52552bab788aba

There are examples memcopy... the p75 version bandwidth ~1630 Mbytes/sec vs.
the p67 rep mobsb ~570 Mbytes/sec. A memset with movntq for example for
blocks >512 bytes. You will however need intrinsics/assembler to do this
which stops being "c++" quite quickly!!

Before people invoke 'portable' if you are targeting a specific platform you
can optimise for that platform and still default to something else for other
builds...

You _can_ beat the memxxx libraries & std::x if you want/need but probably
not worth the time/effort unless you are actually bandwidth limited. You
would also sacrifice the generality & safety of the std:: funcs which other
posters point out.

Regards,
Chris
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
George Neuner
2010-06-06 15:12:15 UTC
Permalink
Post by h***@gmail.com
I'm told that std::fill is specialized for POD types, but the
following code...
void usingbe_memset( int* begin, int* end, int value )
{
memset( begin, value, (end-begin)*sizeof(int) );
}
void usingbe_fill( int* begin, int* end, int value )
{
std::fill( begin, end, value );
}
Compiled with g++ version 4.4.1, command line options "-c -O3 -S". The
assembly, for brevity, i've posted here: http://pastebin.com/PVUUGGqH
For those who can't read assembly, basically, the std::fill code is a
straight loop, the memset code is done without looping, mostly in one
assembly line (highlighted). Theoretically, there should be no
difference, but it reality, there is.
memset *is* looping ... the loop is simply in microcode (or in
whatever now passes for microcode) instead of being explicit in the
instruction stream.

The sequence:

movl 8(%ebp), %edx
movl 12(%ebp), %ecx
movb 16(%ebp), %al
movl %edx, %edi
rep stosb

sets up the byte pattern in AL, the destination address in EDX and the
count in ECX. "rep stosb" triggers the loop which implements:

while (--ECX >= 0)
*EDX++ = AL;

However, for a large buffer, I think this ought to be sub-optimal on
modern x86 processors - particularly on HT processors. Since the
memory bus is 64-bits, I would think it would be better to use SSE2 or
maybe even the FPU on 32-bit chips, and quadword (stosq) on 64-bit
chips, so that the write combine buffer is left available for other
stores.

Guess I'll have to try it.

George
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Martin Vejnár
2010-06-08 02:12:57 UTC
Permalink
Post by h***@gmail.com
Post by Mathias Gaunard
std::fill falls back to std::memset when the memory is contiguous...
It is called overloading.
Theoretically, you're right, but i tried that and found it untrue.
void usingbe_memset( int* begin, int* end, int value )
{
memset( begin, value, (end-begin)*sizeof(int) );
}
void usingbe_fill( int* begin, int* end, int value )
{
std::fill( begin, end, value );
}
The two functions have different semantics; if you want to compare them, you need to change the type of the range from int to char. I just tested the following code on msvc10:

char a[42];
std::fill(a, a + 42, 0);

and it resulted in a call to memset as expected.
--
Martin

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
g***@hotmail.com
2010-06-09 13:26:40 UTC
Permalink
Post by Martin Vejnár
The two functions have different semantics; if you want to compare them, you need to change the type of the range from int to
char a[42];
std::fill(a, a + 42, 0);
and it resulted in a call to memset as expected.
Yes in vstudio 2003 there are overloads for std::fill with (unsigned)
char arguments (e.g. fill(char *_First, char *_Last, int _Val), see
xutility header file). For the overlaods, the impl. falls back to
memset.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Jens Schmidt
2010-06-04 14:34:14 UTC
Permalink
George Neuner wrote:

[zero fill]
Post by George Neuner
- byte fill until the address is aligned for long fill
- calculate the # of iterations of long fill
- construct long fill pattern
- do long fills
- byte fill any remaining locations
but it can result in a healthy chunk of code depending on the CPU's
capabilities.
There are architectures where the code can be reduced to
- byte fill all locations
without any performance penalty. This happens when a) the CPU is
executing instructions much faster than the memory system can write
and b) the memory system uses write combining and special handling
for sequential access.
--
Viele Grüße,
Jens Schmidt


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Mathias Gaunard
2010-06-01 18:06:46 UTC
Permalink
Post by Hicham Mouline
Hello,
Are there any articles comparing the cases/reasons when/why to use memset vs
fill
std::fill works for any pair of iterators, memset only works for
contiguous memory.
Post by Hicham Mouline
vs writing a loop by hand?
Using std::fill or memset makes the code more explicit, and also
potentially more efficient.
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Alf P. Steinbach
2010-06-01 20:34:19 UTC
Permalink
Post by Hicham Mouline
Are there any articles comparing the cases/reasons when/why to use memset vs
fill, vs writing a loop by hand?
Others have enumerated reasons why std::fill is generally superior to memset.

However, the most common use of memset seems to be a use case where neither
std::fill nor memset are appropriate, namely like

SomePodStruct foo;
memset( appropriate args here );

which is brittle, verbose and unnecessary.

To zero that struct after initialization, just do

foo = SomePodStruct();

Not that I recommend such "reuse" of a variable (implied by the zeroing), but I
think it's far more clear and much less brittle than a memset if it's needed.

A proper way to zero that struct at initialization is

SomePodStruct foo = {};

I guess the programmers who choose memset for this do that because they're used
to it in C.


Cheers & hth.,

- Alf
--
blog at <url: http://alfps.wordpress.com>

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Andrei Alexandrescu
2010-06-01 23:16:00 UTC
Permalink
Post by Hicham Mouline
Hello,
Are there any articles comparing the cases/reasons when/why to use memset vs
fill, vs writing a loop by hand?
I wrote an article on exactly that topic a while ago:

http://www.drdobbs.com/web-development/184403799

with a rather scary conclusion, which I quote:

====
There is a very deep, and sad, realization underlying all this. We are
in 2001, the year of the Spatial Odyssey. We've done electronic
computing for more than 50 years now, and we strive to design more and
more complex systems, with unsatisfactory results. Software development
is messy. Could it be because the fundamental tools and means we use are
low-level, inefficient, and not standardized? Just step out of the box
and look at us — after 50 years, we're still not terribly good at
filling and copying memory.
====

I haven't measured in a while; I hope, but I doubt, that we're in a better shape 9 years later.


Andrei
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
Loading...