LinuxSir.cn,穿越时空的Linuxsir!

 找回密码
 注册
搜索
热搜: shell linux mysql
查看: 897|回复: 6

关于-O3是不是快的讨论

[复制链接]
发表于 2005-6-21 12:19:07 | 显示全部楼层 |阅读模式
在 amd64 mail list 上面看到的,有价值。看来我要换回 -O2 了。

Karol Krizka posted <ac342b0a05061017533a58f82e@mail.gmail.com>, excerpted
below,  on Fri, 10 Jun 2005 17:53:25 -0700:

> Hi guys,
> Today is the last day of school for me so I don't care if this computer
> gets broken. I've decided to try out gcc4 on it. I remember reading some
> threads on this list about how it broke some apps on KDE. How about GNOME,
> is there anything not working?
>
> Also what are the best flags to get the fastest code (size dosn't matter)?
> I've heard people talk about some '-visibility=hidden'. Is there any
> others that I am not aware of?

Short answer, using an uptodate ~amd64 system, and using gcc-config where
necessary to switch back to an older gcc-3.whatever version,
gcc-4.whatever has caused surprisingly few issues here.  Operation with
gcc-4.whatever (I'm regularly updating to the most current masked
gcc-4.0.1-beta2005mmdd snapshots) as my system default compiler has been
much smoother than I actually expected it to be.  The longer, more
detailed answer, follows.

gcc-4 is slotted and installed in parallel to your latest gcc-3.x.
gcc-config is the tool used to switch between them.  Thus, if you are
going to use gcc-4, ensure you have the latest ~ gcc-config as well.
Also, the latest ~ binutils is probably helpful.

KDE now compiles and runs fine, if you are using the individual packages,
with the one or two possible exceptions (one of the games from kdegames
was one, IIRC, but I've not merged that game so don't know) that I'm not
using, here.  The latest KDEs have had the -fvisibility-hidden stuff
disabled from upstream, AFAIK, due to various issues, because they used it
wrong in the first place, so they now work just fine, including the
individual problem packages from before, I /think/, tho like I said, I
hadn't run into any issues I could attribute to that anyway and wasn't
running the one package /known/ to have serious problems (segfaults) from
it, so I can't say for sure.

I have been using gcc-4.0.1-beta2005mmdd snapshots for some time
(unmasking them as necessary), as my regular system-wide compiler.  There
are a very few programs that won't compile/merge, altho most do just fine.
The ones that don't, I simply use gcc-config to switch compilers back to
the normal 3.4.4 profile, do that specific package merge, then switch back
to 4.0.1-beta-whatever.

Do note, however, that I'm running an entirely ~amd64 system, even with
some masked-for-testing additions as well.  I would *NOT* recommend anyone
running stable try using the gcc-4 series just yet, for their entire
system, anyway.  Again, merging it for use with selected packages, using
gcc-config to switch to it from a normal gcc-3.x profile (the reverse of
the above, where I have gcc-4 as my normal profile), might be doable on a
stable system, but even then, you'll likely need the latest unstable
binutils and gcc-config, at minimum, to get it to work smoothly, and those
in turn might pull in other necessary unstable dependencies.

The two major packages that are KNOWN to still have issues with gcc-4 are
glibc, and xorg.  There's a still masked (AFAIK) glibc version that's
supposed to compile with gcc-4, and is in the tree specifically to allow
those that want to try it with gcc-4, to do so, but there are some dire
warnings about using it, and while it worked just fine for 64-bit here,
the 32-bit parallel build had issues (I couldn't compile anything else
to 32-bit with it installed, including further gcc-snapshots, and the
portage sandbox package itself, both of which have 32-bit compoenents,
they fail during the 32-bit configure phase) so I unmerged it.  However,
continue to use your gcc-3.whatever compiled version of glibc, and use
gcc-config to switch to your gcc-3.whatever profile when compiling any
new glibc packages, and that shouldn't be an issue.

Likewise with xorg.  Simply use gcc-config to switch to your
gcc-3.whatever profile, and xorg continues to compile just fine.

Pretty much everything else, I've had no problems with using
gcc-4.whatever at all.  In the unlikely event there /are/ problems, again,
simply switching to the gcc-3.whatever profile using gcc-config, and
remerging using that, should solve them.

The one other gcc-4 related issue I've seen is runtime, not compiletime,
and relates to libstdc++.  The gcc-4 version of that library is backward
compatible with the gcc-3.3.x and 3.4.x versions, but the 3.x versions
aren't forward compatible with the gcc-4 versions.  The libstdc++ version
that gets loaded is ALSO affected by your gcc-config setting.  With KDE,
it's preferable to compile everything with one OR the other, and then
ensure when you load any KDE apps, that you you do so with gcc-config set
to the version that matches what you compiled it with.  *MOST* of KDE
seems to run fine in any case, but anything requiring KHTML for rendering,
including not only konqueror, but kcontrol, and some misc. apps like
kweather, can refuse to load, under certain conditions, if the libstdc++
libraries used to compile them don't match up and don't match what's
pointed to by gcc-config at the time they are launched.   I doubt this
minor incompatibility will show itself in much else beyond the KDE family,
however, even where apps ARE C++, because very few will have the complex
dependency structure that KDE does.  In any case, here again, remerging
the dependency tree of the offending application so all C++ related
libraries and the application itself are compiled with a matching gcc,
should fix the problem, and has done so here.

CFLAGS:  Do NOT put -fvisibility-hidden in your CFLAGS!!  While this /can/
speed things up where used appropriately, it does so by hiding
specific "internal" functions so they don't have to be dealt with when
linking and otherwise handling executable libraries and applications.  Put
that in your CFLAGS, and you are essentially telling gcc to hide *ALL*
functions, including those that are intended to be linked to.  This *WILL*
hose your system!!

Other than that, the usual rules and cflags in general continue to apply,
nothing particularly new, with ONE known exception.  The methods gcc uses
for optimization have changed, such that -fweb, which used to be generally
optimizing, is now often /de/optimizing, instead.  If you used it in your
CFLAGS before, consider removing it.  (At least, that's what I've read,
and what I did.  I've not done any benchmarks on it.)

As for speed vs size optimization, the following should be interesting...

Be /very/ careful with optimizing for speed, while saying size doesn't
matter.  Very often, theoretically faster code, say -O3, actually runs
/slower/ than -O2 or -Os.  The reason, when you think about it, is rather
simple.  Yes, -O3 optimizes for faster code, but it does so while not
considering size hardly at all.  In real life CPUs, there's such a thing
as cache memory limitations.  Running from the registers is the fastest,
no performance penalty, but there are only a very few of them.  L1 cache
is next, but it too is very limited, typically 64k each for CPU
instructions and data (128k total).  L2 cache is slower but still makes a
HUGE difference when compared to regular memory.  Take a look at the
benchmarks of otherwise identical CPUs with different size L2 cache if
you've any doubt.  L2 cache is normally 1MB on the higher end AMD64 chips,
512KB on the low end "cheap" versions.  Beyond that is regular memory,
many times slower than L2 cache, but also pretty much as large as your
purchasing budget allows.  Beyond that is hard drive swap and/or any
network accessible memory, both of which are typically EXTREMELY slow to
reach, in comparison to local RAM.

While the effects of -O3 are generally theoretically faster code, they
come at the expense of LARGER code.  Thus, in real life, what would
otherwise fit into L1 often spills over to L2, and what would otherwise
fit in L2 often spills over to main memory.  Because accessing this
spillover area is MANY TIMES slower than accessing closer cache, the
effect of -O3 is commonly if unintuitively, to SLOW DOWN the program, by
forcing the CPU to wait for data fetched from further away than it would
have been with -O2 or -Os.  Thus, for many programs, the effects of -O3
are to make things slower, NOT faster.  The exceptions to this general
rule are programs that tend to do a lot of cache thrashing, and therefore
not keep their instructions or data in the cache, anyway.  Anything
handling playing or streaming media of any size generally fits this
category, thus, all your mplayer and media encoding/decoding applications.

(Not coincidentally, such throughput intensive applications are the
strongest point for modern deeply pipelined but very high clockrate
"Netburst" Pentium 4 style Intel CPUs, as well. RDRAM was similarly
optimized for high thruput at the expense of high latency as well,
applications. AMD's arch and DDR-SDRAM, OTOH, are far lower latency archs
that don't tend to do quite as well in media type applications but tend to
be far better in general purpose applications where thread switching and
latency are far more critical.)

As a consequence of the above, I've been using -Os for some time and
continue to do so. There was an article discussed here which demonstrated
that with -O3, gcc-4 produced larger executables, and they in general
benchmarked slightly worse, than the latest optimized gcc-3.x, with the
same -O3. However, I've contended for some time that due to effects on
cache overruns, -O3 will often tend to deoptimize code, rather than
optimize it, thus my use of -Os, optimizing for size.  Unfortunately, the
article didn't compare -Os compiled code sizes or performance, and I've
not seen comparisons elsewhere (altho I've not been really looking for
them either), so I have no hard data on that.  DO note, however, that the
gcc-3.4 series is relatively mature at this point, and thus should be
producing code about as optimized as it's going to.  By contrast, the
gcc-4.0 series is still new and probably producing far looser code than it
will by 4.0.1 or 4.0.2. Thus, in the abstract, it's quite possible it will
actually benchmark worse than 3.4.x, which is exactly what we saw in the
article covered here, with -O3 optimized code.

I still think gcc-4 is producing faster code for me with -Os, with no idea
on the size of the executables.  However, having not done any benchmarks,
I'm absolutely willing to admit that it could easily be just my
perception, and performance /may/ actually be worse, as we saw with -O3 in
the benchmarks discussed above.  (Do note that such could easily be
explained as well... -O3 produces theoretically faster code with no
concern for size, so if my cache arguments have any validity at all, it's
actually quite likely that a "better" job at -O3 optimization would
produce slower code in real life, because it would be theoretically faster
at the expense of size, thereby cache-busting more efficiently, causing
the code to run slower when actually used in a real-life finite-cached
processor.  Thus, the results above were actually /expected/ IMO, and
could indeed mean gcc4 is more efficient at (de)optimizing exactly how it
is told to optimize.)

All that said, and again entirely by feel, I /think/ I see what could be
worse memory leaks.  I /think/ I see memory use growing farther and faster
over time than I /remember/ happening before, particularly running my
gcc-4 compiled KDE in (still gcc-3.4.3-whatever compiled, because it
won't compile in 4.x yet) xorg.  Quitting KDE/X to the CLI prompt and
restarting them essentially eliminates the issue, which only occurs over
several days of use, and I'm not /sure/ it's worse than it was, but it
just /seems/ so.  HOWEVER, note that I'm ALSO running currently masked
xorg-x11-6.8.99.x testing ebuilds, and it's QUITE possible, EVEN LIKELY,
THAT's where the leaks are, again if it's really any worse than before in
the first place.  I simply don't know, and am only reporting the
observations I see.

So, what that all amounts to is this:  -Os /may/ not be quite as
efficient, and either it or gcc-4 in general /may/ trigger memory leaks I
wasn't seeing before.  However, the issue /may/ not exist at all, or /may/
be attributable to something else entirely.  In any case, it shouldn't be
a serious problem for normal use, unless you consider "normal use" to be
running long-running applications that take a week to come up with an
answer, in which case the (potential) memory leak may be a problem.
However, in that case, I'd wonder at your sanity in trying to test an
acknowledged not yet stable marked gcc-4 on such a required-stable system
in the first place! <g>

That should about cover it... <g>
发表于 2005-6-21 13:15:33 | 显示全部楼层
我也已经仅使用-O2了。gcc4.0.1的编译性能和gcc3.3.5相当,但是4.1.0就要明显提高20%-40%.
回复 支持 反对

使用道具 举报

发表于 2005-6-21 13:34:45 | 显示全部楼层
确实,没觉得GCC4.0.1编译快,现在用回3.4.4
昨晚安装quakeforge,-O3时编译通过,但游戏运行不了,换了-O就一切OK
回复 支持 反对

使用道具 举报

发表于 2005-6-21 15:28:43 | 显示全部楼层
多谢了~~

“优化”不当,只能更慢
回复 支持 反对

使用道具 举报

发表于 2005-6-21 18:34:50 | 显示全部楼层
还没有遇到过因为02,03而出问题的时侯--
回复 支持 反对

使用道具 举报

发表于 2005-6-21 18:45:35 | 显示全部楼层
The two major packages that are KNOWN to still have issues with gcc-4 are
glibc, and xorg. There's a still masked (AFAIK) glibc version that's
supposed to compile with gcc-4, and is in the tree specifically to allow
those that want to try it with gcc-4, to do so, but there are some dire
warnings about using it, and while it worked just fine for 64-bit here,
the 32-bit parallel build had issues (I couldn't compile anything else
to 32-bit with it installed, including further gcc-snapshots, and the
portage sandbox package itself, both of which have 32-bit compoenents,
they fail during the 32-bit configure phase)
so I unmerged it. However,
continue to use your gcc-3.whatever compiled version of glibc, and use
gcc-config to switch to your gcc-3.whatever profile when compiling any
new glibc packages, and that shouldn't be an issue.


我用gcc4编译一个glibc(2.3.5不能编译通过,是2.3.5另外一个版本200XXXXX,具体不记得了)
重新编译很多程序也出问题。
用gcc-3.4.3 编译zhcon也不能通过。
回复 支持 反对

使用道具 举报

发表于 2005-6-21 19:02:23 | 显示全部楼层
-O2是比较合适的选择。-O3在少数情况下会有问题,且总体运行速度相对-O2并没有什么提升。
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

快速回复 返回顶部 返回列表