This is good because it will make LLVM more "polished," but bad because
there might be a joe out there who could take this and flesh it out into
a real code generator.
llvm-svn: 12502
"Rewrite the second on AnalysisUsage usage. This documents the new
addRequiredTransitive member that Misha added, and explains the whole
concept a lot better. Also, the document used incorrect "subsubsection"
tags instead of "doc_subsubsection" which this fixes."
llvm-svn: 12476
in the QMTest Testrunner tests.
Please note that putting output files in the Output directory no longer
works, as QMTest does not build Output directories anymore (nor does the
test run in a separate subdirectory, anyway).
llvm-svn: 12466
as it is making effectively arbitrary modifications to the CFG and we don't
have a domset/domfrontier implementations that can handle the dynamic updates.
Instead of having a bunch of code that doesn't actually work in practice,
just demote any potentially tricky values to the stack (causing the problem
to go away entirely). Later invocations of mem2reg will rebuild SSA for us.
This fixes all of the major performance regressions with tail duplication
from LLVM 1.1. For example, this loop:
---
int popcount(int x) {
int result = 0;
while (x != 0) {
result = result + (x & 0x1);
x = x >> 1;
}
return result;
}
---
Used to be compiled into:
int %popcount(int %X) {
entry:
br label %loopentry
loopentry: ; preds = %entry, %no_exit
%x.0 = phi int [ %X, %entry ], [ %tmp.9, %no_exit ] ; <int> [#uses=3]
%result.1.0 = phi int [ 0, %entry ], [ %tmp.6, %no_exit ] ; <int> [#uses=2]
%tmp.1 = seteq int %x.0, 0 ; <bool> [#uses=1]
br bool %tmp.1, label %loopexit, label %no_exit
no_exit: ; preds = %loopentry
%tmp.4 = and int %x.0, 1 ; <int> [#uses=1]
%tmp.6 = add int %tmp.4, %result.1.0 ; <int> [#uses=1]
%tmp.9 = shr int %x.0, ubyte 1 ; <int> [#uses=1]
br label %loopentry
loopexit: ; preds = %loopentry
ret int %result.1.0
}
And is now compiled into:
int %popcount(int %X) {
entry:
br label %no_exit
no_exit: ; preds = %entry, %no_exit
%x.0.0 = phi int [ %X, %entry ], [ %tmp.9, %no_exit ] ; <int> [#uses=2]
%result.1.0.0 = phi int [ 0, %entry ], [ %tmp.6, %no_exit ] ; <int> [#uses=1]
%tmp.4 = and int %x.0.0, 1 ; <int> [#uses=1]
%tmp.6 = add int %tmp.4, %result.1.0.0 ; <int> [#uses=2]
%tmp.9 = shr int %x.0.0, ubyte 1 ; <int> [#uses=2]
%tmp.1 = seteq int %tmp.9, 0 ; <bool> [#uses=1]
br bool %tmp.1, label %loopexit, label %no_exit
loopexit: ; preds = %no_exit
ret int %tmp.6
}
llvm-svn: 12457
time from 615s to 1.49s on a large testcase that has a gigantic switch statement
that all of the blocks in the function go to (an intepreter).
llvm-svn: 12442
http://mail.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20040308/013095.html
Basically, this patch only updated the immediate dominatees of the header node
to tell them that the preheader also dominated them. In practice, ALL
dominatees of the header node are also dominated by the preheader.
This fixes: LoopSimplify/2004-03-15-IncorrectDomUpdate.
and PR293
llvm-svn: 12434
Simplify the input/output finder. All elements of a basic block are
instructions. Any used arguments are also inputs. An instruction can only
be used by another instruction.
llvm-svn: 12405
extracted, and a function that contained a single top-level loop never had
the loop extracted, regardless of how much non-loop code there was.
llvm-svn: 12403
* Don't insert a branch to the switch instruction after the call, just
make it a single block.
* Insert the new alloca instructions in the entry block of the original
function instead of having them execute dynamically
* Don't make the default edge of the switch instruction go back to the switch.
The loop extractor shouldn't create new loops!
* Give meaningful names to the alloca slots and the reload instructions
* Some minor code simplifications
llvm-svn: 12402
This also implements a two minor improvements:
* Don't insert live-out stores IN the region, insert them on the code path
that exits the region
* If the region is exited to the same block from multiple paths, share the
switch statement entry, live-out store code, and the basic block.
llvm-svn: 12401
a member of the class. While we're at it, turn the collection into a set
instead of a vector to improve efficiency and make queries simpler.
llvm-svn: 12400
miscompiled, try to use the loop extractor to reduce the program down to a
loop nest that is being miscompiled. In practice, the loop extractor appears
to have too many bugs for this to be useful, but hopefully they will be fixed
soon...
llvm-svn: 12398
* Make several methods of bugdriver global functions (ParseInputFile, PrintFunctionList)
* Make PrintFunctionList truncate the output after 10 entries, like the crash debugger
did. This allows code sharing.
* Add a couple of methods to BugDriver that allows us to eliminate some friends
* Improve comments in ExtractFunction.cpp
* Make classes that used to be friends up bugdriver now live in anon namespaces
* Rip a bunch of functionality in the miscompilation tester into a new
TestMergedProgram function for future code sharing.
* Fix a bug in the miscompilation tester induced in my last checkin
llvm-svn: 12393
loop information won't see it, and we could have unreachable blocks pointing to
the non-header node of blocks in a natural loop. This isn't tidy, so have the
loopsimplify pass clean it up.
llvm-svn: 12380
* Be a lot more accurate about what the effects will be when inlining a call
to a function when an argument is an alloca.
* Dramatically reduce the penalty for inlining a call in a large function.
This heuristic made it almost impossible to inline a function into a large
function, no matter how small the callee is.
llvm-svn: 12363
On the testcase from GCC PR12440, which has a LOT of loops (1392 of which require
preheaders to be inserted), this speeds up the loopsimplify pass from 1.931s to
0.1875s. The loop in question goes from 1.65s -> 0.0097s, which isn't bad. All of
these times are a debug build.
This adds a dependency on DominatorTree analysis that was not there before, but
we always had dominatortree available anyway, because LICM requires both loop
simplify and DT, so this doesn't add any extra analysis in practice.
llvm-svn: 12362
Added information on getting the LLVM GCC front end from CVS.
Added new configure script options.
Made other minor corrections and modifications.
llvm-svn: 12340
Make an explicit call to it from runOnFunction() if we know we're supposed to
write into the global. This is lame (esp. the const_cast), but it solves
the problem.
llvm-svn: 12291
make the output more compact.
Divorce state-saving from the doFinalization method; for some reason it's not
getting called when I want it to, at Reoptimizer time. Put the guts in
PhyRegAlloc::finishSavingState(). Put an abort() in it so that I can be really
really sure that it's getting called.
Update comments.
llvm-svn: 12286
De-constify SaveStateToModule; we have to set both it and SaveRegAllocState
explicitly in the reoptimizer.
Make SaveRegAllocState an 'external location' option.
llvm-svn: 12278
projects like 'port glibc to llvm' or 'improve nightly tester', should
have an unassigned enhancement bug opened for them so that they can be
tracked more easily. Open projects should only list generic projects
like 'compile programs with the LLVM compiler' or 'write a new backend
for target'.
llvm-svn: 12273
by trying to get the compiler to generate an undefined reference for it
and related functions which live in libc_nonshared.a on Linux.
Linkers... sigh.
llvm-svn: 12256
bear the burden of implementing what will be all exactly the same methods.
They just want to provide the information in differing ways.
llvm-svn: 12239
testcase like this:
int %test(int* %P, int %A) {
%Pv = load int* %P
%B = add int %A, %Pv
ret int %B
}
We now generate:
test:
mov %ECX, DWORD PTR [%ESP + 4]
mov %EAX, DWORD PTR [%ESP + 8]
add %EAX, DWORD PTR [%ECX]
ret
Instead of:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
mov %EAX, DWORD PTR [%EAX]
add %EAX, %ECX
ret
... saving one instruction, and often a register. Note that there are a lot
of other instructions that could use this, but they aren't handled. I'm not
really interested in adding them, but mul/div and all of the FP instructions
could be supported as well if someone wanted to add them.
llvm-svn: 12204
This allows pointers to aggregate objects, whose elements are only read, to
be promoted and passed in by element instead of by reference. This can
enable a LOT of subsequent optimizations in the caller function.
It's worth pointing out that this stuff happens a LOT of C++ programs, because
objects in templates are generally passed around by reference. When these
templates are instantiated on small aggregate or scalar types, however, it is
more efficient to pass them in by value than by reference.
This transformation triggers most on C++ codes (e.g. 334 times on eon), but
does happen on C codes as well. For example, on mesa it triggers 72 times,
and on gcc it triggers 35 times. this is amazingly good considering that
we are using 'basicaa' so far.
llvm-svn: 12202
The problem was that we were merging a field of a node with a value that was
deleted. Thanks to bugpoint for reducing povray to a nice small 3 function
example. :)
llvm-svn: 12116
Make sure to scope the NodeMap passed into cloneInto so that it doesn't point
to nodes that are deleted. Add some FIXME's for future performance enhancements.
llvm-svn: 12115
do it on povray. The problem is that we were not copying globals from callees to
callers unless the existed in both graphs. We should have copied them in the case
where the global pointed to a node that was copied as well.
llvm-svn: 12104
been using the default target data layout object to lower malloc instructions,
causing us to allocate more memory than we needed! This could improve the
performance of the CBE generated code substantially!
llvm-svn: 12088
(16) into certain areas of the SPARC V9 back-end. I'm fairly sure the US IIIi's
dcache has 32-byte lines, so I'm not sure where the 16 came from. However, in
the interest of not breaking things any more than they already are, I'm going
to leave the constant alone.
llvm-svn: 12043
of generating this code:
mov %EAX, 4
mov DWORD PTR [%ESP], %EAX
mov %AX, 123
movsx %EAX, %AX
mov DWORD PTR [%ESP + 4], %EAX
call Y
we now generate:
mov DWORD PTR [%ESP], 4
mov DWORD PTR [%ESP + 4], 123
call Y
Which hurts the eyes less. :)
Considering that register pressure around call sites is already high (with all
of the callee clobber registers n stuff), this may help a lot.
llvm-svn: 12028
DSNodes, unlike other GraphTraits nodes, can have null outgoing edges, and
df_iterator doesn't take this into consideration. As a workaround, the
successor iterator now handles null nodes and 'indicates' that null has
no successors.
llvm-svn: 12025
we really don't win that much by eliminating this (not many Modules are
allocated), so it's not worth it. When we can, we should revisit this in
the future.
llvm-svn: 12023
LLVM instructions. Because it contains an explicit cast, we didn't catch it.
I guess instruction's will be annotable for the duration of the sparcv9's
existence.
llvm-svn: 11999
1) For 8-bit registers try to use first the ones that are parts of the
same register (AL then AH). This way we only alias 2 16/32-bit
registers after allocating 4 8-bit variables.
2) Move EBX as the last register to allocate. This will cause less
spills to happen since we will have 8-bit registers available up to
register excaustion (assuming we use the allocation order). It
would be nice if we could push all of the 8-bit aliased registers
towards the end but we much prefer to keep callee saved register to
the end to avoid saving them on entry and exit of the function.
For example this gives a slight reduction of spills with linear scan
on 164.gzip.
Before:
11221 asm-printer - Number of machine instrs printed
975 spiller - Number of loads added
675 spiller - Number of stores added
398 spiller - Number of register spills
After:
11182 asm-printer - Number of machine instrs printed
952 spiller - Number of loads added
652 spiller - Number of stores added
386 spiller - Number of register spills
llvm-svn: 11996
their names more decriptive. A name consists of the base name, a
default operand size followed by a character per operand with an
optional special size. For example:
ADD8rr -> add, 8-bit register, 8-bit register
IMUL16rmi -> imul, 16-bit register, 16-bit memory, 16-bit immediate
IMUL16rmi8 -> imul, 16-bit register, 16-bit memory, 8-bit immediate
MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory
llvm-svn: 11995
Note that this is a band-aid put over a band-aid. This just undisables
tail duplication in on very specific case that it seems to work in.
llvm-svn: 11989
methods take an int or unsigned value instead of int64_t.
Also, add an 'addImm' method to the MachineInstrBuilder class, because the
fact that the hardware sign or zero extends it does not/should not matter
to the code generator. Once the old sparc backend is removed the difference
can be eliminated.
llvm-svn: 11976
parse. The name is now I (operand size)*. For example:
Im32 -> instruction with 32-bit memory operands.
Im16i8 -> instruction with 16-bit memory operands and 8 bit immediate
operands.
llvm-svn: 11970
the size of the immediate and the memory operand on instructions that
use them. This resolves problems with instructions that take both a
memory and an immediate operand but their sizes differ (i.e. ADDmi32b).
llvm-svn: 11967
an 8-bit immediate. So mark the shifts that take immediates as taking
an 8-bit argument. The rest with the implicit use of CL are marked
appropriately.
A bug still exists:
def SHLDmri32 : I2A8 <"shld", 0xA4, MRMDestMem>, TB; // [mem32] <<= [mem32],R32 imm8
The immediate in the above instruction is 8-bit but the memory
reference is 32-bit. The printer prints this as an 8-bit reference
which confuses the assembler. Same with SHRDmri32.
llvm-svn: 11931
2004-02-26-FPNotPrintableConstants.llx ensures that constants used in an
LLVM program are declared static if they are assigned to global variables.
2004-02-26-LinkOnceFunctions.llx ensures that linkonce functions get the
weak attribute.
llvm-svn: 11885
Functions with linkonce linkage are declared with weak linkage.
Global floating point constants used to represent unprintable values
(such as NaN and infinity) are declared static so that they don't interfere
with other CBE generated translation units.
llvm-svn: 11884
all dynamically allocated LLVM values 4 bytes smaller, eliminate some vtables, and
make Value's destructor faster.
This makes Function derive from Annotation now because it is the only core LLVM
class that still has an annotation stuck onto it: MachineFunction.
MachineFunction is obviously horrible and gross (like most other annotations), but
will be the subject of refactorings later in the future. Besides many fewer
Function objects are dynamically allocated that instructions blocks, constants,
types, etc... :)
llvm-svn: 11878
other clients. The problem is that the nullVal member was left to the default
constructor to initialize, which for int's does nothing (ie, leaves it unspecified).
To get a zero value, we must use T(). It's C++ wonderful? :)
llvm-svn: 11867
1. Functions do not make things incomplete, only variables
2. Constant global variables no longer need to be marked incomplete, because
we are guaranteed that the initializer for the global will be in the
graph we are hacking on now. This makes resolution of indirect calls happen
a lot more in the bu pass, supports things like vtables and the C counterparts
(giant constant arrays of function pointers), etc...
Testcase here: test/Regression/Analysis/DSGraph/constant_globals.ll
llvm-svn: 11852
Make the incompleteness marker faster by looping directly over the globals
instead of over the scalars to find the globals
Fix a bug where we didn't mark a global incomplete if it didn't have any
outgoing edges. This wouldn't break any current clients but is still wrong.
llvm-svn: 11848
pair, and look up varargs in the execution stack every time, instead of
just pushing iterators (which can be invalidated during callFunction())
around. (union GenericValue now has a "pair of uints" member, to support
this mechanism.) Fixes Bug 234.
llvm-svn: 11845
assume that if they don't intend to write to a global variable, that they
would mark it as constant. However, there are people that don't understand
that the compiler can do nice things for them if they give it the information
it needs.
This pass looks for blatently obvious globals that are only ever read from.
Though it uses a trivially simple "alias analysis" of sorts, it is still able
to do amazing things to important benchmarks. 253.perlbmk, for example,
contains several ***GIANT*** function pointer tables that are not marked
constant and should be. Marking them constant allows the optimizer to turn
a whole bunch of indirect calls into direct calls. Note that only a link-time
optimizer can do this transformation, but perlbmk does have several strings
and other minor globals that can be marked constant by this pass when run
from GCCAS.
176.gcc has a ton of strings and large tables that are marked constant, both
at compile time (38 of them) and at link time (48 more). Other benchmarks
give similar results, though it seems like big ones have disproportionally
more than small ones.
This pass is extremely quick and does good things. I'm going to enable it
in gccas & gccld. Not bad for 50 SLOC.
llvm-svn: 11836
scaled indexes. This allows us to compile GEP's like this:
int* %test([10 x { int, { int } }]* %X, int %Idx) {
%Idx = cast int %Idx to long
%X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
ret int* %X
}
Into a single address computation:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
ret
Before it generated:
test:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
shl %ECX, 3
add %EAX, %ECX
lea %EAX, DWORD PTR [%EAX + 4]
ret
This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot. On
bzip2 for example, we go from this:
10665 asm-printer - Number of machine instrs printed
40 ra-local - Number of loads/stores folded into instructions
1708 ra-local - Number of loads added
1532 ra-local - Number of stores added
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
2794 x86-peephole - Number of peephole optimization performed
to this:
9873 asm-printer - Number of machine instrs printed
41 ra-local - Number of loads/stores folded into instructions
1710 ra-local - Number of loads added
1521 ra-local - Number of stores added
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole - Number of peephole optimization performed
... and these types of instructions are often in tight loops.
Linear scan is also helped, but not as much. It goes from:
8787 asm-printer - Number of machine instrs printed
2389 liveintervals - Number of identity moves eliminated after coalescing
2288 liveintervals - Number of interval joins performed
3522 liveintervals - Number of intervals after coalescing
5810 liveintervals - Number of original intervals
700 spiller - Number of loads added
487 spiller - Number of stores added
303 spiller - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
363 x86-peephole - Number of peephole optimization performed
to:
7982 asm-printer - Number of machine instrs printed
1759 liveintervals - Number of identity moves eliminated after coalescing
1658 liveintervals - Number of interval joins performed
3282 liveintervals - Number of intervals after coalescing
4940 liveintervals - Number of original intervals
635 spiller - Number of loads added
452 spiller - Number of stores added
288 spiller - Number of register spills
789 twoaddressinstruction - Number of instructions added
789 twoaddressinstruction - Number of two-address instructions
258 x86-peephole - Number of peephole optimization performed
Though I'm not complaining about the drop in the number of intervals. :)
llvm-svn: 11820
to do analysis.
*** FOLD getelementptr instructions into loads and stores when possible,
making use of some of the crazy X86 addressing modes.
For example, the following C++ program fragment:
struct complex {
double re, im;
complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
return arg + complex(1,0);
}
Used to be compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
*** mov %EDX, %ECX
fld QWORD PTR [%EDX]
fld1
faddp %ST(1)
*** add %ECX, 8
fld QWORD PTR [%ECX]
fldz
faddp %ST(1)
*** mov %ECX, %EAX
fxch %ST(1)
fstp QWORD PTR [%ECX]
*** add %EAX, 8
fstp QWORD PTR [%EAX]
ret
Now it is compiled to:
_Z6addoneRK7complex:
mov %EAX, DWORD PTR [%ESP + 4]
mov %ECX, DWORD PTR [%ESP + 8]
fld QWORD PTR [%ECX]
fld1
faddp %ST(1)
fld QWORD PTR [%ECX + 8]
fldz
faddp %ST(1)
fxch %ST(1)
fstp QWORD PTR [%EAX]
fstp QWORD PTR [%EAX + 8]
ret
Other programs should see similar improvements, across the board. Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86. :)
llvm-svn: 11819
into a single LEA instruction. This should improve the code generated for
things like X->A.B.C[12].D.
The bigger benefit is still coming though. Note that this uses an LEA instruction
instead of an add, giving the register allocator more freedom. We should probably
never generate ADDri32's.
llvm-svn: 11817
so that we always get the inline function instead. Remember, kids, like it says
in the GCC manual, "An Inline Function is As Fast As a Macro."
llvm-svn: 11815
Also fix problem where we didn't check to see if a node pointer was null.
Though fclose(null) doesn't make a lot of sense, 300.twolf does it.
llvm-svn: 11810
longer was getting this #include, it always fell back on the less precise
floating point initializer values, causing some testsuite failures.
llvm-svn: 11803
allocator.
The implementation is completely rewritten and now employs several
optimizations not exercised before. For example for 164.gzip we have
997 loads and 699 stores vs the 1221 loads and 880 stores we have
before.
llvm-svn: 11798
This case occurs many times in various benchmarks, especially when combined
with the previous patch. This allows it to get stuff like:
if (X == 4 || X == 3)
if (X == 5 || X == 8)
and
switch (X) {
case 4: case 5: case 6:
if (X == 4 || X == 5)
llvm-svn: 11797
block into MachineBasicBlock::getFirstTerminator().
This also fixes a bug in the implementation of the above in both
RegAllocLocal and InstrSched, where instructions where added after the
terminator if the basic block's only instruction was a terminator (it
shouldn't matter for RegAllocLocal since this case never occurs in
practice).
llvm-svn: 11748
use FP instructions. This reduces the number of instructions inserted in
176.gcc (for example) from 58074 to 101 (it doesn't use much FP, which
is typical). This reduction speeds up the entire code generator. In the
case of 176.gcc, llc went from taking 31.38s to 24.78s. The passes that
sped up the most are the register allocator and the 2 live variable analysis
passes, which sped up 2.3, 1.3, and 1.5s respectively. The asmprinter
pass also sped up because it doesn't print the instructions in comments :)
Note that this patch is likely to expose latent bugs in machine code passes,
because now basicblock can be empty, where they were never empty before. I
cleaned out regalloclocal, but who knows about linscan :)
llvm-svn: 11717
switch statements in the constructors and simplifies the
implementation of the getUseType() member function. You will have to
specify defs using MachineOperand::Def instead of MOTy::Def though
(similarly for Use and UseAndDef).
llvm-svn: 11715
(minor) benefits right now:
1. An extra dummy MOVrr32 is gone. This move would often be coallesced by
both allocators anyway.
2. The code now uses the gep_type_iterator to walk the gep, which should future
proof it a bit. It still assumes that array indexes are Longs though.
These don't really justify rewriting the code. The big benefit will come later
though.
llvm-svn: 11710
value is a physreg and one is a virtreg. For this reason, disable copy folding
entirely for physregs. Also, use the new isMoveInstr target hook which gives us
folding of FP moves as well.
llvm-svn: 11700
BU propagation, clone the globals into the GG of EACH FUNCTION that finishes
processing! The GlobalsGraph *must* include all globals and effects from
all functions in the program. Fixing this makes pool allocation work better
on 175.vpr, but it still ultimately crashes.
llvm-svn: 11686
end of the BU and CBU passes. The globals will be marked incomplete, so it
doesn't matter if they are missing some info, and merging isn't guaranteed
to bring everything in anyway!
llvm-svn: 11684
1. LiveIntervals now implement a 4 slot per instruction model. Load,
Use, Def and a Store slot. This is required in order to correctly
represent caller saved register clobbering on function calls,
register reuse in the same instruction (def resues last use) and
also spill code added later by the allocator. The previous
representation (2 slots per instruction) was insufficient and as a
result was causing subtle bugs.
2. Fixes in spill code generation. This was the major cause of
failures in the test suite.
3. Linear scan now has core support for folding memory operands. This
is untested and not enabled (the live interval update function does
not attempt to fold loads/stores in instructions).
4. Lots of improvements in the debugging output of both live intervals
and linear scan. Give it a try... it is beautiful :-)
In summary the above fixes all the issues with the recent reserved
register elimination changes and get the allocator very close to the
next big step: folding memory operands.
llvm-svn: 11654
that need them. This is very useful on CISCy targets like the X86 because it
reduces the total spill pressure, and makes better use of it's (large)
instruction set. Though the X86 backend doesn't know how to rewrite many
instructions yet, this already makes a substantial difference on 176.gcc for
example:
Before:
Time:
8.0099 ( 31.2%) 0.0100 ( 12.5%) 8.0199 ( 31.2%) 7.7186 ( 30.0%) Local Register Allocator
Code quality:
734559 asm-printer - Number of machine instrs printed
111395 ra-local - Number of registers reloaded
79902 ra-local - Number of registers spilled
231554 x86-peephole - Number of peephole optimization performed
After:
Time:
7.8700 ( 30.6%) 0.0099 ( 19.9%) 7.8800 ( 30.6%) 7.7892 ( 30.2%) Local Register Allocator
Code quality:
733083 asm-printer - Number of machine instrs printed
2379 ra-local - Number of reloads fused into instructions
109046 ra-local - Number of registers reloaded
79881 ra-local - Number of registers spilled
230658 x86-peephole - Number of peephole optimization performed
So by fusing 2300 instructions, we reduced the static number of instructions
by 1500, and reduces the number of peepholes (and thus the work) by about 900.
This also clearly reduces the number of reload/spill instructions that are
emitted.
llvm-svn: 11542
nightly tests to be really messed up. The problem was that the new leakdetector
was depending on undefined behavior: the order of destruction of static objects.
llvm-svn: 11488
hacks can be banished. Also, this gives us the opportunity to emit special code
for the setjmp/longjmps which alows the elimination of one GCC warning for every
setjmp/longjmp site (which is often THOUSANDS in C++ programs). Yaay!
llvm-svn: 11484
prototypes, even if they don't precisely match what it would prefer to use.
This fixes: CBackend/2004-02-15-PreexistingExternals.llx compiling it into:
ltmp_0_30 = memcpy(l14_C, 4u, 17);
ltmp_1_30 = memcpy(((int *)l27_A), ((unsigned )(long)l27_B), ((int )123u));
instead of:
ltmp_0_30 = memcpy(l14_C, 4u, 17);
ltmp_1_27 = l43_memcpy(l27_A, l27_B, 123u);
Which does the wrong thing as you could imagine.
llvm-svn: 11481
MRegisterInfo::getNumRegs() instead of
MRegisterInfo::FirstVirtualRegister.
Also use MRegisterInfo::is{Physical,Virtual}Register where
appropriate.
llvm-svn: 11477
initializers for constant structs and arrays take constant space, instead of
space proportinal to the number of elements. This reduces the memory usage of
the LLVM compiler by hundreds of megabytes when compiling some nasty SPEC95
benchmarks.
llvm-svn: 11470
'Constant', instead of specific subclass pointers. In the future, these will
return an instance of ConstantAggregateZero if all of the inputs are zeros.
llvm-svn: 11467
implementation class. This makes the code simpler and allows for more
types to be added easily. It also implements caching for generic
objects (it was only available for llvm objects).
llvm-svn: 11452
they do not modify the passed iterator but return a copy.
next(myIt) returns copy of myIt incremented once
next(myIt, n) returns copy of myIt incremented n times
prior(myIt) returns copy of myIt decremented once
prior(myIt, n) returns copy of myIt decremented n times
While at it remove obsolete implementation of mapped_iterator.
llvm-svn: 11429
"expensive exceptions" controlled by an option. Also refactor and eliminate a bunch of cruft.
This is a temporary solution and causes millions of warnings to pour out of programs that use
exceptions, but it should fix the problem with sparc and the 'write' declaration (PR190).
Subsequent changes will make this stink much less
llvm-svn: 11405
operands as incomplete, though fopen is known to only read them. This just adds
fclose for symmetry, though it doesn't gain anything. This makes the dsgraphs for
181.mcf much more precise.
llvm-svn: 11390
allowed in invoke instructions. Thus, if we are inlining a call to an intrinsic
function into an invoke site, we don't need to turn the call into an invoke!
llvm-svn: 11384
Rename SetMachineOperandConst's formal parameters to match other methods here.
Mark some methods as being used only by the SPARC back-end.
Fix a missing-paren bug in OutputValue().
llvm-svn: 11363
MachineBasicBlock. Also change opcode to a short and numImplicitRefs
to an unsigned char so that overall MachineInstr's size stays the
same.
llvm-svn: 11357
ilist of MachineInstr objects. This allows constant time removal and
insertion of MachineInstr instances from anywhere in each
MachineBasicBlock. It also allows for constant time splicing of
MachineInstrs into or out of MachineBasicBlocks.
llvm-svn: 11340
Add ProgramExitedNonzero argument to executeProgram(), and make it
tell its caller whether the program exited nonzero.
Move executeProgramWithCBE() out of line, to ExecutionDriver.cpp, and remove
its extra arguments which are always defaulted. Make it turn off
check-exit-code if the program exits nonzero while generating a reference
output.
Make diffProgram() assume that any nonzero exit code is a failure, if
check-exit-code is turned on.
llvm-svn: 11325
more of a testcase for profiling information than anything that should reasonably
be used, but it's a starting point. When I have more time I will whip this into
better shape.
llvm-svn: 11311
Having a proper 'select' instruction would allow the elimination of a lot
of the special case cruft in this patch, but we don't have one yet.
llvm-svn: 11307
methods which have strangely different semantics in different backends,
and noone knew what any did.
Getting rid of these ALSO allows the dependence of MachineInstr.h on
MRegisterInfo.h to be removed, which makes me much happier, and probably
alkis too. :)
llvm-svn: 11287
in this for programs with lots of types (like the testcase in PR224).
The problem was that the type ID that the outer vector was using was not
very dense (as many types are getting resolved), so the vector is large
and gets reallocated a lot.
Since there are a lot of values in the program (the .ll file is 10M),
each reallocation has to copy the subvectors, which is also quite slow
(this wouldn't be a problem if C++ supported move semantics, but it
doesn't, at least not yet :(
Changing the outer data structure to a map speeds a release build of
llvm-as up from 11.21s to 5.13s on the testcase in PR224.
llvm-svn: 11244
this speeds up a release llvm-as from 21.95s to 11.21s, because before it
would do an expensive traversal of the type-graph of every type resolved.
llvm-svn: 11242
type at the same time, resolve the upreferences to each other before resolving
it to the outer type. This shaves off some time from the testcase in PR224, from
25.41s -> 21.72s.
llvm-svn: 11241
instead of randomly groping about inside its outEdges array.
Make SchedGraph::addDummyEdges() use getNumOutEdges() instead of
outEdges.size().
Get rid of ifdefed-out code in SchedGraph::buildGraph().
llvm-svn: 11238
consistent across the various type classes, we can factor out a LOT more
almost-identical code. Also, add a couple of temporary statistics.
llvm-svn: 11232
all of the ad-hoc storage of contained types. This allows getContainedType to
not be virtual, and allows us to entirely delete the TypeIterator class.
llvm-svn: 11230
contains the type we are looking for, just search the immediately used types.
We can only do this because we keep the "current" type in the nesting level
as we decrement upreferences.
This change speeds up the testcase in PR224 from 50.4s to 22.08s, not
too shabby.
llvm-svn: 11221
the Virt2PhysRegMap std::map with an std::vector. This speeds up the
register allocator another (almost) 40%, from .72->.45s in a release build
of LLC on 253.perlbmk.
llvm-svn: 11219
from physical registers, and they are always dense, it makes sense to not have
a ton of RBtree overhead. This change speeds up regalloclocal about ~30% on
253.perlbmk, from .35s -> .27s in the JIT (in LLC, it goes from .74 -> .55).
Now live variable analysis is the slowest codegen pass. Of course it doesn't
help that we have to run it twice, because regalloclocal doesn't update it,
but even if it did it would be the slowest pass (now it's just the 2x slowest
pass :(
llvm-svn: 11215
1. The "work" was not in the assert, so it was punishing the optimized release
2. getNamedFunction is _very_ expensive in large programs. It is not designed
to be used like this, and was taking 7% of the execution time of the code
generator on perlbmk.
Since the assert "can never fail", I'm just killing it.
llvm-svn: 11214
removeDeadNodes is called, only call it at the end of the pass being run.
This saves 1.3 seconds running DSA on 177.mesa (5.3->4.0s), which is
pretty big. This is only possible because of the automatic garbage
collection done on forwarding nodes.
llvm-svn: 11178
DSGraphs while they are forwarding. When the last reference to the forwarding
node is dropped, the forwarding node is autodeleted. This should simplify
removeTriviallyDead nodes, and is only (efficiently) possible because we are
using an ilist of dsnodes now.
llvm-svn: 11175
quite the same as for non-intrusive lists of pointers to nodes. To support
transitioning code bases, add a new 'compatibility' iterator.
llvm-svn: 11172
slots each. As a concequence they get numbered as 0, 2, 4 and so
on. The first slot is used for operand uses and the second for
defs. Here's an example:
0: A = ...
2: B = ...
4: C = A + B ;; last use of A
The live intervals should look like:
A = [1, 5)
B = [3, x)
C = [5, y)
llvm-svn: 11141
The problem is that the dominator update code didn't "realize" that it's
possible for the newly inserted basic block to dominate anything. Because
it IS possible, stuff was getting updated wrong.
llvm-svn: 11137
complete rewrite of load-vn will make it a bit faster. This changes speeds up
the gcse pass (which uses load-vn) from 25.45s to 0.42s on the testcase in
PR209.
I've also verified that this gives the exact same results as the old one.
llvm-svn: 11132
1. Don't scan to the end of alloca instructions in the caller function to
insert inlined allocas, just insert at the top. This saves a lot of
time inlining into functions with a lot of allocas.
2. Use splice to move the alloca instructions over, instead of remove/insert.
This allows us to transfer a block at a time, and eliminates a bunch of
silly symbol table manipulations.
This speeds up the inliner on the testcase in PR209 from 1.73s -> 1.04s (67%)
llvm-svn: 11118
and that basic block ends with a return instruction. In this case, we can just splice
the cloned "body" of the function directly into the source basic block, avoiding a lot
of rearrangement and splitBasicBlock's linear scan over the split block. This speeds up
the inliner on the testcase in PR209 from 2.3s to 1.7s, a 35% reduction.
llvm-svn: 11116
fails when the basic block points to the function->end. Instead, require that
the client pass in the function AND the basicblock to insert into.
llvm-svn: 11112
process. The only optimization we did so far is to avoid creating a
PHI node, then immediately destroying it in the common case where the
callee has one return statement. Instead, we just don't create the return
value. This has no noticable performance impact, but paves the way for
future improvements.
llvm-svn: 11108
to add the cloned block to. This allows the block to be added to the function
immediately, and all of the instructions to be immediately added to the function
symbol table, which speeds up the inliner from 3.7 -> 3.38s on the PR209.
llvm-svn: 11107
instead of a loop that is really inefficient with large basic blocks.
This speeds up the inliner pass on the testcase in PR209 from 13.8s to 2.24s
which still isn't exactly speedy, but is a lot better. :)
llvm-svn: 11105
Basically we store floating point values as their integral components, instead of relying
on the semantics of floating point < to differentiate between values. This is likely to
make the map search be faster anyway.
llvm-svn: 11064
registers (not as the max number of registers).
Change toSpill from a std::set into a std::vector<bool>.
Use the reverse iterator adapter to do a reverse scan of allocatable
registers.
llvm-svn: 11061
This tremendously improves the code generated by the LLVM optimizer, primarily
by making the inliner more aggressive. For example, it improves the stepanov
benchmark from 55.56 mega-additions/sec to 98.04 Ma/s. It also improves the
oopack/iterator benchmark from 338.3MFLOPS/s to 1103.4MFLOPS/s. Less noteworthy,
it improves oopack/matrix from 573 -> 641 MFLOPS/s.
llvm-svn: 11053
of a linear search to find the first range for comparisons. This cuts
down the linear scan register allocator running time by a factor of 3
in 254.perlbmk and by a factor of 2.2 in 176.gcc.
llvm-svn: 11030
FP_REG_KILL instructions at the end of blocks involved with critical edges.
Fix a bug where FP_REG_KILL instructions weren't inserted in fall through
unconditional branches. Perhaps this will fix some linscan problems?
llvm-svn: 11019
function to find the globals, iterate over all of the globals directly. This
speeds the function up from 14s to 6.3s on perlbmk, reducing DSA time from
53->46s.
llvm-svn: 10996
This reduces the number of nodes allocated, then immediately merged and DNE'd
from 2193852 to 1298049. unfortunately this only speeds DSA up by ~1.5s (of
53s), because it's spending most of its time waddling through the scalar map :(
llvm-svn: 10992
Also, use RC::merge when possible, reducing the number of nodes allocated, then immediately merged away from 2985444 to 2193852 on perlbmk.
llvm-svn: 10991
it to be off. If it looks like it's completely unnecessary after testing, I
will remove it completely (which is the hope).
* Callers of the DSNode "copy ctor" can not choose to not copy links.
* Make node collapsing not create a garbage node in some cases, avoiding a
memory allocation, and a subsequent DNE.
* When merging types, allow two functions of different types to be merged
without collapsing.
* Use DSNodeHandle::isNull more often instead of DSNodeHandle::getNode() == 0,
as it is much more efficient.
*** Implement the new, more efficient reachability cloner class
In addition to only cloning nodes that are reachable from interesting
roots, this also fixes the huge inefficiency we had where we cloned lots
of nodes, only to merge them away immediately after they were cloned.
Now we only actually allocate a node if there isn't one to merge it into.
* Eliminate the now-obsolete cloneReachable* and clonePartiallyInto methods
* Rewrite updateFromGlobalsGraph to use the reachability cloner
* Rewrite mergeInGraph to use the reachability cloner
* Disable the scalar map scanning code in removeTriviallyDeadNodes. In large
SCC's, this is extremely expensive. We need a better data structure for the
scalar map, because we really want to scan the unique node handles, not ALL
of the scalars.
* Remove the incorrect SANER_CODE_FOR_CHECKING_IF_ALL_REFERRERS_ARE_FROM_SCALARMAP code.
* Move the code for eliminating integer nodes from the trivially dead
eliminator to the dead node eliminator.
* removeDeadNodes no longer uses removeTriviallyDeadNodes, as it contains a
superset of the node removal power.
* Only futz around with the globals graph in removeDeadNodes if it is modified
llvm-svn: 10987
efficient in the case where a function calls into the same graph multiple times
(ie, it either contains multiple calls to the same function, or multiple calls
to functions in the same SCC graph)
llvm-svn: 10986
* Make AssertNodeInGraph not be HORRIBLY time consuming
* Eliminate the dead mergeInGlobalsGraph method
*** Add the definition for the new ReachabilityCloner class
llvm-svn: 10981
out that the problem was actually the writer writing out a 'null' value
because it didn't normalize it. This fixes:
test/Regression/Assembler/2004-01-22-FloatNormalization.ll
llvm-svn: 10967
is a move between two registers, at least one of the registers is
virtual and the two live intervals do not overlap.
This results in about 40% reduction in intervals, 30% decrease in the
register allocators running time and a 20% increase in peephole
optimizations (mainly move eliminations).
The option can be enabled by passing -join-liveintervals where
appropriate.
llvm-svn: 10965
lives near the other installation dirs (like libdir, bindir, etc.).
Move the rule for making bytecode_libdir out of the ifdef LIBRARYNAME...endif.
llvm-svn: 10964
virtReg lives on the stack. Now a virtual register has an entry in the
virtual->physical map or the virtual->stack slot map but never in
both.
llvm-svn: 10958
map was only used to implement a marginal GlobalsGraph optimization, and it
actually slows the analysis down (due to the overhead of keeping it), so just
eliminate it entirely.
llvm-svn: 10955
in terms of it.
Though clonePartiallyInto is not cloning partial graphs yet, this change
dramatically speeds up inlining of graphs with many scalars. For example,
this change speeds up the BU pass on 253.perlbmk from 69s to 36s, because
it avoids iteration over the scalar map, which can get pretty large.
llvm-svn: 10951
called bytecode_libdir. Make install-bytecode-library depend on
the existence of that directory, and add a rule for creating it if
it does not exist by calling mkinstalldirs.
llvm-svn: 10946
which ones don't, which is state that the parent class doesn't know without
knowing the implementation. Let the children classes implement
materializeModule().
llvm-svn: 10942
fact "profitable" to do so. This makes compactification "free" for small
programs (ie, it is completely disabled) and even helps large programs by
not having to encode pointless compactification planes.
On 176.gcc, this saves 50K from the bytecode file, which is, alas only
a couple percent.
This concludes my head bashing against the bytecode format, at least for
now.
llvm-svn: 10922
This shrinks the bytecode file for 176.gcc by about 200K (10%), and 254.gap by
about 167K, a 25% reduction. There is still a lot of room for improvement in
the encoding of the compaction table.
llvm-svn: 10915
This shrinks the bytecode file for 176.gcc by about 200K (10%), and 254.gap by
about 167K, a 25% reduction. There is still a lot of room for improvement in
the encoding of the compaction table.
llvm-svn: 10914
the bytecode file for 176.gcc by about 200K (10%), and 254.gap by about 167K,
a 25% reduction. There is still a lot of room for improvement in the encoding
of the compaction table.
llvm-svn: 10913
platform independent. This code is completely untested (but never used),
and needs autoconf support for detecting pthreads, but it's a start, and
deletes two emails from my inbox. :)
llvm-svn: 10906
bytecode files when compiling 176.gcc, but more importantly will make it
easier to eliminate CPR's in the future (no new .bc revision will be
required to support them)
llvm-svn: 10884
of forcing them to go through ConstantPointerRef's. This allows bytecode
files to mirror .ll files, allows more efficient encoding, and makes it easier
to eventually eliminate CPR's.
llvm-svn: 10883
returning error codes. Because they don't return an error code, they can
return the value read, which simplifies the code and makes the reader more
efficient (yaay!).
Also eliminate the special case code for little endian machines.
llvm-svn: 10871
intended to save size (and does on small programs), but on big programs it
actually increases the size of the program slightly. The deal is that many
functions end up using the characters that the string contained, and the
characters are no longer in the global constant table, so they have to be
emitted in function specific constant pools.
This pessimization will be fixed in subsequent patches.
llvm-svn: 10864
It's not clear why the code was looking for signed chars < 0, but it can't
matter to the assembler anyway, so the check goes away. This also fixes
compatibility with arrays of [us]byte that have constantexprs in them.
Also slightly restructure some code to be cleaner.
llvm-svn: 10854
are complex enough to check that it should be a seperate method.
While I'm here, improve ConstantArray::getNullValue a bit, though the
FIXME is still quite valid.
llvm-svn: 10850
because that makes it abort. Also, fix a typo in a comment.
This checkin brought to you by the "It only takes about 30 seconds to run
ENABLE_LLI tests on Shootout on zion, even if they all dump core" fund.
llvm-svn: 10844
Since this really only makes sense for these two, change hte instance variable
to reflect whether we are writing a bytecode file or not. This makes it
reasonable to add bcwriter specific stuff to it as necessary.
llvm-svn: 10837
Make should continue even if compilation cmds fail, for the sake of
the nightly tester, so use minuses on them.
Use LLVMAS, LLVMGCC, LLVMGXX instead of LAS, LCC, LCXX (as per FIXME).
llvm-svn: 10825
Remove checks for many common Unix programs. Our build process currently
assumes they are there and makes no provisions for any other world-views.
(We can add some of these checks back at some later time if it should prove
useful, but right now, we do not need to check to see whether "rm" exists.)
Remove checks for many common standard C headers and functions. We assume
ISO/ANSI C++, and we always use the <cfoo> versions of ANSI C's <foo.h>
headers, so these checks will not help anything.
Edit configure's warning messages for clarity and content.
Change checks for "optional" programs to default to using "true" instead of
"false", so that a failure to find, e.g., etags, will be less likely to result
in make failing.
No longer shall we check for --enable-purify or --with-purify options.
No longer shall we propagate these to the Makefiles.
configure regenerated using autoconf-2.57.
Please feel free to send me any questions or comments you have. :-)
llvm-svn: 10814
when an implicitely defined register is later used by an alias. For example:
call foo
%reg1024 = mov %AL
The call implicitely defines EAX but only AL is used. Before this fix
no information was available on AL. Now EAX and all its aliases except
AL get defined and die at the call instruction whereas AL lives to be
killed by the assignment.
llvm-svn: 10813
testcase test/Regression/Assembler/ConstantExprFold.llx
Note that these kinds of things only rarely show up in source code, but are
exceedingly common in the intermediate stages of algorithms like SCCP. By
folding things (especially relational operators) that use symbolic constants,
we are able to speculatively fold more conditional branches, which can
lead to some big simplifications.
It would be easy to add a lot more special cases here, so if you notice
SCCP missing anything "obvious", you know what to make smarter. :)
llvm-svn: 10812
Move a bunch of (now) private stuff from ConstantFolding.h into
ConstantFolding.cpp.
This _finally_ gets us to a place where we have a sane constant folder. The
rules are:
1. LLVM clients now use ConstantExpr::get* methods to fold constants. If they
cannot be folded, a constantexpr is created, so these methods always return
valid Constant*'s.
2. The implementation of ConstantExpr::get* uses the functions exposed by
ConstantFolding.h to try to fold constants. If they cannot be folded,
they should return a null pointer.
3. The implementation of ConstantFolding can do whatever it wants, and only
has one client (Constants.cpp)
This cuts down on the wierd dependencies, and eliminates the two interfaces.
The old constanthandling interface was especially bad for clients to use
because almost none of them took the failure condition into consideration,
thus leading to obscure problems.
llvm-svn: 10807
this whole refactoring: allow constant folding methods to return something
other than predefined classes, allow them to return generic Constant*'s.
llvm-svn: 10806
constants as being "true" when evaluating branches. This was introduced
because we now create constantexprs for the constants instead of failing the
fold.
llvm-svn: 10778
YACC as bison -y. In this way, we ensure that bison is being used, but
the Makefiles have macros for using bison itself and for getting bison to
act like it is traditional yacc.
llvm-svn: 10774
* Implement SCCP of load instructions, implementing Transforms/SCCP/loadtest.ll
This allows us to fold expressions like "foo"[2], even if the pointer is only
a conditional constant.
llvm-svn: 10767
LiveVariables::HandlePhysRegDef private they use information that is
not in memory when LiveVariables finishes the analysis.
Also update the TwoAddressInstructionPass to not use this interface.
llvm-svn: 10755
The first change (which is disabled) compactifies all of the function constant
pools into the global constant pool, in an attempt to reduce the amount of
duplication and overhead. Unfortunately, as the comment indicates, this is
not yet a win, so it is disabled.
The second change sorts the typeid's so that those types that can be used
by instructions in the program appear earlier in the table than those that
cannot (such as structures and arrays). This causes the instructions to
be able to use the dense encoding more often, saving about 5K on 254.gap.
This is only a .65% savings though, unfortunately. :(
llvm-svn: 10754
Fix iterator invalidation problems which was causing -mstrip to miss some
entries, and read free'd memory. This shrinks the symbol table of 254.gap
from 333 to 284 bytes! :)
llvm-svn: 10751
occurs when the symbol table for a module has been stripped, making all of the
function local symbols go away.
This saves 6728 bytes in the stripped bytecode file of 254.gap (which obviously
has 841 functions), which isn't a ton, but helps and was easy.
llvm-svn: 10750
* Refactor reader stuff out of include/llvm/Bytecode/Primitives.h. This is
internal implementation details for the reader, not public interfaces!
llvm-svn: 10739
This should get hunked over to the Sparc backend, along with
MachineCodeForInstruction and a bunch of files in include/llvm/Codegen,
but those battles will have to wait for a later time.
llvm-svn: 10731
of the register allocator as follows:
before after
mesa 2.3790 1.5994
vpr 2.6008 1.2078
gcc 1.9840 0.5273
mcf 0.2569 0.0470
eon 1.8468 1.4359
twolf 0.9475 0.2004
burg 1.6807 1.3300
lambda 1.2191 0.3764
Speedups range anyware from 30% to over 400% :-)
llvm-svn: 10712
stepping, next'ing, finish'ing, stacktraces, source listings, etc. You can't
print program variables yet though.
Oh, and I lost my nice commented version of funccall.ll :(
Test with:
llvm-as funccall.ll
llvm-db funccall.bc
<arguments>
This is not automatically testable yet, and the C front-end doesn't support
debug information yet. That said, it's a start.
llvm-svn: 10689
the debugging information formats will likely change, but it's a start, and I
have to move on to other things in the short-term, so it might be a while before
I get back to working on this.
llvm-svn: 10683
turn a memory address back into the LLVM global object that starts at that
address. Note that this won't cause any additional datastructures to be built
for clients of the EE that don't need this information.
Also modified some code to not access the GlobalAddress map directly.
llvm-svn: 10674
turn a memory address back into the LLVM global object that starts at that
address. Note that this won't cause any additional datastructures to be built
for clients of the EE that don't need this information.
llvm-svn: 10673
do the following:
% cd llvm/autoconf
% aclocal
% autoconf -o ../configure
This changes facilitaties the following:
1) It should be easier to incorporate new autoconf macros.
2) It allows for conversion to Automake (should we ever desire it).
llvm-svn: 10655
saved register it has a longer free range than ECX (which is defined
every time there is a fnuction call) which makes ECX a better register
to reserve.
llvm-svn: 10635
which denotes the register we would like to be assigned to (virtual or
physical). In register allocation, if this hint exists and we can map
it to a physical register (it is either a physical register or it is a
virtual register that already got assigned to a physical one) we use
that register if it is available instead of a random one in the free
pool.
llvm-svn: 10634
provides for future extensibility, might help the LLVA project avoid having to
hack their own LLI, and provides support required for the experimental Venus
project.
llvm-svn: 10620
* Inline callMain function
* Remove hack from the ExecutionEngines where the 'run' method would automatically
run atExit functions. Fixing this requires explicitly calling exit if main returns
llvm-svn: 10611
with live intervals was missing registers that were used before they
were defined (in the arbitrary order live intervals numbers
instructions).
llvm-svn: 10603
that defines the symbol "main." This is a hack that ensures that programs
that place their main function in a library and then link it in
(i.e. Apache 2.x) get their main function linked in.
There is probably a more correct way to do this, but this works for now.
llvm-svn: 10594
Modified ReadArchiveBuffer() so that it dynamically allocates the
std::string object used to hold the bytecode object file's name. This is
necessary because it is passed by reference to the new Module that is
allocated to represent the bytecode object, and previously we were
using a std::string that disappeared on function exit.
llvm-svn: 10565
instruction selector by adding a new pseudo-instruction
FP_REG_KILL. This instruction implicitly defines all x86 fp registers
and is a terminator so that passes which add machine code at the end
of basic blocks (like phi elimination) do not add instructions between
it and the branch or return instruction.
llvm-svn: 10562
* Move sparc specific code out of generic code
* Eliminate the getOffset() method which made INVALID_FRAME_OFFSET
necessary, which made pulling in MAX_INT as a sentinal necessary.
llvm-svn: 10553
VM.cpp and JIT.cpp files into JIT.cpp. This also splits some nasty code out
into TargetSelect.cpp so that people hopefully won't notice it. :)
llvm-svn: 10544
more operands and the two first operands are constrained to be the
same. The pass takes an instruction of the form:
a = b op c
and transforms it into:
a = b
a = a op c
and also preserves live variables.
llvm-svn: 10512
killing instruction is tracked. This causes the LiveIntervals to
create bogus intervals. The workaound is to add a range to the
interval from the redefinition to the end of the basic block.
llvm-svn: 10510
Move some of the longer LiveIntervals::Interval method out of the
header and add debug information to them. Fix bug and simplify range
merging code.
llvm-svn: 10509
a pointer from an AliasSet, maintain the pointer values on a doubly linked
list instead of a singly linked list, to permit efficient removal from the
middle of the list.
llvm-svn: 10506
implementation of a Target{RegInfo, InstrInfo, Machine, etc} now has a separate
header and a separate implementation file.
This means that instead of a massive SparcInternals.h that forces a
recompilation of the whole target whenever a minor detail is changed, you should
only recompile a few files.
Note that SparcInternals.h is still around; its contents should be minimized.
llvm-svn: 10500
a) remove opIsUse(), opIsDefOnly(), opIsDefAndUse()
b) add isUse(), isDef()
c) rename opHiBits32() to isHiBits32(),
opLoBits32() to isLoBits32(),
opHiBits64() to isHiBits64(),
opLoBits64() to isLoBits64().
This results to much more readable code, for example compare
"op.opIsDef() || op.opIsDefAndUse()" to "op.isDef()" a pattern used
very often in the code.
llvm-svn: 10461
allocaton on the X86 to add information to the machine code denoting
that our floating point stackifier cannot handle virtual point
register that are alive across basic blocks. This pass adds an
implicit def of all virtual floating point register at the end of each
basic block.
llvm-svn: 10446
dnl libelf is for sparc only; we can ignore it if we don't have it
AC_CHECK_LIB(elf, elf_begin)
@@ -296,46 +223,29 @@ AC_SEARCH_LIBS(mallinfo,malloc,AC_DEFINE([HAVE_MALLINFO],[1],[Define if mallinfo
dnl pthread locking functions are optional - but llvm will not be thread-safe
dnl without locks.
AC_SEARCH_LIBS(pthread_mutex_lock,pthread,AC_DEFINE(HAVE_PTHREAD_MUTEX_LOCK,1,[Define if PThread mutexes (e.g., pthread_mutex_lock) are available in the system's thread library.]))
AC_C_BIGENDIAN(AC_DEFINE([ENDIAN_BIG],[],[Define if the machine is Big-Endian]),AC_DEFINE([ENDIAN_LITTLE],[],[Define if the machine is Little-Endian]))
dnl Check for things that need to be included in public headers, and so
dnl for which we may not have access to a HAVE_* preprocessor #define.
@@ -520,18 +447,25 @@ AC_ARG_WITH(bcrepos,AC_HELP_STRING([--with-bcrepos],[Location of Bytecode Reposi
dnl Location of PAPI
AC_ARG_WITH(papi,AC_HELP_STRING([--with-papi],[Location of PAPI]),AC_SUBST(PAPIDIR,[$withval]),AC_SUBST(PAPIDIR,[/home/vadve/shared/Sparc/papi-2.3.4.1]))
dnl Location of the purify program
AC_ARG_WITH(purify,AC_HELP_STRING([--with-purify],[Location of purify program]),AC_SUBST(PURIFY,[$withval]))
dnl Get libtool's idea of what the shared library suffix is.
dnl (This is a hack; it relies on undocumented behavior.)
AC_MSG_CHECKING([for shared library suffix])
eval "SHLIBEXT=$shrext"
AC_MSG_RESULT($SHLIBEXT)
dnl Propagate it to the Makefiles and config.h (for gccld & bugpoint).
AC_SUBST(SHLIBEXT,$SHLIBEXT)
AC_DEFINE_UNQUOTED(SHLIBEXT,"$SHLIBEXT",
[Extension that shared libraries have, e.g., ".so".])
The '<tt>select</tt>' instruction is used to choose one value based on a
condition, without branching.
</p>
<h5>Arguments:</h5>
<p>
The '<tt>select</tt>' instruction requires a boolean value indicating the condition, and two values of the same <ahref="#t_firstclass">first class</a> type.
</p>
<h5>Semantics:</h5>
<p>
If the boolean condition evaluates to true, the instruction returns the first
value argument, otherwise it returns the second value argument.
This release implements the following new features:
<aname="newfeatures">This release implements the following new features:</a>
</div>
<ol>
<li><a
href="http://mail.cs.uiuc.edu/pipermail/llvmdev/2003-November/000528.html">A new
LLVM profiler, similar to gprof</a> is available</li>
<li><ahref="SourceLevelDebugging.html">A new LLVM source-level debugger has been started.</a></li>
<li>LLVM 1.2 encodes bytecode files for large programs in 10-30% less space.</li>
<li>LLVM can now feed profile information back into optimizers for Profile Guided Optimization, includes a simple basic block reordering pass, and supports edge profiling as well as function and block-level profiling.</li>
<li>The LLVM JIT lazily initializes global variables, reducing startup time for programs with lots of globals (like C++ programs).</li>
<li>LLVM and the C/C++ front-end now compile on Mac OS/X! Mac OS/X users can
now explore the LLVM optimizer with the C backend and interpreter. Note that
LLVM requires GCC 3.3 on Mac OS/X.</li>
<li>The build and installation infrastructure in this release is dramatically
into an 'llvm' C++ namespace</a>, for easier integration with third-party
code. Note that due to lack of namespace support in GDB 5.x, you will probably
want to upgrade to GDB 6 or better to debug LLVM code.</li>
<li>
The build system now copies Makefiles dynamically from the source tree to the
object tree as subdirectories are built. This means that:
<ol>
<li>
New directories can be added to the source tree, and the build will
automatically pick them up (i.e. no need to re-run <tt>configure</tt>).
<li>The "tblgen" tool is <ahref="TableGenFundamentals.html">now documented</a>.</li>
<li>The target-independent code generator got several improvements:
<ul>
<li>It can now fold spill code into instructions (on targets that support it).</li>
<li>A generic machine code spiller/rewriter was added. It provides an API for
global register allocators to eliminate virtual registers and add the
appropriate spill code.</li>
<li>The represenation of machine code basic blocks is more efficient and has
an easier to use interface.</li>
</ul>
</li>
<li>
You will need to build LLVM from the top of the object tree once to ensure
that all of the Makefiles are copied into the object tree subdirectories.
</li>
</ol>
</li>
<li>A front-end for "Stacker" (a simple Forth-like language) is now
<ahref="http://llvm.cs.uiuc.edu/PR136">included in the main LLVM tree</a>.
Additionally, Reid Spencer, the author, contributed a document <ahref="Stacker.html">describing his experiences writing Stacker, and the language itself</a>. This document is invaluable for others writing front-ends targetting LLVM.</li>
<li>The <tt>configure</tt> script will now configure all projects placed in the
<tt>llvm/projects</tt> directory.</li>
<li>The <tt>-tailcallelim</tt> pass can now introduce "accumulator" variables
to transform functions in many common cases that it could not before.</li>
<li>The <tt>-licm</tt> pass can now sink instructions out the bottom of loops
in addition to being able to hoist them out the top.</li>
<li>The <tt>-basicaa</tt> pass (the default alias analysis) has been upgraded
to be <ahref="http://llvm.cs.uiuc.edu/PR86">significantly more
precise</a>.</li>
<li>LLVM 1.1 implements a simple size optimization for LLVM bytecode files.
This means that the 1.1 files are smaller than 1.0, but that 1.0 won't
read 1.1 bytecode files.</li>
<li><ahref="http://llvm.cs.uiuc.edu/PR140">The gccld program produces a runner script that includes command-line options to load the necessary shared objects.</a></li>
<li><ahref="http://llvm.cs.uiuc.edu/PR253">LLVM now no longer depends on the boost library</a>.</li>
<li>The X86 backend now generates <b>substantially</b> better native code and is faster.</li>
<li>The C backend has been turned moved from the "llvm-dis" tool to the "llc"
tool. You can activate it with "<tt>llc -march=c foo.bc -o foo.c</tt>".</li>
<li>LLVM includes a new interprocedural optimization that marks global variables
"constant" when they are provably never written to.</li>
<li>LLVM now includes a new interprocedural optimization that converts small "by reference" arguments to "by value" arguments, which often improves the performance of C++ programs substantially.</li>
<li>Bugpoint can now do a better job reducing miscompilation problems by
reducing programs down to a particular loop nest, instead of just the function
being miscompiled.</li>
<li>The GCSE and LICM passes can now operate on side-effect-free function calls, for example hoisting calls to "<tt>strlen</tt>" and folding "<tt>cos</tt>" common subexpressions.</li>
<li>LLVM has early support for a new <a
href="LangRef.html#i_select"><tt>select</tt></a> instruction, though it is
currently only supported by the C backend.</li>
</ol>
@@ -159,133 +143,113 @@ In this release, the following missing features were implemented:
</div>
<ol>
<li><ahref="http://llvm.cs.uiuc.edu/PR88">The interpreter does not support
invoke or unwind</a></li>
<li><ahref="http://llvm.cs.uiuc.edu/PR16">Exception handling in the X86
& Sparc native code generators</a> is now supported</li>
<li>The C/C++ front-end now supports the GCC <tt>__builtin_return_address</tt> and <tt>__builtin_frame_address</tt> extensions. These are also supported by the X86 backend and by the C backend.</li>
<li><ahref="http://llvm.cs.uiuc.edu/PR249">[X86] Missing cast from ULong -> Double, cast FP -> bool and support for -9223372036854775808</a></li>
the "<ahref="http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html#Labels%20as%20Values">labels as values</a>" GCC extension, often used to build "threaded interpreters".</a></li>
<li><ahref="http://llvm.cs.uiuc.edu/PR99">Interpreter does not support the
<li><ahref="http://llvm.cs.uiuc.edu/PR95">SymbolTable::getUniqueName is very inefficient</a></li>
<li><ahref="http://llvm.cs.uiuc.edu/PR97">bugpoint must not pass -R<directory> to Mach-O linker</a></li>
<li><ahref="http://llvm.cs.uiuc.edu/PR98">[buildscripts] Building into objdir with .o in it fails</a></li>
<li><ahref="http://llvm.cs.uiuc.edu/PR101">[setjmp/longjmp] Linking C programs which use setjmp/longjmp sometimes fail with references to the C++ runtime library!</a></li>
<li><ahref="http://llvm.cs.uiuc.edu/PR107">AsmParser Misses Symbol Redefinition Error</a></li>
<li><ahref="http://llvm.cs.uiuc.edu/PR108">gccld -Lfoo -lfoo fails to find ./foo/libfoo.a</a></li>
the following extensions are known to <b>not be</b> supported:
<ol>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Local-Labels.html#Local%20Labels">Local Labels</a>: Labels local to a block.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html#Labels%20as%20Values">Labels as Values</a>: Getting pointers to labels, and computed gotos.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html#Nested%20Functions">Nested Functions</a>: As in Algol and Pascal, lexical scoping of functions.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Constructing-Calls.html#Constructing%20Calls">Constructing Calls</a>: Dispatching a call to another function.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Extended%20Asm">Extended Asm</a>: Assembler instructions with C expressions as operands.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Constraints.html#Constraints">Constraints</a>: Constraints for asm operands</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Constraints.html#Constraints">Constraints</a>: Constraints for asm operands.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Asm-Labels.html#Asm%20Labels">Asm Labels</a>: Specifying the assembler name to use for a C symbol.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Explicit-Reg-Vars.html#Explicit%20Reg%20Vars">Explicit Reg Vars</a>: Defining variables residing in specified registers.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Return-Address.html#Return%20Address">Return Address</a>: Getting the return or frame address of a function.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html#Vector%20Extensions">Vector Extensions</a>: Using vector instructions through built-in functions.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Target-Builtins.html#Target%20Builtins">Target Builtins</a>: Built-in functions specific to particular targets.</li>
<p>The following extensions <b>are</b> known to be supported:</p>
<ol>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html#Labels%20as%20Values">Labels as Values</a>: Getting pointers to labels and computed gotos.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html#Statement%20Exprs">Statement Exprs</a>: Putting statements and declarations inside expressions.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Typeof.html#Typeof">Typeof</a>: <code>typeof</code>: referring to the type of an expression.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Lvalues.html#Lvalues">Lvalues</a>: Using <code>?:</code>, "<code>,</code>" and casts in lvalues.</li>
@@ -500,7 +494,8 @@ work:
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Subscripting.html#Subscripting">Subscripting</a>: Any array can be subscripted, even if not an lvalue.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Pointer-Arith.html#Pointer%20Arith">Pointer Arith</a>: Arithmetic on <code>void</code>-pointers and function pointers.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html#Compound%20Literals">Compound Literals</a>: Compound literals give structures, unions or arrays as values.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html#Compound%20Literals">Compound Literals</a>: Compound literals give structures, unions,
or arrays as values.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html#Designated%20Inits">Designated Inits</a>: Labeling elements of initializers.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Cast-to-Union.html#Cast%20to%20Union">Cast to Union</a>: Casting to union type from any member of the union.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Case-Ranges.html#Case%20Ranges">Case Ranges</a>: `case 1 ... 9' and such.</li>
@@ -514,6 +509,7 @@ work:
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Alternate-Keywords.html#Alternate%20Keywords">Alternate Keywords</a>:<code>__const__</code>, <code>__asm__</code>, etc., for header files.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Incomplete-Enums.html#Incomplete%20Enums">Incomplete Enums</a>: <code>enum foo;</code>, with details to follow.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Function-Names.html#Function%20Names">Function Names</a>: Printable strings which are the name of the current function.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Return-Address.html#Return%20Address">Return Address</a>: Getting the return or frame address of a function.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Unnamed-Fields.html#Unnamed%20Fields">Unnamed Fields</a>: Unnamed struct/union fields within structs/unions.</li>
<li><ahref="http://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html#Attribute%20Syntax">Attribute Syntax</a>: Formal syntax for attributes.</li>
</ol></li>
@@ -532,7 +528,7 @@ lists, please let us know (also including whether or not they work).</p>
<divclass="doc_text">
<p>For this release, the C++ front-end is considered to be fully functional, but
<p>For this release, the C++ front-end is considered to be fully functional but
has not been tested as thoroughly as the C front-end. It has been tested and
works for a number of non-trivial programs, but there may be lurking bugs.
Please report any bugs or problems.</p>
@@ -540,17 +536,14 @@ Please report any bugs or problems.</p>
// (C) Copyright Boost.org 2001. Permission to copy, use, modify, sell and
// distribute this software is granted provided this copyright notice appears
// in all copies. This software is provided "as is" without express or implied
// warranty, and with no claim as to its suitability for any purpose.
// See http://www.boost.org for most recent version.
// Mac OS specific config options:
#define BOOST_PLATFORM "Mac OS"
// If __MACH__, we're using the BSD standard C library, not the MSL:
#if defined(__MACH__)
# define BOOST_NO_CTYPE_FUNCTIONS
# define BOOST_NO_CWCHAR
# ifndef BOOST_HAS_UNISTD_H
# define BOOST_HAS_UNISTD_H
# endif
// boilerplate code:
# include <boost/config/posix_features.hpp>
# ifndef BOOST_HAS_STDINT_H
# define BOOST_HAS_STDINT_H
# endif
//
// BSD runtime has pthreads, sched_yield and gettimeofday,
// of these only pthreads are advertised in <unistd.h>, so set the
// other options explicitly:
//
# define BOOST_HAS_SCHED_YIELD
# define BOOST_HAS_GETTIMEOFDAY
# ifndef __APPLE_CC__
// GCC strange "ignore std" mode works better if you pretend everything
// is in the std namespace, for the most part.
# define BOOST_NO_STDC_NAMESPACE
# endif
#else
// We will eventually support threads in non-Carbon builds, but we do
// not support this yet.
# if TARGET_CARBON
# define BOOST_HAS_MPTASKS
// The MP task implementation of Boost Threads aims to replace MP-unsafe
// parts of the MSL, so we turn on threads unconditionally.
# define BOOST_HAS_THREADS
// The remote call manager depends on this.
# define BOOST_BIND_ENABLE_PASCAL
# endif
#endif
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.