Notes for ra
------------

There are just my internal notes and shouldn't be taken too seriously.
Pertinent stuff will be documented properly before I'm done.

Bug: permission problem on Windows Vista does not allow the user to
save their data at the end of an Ra session (to the standard R place).

Bug: A jit() at the top level in a file that is "sourced" from a file
that is already sourced (i.e. in a _nested_ sourced file) will
incorrectly give the message: 
ignored jit(0) in eval.with.vis because eval.with.vis is already jitting

eval.c is now split into several files: 
$ wc -l *eval*.c
   344 eval.c           core of the R evaluator
  1675 evalbc.c         luke's byte code compiler
   588 evalif.c         if and loops
   290 evalprof.c       R profiler 
   556 evaljit.c        jit evaluator
  1624 evaljit1.c       jit instructions
  1415 eval1.c          everything else that was in eval

I added "#define RNIL R_NilValue" to jit.h for readability.

A note on volatile SEXPS:
Volatile declarations in original loop code in eval.c were wrong
"volatile SEXP p" which is "volatile SEXPREC *p"
but for their intended use should be "SEXPREC volatile *p".
i.e. the pointer is volatile not the data it points to.
I fixed these in the code I moved to loops.c but there are still other 
places in eval that need to be fixed. In practice it seems to make
no difference --- gcc seems to treat both kinds of pointers the same,
probably it takes a conservative approach to anything that has
volatile anywhere in its declaration.

A note on CSEXPs
CSEXP is defined as "const SEXP" which is "const SEXPREC *p" but are
used to mean "SEXPREC const *p".  Declaring them as they really mean
would cause many many compiler warnings.  CSEXP shows the programmer's
intent, meaning "this SEXP is not modified in this routine".

Non linear time increase with time-jit-lib.R:for.if()
N <- 2e7
test(`for.if`, N / 5000)   .13 secs
test(`for.if`, N / 3000)   .37
test(`for.if`, N / 2000)   .72
test(`for.if`, N / 1000)  3.5  "should be" twice .72

`for.if` <- function(N)
{
    jit(jit.flag, TRACE.FLAG)
    iA <- seq(2,N); x <- double(N)
    for (i in iA) {
        if (i %% 2) 
            x <- x + 1
        else
            x <- x + 100
    }
    x
}

Jitted assign always uses copy on assign.  There may be a cheaper method.

In for.if, the copy on assign may kill the JIT speed advantage.

This is out of date but will use as a basis for correct documentation:
// We don't generate JIT instructions for for and while loops, but
// instead achieve efficiency by using the fact that the environment is
// stable while jitting.  Also we don't do NA handling and other checks,
// which helps make the code faster.
//
// Actually the above is not the whole truth for "for" loops.  For these
// there is a possible additional stage of jitting: if the whole  body was
// JIT compiled we then generate JIT_for_i and JIT_eval instructions for
// the loop header. Like this:
//
//   for_i   operand=indexBindingLoc
//   arg     operand=rhs (rhs is the evaluated rhs)
//   evaljit operand=JITTED(body)
//
// This only kicks in for the inner loop(s) of nested loops. Thus in
//       (i in 1:3) for (j in 1:3) body
// the j loop goes through this additional stage of jitting but
// not the i loop.  Note that this does not happen for loop bodies
// with break or next because the break or nest inhibit compilation
// of the entire loop.
//
// For "for" loops, we don't need to set the jitted bit of sym because
// the index is precalculated and thus subsequent changes to the index don't
// affect the loop condition.
// Which means that jitFor could be used for standard R (except
// when condition is a VECSXP or LISTSXP because condtype changes for these).

"exp" followed by an plus (or other?) arithmetic op is much slower under
standard R execution on ra versus R 2.6.2 See this for e.g. in
looped.dnorm.  Don't know why, it appears to be calling the same library
function "exp" as standard R. This doesn't affect the other math1 functions
that I tried.  Affects expm1 too. It also happens if I use the installed
Ra i.e. with identical blas.dll etc. as R 2.6.2.  Example:
    for (i in 1:N)           Ra slowdown compared to R
#        x <- exp(x) / sigma 60%
#        exp(x)               2%
#        x <- exp(x)          6%
#        x <- x / sigma       3%
#        x <- exp(x)          3%
#        x <- exp(x) / 1     60%
#        exp(x) / 1          60%
#        exp(0) / 1          60%
#        exp(1) / 1          60%
#        exp(1)               3%
#        2.71 / 1             5%
#        y / 1                5%
#        exp(1) + 1          60%
#        exp(1) + 0          60%
#        sin(1) + 0           3%
#        log(1) + 0           3%
#        expm1(10) + 0       60%

NAMED handling of subscripted assignment could be flaky.

Bug?: we don't install the help for ra when we load ra at bootup.
Means the user has to use library(jit) to get ra help.

Forced promises would be evaluated more quickly in standard R if we
replaced the PROMSXP with the evaluated promise.  Do this in
eval.c:forcePromiseAux.  A problem is that when unparsing you would
see the evaluated promise value rather than the original 
expression --- does this matter?

OBSOLETE In nested loops, the inner loop only gets jitted on the
second pass through the outer loop.  Which is a pity.  To fix requires
modifying jitProlog so we can restart a compile.

Standard R "for" is needlessy slow I think -- could use same
trick as jitWhile, because typeof condition is always constant.

Ra icon Times New Roman Bold 22pt 32x32 pixels 256 colors 
Blue color is R 85 G 127 B 212
Ra bitmap was created with Times New Roman Bold (32x32 pix 36pt)
I tried to create a small 16x16 pix 11pt icon as well but it displayed
as gray in the quick start menu --- why?  So now we let Windows
create the 16x16 quick start item automatically from the 32x32
icon, which doesn't look that good.

Jitted versions of functions like do_matprod (used for %*%) could maybe
be faster in a jitted block because they don't have to do type checking.
Likewise functions like any.

Think about this? Could create "alternative jit funcs' that will be
called in lieu of the original func if the original func appears in a
jitted expression.  The jit version of the func could be faster
because would not have to do type checking.

Jitted code has same syntax as standard R but with some
semantic restrictions.  This eliminates some confusion caused by
approaches like Pyrex e.g. in Pyrex you expect "++x" to behave like
C.  But it doesn't, because it gets parsed in the Python way as
"+(+x)" i.e. as two unary pluses.

The principle of "First do no harm" implies give an error msg "cannot
jit this piece of code" rather than producing incorect code.
Be conservative --- if in doubt call disallowIfJitting().

The jitter is profligate with memory.  It allocates all buffers that
are necessary in a jit block and holds them while the block is active.
This could be reduced considerably by reusing buffers with a fast
register allocation algorithm where the "registers" are memory
buffers. Or an even simpler approach because we can allocate if
necessary instead of spill.

R_RestartToken in applyClosure. How does this affect push_jit_state?

Check that jit handling is ok if R function invokeRestart and friends
are called.

Are there tests over and above make check-all that I should be doing?

Subsetting eg x$a is not yet supported

Compilation of this expression is inefficient:
    { x <- x + 5 + foo(x) + 6 + 7 }
Because the final + in the 6+7 is the top of the parse tree
#   Did not compile 0x1d8ef8c { x <- x + 5 + foo(x) + 6 + 7 ... }
#   Did not compile 0x1d8efa8 x <- x + 5 + foo(x) + 6 + 7
#   Did not compile 0x1d8f018 x + 5 + foo(x) + 6 + 7
#   Did not compile 0x1d8f088 x + 5 + foo(x) + 6
#   Did not compile 0x1d8f0dc x + 5 + foo(x)
#   Compiled x + 5

Should ForwardJitRecord be part of switch macro in memory.c
rather than through ForwardJitNodes?

You have to take care of lots and lots of devilish details in an R jit
compiler.  I can see why no one has seriously tried writing a jit
compiler for R before.

Pysco has 41000 lines of c and python code (32000 .c .h,  9000 .p*)

Should combine common sequences like "push push add" into one opcode.
Or push push.  But will possibly cause combinatorial explosion.

There are some things that I'm doing that could (should?) be merged
into the existing R framework --- e.g. jit state should be part of the
context rather than a seperate jit state stack.  Calls to
pop_jit_state is fragile -- won't work if exit a function neither by
usual way or by return -- when can this happen -- long jump or
exceptions in R?  Check out try too.

Front end parser could help optimize expressions like x[i] = x[i] + y[i]
i.e. recognize "+=" constructs and notify jitter.
This is one step more than common subexp recognition because we want
to see that lhs shares an address with rhs.  Actually, jitter could
probably figure it out, but more efficient to do in parser?  User
doesn't need to know about it, he would just get faster code.
Note added later: difficult for jitter to do because of parsing order, 
x[i] on rhs is at bottom of parse tree, not "next to" x[i] on lhs.

TAG field of LANGSXPs is unused so could be used to store file:linenbr
info.  Evidence: running make check-all with
    assert(TAG(e) == R_NilValue)
in evalSYM is ok.  This would be a nice addition to standard R i.e.
give the user the file and linenumber in error messages.

For displaying segmentation violations and other crashes use Dr Mingw:
http://jrfonseca.dyndns.org/projects/gnu-win32/software/drmingw/index.html

Compilation is much faster if -O3 is disabled i.e. OPTFLAGS=-Wall -pedantic
so edit gnuwin32\Makefile to use the above OPTFLAGS when environment
variable DEBUG is set.

To jit non-loop code would need to retain jitted code --- no unjit on
jit(0) --- and keep an associated list of preconditions.  When
re-enter jitted code first check if preconditions are satisfied; if so
use the jitted code, if not then re-jit.

Must update serialize.c for JIT code --- probably not needed?  But a
assertion error msg if someone tries to serialize jit code would be
good.

OBSOLETE For speed, will have to concatenate adjacent jitted
expressions in basic blocks

Fastest evalJIT would be stackless.  To implement use shadow stack at
compilation time cf Ertl's paper on translating Forth to C.

Allocating an ans field in JIT_OP always uses more memory than is
needed and is therefore slower --- rather keep a pool of temp
variables

Replace 1 elem const operands with immediate value

Serialization of a JIT -- serialize as usual but with "original" field

Should pre type convert constants when genning jit expression, for speed

SEXP attributes -- must write test code

genjit_subset repeats the same sequence of tests as do_subset_dflt,
which is inefficient

genjit_subset and genjit_binary take quite different approaches.  Is
it possible to make these more uniform for simplicity? But must
efficient too.

DONE Make sure I conform to convention with attribute_hidden and Rf_
prefixes etc.  Add to Rinternals.h? #define evalJIT Rf_evalJIT

jits array should be a linked list, so no MAX_JITS?

Conform code layout to standard --- I'm using epsilon not emacs so may
be some diffs.

Could try to order stack manipulations to reduce Pentium (or
whatever) instruction order dependencies.  I think that means that
istack should point to one beyond currently active stack elements.
Right now it points to currently active stack elements.
  e.g. { stack[istack-1] = ans; istack-- } can be executed in parallel
  but  { stack[--istack] } can't
But maybe this only helps push instructions

Unroll for loops in evalJIT if initial n > CONST?

Doc: use library funcs not R funcs in exec_jit, for speed, e.g. we use
pow not R_pow

In evalJIT move common opcodes close together for better processor
cache use?  Compiling by copying C snippets into the jit execution
buffer would make this unneeded.

Note that removing the default in evalJIT does not make the switch faster

Revisit ForwardJitNodes and gc handling.  Must take care of garbage
collection ageing here (for original and ans)?

If compiler was aware of jitting could perhaps alloc a jit_record only
of the size we need



Assertions
----------

This section tries to justify why I added Defn.h:assert().  The assert
macro calls assertFail which calls error with a standard message:

int assertFail(const char *fname, const unsigned nline, const char *exp)
{
    error(_("internal assertion failed in file %s line %d: %s\n"
            "Please save your data and quit the R session."),
          fname, nline, exp);

    return 0;
}

This particular way of skinning the cat makes sense for the following
reasons.

Standard assert in assert.h aborts the R session which is too drastic.

Using error for assertions as has been commonly done conflates program
errors and user errors.  The result is that every man does what is
right in his own eyes for inconsistent conditions in the code and
there is no standard msg to the user.

Examples:

envir.c      error("first argument ('table') not of type VECSXP, from R_HashResize");

memory.c:    warning("R_AllocStringBuffer(-1) used: please report");

dataentry.c: error("X11 fatal IO error: please save work and shut down R");

memory.c:    naked abort();

platform.c:  error(_("internal out-of-memory condition"));

util.c:      error(_("unimplemented type (%d) in '%s'\n"), t, s);

system.c:    "Fatal error: %s\n"  (in R_Suicide)

saveload.c:
    #ifdef NDEBUG
    #define R_assert(e) ((void) 0)
    #else
    #define R_assert(e) ((e) ? (void)
                         0 : error("assertion `%s' failed: file `%s', line %d\n", \
                                    #e, __FILE__, __LINE__))
    #endif /* NDEBUG */

There was discussion in the mailing lists with no clear resolution.
https://stat.ethz.ch/pipermail/r-devel/2007-November/047402.html

Unloading the package which issues an assert was suggested.  But that
makes it harder for users to save their data.  Instead it is better to
just issue the assertion fail message and let users decide what they
want, which is consistent with the R approach -- i.e. let the user
have the power.

The programmer can call assertFail() directly when he wants.

The macro is called assert and not R_assert so if people use assert in
packages etc. they call our R assert.  If it was called R_assert then
package code that also runs in other environments would have to
finangle their calls to R_assert to make them calls to assert.

Note that regex.c, malloc.c, extra/trio/*.c and
gnuwin32/front_ends/*.c include the standard C <assert.h>

The term "quit" is used in the assert message for consistency
with the R q() and quit() commands.

Is Defn.h the best place for R assert()?

Add fflush to assert?


Profiling results
-----------------

For microsoft debug build with inlining disabled Dec, 2007

jitflag = 11
convolve <- function(a,b)
{
    jit(jitflag)
    na <- length(a)
    nb <- length(b)
    ab <- numeric(na + nb)
    for(i in 1:na)
        for(j in 1:nb)
             ab[i + j] <- ab[i + j] + a[i] * b[j]
    ab
}
N = 400
a <- double(N)
b <- double(N)
convolve(a, b)


        Func          Func+Child           Hit
        Time   %         Time      %      Count  Function
---------------------------------------------------------
     267.126   7.4     3529.398  98.1  1645166 _Rf_eval (eval.obj)
     209.745   5.8     3014.219  83.8   160000 _applydefine (eval.obj)
     195.438   5.4      195.438   5.4  2463005 _Rf_findVarInFrame3 (envir.obj)
     185.937   5.2      809.673  22.5   319998 _evalJIT (jit.obj)
     159.204   4.4      582.883  16.2  1774630 _Rf_evalSYM (eval.obj)
     136.254   3.8      136.254   3.8  5083505 _Rf_protect (memory.obj)
     116.984   3.3      157.690   4.4   483672 _Rf_install (names.obj)
     112.328   3.1      156.699   4.4  2127629 _Rf_cons (memory.obj)
     111.993   3.1      111.993   3.1  2115137 _Rf_length (util.obj)
     101.169   2.8     3487.506  97.0   492451 _evalLANG_SPECIAL (eval.obj)
      85.225   2.4      255.903   7.1  1774710 _Rf_findVar (envir.obj)
      80.111   2.2      572.316  15.9  1439991 _pushSYM_with_check (jit.obj)
      73.270   2.0       73.270   2.0  3131549 _Rf_unprotect (memory.obj)
      70.706   2.0      219.315   6.1   160000 _VectorAssign (subassign.obj)
      56.757   1.6       56.757   1.6  1661463 _R_CheckStack (errors.obj)
      56.247   1.6     3529.113  98.1   498349 _evalLANG (eval.obj)
      54.015   1.5       54.015   1.5  2603449 _Rf_isNull (util.obj)
      51.948   1.4      220.891   6.1   160000 _replaceCall (eval.obj)
      50.520   1.4      551.252  15.3   163893 _Rf_DispatchOrEval (eval.obj)
      48.026   1.3     3112.047  86.5   320972 _do_set (eval.obj)
      47.325   1.3       57.709   1.6  1127418 _SETCAR (memory.obj)
      47.026   1.3       47.026   1.3  1776566 _Rf_isEnvironment (util.obj)
      45.592   1.3       90.888   2.5   322310 _Rf_defineVar (envir.obj)
      42.049   1.2      376.044  10.5   163890 _Rf_evalListKeepMissing (eval.obj)
      38.836   1.1       38.836   1.1   483672 _R_Newhashpjw (envir.obj)
      38.414   1.1      257.044   7.1   498349 _Rf_findFun (envir.obj)
      37.793   1.1       37.793   1.1  1439991 _inc_stack_with_check (jit.obj)
      34.285   1.0      327.281   9.1   160000 _do_subassign_dflt (subassign.obj)
      33.001   0.9       83.491   2.3   480836 _R_GetGlobalCache (envir.obj)
      32.240   0.9     3454.489  96.1      403 _do_for (eval.obj)
      30.355   0.8       33.655   0.9   173678 _Rf_allocVector (memory.obj)
      29.269   0.8       29.269   0.8   481791 _hashIndex (envir.obj)
      25.645   0.7       34.147   0.9     8979 _GetNewPage (memory.obj)
      24.670   0.7      108.180   3.0   480836 _findGlobalVar (envir.obj)
      24.421   0.7       40.060   1.1   504440 _Rf_CreateTag (coerce.obj)
      24.041   0.7       24.041   0.7   990752 _SET_TAG (memory.obj)
      22.212   0.6       54.516   1.5   162019 _Rf_makeSubscript (subscript.obj)
      22.159   0.6      110.144   3.1   487371 _evalPROM (eval.obj)
      22.110   0.6      141.671   3.9   160000 _evalseq (eval.obj)
      21.279   0.6       21.279   0.6   480836 _R_HashGet (envir.obj)
      20.605   0.6       53.263   1.5   160106 _Rf_list2 (util.obj)
      20.130   0.6       44.096   1.2   160000 _SubAssignArgs (subassign.obj)
      19.839   0.6       19.839   0.6   639996 _DCHECK_binop_x1_y1 (jit.obj)
      19.655   0.5       19.655   0.5   283159 _old_to_new (memory.obj)
      19.294   0.5      344.319   9.6   160000 _assignCall (eval.obj)
      18.110   0.5       18.110   0.5   479997 _DCHECK_subset_x_y1 (jit.obj)
      17.875   0.5       27.146   0.8   320969 _SET_FRAME (memory.obj)
      17.366   0.5       17.366   0.5   645925 _Rf_isSymbol (util.obj)
      16.798   0.5       16.798   0.5        1 _gen_tempname (mkdtemp.obj)
      16.707   0.5       17.517   0.5   502007 _Rf_getAttrib (attrib.obj)
      16.643   0.5       16.643   0.5   500885 _SETCDR (memory.obj)
      16.452   0.5       16.950   0.5        1 _Init_R_Variables (platform.obj)
      16.272   0.5       16.306   0.5        5 _resolveNativeRoutine (dotcode.obj)
      16.243   0.5       16.243   0.5   665780 _vmaxget (memory.obj)
      16.196   0.5       38.705   1.1   161493 _setVarInFrame (envir.obj)
      16.041   0.4       34.442   1.0   160000 _Rf_unbindVar (envir.obj)
      15.988   0.4       20.287   0.6   168523 _Rf_begincontext (context.obj)
      15.365   0.4       19.588   0.5      936 _do_subset3 (subset.obj)
      15.331   0.4       58.404   1.6   160008 _Rf_allocList (memory.obj)
      15.128   0.4       15.128   0.4       10 _adler32 (adler32.obj)
      14.952   0.4       14.952   0.4   498349 _inc_jit_unresolved (eval.obj)
      14.903   0.4       14.903   0.4   497252 _Rf_check_stack_balance (eval.obj)
      14.763   0.4       14.763   0.4   332821 _Rf_isVector (util.obj)
      13.827   0.4       16.168   0.4   165954 _Rf_mkPROMISE (memory.obj)
      13.558   0.4       91.745   2.6   160076 _Rf_lang3 (util.obj)
      13.471   0.4       13.471   0.4   479997 _CHECK_subscript_i (jit.obj)
      13.282   0.4       13.282   0.4   329227 _Rf_isString (util.obj)
      12.712   0.4       12.712   0.4   160000 _SubassignTypeFix (subassign.obj)
      12.192   0.3       12.192   0.3   322296 _Rf_isLanguage (util.obj)
      11.761   0.3       20.238   0.6   160000 _R_SetVarLocValue (envir.obj)
      11.065   0.3       11.065   0.3   497257 _vmaxset (memory.obj)
      10.773   0.3       25.078   0.7   162019 _Rf_isMatrix (util.obj)
      10.494   0.3       10.494   0.3        1 _isMethodsDispatchOn (objects.obj)
       9.945   0.3       21.244   0.6   162019 _Rf_vectorSubscript (subscript.obj)
       9.733   0.3       92.020   2.6   160000 _EnsureLocal (eval.obj)
       9.382   0.3        9.382   0.3   323635 _R_Reprotect (memory.obj)
       9.354   0.3       41.827   1.2        5 _do_dotcall (dotcode.obj)
       9.087   0.3        9.087   0.3   160000 _RemoveFromList (envir.obj)
       8.972   0.2      876.181  24.4   160000 _do_subassign (subassign.obj)
       8.784   0.2        8.860   0.2   168523 _Rf_endcontext (context.obj)
       8.000   0.2       17.324   0.5   160396 _Rf_lcons (util.obj)
       7.894   0.2       15.459   0.4   160000 _R_findVarLocInFrame (envir.obj)
       7.570   0.2        7.570   0.2   160024 _findVarLocInFrame (envir.obj)
       7.270   0.2       11.299   0.3   162019 _int_vectorSubscript (subscript.obj)
       6.785   0.2        6.785   0.2     5032 _SET_PRENV (memory.obj)
       6.723   0.2       45.428   1.3   161493 _Rf_setVar (envir.obj)
       6.321   0.2        7.052   0.2     3981 _ReleasePage (memory.obj)
       5.676   0.2        5.676   0.2   165032 _SET_PRVALUE (memory.obj)
       5.576   0.2      381.621  10.6   163891 _evalArgs (eval.obj)
       5.472   0.2       15.281   0.4   160284 _Rf_list1 (util.obj)
       5.389   0.1        8.982   0.2     9405 _Rm_malloc (malloc.obj)
       5.191   0.1        5.191   0.1      847 _stdout_fflush (connections.obj)
       5.013   0.1        5.013   0.1   160000 _R_GetVarLocSymbol (envir.obj)
       4.152   0.1       11.204   0.3        1 _TryToReleasePages (memory.obj)
       3.997   0.1        3.997   0.1   166045 _Rf_isObject (util.obj)
       2.823   0.1       15.734   0.4        7 _RunGenCollect (memory.obj)
       2.367   0.1       18.117   0.5     7696 _inflate (inflate.obj)
       2.241   0.1        2.241   0.1     5124 _tmalloc_large (malloc.obj)
       1.757   0.0        2.735   0.1     4057 _Rf_matchArgs (match.obj)
       1.547   0.0       59.411   1.7     4802 _Rf_evalList (eval.obj)
       1.524   0.0        1.524   0.0        1 _SortNodes (memory.obj)
       1.402   0.0     3528.979  98.1     4057 _Rf_applyClosure (eval.obj)
       1.363   0.0        2.270   0.1    16313 _R_ProcessEvents (system.obj)
       1.257   0.0        3.901   0.1    16296 _R_CheckUserInterrupt (errors.obj)


Pros of my bottom up approach
-----------------------------

  Simple in that there are no changes to front end and changes to existing
  code are minimal, except to eval.c and envir.c.

  Semantics of code generation driven by existing R code, so
  semi-automatically deals with vagaries of standard R.
  There is some type conversion messiness in arithmetic.c
  which we take care of in the genjit*() funcions.

  Can mix and match jitted and standard code, allows incremental development
  with useful results all along.

  Only susceptible to changes to R code that is jitted --- i.e. most new
  extensions to R won't affect jitter.


Cons of my approach
-------------------

  Don't use a standard computer science parsing recipe, makes it less easy
  to understand (maybe).

  Following breadcrumbs dropped by the evaluator can lead you into the woods.



Things that can be done to speedup standard R, maybe
----------------------------------------------------

R_CheckStack() shouldn't use real arithmetic

Don't need both R_CheckStack and R_EvalDepth, R_CheckStack should suffice?

Use R_INLINE in arithmetic.c and elsewhere eg real_binary()
Probably unnecessary --- gcc seems to inline anyway when needed.

__asm__ ( "fninit" ) is unneeded for every eval, just use it on
LANGSXP and the like.  Is there an R doc on speficically why it is was
put in apart from the generally known idea that Windows libraries can
fox up float state?  (fninit == all errors are masked, 64-bit
mantissa and rounding are selected). It uses 12-22 clock cycles
roughly equivalent to FDIV, not that bad.

eval.c: evalself():
    if (NAMED(e) != 2)        is "if" necessary?
        SET_NAMED(e, 2);

eval.c: eval():
    if (++evalcount > 100)    better pipeplining with evalcount++
                              i.e. not ++evalcount

In eval() use a call table indexed on SXPTYPE rather than a switch



What arithmetic.c does with logicals
------------------------------------

intsym = as.integer(1)
logsym = TRUE
realsym = 9                                       is        want(?)
jit(11); for (i in 1:3) intsym+TRUE             |add_i1_i1
# jit(11); for (i in 1:3) intconst+TRUE         |
jit(11); for (i in 1:3) TRUE+TRUE               |add_i1_i1
jit(11); for (i in 1:3) realsym+TRUE            |add_r1_r1 add_r1_i1
jit(11); for (i in 1:3) 9+TRUE                  |add_r1_r1 add_r1_i1
jit(11); for (i in 1:3) logsym+TRUE             |add_i1_i1
jit(11); for (i in 1:3) intsym+logsym           |add_i1_i1
# jit(11); for (i in 1:3) intconst+logsym       |
jit(11); for (i in 1:3) TRUE+logsym             |add_i1_i1
jit(11); for (i in 1:3) realsym+logsym          |add_r1_r1 add_r1_i1
jit(11); for (i in 1:3) 9+logsym                |add_r1_r1 add_r1_i1
jit(11); for (i in 1:3) logsym+logsym           |add_i1_i1

jit(11); for (i in 1:3) logsym+realsym          |add_r1_r1 add_i1_r1
jit(11); for (i in 1:3) logsym+9                |add_r1_r1 add_i1_r1


The rule is: if one arg is real then arithmetic.c converts lgl to real
before add, including converting default logical NA to double NA i.e.
a specific NaN.

Jitter solution 1: if one arg of add_r_r is lgl then type convert it
to real -- but can only do this at runtime in evalJIT every time
add_r_r is executed, because the arg may be anywhere on the stack --
not fast

Jitter solution 2: add type params to genjit_real_binary() and treat
one arg as int where possible Seems to be the best approach, but will
slow down standard R

Jitter solution 3: as above, but preconvert int and lgl constants to
double before evalJIT -- would reduce number of double casts in
eval_jit -- a refinement for later



GP Bits in current R code
-------------------------

DDVAL_MASK              0x1
READY_TO_FINALIZE_MASK  0x1
FINALIZE_ON_EXIT_MASK   0x2
LATIN1_MASK             0x4     1<<2
UTF8_MASK               0x8     1<<3
S4_OBJECT_MASK          0x10    1<<4
JITTED_BIT              0x1000          added for ra, used on LANGSXPs
CANNOT_JIT_BIT          0x2000          added for ra, used on LANGSXPs and bindings
BINDING_LOCK_MASK       0x4000  1<<14
ACTIVE_BINDING_MASK     0x8000  1<<15
MISSING_MASK            0xf     15

naughty macros that clobber all gp bits:

SETLEVELS(x,v)   applied to vectors, all? types in saveload, serialize
SET_ARGUSED(x,v) = SETLEVELS(x,v) match.c, unique.c
SET_HASHASH(x,v) memory.c, names.c only applied to PRINTNAMEs i.e. symsxp.pname i.e. CAR
SET_PRSEEN(x,v)  forcePromise, evalPROM

naughty but only used in macros that keep the other bits:

SET_ENVFLAGS(x,v)



Contexts
--------

begincontext called by
        dataentry.c CTXT_CCODE
        builtin.c do_cat CTXT_CCODE
        context.c R_ToplevelExec CTXT_TOPLEVEL
        errors.c R_CheckStack CTXT_CCODE vwarningcall_dflt ...
        eval.c  evalLANG_BUILTIN    CTXT_BUILTIN
                applyClosure        CTXT_RETURN
                R_execClosure       CTXT_RETURN
                do_for/while/repeat CTXT_LOOP
                applydefine         CTXT_CCODE
                do_eval             CTXT_RETURN
                tryDispatch         CTXT_RETURN
                bcEval              CTXT_RETURN
                loopWithContect     CTXT_LOOP (called from bcEval)
                do_browser          CTXT_BROWSER CTXT_RESTART

        memory.c RunFinalizers CTXT_TOPLEVEL
                 protect CTXT_CCODE only if ppstack not big enough
        saveload.c CTXT_CCODE
        scan.c do_scan CTXT_CCODE
        serialize.c CTXT_CCODE

CTXT_TOPLEVEL = 0
CTXT_NEXT     = 1
CTXT_BREAK    = 2
CTXT_LOOP     = CTXT_NEXT|CTXT_BREAK
CTXT_FUNCTION = 4
CTXT_CCODE    = 8
CTXT_RETURN   = CTXT_FUNCTION|CTXT_CCODE
CTXT_BROWSER  = 16
CTXT_GENERIC  = CTXT_FUNCTION|CTXT_BROWSER
CTXT_RESTART  = 32
CTXT_BUILTIN  = 64

applyClosure called by
        eval.c
                evalLANG_CLOS
                do_recall
                DispatchOrEval
                DispatchGroup
                bcEval
        bind.c
        objects.c
        summary.c: do_range


Fixed or stale or now in other docs
-----------------------------------

FIXED You get a lousy error msg if you have an R warning or error in a jit block

   e.g. jit(11); x = double(5); for (i in 1:3) x[2:3] = x + 1

   Reported as
     Error in x[2:3] = x + 1 : default argument cannot be used in a jit block

FIXED There are some garbage collection issues with jitted code (a bug).
To do with SEXP generation ageing, it seems.

FIXED There is no implicit jit(0) on return from a function (a bug)

You can't call an R function from a jit block (will get an error msg)

FIXED Using matrices (objects with more than one subscript) in
jit blocks cause spurious error messages (and maybe wrong answers?)

FIXED? Incorrect error msgs for some legitimate expressions in a jit block

   e.g. jit(11); x = double(5); y = c(1,2); for (i in 1:3) x[2:3] = x[2] + y

   Reported as
     Assertion failed: (((VECSEXP) (ans))->vecsxp.length) == 1 ...
     Error: internal assertion failed

FIXED After untarring, GETVERSION doesn't rebuild Rversion.h

Arithmetic gotcha:  NAs are logical and logicals are treated
as ints (for efficient) in jitted code. Therefore
   standard R   1+NA -> NA
   jitted code  1+NA -> 1+INT_MIN
This is confusing because in standard R code, NAs in double arithmetic
ops are converted to as.double(NA).
To prevent this, use NaN not NA.

To be optimized, both arg of a binop must have the same len or one of
them must have len 1

Negative indices aren't allowed.

Currently just do run time specialization of some arithmetic ops
and vector indexing and for loops.

For speed we use don't do any NA handling in the code.  The hardware
generates NANs instead, where appropriate, which is close enough.

In jit block:
i)  Can't create a variable in the jit block (or change type of var etc)
ii) Can't use "assign" etc. to create any var in jit block or any
calls made from jit block

NO LONGER VALID: MAX_STACK_LEN is very generous because we don't check
for stack overflow when DEBUG_JIT==0.  So jitted R will crash if you
have too many nested jitted expressions. That's a worthwhile trade-off
for the speed increase. C does it too. TODO check if indeed
worthwhile.

More macros in evalJIT?

FIXED In current implementation, all or nothing of an expression is jitted

When setting CANNOT_JIT_MASK we should set it for all sub expressions too? No



R directory structure
----------------------


R>l
./              NEWS            Tcl/            etc/            share/
../             NOTES-ra.txt    VERSION         include/        src/
COPYING         ONEWS           bin/            library/        tar-ra.bat
ChangeLog       OONEWS          config.site     m4/             tests/
INSTALL         README          configure       modules/        tools/
Makeconf.in     README-ra.txt   configure.ac    po/
Makefile.in     SVN-REVISION    doc/

R/bin>l
./                 Rcmd.exe           Rproxy.dll         check
../                Rd2dvi.sh          Rscript.exe        config.sh
INSTALL            Rd2txt             Rterm.exe          graphapp.dll
R.dll              Rdconv             SHLIB              helpPRINT.bat
R.exe              Rdiff.sh           Sd2Rd              iconv.dll
REMOVE             Rgui.exe           Stangle.sh         massage-Examples
RSetReg.exe        Rlapack.dll        Sweave.sh          md5check.exe
Rblas.dll          Rprof              build

R/tools>l

./                  Rdnewer.pl          install-info.pl     missing
../                 config.guess        install-sh          move-if-change
GETCONFIG           config.rpath        keywords2html.pl    pkg2tex.pl
GETDISTNAME         config.sub          ldAIX4              rsync-recommended
GETMAKEVAL          copy-if-change      link-recommended    updatefat
GETVERSION          getsp.class         linkcheck.pl
Makefile.in         getsp.java          ltmain.sh
README              help2man.pl         mdate-sh

R/src>l
./            appl/         include/      modules/      unix/
../           extra/        library/      nmath/
Makefile.in   gnuwin32/     main/         scripts/

R/src/gnuwin32>lsd
bitmap/
check/
cran/
fixed/
front-ends/
getline/
help/
installer/
unicode/
windlgs/

R/src>l extra
./            blas/         intl/         xdr/
../           bzip2/        pcre/         zlib/
Makefile.in   graphapp/     trio/

R/src/include>lsd
R_ext/
Rmodules/

R/src/scripts>l
./              Makefile.in     Rd2txt          Stangle         javareconf
../             Makefile.win    Rdconv.in       Sweave          mkinstalldirs
BATCH           R.sh.in         Rdiff           build.in        pager
COMPILE.in      REMOVE.in       Rprof.in        check.in
INSTALL.in      Rcmd            SHLIB.in        config
LINK            Rd2dvi          Sd2Rd.in        f77_f2c

R/src/unix>l
./            Rscript.c     dynload.c     libR.pc.in    system.c
../           Runix.h       edit.c        stubs.c       system.txt
Makefile.in   X11.c         hpdlfcn.c     sys-std.c
Rembedded.c   aqua.c        hpdlfcn.h     sys-unix.c


R scripts
---------

R>ff *ake* *ac *in

./bin/config.sh
./config.site
./configure
./configure.ac

./etc/ldpaths.in
./etc/Makeconf
./etc/Makeconf.in
./etc/Makefile.in
./etc/Renviron.in

./include/Rconfig.h

./m4/Makefile.in

./Makeconf.in
./Makefile.in

./share/make/
./share/make/config.mk
./share/Makefile.in

./src/appl/Makedeps
./src/appl/Makefile.in
./src/appl/Makefile.win

./src/extra/blas/Makefile.in

./src/gnuwin32/COPYRIGHTS.win

./src/gnuwin32/cran/index.in
./src/gnuwin32/cran/rdevel.in
./src/gnuwin32/cran/ReadMe.in
./src/gnuwin32/cran/release.in
./src/gnuwin32/cran/rpatched.in
./src/gnuwin32/cran/rtest.in

./src/gnuwin32/fixed/bin/
./src/gnuwin32/fixed/bin/config.sh
./src/gnuwin32/fixed/etc/Makeconf
./src/gnuwin32/fixed/fixbin
./src/gnuwin32/fixed/h/config.h
./src/gnuwin32/fixed/h/config64.h
./src/gnuwin32/fixed/Makefile

./src/gnuwin32/front-ends/make.rtest
./src/gnuwin32/front-ends/Makedeps
./src/gnuwin32/front-ends/Makefile

./src/gnuwin32/getline/Makefile

./src/gnuwin32/help/Makefile

./src/gnuwin32/installer/Makefile

./src/gnuwin32/Makedeps
./src/gnuwin32/MakeDll
./src/gnuwin32/makeDllRes.pl
./src/gnuwin32/Makefile
./src/gnuwin32/Makefile.packages
./src/gnuwin32/MakePkg
./src/gnuwin32/windlgs/src/Makevars.win

./src/include/config.h
./src/include/config.h.in
./src/include/Makefile.in
./src/include/Makefile.win
./src/include/R_ext/Makefile.in
./src/include/Rconfig.h
./src/include/Rmath.h0.in
./src/include/stamp-h.in

./src/library/Makefile.in
./src/library/base/DESCRIPTION.in
./src/library/base/Makefile.in

./src/main/

./src/main/Makedeps
./src/main/Makefile.in
./src/main/Makefile.win

./src/Makefile.in

./src/nmath/Makedeps
./src/nmath/Makefile.in
./src/nmath/Makefile.win
./src/nmath/standalone/libRmath.pc.in
./src/nmath/standalone/Makefile.in
./src/nmath/standalone/Makefile.win

and so on...
