Meetup video on Use of R core scripting to eliminate ‘NA’ and other common issue
Detail of Meetup from:
Manuscript of Intended Presentation:
The Use of a<-a[-(i)] can lead to NA’s
Argument is that a<-a[-is.na(a)] would then suffice to clean this up, but what are the costs if, say, a is a resultant vector from a sorting algorithm which recursively shortens the vector?
The reality is that removing individual elements by referring to their index can be difficult on data integrity after the remaining indices are then restructured. Perhaps this is dependent on the cluster or R environment you are loading from. The reality is that NA’s are a commonly recurring problem in R.
Since there are many precompiled functions in R, it seems logical to make use of them. What isn’t so obvious is the usage of them for non-vector arguments. For example, typically rm() is a function which can be used to clean up a directory prior to inputting or after outputting a file from a program. However, rm() can also be used for the same purposes as a<-a[-(i)], and therefore bypassing the need to subsequently call a<-a[-is.na()] afterwards, and the risk for loss of data integrity.
More along the lines of data integrity is the loss of precision in arithmetic operations as you get close to your assigned machine precision. What then happens is dependent on, again, your own system and which version of R you are utilizing. Apparently 3.0.0 seems to be set up now with the idea of allowing data to just drop digits as precision is maxed out. To quote the current developers blog:
The following function is due for release:
digitloss=c(“allow”, “warn”, “forbid” )
C developers can deal with this by implementing their own arithmetic procedures, keeping in mind the underlying algorithm of each. e.g. Division can be viewed as the inverse operation of multiplication, which in turn can be viewed as a “convolution” of two floating point integers.
So what does this mean.. ? Maybe for the purposes of speeding up your system and avoiding the abovementioned data loss, converting your division problem to a multiplication by the inverse of your divisor, and then in order to convert your base 10 number to decimal formatting- either calling strtoll() or incorporating your own division algorithm.
At this point you would be ready to perform the “convolution” portion of your multiplication formula. Warning: convolve() in R (as in C’s numerical recipes) incorporates the Fourier transform, adding a full N*logN to your computational complexity. So it may be best to code up your own if you think time is of importance.
Examples of code demonstrating the above topics can be available upon request. Thanks for your attendance.
FACEBOOK ACCOUNT and TWITTER. Don't worry as I don't post stupid cat videos or what I eat!