Monthly Archives: June 2012

What kind of profitable financial forecasting models and algorithms do R professional users focus on?

What kind of profitable financial forecasting models and algorithms do R professional users  focus on?

I am on the hunt for various modeling types which I have comes across. I can list a bunch but I am sure there a pile I would be missing as well.

The ones I have under my radar right now include:

Garch

Arima

Arma

PCA

Markov chain or mcmc

CAPM

Autoregressive (AR)

Bayesian

Event arbitrage

Market inefficiency

Mape

Mean reversion

Moving average

​Answer this survey here.

It seems I am having no problems finding R source example tutorials with the exception of PCA and Markov. Maybe these are not used as much within the world of quant financial modeling? Or at least within R vs Matlab. Who knows but surveys like this help me understand what R users are actually focusing on.

if you have an opinion I what could be missing, please comment because I would be highly interested in what others have to say on what could be missing. As you can tell from the above list, these types of models are most likely popular or just more academic. I do realize what is used with industries like banking, hedge funds, these models could be significantly altered or tweaked to make them highly proprietary and obviously profitable. We would usually never know that secret.

Also from the R tutorials I am seeing also blend other languages which help speed up the execution of single threaded R. Certain examples that come to mind is C++ with the help of fantastic R packages like Rcpp or Rinside. With some heavy computing (also known as expensive) makes uses of parallelization or NOsql database solutions to help speed up simulations and calculations. I never even investigated other uses likes GPU, FPGA, or Cuda. Does it ever stop?

​As said, my primary goal is to find what kind of models R users and developers are using for their own research. It seems these same polls not only help me, but help others as well as they have access to the poll results.

​​The one thing I find very overwhelming is what actually is profitable and which could be duds in terms of models. Obviously these are very important things to consider when spending vast amounts of time in certain model development using a tool like R,

These answers help me figure out what to present for my new R Matlab User group.

 

 

 

Which are my fave GARCH R packages for financial forecasting and future a trading models for lucractive strategy

Let me do a brief but hopeful review for those considering which R package to go with for

GARCH financial forecasting
Yes it is popular but I have been advised not to rely on it too much. That being said, I cannot disagree with more advanced users of this math forecasting method. After going through various modeling demos, I have come to the conclusion in which R packages might be best for your needs as there are many to choose from. Remember that these are my experiences so they might be most appropriate for you. Also, I am kind of new at this so I claim nowhere to be an expert.
garch function from tseries R package
This may be fast but does not always find a solution. If simulations are a must, use this for being really fast. I demoed it and it is usually 1/6 slower than fGarch
garchFit function from fGarch R package
More accurate in finding a potential GARCH solution but six times slower as mentioned above. What is a good combination for this? As I found at this link:
http://www.r-bloggers.com/trading-using-garch-volatility-forecast/
The example code can easily give you best of both worlds.
Carrying on with the weekly tour of GARCH, I came across with even more accurate solutions and better statistical reporting.
EGARCH in rgarch R package
All I could get were pretty little plots with this and the ggplot2 R function. This was also a real (How shall we say nicely) a bitch set up. I could not bother with this R package.

 rugarch package
I think this one is my favorite just due to the amount of statistical reporting with important p-value results. Now I am sounding like some math expert. Anyhow, it seems that this was the slowest but it gave me the best results. It also allows rolling windows too which is kind of nice. You cannot find that in fgarch

No GARCH package

If you are a math whiz, develop your own raw GARCH proprietary algorithm. This what the well paid PHDs do in large banks like Goldman Sachs do. For me, to do this is wishful thinking but it is definitely an option for those who can do it.

P.S. For being slow, maybe I could apply paralyzing (with something like RSnow) to the local for loops in these functions?
Hey I just start my new R Matlab User Meetup  group!

Here is a Youtube video on how to use R to access MYSQL through RMySQL R package

Here is a Youtube video on how to use R to access MYSQL through RMySQL R package

This might be one of those extra unneeded videos you can find on Youtube but surprise, there were none. After running my survey of which database (commercial, open source RDMS or NOSQL) R users run, it struck me that nearly half use MYSQL followed by PostgreSQL.

I always had concerns of MYSQL with scaling and redundancy for my large data I anticipate my models and simulations will need. I demoed everything from Cassandra, Hadoop, Redis, MongolDB, etc. I found many were good but I found Redis to be the best one that fit my needs. As a result, I though  I was set until…

I came across an open source trading platform done in C++ called Trading Shim. This was a rare find as it met all my needs including Interactive Brokers but used MYSQL out of box. The database schema was big as well. I tried debugging and configuring which is still not complete but I am getting closer.

After seeing the results of my survey, I through why not stick with MYSQL for now as it can be fast enough. Hey, Yahoo Finance uses it for their backend so it cannot be that bad right. I just understand you would go through some heavy configuring to do something like sharding. This would not be as easy as in something like Redis. Anyhow, also MYSQL has proven to be the most popular database as well so it might be wise to get more comfortable with something that is widely used.

As for this Trading Shim, it makes my life so much easier without needing to recode anything so I just need to have my R algorithms access the MYSQL data. This was a much more sensible option at least this point to minimize the coding and debugging cycles.

So …enough typing, I have this video posted to show how easy it to have R access MYSQL. As said, this appears to only work in a Linux environment due to the RMySQL requires RCpp which only works with GCC, not Visual Studio for Windows. Sorry, I don’t make the rules. You could install MINGW for Windows and GCC but why would you want to go through the pain of that?

[youtube_sc url=”http://www.youtube.com/watch?v=LvCFaTln_3g” playlist=”how to have R access MYSQL through RMySQL R package”]

The HUGE edge of Matlab over R is the Matlab Coder Toolbox. Converting R script code to C++/C?

The HUGE edge of Matlab over R is the Matlab Coder Toolbox. Converting R script code to C++/C?

I was reading the various links from the fantastic Stack Overlflow:

http://stackoverflow.com/questions/9154383/converting-loop-from-r-to-c-using-rcpp

http://stackoverflow.com/questions/10089754/converting-models-in-matlab-r-to-c-java

It seems the RCPP R package always come up in name. It is that or a derivative like RCPPArmadillo. I have played with RCpp which is really good. I am very impressed with it but when you look  at the equivalent from the Matlab world, it seems that your choices are Matlab Builder JA or Matlab Builder NE toolboxes. For those new to it, you also need to include the Matlab Compile Runtime which can lead your target application potentially extra heavy so why go there. Also, the code you need to develop to be compliant with the MCR is ugh…. how shall we say?  Extra long and wonky. Boo to that on that one.

I have experimented the various equivalent R packages of RCPP/Rinside, RJava, and R.Net which work really well. You can also make direct calls into R from these languages with little hassle compared to Matlab’s way. It seems slightly fast also since the loading of R can be very lightweight. Very cool.

But that brings me to a very important question that Matlab does have an edge. It is called Matlab Coder Toolbox which is very cool. Introduced last year, it allows you to truly convert your Matlab M Script into C or C++. It is very cool but offers limited support for the Matlab toolboxes. The functions I wanted to use were not supported but the current version of Matlab might address that. If your M script does not use any toolboxes, you can definitely use this Coder toolbox as an option. It is very cool in how it works but it is very expensive. Try $6000 but it might be worth it since you no longer need to hire a $100+K C++ developer. This is one option that Matlab has which R does not.

I hope someone can crack this code despite using something like RCpp. I think a tool to convert to C++ like the Matlab Coder would be completely awesome but I am sure this would not be a cheap R package. The amount of work would be huge.
Try joining my New R/Matlab users Meetup group.

 

 

Announcement of new R and Matlab Meetup User group for those in Finance and Financial Services!

Announcement of new R and Matlab Meetup User group for those in Finance and Financial Services!

I have started a new R User Groups for those in the financial services field. This of course includes those from banking, hedge fund, bulge brackets, brokers, prop shops, indie traders, etc.  As I do like Matlab as well, I decided to combine both interests into one. The primary goal is to share ideas in using software tools like R or Matlab to enhance development in custom and proprietary trading strategies, models, and algorithms. I guess you could say I am kind of excited by starting this. I also hope people will join especially those who are researchers, analysts, practioners, etc in these fields. Also, I am finding so many new blogs posts and R packages that I would love to share. I also want to learn as much as possible with this group. So if you feel you want to learn or contribute something, please join this Meetup group.

Join here at this Meetup group.

Despite Meetup’s requirements for face to face meeting up, I plan to do various and highly frequent online events through something like GotoMeeting.com for members of this Meetup. I also have to do face to face physical meet ups, so those will take place at least once a month in my local home town of Toronto, Ontario, Canada.  I also run another Meetup group so both will be used for of these events. The other group is

http://www.meetup.com/quant-finance

I have operated this group for over a year and a half with a few hundred members, it seems it  is growing steadily but many are in the global financial centers like New York, London, etc. As a result, I will be mimicking everything with this new group for R users.

Got questions, let me know via commenting below.

Which RDMS or NOSQL database do you use for R? MySQL, Cassandra, HBase, MongoDB, Oracle, PostgreSQL, CouchDB, SQLite?

Which RDMS or NOSQL database do you use for R? MySQL, Cassandra,  HBase, MongoDB, Oracle, PostgreSQL, CouchDB, SQLite?

This R survey is kind of important. It will show a few things:

  1. Which R most users use regardless if they are commercial vs open source vs NOSQL .
  2. This will help us figure out which database is best for R using the scalability and speed depending on the requirements. This includes multiple writes for market tick data from C++ or a Java application and access by various R algorithms for analytics purposes.

Go here for the poll.

Here are some reasonable options with reasons:

MYSQL

I would assume this to be the number one choice since it is open source (or at least they say). It also contains sharding and other scalability needs with clustering. Is this something that people are using for their trading platform requirements? This includes using MYSQL as a tick data repository.

PostgreSQL

Is anyone actually using this open source database for their R needs?

Oracle

This is easily the most popular commercial RDMS for both Linux/Unix and Windows. As Oracle has open R into with a connector into their ecosystem, I wondered if people are actually using this.

SQL Server/DB2/Sybase

I am unsure if there are any R package connectors to any of these databases. I was just curious as I am really not interested in these as a real option.

Cassandra

There seems to be no R package support for this. I once posted something on R-Bloggers.com and it lit up the site, it made me wonder if this is actually more popular than people think. It seems to meet the needs of both quick write and read access.

Redis

Now the doRedis R package looked really hot. It even showcased how to use with a potential financial analytics system. I even saw Java sharding examples which left me excited on the capabilities of this database.

MongoDB

This seems to be strangely the most popular of all. I also found various R packages which seems to support it as well.

HBase which is part of Hadoop

Eh. No support even according to Revolutionary Analytics which their lacking install R package guides. I gave up pretty quickly on these R packages.

All others database options seem fine but the ones listed above seem the most viable for any R user as repository for scaling and clustering.

Go here for the poll.

http://quantlabs.net/surveys/2012/06/19/what-rdms-or-nosql-database-should-a-r-user-focus-on/

My video demo and introductory checklist on how to debug with R and Eclipse IDE

My video demo and introductory checklist on how to debug with R and Eclipse IDE

Install instructions to debug R with R

Install correct Eclipse version plugin from and follow instructions:

http://www.walware.de/

Also, ensure to install RJ R package dependency as well at.

Ie. Use in R:

install.packages(c(“rj”, “rj.gd”), repos=”http://download.walware.de/rj-1.1″)

Note during the Eclipse Stat ET plugin install, it may complain about not finding any dependency plugins. I just installed everything offered as I don’t have time to monkey around with this. My Eclipse version was Indigo or 3.7.

Original testing Rsource came from:

http://navisan.com/Articles/EclipseRHTML.aspx

Once everything is installed and correctly configured within Eclipse, make sure to switch to your Stat ET perspective. I have verified the above does work properly with the correct versions once the plug in and proper Eclipse version is lined up. Don’t forget that R package as well!

If your run test fails, check the trouble shooting guide at the end of  http://navisan.com/Articles/EclipseRHTML.aspx

Note everything runs as it should.

To debug in Eclipse

Go to the near the end of the above link to the optional debug R package to be installed. This is called debug so ensure it is installed in to your R environment. You can do all this from within the R console within Eclipse. Pretty neat. It is not as nice as RStudio’s install package console but you do need to install within Eclipse manually. You obviously use the library R function for this as indicated in this image.

http://navisan.com/Articles/EclipseFiles/Eclipse_RDebuggerLoaded.jpg

Debugging
To debug, make sure your test code is an R function like:

# TODO: Add comment

#

# Author: mark qu

################################################################

# some test code…

myFunction <- function(x)

{

                a = 2

                b = 3

                c = a+b

                c

}

Note the last image screen capture and comments at the end of the navisan.com link:

Debugging with the R debugger under Eclipse StatET. Note the mtrace debug statement, which kicks off the debug session upon the function, and the subsequent function call that is then intercepted by the debugger at the automatic breakpoint at line 1 (breakpoint indicated by the red star). Here we single stepped through the function and are at line 3, indicated by the green bar. Note from the Console window that we at this point examined the contents of variable b.

Also, you may need to set the working directory to your Eclipse R project with setwd and source functions. i.e.

library(debug)

setwd(“C:\Users\xx\workspace\R Project Test”)

source(“test.R”)

mtrace(test)

Loading required package: parser

Also, it seems the launcher run configuration does not load R with a new Eclipse session. If this is the case, just launch manually it within run configuration. Run->Run Configuration and choose Run.

This Eclipse StatET plugin could have bugs so it might not work as expected depending on your plugin and Eclipse version.

[youtube_sc url=”bGkvQw67qKc” playlist=”video, demo, introductory, checklist , how to, debug, R, Eclipse, IDE”]

Quality R packages that potential financial researchers and quant traders who model or build a strategy and algorithm

Quality R newbie packages that potential financial researchers and quant traders who model or build a strategy and algorithm

As a newbie to R, I thought it would be worthy to note a few quality R packages that seem to have more advanced some functionality that Matlab does not even give you. Here is my experience thus far:

RTAQ

This is a tough one to gauge as I have recently tried to get something working with this but will only work with New York Stock Exchange data. At first I thought you could easily download like in Yahoo Finance but I don’t think you can. It seems strange when there are two versions file of this trade and quote capture system.
xts

This seems to be a pretty popular way to convert market data into a time series data frame used throughout other financial R packages listed here.

quantstrat

Another sophisticated R package where you can combine with blotter is to apply different classic technical indicators to your market objects. You can apply indicators, signals, and rules using technical analysis indicators like MACD, RSI, and Bollinger Bands. You can even apply your own algorithm to these as well which leads into quant related type of modeling.

blotter

Another useful package for first round of testing within R. This can be at the core of many analytical trading systems with capabilities to capture end of day market data, set currency rate, and create portfolio, and accounts, and with sophisticated charting capabilities.
PerformanceAnalytics

This is easily one of the best R packages yet since it has some very decent charting capabilities you can find in popular trading platforms like Metatrader. You can easily add different type of charting lines to plots. It contains a great and easy way to extract different types of statistical and market data.

Other worthy R packages to mention include:

quantmod
lspm
PortfolioAnalytics

FinancialInstrument

TTR
signalextraction

I hope this helps those out new to the world of R

Old High Frequency Tick Data R Package exists with spreads, trade direction, statistics, volatility for forex and equity

Old High Frequency Tick Data R Package exists with spreads, trade direction, statistics, volatility for forex and equity
This high frequency R package looks fantastic. It includes a lot of analysis on high frequency data where the number of observations could easily be 100K or way more. It contains so many juicy benefits including:
1.    A very decent PDF sample is included. This can be quite rare as there are some real world examples including a few equity analysis and even foreign exchange trading pair.
2.    A good section is described on duration which is part of high frequency data.
3.    Many traders will always find spread results but again to see an R function do this is rare. On the provided equity example like Microsoft, this shows the spread between the bid and ask quotes. It is quite convenient. There is also an example provided with a foreign exchange pair as well between  bid and ask quotes in multiples of ticks with a specified tick size. I never saw anything like this in Matlab.
4.    There is a handy function which triggers a trading direction. Here you can analyze the dataset to know when to buy or sell based on this function.   You can also specify the time lag as well.
5.    There is a set of handy functions for standard statistic measurements like mean, standard deviation, etc. As well, you get associated plots like histograms as well as functions for calendar patterns.
6.    There is a realized volatility function which is based on Anderson versus other volatility models like GARCH models, stochastic volatility models, or the volatility implied by options or other
Derivative prices. Also, there is another set of benefits quoted from the supplied PDF:

They prove that as the sampling frequency of returns approaches infinity,
realized volatility measures are asymptotically free of measurement error. For daily
volatility, they use 5-minute returns to construct daily realized volatilities. The 5-
minute horizon is short enough to have the underlying asymptotic work well, but long
enough to mitigate the autocorrelation distortion caused by market microstructure
frictions.

The Challenge

All this sounds fine but you will find that this package is pretty old from 2003. This R package looks like it has been abandoned and needs a refresh so let me know what you think of this package’s potential. I want to note the conversion using the TAQ Load seems broken so capturing and converting data is not so easy.

Get your hands on it over at:

Compatible R in S+:
http://faculty.washington.edu/ezivot/research/HFAnalysis.ssc
PDF: http://faculty.washington.edu/ezivot/research/hfanalysis.pdf
Get the data here: http://faculty.washington.edu/ezivot/ezresearch.htm

Video of R, RCPP, RInside makes use of C++ so much easier than Matlab Builder NE for high frequency trading aka HFT potential

Video of R, RCPP, RInside makes use of C++ so much easier than Matlab Builder NE for high frequency trading aka HFT potential

So I am going from trying out an open source C# application trading platform to an open source C++ ‘platform’ using Interactive Brokers. I also switched from Matlab to R. Lastly; I am looking at more open source projects including Linux where Ubuntu is becoming my preferred distribution. As you can tell, I am straying away from expensive options like Matlab. Being a developer, I can quickly debug most applications if need be.

The benefit I am finding switching to R as compared to using Matlab with something like Matlab’s Builder JA (for Java) or Builder NE (for .NET languages like C#) toolboxes. When I tried out the combination of R packages RCpp and RInside, I was pleasantly surprised for a number of things. Installing any R package is quite easy. Building or ‘making’ the provided C++ examples for RInside was flawless and easy to execute. The most impressive were samples of parallelization of C++ which was jaw dropping.

Now I am hoping I can see my newer open source C++ trading solution Trading Shim (http://www.tradingshim.org/) will work at some point as well.   Hey…it is C++ so leave it alone. But the speed and scalability of it should be impressive. I just wish there was a complete open source trading platform in Java that could connect to my chosen broker Interactive Brokers.

Anyhow, back to the R packages of RCpp and Rinside. I need to give a shout out to the contributors for making these packages happen in a quick and easy way. The provided C++ examples really do make a difference to showcase how a C++ application can call the R shell processor and execute individual R functions directly. You could not do that with the Matlab NE Builder as you could only call M scripts with their archaic programming structure. The C++ code within RInside as compared to Matlab NE is much simpler, tighter, and smaller. The Matlab NE Builder is really meant for C# so trying it in Visual C++ would have been ‘interesting.’ I am just glad I found this deadly combination of R, RCpp/Rinside, with C++. It may work well for my hopeful high frequency trading platform with R for prototyping and analytics.

[youtube_sc url=”http://www.youtube.com/watch?v=wIzrJFy-VCA” playlist=”Calling R from a C application for HFT development with MPI parallelzation ” title=”Calling%20R%20from%20a%20C%20application%20for%20HFT%20development%20with%20MPI%20parallelzation%20″]