Installing Key Data Science Libraries in R
I use the below script to set up my R environment with required Data Science libraries. Through trial and error, I've found that MacOS/X is my preferred O/S environment for Data Science. Some of the below libraries (like 'bigrf' used for Random Forests for larger datasets) are not available for Windows. Also for packages that I needed to compile, MacOS/X just worked every time. It is important to ensure that Java is installed on your system prior to running the below.
# STEP 1: INSTALL JAVA & POINT R TO IT # On a Mac add the following to your bash_profile $ export LD_LIBRARY_PATH=/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib:/Library/Java/JavaVirtualMachines/jdk1.8.0_102.jdk/Contents/Home/jre/lib/server # Run the below to create the relevant links $ sudo ln -s $(/usr/libexec/java_home)/jre/lib/server/libjvm.dylib /usr/local/lib # Point R to Java $ sudo R CMD javareconf # Install the pre-requisite rJava package install.packages('rJava', type='source') # STEP 2: PRE-MODELING STAGE PACKAGES # Data Visualisation install.packages("ggvis") install.packages("ggplot2") install.packages("googleVis") # Data Transformation install.packages("plyr") install.packages("data.table") # Missing Value Imputations install.packages("missForest") install.packages("missMDA") # Outlier Detection install.packages("outliers") install.packages("evir") # Feature Selection install.packages("features") install.packages("RRF") # Dimension Reduction install.packages("FactoMineR") install.packages("CCP") # STEP 3: MODELING STAGE PACKAGES # Continuous regression install.packages("car") install.packages("randomForest") # Ordinal regression install.packages("rminer") install.packages("CORElearn") # Classification install.packages("caret") install.packages("devtools") library(devtools) install_github("bigrf", repo='aloysius-lim/bigrf') #install.packages("~/Downloads/bigrf_0.1-11.tar.gz", repos = NULL, type = "source") #LINUX # Clustering install.packages("cba") install.packages("Rankcluster") # Time Series install.packages("forecast") install.packages("ltsa") # Survival install.packages("survival") install.packages("BaSTA") # STEP 3: POST MODELING STAGE PACKAGES # General Model Validation install.packages("lsmeans") install.packages("comparison") # Regression Validation install.packages("regtest") install.packages("ACD") # Classification Validation install.packages("binomTools") install.packages("Daim") # Clustering Validation install.packages("clusteval") install.packages("sigclust") # ROC Analysis install.packages("pROC") install.packages("timeROC") # STEP 4: OTHER USEFUL PACKAGES # Improve Performance install.packages("Rcpp") install.packages("parallel") # Work with Web install.packages("XML") install.packages("jsonlite") install.packages("httr") # Report Results install.packages("shiny") install.packages("rmarkdown") # Text Mining install.packages("tm") install.packages("twitteR") # Database install.packages("sqldf") # Install unixodbc first before RODBC #On a Mac run: brew install unixodbc install.packages("RODBC") install.packages("RMongo") # Miscellaneous install.packages("swirl") install.packages("reshape2") install.packages("qcc") install.packages("qdap") # STEP 5: RUN INVENTORY OF INSTALLED PACKAGES # Get a full list of installed packages write.csv(installed.packages(), file = "InstalledPackages.csv")














