The R Software: Fundamentals of Programming and Statistical Analysis
Pierre Lafaye de Micheaux, Remy Drouilhet, and Benoit Liquet Springer, 2013 Download PDF
View On WordPress
seen from China
seen from Germany
seen from Türkiye
seen from United Kingdom
seen from Australia
seen from United States

seen from Singapore

seen from Spain
seen from Poland
seen from China
seen from South Korea
seen from Singapore

seen from Türkiye
seen from Germany

seen from Türkiye

seen from Singapore
seen from China
seen from United States
seen from China
seen from United States
The R Software: Fundamentals of Programming and Statistical Analysis
Pierre Lafaye de Micheaux, Remy Drouilhet, and Benoit Liquet Springer, 2013 Download PDF
View On WordPress
R Software
What is R Software?
R is a programming language and free software developed by Ross Ihaka and Robert Gentleman in 1993. R possesses an extensive catalog of statistical and graphical methods. It includes machine learning algorithms, linear regression, time series, statistical inference to name a few. Most of the R libraries are written in R, but for heavy computational tasks, C, C++ and Fortran codes are preferred.
R is not only entrusted by academic, but many large companies also use R programming language, including Uber, Google, Airbnb, Facebook and so on.
Read More
Data analysis with R is done in a series of steps; programming, transforming, discovering, modeling and communicate the results
Program: R is a clear and accessible programming tool
Transform: R is made up of a collection of libraries designed specifically for data science
Discover: Investigate the data, refine your hypothesis and analyze them
Model: R provides a wide array of tools to capture the right model for your data
Communicate: Integrate codes, graphs, and outputs to a report with R Markdown or build Shiny apps to share with the world
In this introduction tutorial you will learn R
What is R used for?
R by Industry
R package
Communicate with R
Why use R?
Should you choose R?
Is R difficult?
What is R used for?
Statistical inference
Data analysis
Machine learning algorithm
R by Industry :
Read More
If we break down the use of R by industry, we see that academics come first. R is a language to do statistic. R is the first choice in the healthcare industry, followed by government and consulting.
R package
The primary uses of R is and will always be, statistic, visualization, and machine learning. The picture below shows which R package got the most questions in Stack Overflow. In the top 10, most of them are related to the workflow of a data scientist: data preparation and communicate the results.
All the libraries of R, almost 12k, are stored in CRAN. CRAN is a free and open source. You can download and use the numerous libraries to perform Machine Learning or time series analysis.Communicate with R
R has multiple ways to present and share work, either through a markdown document or a shiny app. Everything can be hosted in Rpub, GitHub or the business’s website. Rstudio accepts markdown to write a document. You can export the documents in different formats:
Document :
· HTML
· PDF/Latex
· Word
Presentation :
· HTML
· PDF beamer
Why use R?
Read More
Data science is shaping the way companies run their businesses. Without a doubt, staying away from Artificial Intelligence and Machine will lead the company to fail. The big question is which tool/language should you use?
They are plenty of tools available in the market to perform data analysis. Learning a new language requires some time investment. The picture below depicts the learning curve compared to the business capability a language offers. The negative relationship implies that there is no free lunch. If you want to give the best insight from the data, then you need to spend some time learning the appropriate tool, which is R.
On the top left of the graph, you can see Excel and PowerBI. These two tools are simple to learn but don’t offer outstanding business capability, especially in term of modeling. In the middle, you can see Python and SAS. SAS is a dedicated tool to run a statistical analysis for business, but it is not free. SAS is a click and run software. Python, however, is a language with a monotonous learning curve. Python is a fantastic tool to deploy Machine Learning and AI but lacks communication features. With an identical learning curve, R is a good trade-off between implementation and data analysis.
When it comes to data visualization (DataViz), you’d probably heard about Tableau. Tableau is, without a doubt, a great tool to discover patterns through graphs and charts. Besides, learning Tableau is not time-consuming. One big problem with data visualization is you might end up never finding a pattern or just create plenty of useless charts. Tableau is a good tool for quick visualization of the data or Business Intelligence. When it comes to statistics and decision-making tool, R is more appropriate.
Stack Overflow is a big community for programming languages. If you have a coding issue or need to understand a model, Stack Overflow is here to help. Over the year, the percentage of question-views has increased sharply for R compared to the other languages. This trend is of course highly correlated with the booming age of data science but, it reflects the demand of R language for data science.
In data science, there are two tools competing with each other. R and Python are probably the programming language that defines data science.
Should you choose R?
Data scientist can use two excellent tools: R and Python. You may not have time to learn them both, especially if you get started to learn data science. Learning statistical modeling and algorithm is far more important than to learn a programming language. A programming language is a tool to compute and communicate your discovery. The most important task in data science is the way you deal with the data: import, clean, prep, feature engineering, feature selection. This should be your primary focus. If you are trying to learn R and Python at the same time without a solid background in statistics, its plain stupid. Data scientist are not programmers. Their job is to understand the data, manipulate it and expose the best approach. If you are thinking about which language to learn, let’s see which language is the most appropriate for you.
Read More
The principal audience for data science is business professional. In the business, one big implication is communication. There are many ways to communicate: report, web app, dashboard. You need a tool that does all this together.
Is R difficult?
Years ago, R was a difficult language to master. The language was confusing and not as structured as the other programming tools. To overcome this major issue, Hadley Wickham developed a collection of packages called tidyverse. The rule of the game changed for the best. Data manipulation become trivial and intuitive. Creating a graph was not so difficult anymore.
The best algorithms for machine learning can be implemented with R. Packages like Keras and TensorFlow allow to create high-end machine learning technique. R also has a package to perform Xgboost, one the best algorithm for Kaggle competition.
R can communicate with the other language. It is possible to call Python, Java, C++ in R. The world of big data is also accessible to R. You can connect R with different databases like Spark or Hadoop.
Read More
Finally, R has evolved and allowed parallelizing operation to speed up the computation. In fact, R was criticized for using only one CPU at a time. The parallel package lets you to perform tasks in different cores of the machine.
Summary
In a nutshell, R is a great tool to explore and investigate the data. Elaborate analysis like clustering, correlation, and data reduction are done with R. This is the most crucial part, without a good feature engineering and model, the deployment of the machine learning will not give meaningful results.
Running RScript in the background while catching up during the holidays
Apologies for being “out of the loop” for quite a while. Although I *try* to keep up with everything, the holidays (both American and Chinese ones) have hampered me… While finally taking a few days off during the Chinese New Year, I was able to relax by looking back at the photos I took during my mini-road trip to Texas (which I took after my Ocean Optics conference). Also, I had a chance to have some data running in the background.
I have been fortunate enough to meet very dear friends even halfway around the world, both here in Taiwan as well as back in the States.
A special thanks to my friends, Mei-Fen 眉芬 and Joe, for getting me to enjoy the fresh air out in Taiwan’s ShanHua District/Tainan Observatory 善化~大內天文館 during the Chinese New Year.
Meanwhile, while looking back at my photos, I got to reminisce about some really close friends I essentially consider family. They showed me around the Houston-Galveston area of Texas.
Although the beaches were very sunny that particular day we went out, I was actually really impressed with eating greasy (Americanized) Chinese food as well as their famous Texan Buc-ee’s gas stations!
In terms of R Studio software, I have recently been running a “for” loop on my HUGE dataset for spectral seagrass profiles. (Shout out to Anthony Damico’s YouTube video – it was very useful in helping me build my R Script.)
Hope to try to keep all you readers more in the loop!
Summer “break”: Getting down to the nitty gritty
So shortly after final exams, I was able to get away for a bit and took up my friend’s invitation. During our Chinese-English language exchanges, JueLing 玨玲, a fellow PhD student commuter who lives in Taichung, was always telling me about the many natural wonders there. 期末考不久之後,我有機會可以離開壓力然後赴朋友的約。我們在中英語言交換時,我一個住在台中的博士生朋友玨玲,告訴我台中有很多自然景觀。 หลังจากสอบเสร็จ ฉันได้มีโอกาสไปเที่ยวไทจง(Taichung)ตามคำเชิญของJueLing เพื่อนนักเรียนปริญญาเอก ที่อาศัยอยู่ที่ไทจง เราได้คุยแลกเปลี่ยนภาษาและวัฒนธรรม JueLing มักจะเล่าให้ฉันฟังว่าไทจงนั้นสวยงามอย่างไร
Located along wider coastal floodplains than compared to other major cities within Taiwan, Taichung’s urbanization is much more spread out. Even so, there is still a huge network of hiking trails within the DaKeng area of Taichung’s Beitun District. We hiked for several hours along one of the more difficult trails (Trail #4). However, I think the view from up above was totally worth it! 相對於台灣其它的城市,因為台中位於海岸沖積平原所以它的都市化(urbanization)比較分散。雖然是這樣子,在台中北屯區的大坑還有很多相互連結的爬山步道。我們在比較難的步道(四號步道)爬了很多鐘頭。但是,我覺得從上面俯瞰風景真的值得! เมืองไทจงมีบริเวณชายฝั่งที่กว้างกว่าเมืองใหญ่อื่นๆ ความเจริญ(urbanization)ของเมืองไทจงนั้นถือว่ากระจายออกเป็นวงกว้างมากกว่าเมืองใหญ่อื่นๆ แม้ว่าเมืองไทจงจะมีความเจริญมาก แต่ก็ยังมีเส้นทางเดินป่าอยู่มากในบริเวณต้าเคิง(DaKeng) ตำบลเบ๋ตุ่น (Beitun) ของเมืองไทจง เราไปเดินป่า (hiking) กันอยู่หลายชั่วโมง ซึ่งเส้นทางนี้เป็นหนึ่งในเส้นทางที่ยากที่สุด (เส้นทางที่สี่) แต่พอไปถึงยอดเขาแล้วมองลงมา เห็นวิวที่สวยงาม ฉันคิดว่ามันคุ้มมากที่เดินมาหลายชั่วโมงนี้!
In the meantime, I have been trying to stay on top of things while still juggling a LOT on my plate. I have recently started a new part-time job translating documents from the Thai Food and Drug Administration into English for my coworkers who are Taiwanese Chinese. Once I get home from that, I have also been cramming and writing R Script (computer) code for my spectral analyses during the nighttime. 同時間,我嘗試妥善打理每件事雖然我超忙。我最近開始打工為我的台灣同事把泰國食品藥物管理署文件翻譯成英文。晚上我下班到家以後,我把握時間然後寫關於我光譜分析的R Script(電腦)編碼。 ในขณะเดียวกันฉันก็พยายามที่จัดสมดุล ให้กับตารางงานที่ยุ่งเหยิงของฉัน เมื่อเร็วๆนี้ฉันเพิ่งเริ่มทำงานpart-time โดยต้องแปลเอกสารของสำนักงานคณะกรรมการอาหารและยาของประเทศไทยเป็นภาษาอังกฤษ เพื่อที่จะให้เพื่อนร่วมงานคนไต้หวันเข้าใจ ตอนกลางคืนก็ต้องคร่ำเคร่งกับการเขียนรหัส R Script (เขียนโปรแกรมคอมพิวเตอร์) สำหรับการวิเคราะห์สเปกตรัม
The actual data analysis is not so bad per se, but the sheer amount of text files is so daunting (approximately 135,000 files!). For a crash course in R programming, I really recommend using YouTube. In particular, you should check out Mike Marin’s YouTube channel. Here’s hoping I stay sane during this summer “break”! 其實分析本身不那麼糟糕,可是幾個大的規模text文件真的讓我怯步(13萬5千個文件左右!)。如果你要R編碼速成課,我建議用YouTube。尤其,你應該看看Mike Marin的YouTube頻道。希望在這個「暑假」,我沒瘋掉! ที่จริงแล้วการวิเคราะห์ข้อมูลนั้นก็ไม่ได้ยากอะไรมาก แต่จำนวนของไฟล์ข้อมูลมันเยอะมาก (ประมาณ135,000 ไฟล์) ถ้าสนใจหลักสูตรเร่งรัดของการเรียนโปรแกรม R ฉันขอแนะนำช่อง Youtube ของ Mike Martin หวังว่าระหว่าง “ปิดเทอม” ที่แสนยุ่งเหยิงนี้เราจะไม่บ้าไปก่อน!
5 Facts About R (Software)
1. R is Free: It's free to install, update and use
2. R is Popular: There is a rapid growth for the usage of R
3. R is Powerful: R can handle large datasets (Big Data enabled)
4. R is Flexible: Additional free packages can be installed at any time
5. R is Well Supported: Huge R community makes it easy to get fast support
Source: R-Bloggers.com
If you know R, you can simply get more money!!!