`R`

(DataSciR)This is the website for the Data Science with `R`

(DataSciR) course offered in 2021. Click here if you are looking for the 2020 course website.

**11.04.2021: The deadline for the project proposal and registration is on 20.05.2021. In an earlier version of the Welcome slides a different date was given.****05.04.2021: Notifications regarding course admission have been sent by email.****22.03.2021: Application to the course is possible from today.**

- Course day/time: Fridays, 9:15-10:45 (via Zoom)
- Instructor: Uli Niemann
- Course type: seminar
- ECTS credits: 6
- Audience: all FIN Master degree programs
- Course language: english
- Application: see section Application & registration
- Prerequisites: see section Prerequisites
- Grading: based on several deliverables in the context of a semester-long data science project → see Project page

The course is limited to max. 30 students. Please apply for DataSciR by completing the following tasks until 03. April 2021:

- Enroll to the course in the LSF.
- Send a short motivation email (not more than 300 words) to the course instructor (firstname.lastname@ovgu.de) using the subject
. Please send this email using your ovgu email address and please also give your matriculation number.*[DataSciR] Course application*

On 05.04.2021, you will be notified if you can attend the course.

After admission, please complete this registration form until 20.05.2021.

*Ten years ago, who would have thought, that R, the “environment for statistical computing and graphics”, would become one of the most popular programming languages for data scientists?*

The impressive growth of `R`

is not a coincidence. As free & open-source alternative to expensive & proprietary software like SPSS, Matlab and Excel, `R`

’s strengths have always been its capabilities for statistical data analysis as well as its functionalities to create powerful, aesthetically appealing graphics and charts.

While `R`

attracted a rather exclusively academic audience in the 90’s & 00’s, the `R`

community since has grown not only by sheer number but also in diversity, as people from different industries and backgrounds discover `R`

’ usefulness for a wide range of applications. As of February 2020, more than 15,000 (!) packages have been published to CRAN, ca. half of them since 2015.

Especially in the last decade, the functionality and versatility of `R`

has gained momentum. Among the most popular `R`

packages are:

In **Data Science with R** (DataSciR), you will learn fundamentals of `R`

and how to use the following packages for Data Science:

- the “
`tidyverse`

” which includes packages like`dplyr`

and`tidyr`

for data manipulation and`ggplot2`

for data visualization, `rmarkdown`

and`knitr`

for reproducible & automated reporting,`shiny`

for creating interactive web applications, and`tidymodels`

for inferential and predictive modeling.

You will demonstrate your proficiency in these packages on a semester-long graded data science project.

There are no mandatory prerequisites for DataSciR. However, you are expected to have a profound knowledge of fundamental data mining techniques, such as classification, regression and clustering. Hence, it is recommended that you have heard at least one of the following lectures (or comparable):

Also, you should have a basic programming and statistics knowledge. For example, you will learn the most important vector types and classes in `R`

, but you will not learn what a vector or a class is in general. Accordingly, you should know what the terms mean, standard deviation, probability, etc. mean.

Data Mining / Statistical Analysis:

- Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani. An Introduction to Statistical Learning. Springer, 2017.
- Pang-Ning Tan, Michael Steinbach, and Vipin Kumar. Introduction to Data Mining. Pearson, 2005.
- Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2011.

`R`

-specific:

- Hadley Wickham, and Garrett Grolemund. R for Data Science. O’Reilly, 2017.
- Max Kuhn, and Julia Silge. Tidy Modeling with R. 2021. Draft version.
- Max Kuhn. The
`caret`

package. Online documentation. - Hadley Wickham. ggplot2 - Elegant Graphics for Data Analysis. 3rd edition. Draft version.
- Hadley Wickham. Mastering Shiny. O’Reilly, 2021. Draft version.
- Yihui Xie, J. J. Allaire, and Garrett Grolemund. R Markdown: The Definitive Guide. Chapman & Hall/CRC, 2018.
- Hadley Wickham. Advanced R. 2nd edition, Chapman & Hall/CRC, 2019.
- Max Kuhn, and Kjell Johnson. Applied Predictive Modeling. Springer, 2013.
- Max Kuhn, and Kjell Johnson. Feature Engineering and Selection: A Practical Approach for Predictive Models. CRC Press, 2019.
- Bradley Boehmke. Hands-on Machine Learning with R. Chapman and Hall/CRC, 2019.

other:

- Introduction to Data Science course held by Mine Çetinkaya-Rundel at the University of Edinburgh.
- Jenny Bryan, and others. Happy Git and GitHub for the useR. 2018.
- Jeffrey Leak. Organizing Data Science Projects. Learnpub.com.
- RStudio cheat sheets
- RStudio primers
- RStudio webinars
- Quick-R (short tutorials on various topics, e.g. data import, statistics and graph generation)

By the end of the first week, you should have installed the following software on your own laptop:

Also, please check whether you can successfully install packages. To do so, click on the *Packages* tab in the bottom-right pane in RStudio. Then, click on the *Install* button and specify an arbitrary package, e.g. `dplyr`

. Finally, click on *Install*. Alternatively, you can install a package from the console with `install.packages("dplyr")`

. If everything is set up correctly, no error messages should be displayed when you load the installed package with `library(dplyr)`

.