This lesson is still being designed and assembled (Pre-Alpha version)

From Code to Concepts: Introduction to Data Science

General Information

The Carpentries project comprises the Software Carpentry, Data Carpentry, and Library Carpentry communities of Instructors, Trainers, Maintainers, helpers, and supporters who share a mission to teach foundational computational and data science skills to researchers.

Want to learn more and stay engaged with The Carpentries? Carpentries Clippings is The Carpentries' biweekly newsletter, where we share community news, community job postings, and more. Sign up to receive future editions and read our full archive: https://carpentries.org/newsletter/

Data Carpentry aims to help researchers get their work done in less time and with less pain by teaching them basic research computing skills. This hands-on workshop will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: DFL Training Hall, Faculty of Arts and Humanities, Jazan University, SA. Get directions with OpenStreetMap or Google Maps.

When: Oct 30, 2024; 9:00 am - 12:00 pm UTC+3 Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below).

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organizers have checked that:

We are dedicated to providing a positive and accessible learning environment for all. We do not require participants to provide documentation of disabilities or disclose any unnecessary personal information. However, we do want to help create an inclusive, accessible experience for all participants. We encourage you to share any information that would be helpful to make your Carpentries experience accessible. To request an accommodation for this workshop, please fill out the accommodation request form. If you have questions or need assistance with the accommodation form please email us.

Glosario is a multilingual glossary for computing and data science terms. The glossary helps learners attend workshops and use our lessons to make sense of computational and programming jargon written in English by offering it in their native language. Translating data science terms also provides a teaching tool for Carpentries Instructors to reduce barriers for their learners.

Contact: Please email adallak@jazanu.edu.sa for more information.

Roles: To learn more about the roles at the workshop (who will be doing what), refer to our Workshop FAQ.


Code of Conduct

Everyone who participates in Carpentries activities is required to conform to the Code of Conduct. This document also outlines how to report an incident if needed.


Collaborative Notes

We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.


Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey


Setup

To participate in a Data Carpentry workshop, you will need access to software as described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.


R

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

Video Tutorial

Instructions for R installation on various Linux platforms (debian, fedora, redhat, and ubuntu) can be found at <https://cran.r-project.org/bin/linux/>. These will instruct you to use your package manager (e.g. for Fedora run sudo dnf install R and for Debian/Ubuntu, add a ppa repository and then run sudo apt-get install r-base). Also, please install the RStudio IDE.


Registration

Some adblockers block the registration window. If you do not see the registration box below, please check your adblocker settings.

Schedule

Please note: This schedule is merely a guidance. We try to stick to it but based on the progress of the group we might spend less or more time on each section than indicated. We do however avoid running over the published times for the day which is 09:00 to 12:00.
Setup Download files required for the lesson
00:00 1. Introduction to R and RStudio How to setup and organise an analysis project?
How to interact with R and RStudio?
How to install packages?
00:25 2. Basic objects and data types in R What are the basic data structures and data types in R?
How can values be assigned to objects?
How can subsets be extracted from vectors?
How are missing values represented in R?
00:55 3. Working with Tabular Data How to import tabular data into R?
What kind of object stores tabular data?
How to investigate the contents of such object (types of variables and missing data)?
01:55 4. Data visualisation with `ggplot2` How to build a graph in R?
What types of visualisation are suitable for different types of data?
03:15 5. Manipulating variables (columns) with `dplyr` How to select and/or rename specific columns from a data frame?
How to create a new column or modify an existing one?
How to ‘chain’ several commands together with pipes?
04:05 6. Manipulating observations (rows) with `dplyr` How to order rows in a table?
How to retain only unique rows (no duplicates)?
How to identify observations of a dataset that fulfill certain conditions?
05:10 7. Grouped operations using `dplyr` How to calculate summary statistics from a dataset?
How to apply those summaries to groups within the data?
How to apply other data manipulation steps to groups within the data?
06:30 8. Working with categorical data + Saving data How to fix common typos in character variables?
How to reorder values in ordinal categorical variables?
How to save data into a file?
07:15 9. Joining tables How to join different tables together?
How to identify mis-matches between tables?
07:50 10. Data reshaping: from wide to long and back How to change the shape of a table from a ‘wide’ to a ‘long’ format?
When is one or the other format more suitable for analysis?
08:25 11. Data visualisation with `ggplot2` - part II How can we fully customise a plot, by adding annotations, labels, control the axis limits, and change its overall look?
How can we compose several plots together?
09:45 12. Extra practice exercises How to apply the tools and concepts learned to new data?
09:45 Finish