R Read Text File to DataFrame
In today's data-driven world, collecting data from multiple sources and turning it into a structured manner is a critical responsibility for data analysts and scientists. Text files are a prominent source of data, as they frequently include useful information in plain text format. To be used successfully, this data must be translated into a structured format, such as a DataFrame, which is a two-dimensional, size-mutable, heterogeneous tabular data structure with labeled axes.
Reading text files in R
Reading text files in R Programming Language is the process of taking data from plain text files and transforming it into a structured format that is easy to edit and analyze. Here are the types of text files available.
1. CSV (Comma-Separated Values)
- CSV files use commas to separate values in each row.
- Example: data.csv
2. TSV (Tab-Separated Values):
- TSV files use tabs as separators between values.
- Example: data.tsv
3. Space-Separated Values:
- Space-separated files use spaces to separate values in each row.
- Example: data.txt
4. Fixed-Width Files:
- Fixed-width files have columns aligned at specific positions, with no delimiters.
- Example: data.dat
Common Functions for Reading Text Files
There are three main methods :
- Using read.csv() function
- Using read.delim() function
- Using read.table() function
Let's take an example that you have a data frame df with student information loaded into a csv file.
The data contains three columns: "Name", "Roll No", and "Marks".
1. Using read.csv() function
CSV files are commonly used to store tabular data. Here's how to read CSV files into a DataFrame using R:
- Use the read.csv() method with the proper options, such as the file location and delimiter.
- Assign the results to a DataFrame variable.
For import your dataset you can take any dataset and replace the path in code.
# Read the CSV file into a data frame
df <- read.csv('C:\\Users\\GFG19565\\Downloads\\heart.csv')
# Print the contents of the data frame
head(df)
Output:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
1 52 1 0 125 212 0 1 168 0 1.0 2 2 3 0
2 53 1 0 140 203 1 0 155 1 3.1 0 0 3 0
3 70 1 0 145 174 0 1 125 1 2.6 0 0 3 0
4 61 1 0 148 203 0 1 161 0 0.0 2 1 3 0
5 62 0 0 138 294 1 1 106 0 1.9 1 3 2 0
6 58 0 0 100 248 0 0 122 0 1.0 1 0 2 1
2.Using read.delim() function
The read.delim() method reads data from the file "data.tsv". Values in TSV files are separated by tabs, and this function defaults to using the tab (\t) delimiter.
# Read the tsv file into a data frame
df <- read.delim('C:\\Users\\GFG19565\\Downloads\\heart.csv')
# Print the contents of the data frame
head(df)
Output:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
1 52 1 0 125 212 0 1 168 0 1.0 2 2 3 0
2 53 1 0 140 203 1 0 155 1 3.1 0 0 3 0
3 70 1 0 145 174 0 1 125 1 2.6 0 0 3 0
4 61 1 0 148 203 0 1 161 0 0.0 2 1 3 0
5 62 0 0 138 294 1 1 106 0 1.9 1 3 2 0
6 58 0 0 100 248 0 0 122 0 1.0 1 0 2 1
3. Using read.table() function
Tabular files store data in rows and columns. How to read tabular files into a DataFrame in R:
- Use the read.table() function with appropriate parameters
- Copy the file path from the Students.txt file and paste it into the df data frame and then print the contents of the data frame.
# Read data from the text file
df <- read.table('C:\\Users\\GFG19565\\Downloads\\heart.csv', sep='\t', header=TRUE)
# Display the data frame
head(df)
Output:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
1 52 1 0 125 212 0 1 168 0 1.0 2 2 3 0
2 53 1 0 140 203 1 0 155 1 3.1 0 0 3 0
3 70 1 0 145 174 0 1 125 1 2.6 0 0 3 0
4 61 1 0 148 203 0 1 161 0 0.0 2 1 3 0
5 62 0 0 138 294 1 1 106 0 1.9 1 3 2 0
6 58 0 0 100 248 0 0 122 0 1.0 1 0 2 1
Customizing the Reading Process
- sep: Sets the separating character for reading.table().
- header: Determines if the file has a header row.
- na.strings: Specifies which strings should be treated as missing values.
- quote: Sets the quoting character for values that contain separators.
- Fill: Determines whether missing values should be filled with NA.
Handling Variations in Text Files
1. Missing Values
- Use the na.strings argument to define which strings should be handled as missing values.
- Example: read.csv("data.csv", na.strings = c("", "NA").
2. Different Separators
- Specify the separator with the sep option in read.table().
- Example: read.table("data.txt", sep = "")
3 .Inconsistent Data
- Use the quote argument to define the quoting character for values that contain separators.
- Example: read.csv("data.csv", quote = '"').
Conclusion
Reading text files into a DataFrame in R is an important step in the data analysis process. Analysts can efficiently extract, modify, and analyse data from a variety of sources using R functions and packages. Understanding various text file reading methods and proper data management procedures guarantees that R analysis findings are reliable and meaningful.