train test split in r | training and validation datasets

Published: 19 December 2020
on channel: Coder's Digest

5,696

as part of r programming for data analysis tutorial We will see how we can create training and validation datasets using train test split in r, in this video we will use multiple ways to split data in train and test sets., you will learn how to split data from a CSV file into training and testing datasets to get ready for modeling, in R Studio.

Git link : https://github.com/coders-digest/R-Pr...

Includes example of data partition or data splitting with R.
Shows steps for reading CSV file into R.
Illustrates developing linear regression model using training data and then making predictions using validation data set in r.
Discusses regression coefficients
Provides application example using an automobile warranty claims dataset

we will use caTools library in R, also apart of that we will learn to use dplyr package also for partitioning data into train and test set. .
We will also split the data when y variable is not know.

Source :
--------------------------------------------------
TitanicSurvival = read.csv('titanic.csv', header = TRUE)
head(TitanicSurvival)

library(caTools)
split = sample.split(TitanicSurvival$Survived, SplitRatio = 0.7)
trainDataca = subset(TitanicSurvival, split == TRUE)
testDataca = subset(TitanicSurvival, split == FALSE)
prop.table(table(trainDataca$Survived))
prop.table(table(testDataca$Survived))

When y variable is unknown

head(mtcars)
indices = sample(1:nrow(mtcars), 0.7*nrow(mtcars))
trainData = mtcars[indices,]
testData = mtcars[-indices,]