How to check dataframe is empty in Scala?
Last Updated :
27 Mar, 2024
Improve
In this article, we will learn how to check dataframe is empty or not in Scala. we can check if a DataFrame is empty by using the isEmpty
method or by checking the count of rows.
Syntax:
val isEmpty = dataframe.isEmpty
OR,
val isEmpty = dataframe.count() == 0
Here's how you can do it:
Example #1: using isEmpty function
import org.apache.spark.sql.{DataFrame, SparkSession}
object DataFrameEmptyCheck {
def main(args: Array[String]): Unit = {
// Create SparkSession
val spark = SparkSession.builder()
.appName("DataFrameEmptyCheck")
.master("local[*]")
.getOrCreate()
// Sample DataFrame (replace this with
// your actual DataFrame)
val dataframe: DataFrame = spark.emptyDataFrame
// Check if DataFrame is empty
val isEmpty = dataframe.isEmpty
if (isEmpty) {
println("DataFrame is empty")
} else {
println("DataFrame is not empty")
}
// Stop SparkSession
spark.stop()
}
}
Output:
DataFrame is empty
Explanation:
- The code creates a SparkSession, which is the entry point to Spark functionality.
- It defines a sample DataFrame using
spark.emptyDataFrame
, which creates an empty DataFrame. You would typically replace this with your actual DataFrame. - The code then checks if the DataFrame is empty using the
isEmpty
method. Since we initialized it as an empty DataFrame, the conditionisEmpty
will evaluate totrue
. - If the DataFrame is empty, it prints "DataFrame is empty".
- Finally, the SparkSession is stopped to release resources.
Example #2 : using count function
import org.apache.spark.sql.{DataFrame, SparkSession}
object DataFrameEmptyCheck {
def main(args: Array[String]): Unit = {
// Create SparkSession
val spark = SparkSession.builder()
.appName("DataFrameEmptyCheck")
.master("local[*]")
.getOrCreate()
// Sample DataFrame (replace this with
// your actual DataFrame)
val dataframe: DataFrame = spark.emptyDataFrame
// Check if DataFrame is empty
val isEmpty = dataframe.count() == 0
if (isEmpty) {
println("DataFrame is empty")
} else {
println("DataFrame is not empty")
}
// Stop SparkSession
spark.stop()
}
}
Output:
DataFrame is empty
Explanation:
- The code creates a SparkSession, initializing it as "local[*]", which means it will run locally using all available CPU cores.
- It defines a sample DataFrame using
spark.emptyDataFrame
, creating an empty DataFrame. This DataFrame has no rows. - The code then checks if the DataFrame is empty using the
count()
function. This function returns the number of rows in the DataFrame. Since the DataFrame is empty, its count will be 0. - The condition
dataframe.count() == 0
evaluates totrue
because the count of rows in the DataFrame is indeed 0. - Therefore, it prints "DataFrame is empty" to indicate that the DataFrame is indeed empty.
- Finally, the SparkSession is stopped to release resources.