1. How does PySpark differ from Apache Spark?
2. What is a SparkSession and why is it important?
3. How do you cache data in PySpark, and what are the benefits of caching?
4. How does PySpark handle partitioning, and what is the significance of partitioning?
5. What is a UDF, and how is it used in PySpark?
6.What is a window function, and how is it used in PySpark?
7.What is the difference between map() and flatMap() in PySpark?
8. What is a pipeline, and how is it used in PySpark?
9. What is a checkpoint, and how is it used in PySpark?
10. What is a broadcast join, and how is it different from a regular join?