SQL processing in Apache Spark

Workshop room

12nd November, 11:00-13:00

Apache Spark is the de facto framework choice for big data processing, but learning it can be a little intimidating, due to its complexity.

The main role of Spark SQL is to reduce this complexity and to allow you to run queries on big data with a minimum learning effort. All you need to know is to write SQL queries !

In this workshop you will have a short introduction to Apache Spark, its architecture, data structures and after that we will focus on Spark SQL :

  • Create tables
  • Investigate table schema
  • Write and run SQL queries
Workshop tools : Apache Spark, Python, Jupyter Notebook
All the details regarding infrastructure setup and datasets will be provided with a few days before the conference.
Target audience : people interested in big data, apache spark, spark sql, data analysts, database developers.

Tudor Lăpușan

I'm passionate about Big Data/Machine Learning technologies and startups. I heard for the first time about Apache Hadoop when I was at my master courses and from that time I was fascinated about Big Data world.
My first big professional success was when I introduced the Apache Hadoop technology into the company I'm working for, Skobbler, in 2012. From that time I'm working on Big Data projects. My current work involves designing and writing scalable Big Data/Machine Learning projects.
From this passion, I have initiated a BigData/DataScience community in my town, Cluj-Napoca, Romania, with the goals of meeting new passionate people, working together on cool projects and helping IT companies to adopt Big Data technologies. Until now we had many meetups and workshops with many participants.