SparkSQL.jl

sql Function

Submits Structured Query Language (SQL), Data Manipulation Language (DML) and Data Definition Language (DDL) statements to Apache Spark.

Arguments

DDL Supported formats:

Examples

CSV file example:

Comma Separated Value (CSV) format.

stmt = sql(session, "SELECT * FROM CSV.`/pathToFile/fileName.csv`;")

Parquet file example:

Apache Parquet format.

stmt = sql(session, "SELECT * FROM PARQUET.`/pathToFile/fileName.parquet`;")

Delta Lake Example:

Delta Lake is an open-source storage layer for Spark. Delta Lake offers:

ACID transactions on Spark: Serializable isolation levels ensure that readers never see inconsistent data. Scalable metadata handling: Leverages Spark’s distributed processing power to handle all the metadata for petabyte-scale tables with billions of files at ease.

To use Delta Lake you must add the Delta Lake jar to your Spark jars folder.

Example shows create table (DDL), insert (DML) and select statements (SQL) using Delta Lake and SparkSQL:

sql(session, "CREATE DATABASE demo;")
sql(session, "USE demo;")
sql(session, "CREATE TABLE tb(col STRING) USING DELTA;" )