Visual Studio Code: Develop PySpark jobs for SQL Server 2019 Big Data Clusters

September 3, 2019

Product
SQL

Content type
Updates

Today we’re announcing the support in Visual Studio Code for SQL Server 2019 Big Data Clusters PySpark development and query submission. It provides complementary capabilities to Azure Data Studio for data engineers to author and productionize PySpark jobs after data scientist’s data explore and experimentation. The Visual Studio Code Apache Spark and Hive extension enables you to enjoy cross platform and enhanced light weight Python editing capabilities. It covers scenarios around Python authoring, debugging, Jupyter Notebook integration, and notebook like interactive query.

With the Visual Studio Code extension, you can enjoy native Python programming experiences such as linting, debugging support, language service, and so on. You can run current line, run selected lines of code, or run all for your PY file. You can import and export a .ipynb notebook and perform a notebook like query including Run Cell, Run Above, or Run Below. You can also enjoy a notebook like interactive experience that includes your source code and markdown comments along with the running results and output. You can remove the unneeded sections, enter comments, or type additional code in the interactive results window. Moreover, you can visualize your results in a graphic format through a matplotlib like Jupyter Notebook. The integration with SQL Server 2019 Big Data Clusters empowers you to quickly submit a PySpark batch job to the big data cluster and monitor job progress.

Highlights of key features

You can link to SQL Server: The toolkit enables you to connect and submit PySpark jobs to SQL Server 2019 Big Data Clusters.
Python editing: Develop PySpark applications with native Python authoring support (e.g. IntelliSense, auto format, error checking, etc.).
Jupyter Notebook integration: Import and export .ipynb files.
PySpark interactive: Run selected lines of code, or notebook like cell PySpark execution, and interactive visualizations.
PySpark batch: Submit PySpark applications to SQL Server 2019 Big Data Clusters.
PySpark monitoring: Integrate with the Apache Spark history server to view job history, debug, and diagnose Spark jobs.

How to install or update

First, install Visual Studio Code and download Mono 4.2.x for Linux or Mac. Then get the latest Apache Spark and Hive tools by going to the Visual Studio Code extension repository or the Visual Studio Code Marketplace and searching for Spark.

For more information about Apache Spark and Hive tools for Visual Studio Code, please use the following resources:

If you have questions, feedback, comments, or bug reports, please use the comments below or send a note to hdivstool@microsoft.com.

Jenny Jiang

Principal Program Manager, R&D Data Analytics

See more articles from this author

PublishedApr 25

4 min read read

Why migrate Windows Server and SQL Server to Azure: ROI, innovation, and free offers
PublishedApr 1

6 min read read

Provision Premium SSD v2 Storage for Microsoft SQL Server on Azure Virtual Machines in the Microsoft Azure portal
PublishedMar 25

4 min read read

Update on the support of DBCC CLONEDATABASE for production use
PublishedMar 21

3 min read read

Expand the limits of innovation with Azure data

Visual Studio Code: Develop PySpark jobs for SQL Server 2019 Big Data Clusters

Highlights of key features

How to install or update

Related Posts

Explore SQL Server 2022

Highlights of key features

How to install or update

Related Posts

Why migrate Windows Server and SQL Server to Azure: ROI, innovation, and free offers

Provision Premium SSD v2 Storage for Microsoft SQL Server on Azure Virtual Machines in the Microsoft Azure portal

Update on the support of DBCC CLONEDATABASE for production use

Expand the limits of innovation with Azure data