MDS-Heatwave Data Analysis using Zeppelin
The Complete Guide to access MDS with HeatWave by Zeppelin
In this guide we will walk through how to install and access MDS with Heatwave.
1. What is MDS and HeatWave ?
2. What is Zeppelin ?
3. Environment details
4. How to Install Zeppelin ?
5. Network Consideration
6. How to access HeatWave
7. MDS Data Analysis using Zeppelin
8. Conclusion
Overview of Zeppelin and MySQL Database Service with HeatWave
We are living in the data world and as the data is increasing with large velocity, it is important to get the results quicker until earlier MySQL has challenged to produce results against larger data size in faster way how ever there is “paradigm shift” on current solutions of MySQL and
Now, MySQL (only in PaaS model of OCI) comes with HeatWave ,which is a distributed, scalable, shared-nothing, in-memory, hybrid columnar, query processing engine designed for extreme performance.
It is enabled when you add a heatwave cluster to a MySQL DB System.
So MySQL database service(MDS) will give faster results and then needed some data analysis tool to make meaningful of the data , get more insight of data.
In this blog let me introduce Apache Zeppelin for MDS data analysis.
Apache Zeppelin , an open source multipurpose notebook which help users to represent and analyze your data in the form of graphs or charts so that it help the organization to take quick decision.
We will explore each item in details…
Overall, My Idea is to show you quick demo , how easily you can connect MDS through Zeppelin.
Zepplein can be installed anywhere (any public/private cloud , On-Premises).
What is MDS and Heatwave ?
Oracle MySQL Database Service(MDS) is a fully managed database service that lets developers quickly develop and deploy secure, cloud native applications using the world’s most popular open source database.
MySQL Database Service is the only MySQL cloud service with an integrated, high performance, in-memory query accelerator – Heatwave. It enables customers to run sophisticated analytics directly against their operational MySQL databases—eliminating the need for complex, time-consuming, and expensive data movement and integration with a separate analytics database.
MySQL Database Service is 100% built, managed, and supported by the OCI and MySQL engineering teams.
more info:- https://www.oracle.com/mysql/
Heatwave:- https://www.oracle.com/mysql/heatwave
MDS Business benefits :-
https://mysqlsolutionsarchitect.blogspot.com/2022/02/understanding-mysql-database-servicemds.html
What is Zeppelin ?
Zeppelin is web-based notebook that enables data-driven, interactive data analytics and collaborative documents with language like SQL, Scala, Python, R and more.
Things like data ingestion, data exploration, data visualization, and data analytics can be done in the zeppelin notebook.
Features of Apache Zeppelin
v Data Ingestion.
v Data Discovery
v Data Analytics
v Data Visualization and Collaboration
More info:- https://zeppelin.apache.org/
Network Consideration
Make sure the port of zeppelin 8080 is whitelisted in your environment
and in case you are using Oracle Cloud Infrastructure (OCI) then make sure ingress rule is configured to white list the compute instance where Zeppelin is installed and also compute instance is able to ping pong MDS instance.
To access the Zeppelin, always access with proper user with Interpreter access otherwise any user can access your Zeppelin,
Sometime anonymous user get into the portal of Zeppelin by entering the Zeppelin public IP address but however if your interpreter is restricted with particular user then your workspaces is safer.
Hence, It is important to have data source authorization in Apache Zeppelin
https://zeppelin.apache.org/docs/0.10.0/setup/security/datasource_authorization.html
Note:- I couldn’t dig much in details about security and my major focused on where you are , how are you accessing the Zeppelin , how eill have seamless experiences with MDS and HeatWave and do awesome visualization and data analysis
Installation of Zeppelin
In this blog, installation of Zeppelin will be on Oracle Cloud Infrastructure (https://www.oracle.com/in/cloud/)
With below specifications
Step 1 #Install JDK
sudo yum install java-11-openjdk-devel
Step 2:- Download the Zeppelin by using below command
wget https://dlcdn.apache.org/zeppelin/zeppelin-0.10.0/zeppelin-0.10.0-bin-all.tgz
Step 3 :- create user and give the permission
sudo adduser -d /home/opc/zeppelin -s /sbin/nologin zeppelin
sudo chown -R zeppelin:zeppelin /home/opc/zeppelin
Step 4:- rename zeppelin site template to zeppelin site
cd /home/opc/zeppelin/conf
sudo cp zeppelin-site.xml.template zeppelin-site.xml
sudo cp zeppelin-site.xml.template zeppelin-site.xml
Step 5 :- #Start Zeppelin
Connect Zeppelin
http://<IP Address of Compute Instance/Local IP>::8080/#/
Connect MDS and Heatwave through Zeppelin
Download MySQL Connector/J
https://dev.mysql.com/downloads/connector/j/
rpm -ivh mysql-connector-java-8.0.28-1.el7.noarch.rpm
warning: mysql-connector-java-8.0.28-1.el7.noarch.rpm: Header V4 RSA/SHA256 Signature, key ID 3a79bd29: NOKEY
error: Failed dependencies:
java-headless >= 1:1.8.0 is needed by mysql-connector-java-1:8.0.28-1.el7.noarch
[
Fix:- yum -y install java-headless
Create a MySQL Interpreter
#Create directory called mds under zeppelin’s interpreter folder
# mkdir mds
#move the “mysql-connector-java.jar” file into MDS folder
#cp /usr/share/java/mysql-connector-java.jar /home/opc/zeppelin/interpreter/mds/
Navigate to the Interpreter
http://<IP Address of Compute Instance/Local IP>:8080/#/interpreter
Search for MDS interpreter and fill below details
Finally once changes are final then it looks like below
Access data from MDS and Heatwave
Please ensure MDS and Heatwave is up and running.
Create a notebook and access MDS with Heatwave via Zeppelin
Awesome! , Zeppelin Connected to MDS
Let’s do data analysis by using zeppelin
In this demonstration assume that MDS with Heatwave is up and running and data is loaded into HeatWave.
If you wanted to follow quick start demo , how to load data into HeatWave.
Schema used in the demo is “AirportDB” with 50 GB database size.
Command to load first time data into the HeatWave:-
run Auto Parallel Load to load airportdb data into HeatWave:
CALL sys.heatwave_load(JSON_ARRAY(‘airportdb’), NULL);
##Run Below SQL statement to generate report
use airportdb;
SELECT airline.airlinename, SUM(booking.price) as price_tickets, count(*) as nb_tickets
FROM booking, flight, airline, airport_geo
WHERE booking.flight_id=flight.flight_id AND
airline.airline_id=flight.airline_id AND
flight.from=airport_geo.airport_id AND
airport_geo.country = “UNITED STATES”
GROUP BY
airline.airlinename
ORDER BY
nb_tickets desc, airline.airlinename limit 10;
More info about Heatwave:- https://www.oracle.com/mysql/heatwave/
Conclusion
Apache Zeppelin is a kind of tool, which makes Data Scientist life smooth, they can do everything they need in one place. Things like data ingestion, data exploration, data visualization, and data analytics can be done in the zeppelin notebook and MDS HeatWave is a massively parallel, high performance, in-memory query accelerator for Oracle MySQL Database Service that accelerates MySQL performance by orders of magnitude for analytics and mixed workloads and cost lesser than Specialist analytics product like Amazon Redshift, Aurora ,Snowflake ,Azure Synpase ,google Big Query etc.