Discussing Hadoop and MySQL Big Data at MySQL Connect 2013

Big Data with MySQL and Hadoop at MySQL Connect 2013

I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction of Apache Hadoop and its components (HFDS, Map/Reduce, Hive, HBase/HCatalog, Flume, Scoop, etc). Next I will show 2 major Big Data scenarios:

From file to Hadoop to MySQL. This is an example of “ELT” process: Extract data from external source; Load data into Hadoop; Transform data/Analyze data; Extract results to MySQL. It is similar to the original Data Warehouse ETL (Extract; Transfer; Load) process, however, instead of “transforming” data before loading it to the Data Warehouse, we will load it “as is” and then run the data analysis. As a result of this analysis (map/reduce process) we can generate a report and load it to MySQL (using Sqoop export). To illustrate this process I will show 2 classical examples: Clickstream analysis and Twitter feed analysis. On top of those examples I will also show how to use MySQL / Full Text Search solutions to perform a near real-time reports from HBase.

Picture 1: ELT pipeline, from File to Hadoop to MySQL

From OLTP MySQL to Hadoop to MySQL reporting. In this scenario we extract data (potentially close to real-time) from MySQL, load it to Hadoop for storage and analysis and later generate reports to load it into another MySQL instance (reporting), which can be used to generate and display graphs.

Picture 2: From OLTP MySQL to Hadoop to MySQL reporting.

Note: The reason why we need an additional storage for MySQL reports is that it may take a long time to generate a Hive report (as it is executed with Map/Reduce which reads all the files/no indexes). So it make sense to “offload” a common reports’ results into a separate storage (MySQL).

In both scenarios we will need a way to integrate Hadoop and MySQL. In my previous post, MySQL and Hadoop integration, I have demonstrated how to integrate Hadoop and MySQL with Sqoop and Hadoop Applier for MySQL. This case is similar, however we can use a different toolset. In the scenario 1 (i.e. Clickstream) we can use Apache Flume to grab files (or read “events”) and load them into Hadoop. With Flume we can define a “source” and a “sink”. Flume supports a range of different sources including HTTP requests, Syslog, TCP, etc. HTTP source is interesting, as we can convert all (or a number of) HTTP requests (“Source”) into an “event” which can be loaded into Hadoop (“Sink”).

During my presentation I will show the exact configurations for the sample Clickstream process, including:

Flume configuration
HiveQL queries to generate a report
Sqoop export queries to load the report into MySQL

See you at MySQL Connect 2013!

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Zhou Sheng

10 years ago

From OLTP MySQL to Hadoop to MySQL reporting. In this scenario we extract data (potentially close to real-time) from MySQL, load it to Hadoop for storage and analysis and later generate reports to load it into another MySQL instance (reporting), which can be used to generate and display graphs.

how to real-time extract data from mysql?

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Big Data with MySQL and Hadoop at MySQL Connect 2013

Related

Related Blog Articles

RECOMMENDED ARTICLES

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

MySQL 8.4 First Peek

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Big Data with MySQL and Hadoop at MySQL Connect 2013

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Seamless Table Modifications: Leveraging pt-online-schema-change for Online Alterations

Securing Your MySQL Database: Essential Best Practices

MySQL 8.4 First Peek

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation