Union Memory Joins Hands with GBase to Provide New Storage Solutions for Digital Transformation of Data-Intensive Industries

The rapid development of the Big Data Era drives the increasing innovation of Internet, Big Data, Cloud Computing, Artificial Intelligence, Block Chain and other technologies, and the data storage capacity of data-intensive industries, such as finance, Internet, operators and government, shows an explosive growth trend. It has become an important direction in the application of Big Data Warehouse in exploring the way to help enterprises better manage and tap the value of business data from massive data to meet their needs of rapid business development.

Usually adopting distributed computing technology, Big Data Warehouse achieves the storage of massive data with the natural scalability of big data, while converting SQL into the task of big data computing engine to realize data analysis, for example, using Hadoop and Spark as storage and computing engines, and using tools or programming languages to design processing logic to realize the aggregation, cleaning, computing and analysis of different data sources. In addition, Big Data Warehouse has such characteristics as massive data, efficient data query and analysis, data security, and data flexibility.

In China, the data warehouse of local deployment mode is still the first choice of government, finance, energy sectors and large enterprises. With its high throughput, low latency and other characteristics, SSD has gradually become an important carrier of data storage in the Big Data Era, making it an important carrying hardware for the current locally deployed data warehouse products.

In order to better meet the demand for high-performance and high-reliability storage in data-intensive industries, Union Memory joins hands with General Data Technology (hereinafter referred to as GBase), the domestic leader in the field of big data warehousing, to explore new storage solutions for digital transformation of data-intensive industries in the Big Data Era.

Technical introduction

GBase Massively Parallel Processing Database Cluster System (“GBase 8a MPP Cluster” for short) is a distributed parallel database cluster with Shared Nothing architecture developed on the basis of GBase 8a series storage databases, which has the characteristics of high performance, high availability and high scalability. It can provide a high cost-performance general computing platform for management of data in all sizes, and is widely used to support various data warehouse systems, BI systems and decision support systems.

Fig. 1: Schematic Technical Architecture of GBase 8a MPP Cluster

GBase 8a MPP Cluster adopts distributed federated architecture of MPP+Shared Nothing, in which communication among nodes is achieved through TCP/IP networks, and each node stores the data using a local disk. Each node in the GBase 8a MPP Cluster System is relatively independent and self-sufficient, and the whole system has a very high scalability, which is scalable from a few nodes to hundreds of nodes to meet the requirements of growing business scale.

Validation of technical solution

1 Verification Environment

1.1 Hardware Configuration for the Verification

Type

Model

Hardware Configuration

Remark

Server

2U2 Channel (X86)

CPU: Intel® Xeon® Gold 6330 CPU@2.00GHz*2

3 Sets

Memory: 8*32GB

Memory Controller: Support RAID 5 (Data Disk)

Network Card: 1*2 Ports 10GE Ethernet Card

Hard Disk

Union Memory SSD

System Disk: 2*480GB SATA SSD

Data Disk: 6*3.84TB SAS SSDUM511a

Exchange

10GE Exchange

48 Ports 10GE Exchange

/

1.2 Software Configuration for the Verification

Type

Model

Version

Remark

OS

redhat (x86)

7.9

/

GBase Version

GBase 8a

GBase8a_MPP_Cluster-License-9.5.3.14

/

Client

gccli

9.5.3.14

/

Database Pressure Testing

TPC-DS

3.2.0rc1

Open Source

Database Pressure Testing

TPC-H

3.0.0

Open Source

Network Monitoring

SAR

10.1.5

OS Built-in

IOSTAT

Disk-side IO Statistics

10.0.0

/

MPSTAT

CPU Utilization

10.1.5

/

1.3 Networking Planning for the Verification

Fig. 2: GBase 8a MPP Networking Planning Architecture

2 Verification Method

Step 1: Configure 6 SAS UM511a SSDs Raid5.

Step 2: Create Database and Shardings

 Use TPC-H tool to evaluate the database analysis and processing capabilities, create the database and 8 TPC-H built-in shardings in advance;

 Use TPC-DS tool to evaluate the database analysis and processing capabilities, create the database and the 25 TPC-DS built-in shardings in advance.

Step 3: Parameters Tuning

 Tune relevant parameters as per the suggestions of GBase.

Step 4: Data Generation

 TCH-H can generate required test data by setting parameters through the dbgen tool. The command parameter is /dbgen -C 10 -S 1 -s 3000 –vf;

 TPC-DS can generate required test data by setting parameters through the dsdgen tool. The command parameter is dsdgen -scale 3000 -dir testdata -force -parallel 10 -child 1.

Step 5: Data Loading

 When importing the data, an FTP server is built on the local node of the GBASE cluster and data is loaded through FTP. When a sharding contains multiple data files, the multiple data files of a single sharding are integrated into a single import sentence, and the test data is separately loaded into 8 database shardings at the granularity of a single sharding.

Step 6: SQL Execution

 Independent client communicates with GBase Cluster through service plane network, and uses gccli tool to execute TPC-H 22 SQL use cases, reference command is as follows:

/home/GBase/gccli_install/gcluster/server/bin/gccli -h 10.28.100.38 -uroot -Dtpch -vvv < query_1.sql.

3 Verification Results

The Verification Results  under GBase 8a MPP scenario are as follows:

3.1 Performance of Union Memory SAS SSD under Raid Logical Volume Bandwidth Performance Testing

Fig. 3: Performance of Union Memory SAS SSD under Raid Logical Volume Bandwidth Performance Testing

In this test, 6 SAS SSDs and 12 SATA SSDs are configured with Raid5 and Raid50, respectively. The FIO test tool is used for pressure testing of 128KB sequential read/write bandwidth performance on the Server host. As shown in Fig. 3, SAS SSDs have better read/write bandwidth. The read bandwidth is increased by about 19.6%  and the write bandwidth is increased by about 34%  compared with SATA, showing the absolute advantage of SAS SSDs under the Raid logical volume bandwidth.

3.2 Performance of Union Memory SAS SSD under Degraded Single Disk at Fault & Reconfigured Write Performance

Fig. 4: Performance of Union Memory SAS SSD under Degraded Single Disk at Fault & Reconfigured Write Performance

The Read Performance of Degraded Single Disk at Fault refers to the read service performance of the logical volume of the RAID group when a single disk is at fault or plugged out. The Reconfigured Performance refers to the service-side performance when the RAID group simultaneously performs data reconfiguration and service delivery on the hot spare disk when a single disk is at fault. This test is performed when 6 Union Memory SAS SSDs are configured with Raid5 and one SSD fails.

As shown in Fig. 4, SAS SSDs perform better than SATA SSDs both in the read performance of degraded single disk at fault and in the reconfigured read bandwidth. The read performance of degraded single disk at fault is about 22.7%  better than that of SATA SSDs, and the reconfigured read bandwidth performance is about 38.4%  higher than that of SATA SSDs.

3.3 Performance of Union Memory SAS SSD under TPC-DS Scenario

Fig. 5: Performance of Union Memory SAS SSD under TPC-DS Scenario

TPC-DS test is to load data to GBase database shardings via FTP protocol. As can be seen from Fig. 5, under the same physical hardware environment, Union Memory SAS SSDs take slightly less time than SATA SSDs in both data import and SQL use case execution, and the former have a certain advantage in time.

3.4 Performance of Union Memory SAS SSD under TCP-H Scenario

Fig. 6: Performance of Union Memory SAS SSD under TCP-H Scenario

TPC-H test is to load data to GBase database shardings via FTP protocol. Fig. 6 shows the performance of Union Memory SAS SSDs and SATA SSDs under TPC-H test. The total time of Union Memory SAS SSDs is slightly less than that of SATA SSDs, the data import time of Union Memory SAS SSDs is about 6%  less than that of SATA SSDs, and the execution time of SQL use case with Union Memory SAS SSDs is reduced about 3%  than that of SATA SSDs.

Summary

  • The verification fully demonstrates the performance advantages of Union Memory SAS SSDs under GBase 8a MPP scenarios. Union Memory SAS SSDs can effectively support efficient services, with advantages in higher bandwidth for a single disk and better performance of degraded and reconfigured disks at fault, not only help enterprises save hardware procurement costs, but also solve the problem of big data storage and computing under massive data, and enable efficiently processing of massive structured data.

    Union Memory has been deeply engaged in the field of solid-state drives for individual and enterprise users for many years, and has released a variety of high-performance and high-reliability products, which have the capability to cope with complex service environments and database challenges, and can meet the requirements of industrial users for processing massive data. Union Memory will continue to work with GBase to create new storage solutions in the context of digital transformation in our times.

Download
Products
ESSDs
DSSDs
CSSDs
Embedded Storages
Solutions
Servers
Data Centers
PCs
Mobile Terminals
Smart Wearables
Technical solution
Cases
Consumer Electronics
Internet
Finance
Telecom
Cloud Computing
Big Data
Technologies
In-house Controllers
Firmware Development
Packaging and Testing
Support
Download
Consulting
About Us
Company Profile
Milestones
Honors
Corporate Culture
Contact Us
News and Events
News
Events
Technical White Paper
投资者关系
公司治理
管理团队
财务报告
最新公告
Join Us
Social Recruitment
Campus Recruitment

Shenzhen UnionMemory Information System Limited

Address: 19th Floor, Block B, Ramaxel Houhai Center, Nanshan District, Shenzhen
Tel: +86 755-2681 3300
E-mail: support@unionmem.com

Copyright © 2020-2024 Shenzhen UnionMemory Information System Limited. All Rights Reserved.  Terms of Use     Privacy Policy     Cookies