What is the status of Hive on Parquet
4 slave nodes
10.200.187.86 cslave1 4 Core 3G
10.200.187.87 cslave2 2 core 4G
10.200.187.88 cslave3 2 core 4G
10.200.187.89 cslave4 2 core 6G
1. Impala is much more efficient than Hive, given enough memory and simple SQL. Simple SQL runs on millions of pieces of data and takes a few seconds or even less than a second.
2. Impala performance has a lot to do with the storage format of the data. The text format at the million level differs ten times from the hbase format, and the parquet format at ten million levels differs from the text format a hundred times.
3. In the current cluster configuration, the Impala join at the Millions level performs slightly better than the Hive (3 to 4 times), but the join execution of large Impala tables fails at the Millions Level (insufficient memory).
4. Impala uses parquet storage (column type). Select some fields that are very efficient at querying conditions.
The official said the new version of Impala can be used in a production environment, but industry feedback says there will be many problems. The main problem is memory overflow. The official recommendation for Impala nodes is 128G.
Recommended usage scenarios:
It is deployed in the production environment and can be applied to operations and maintenance. It is efficient to do simple data reviews. However, it has a certain memory footprint, and it is not recommended to use complex SQL such as large joins of tables.
Imapla real-time query as seen above greatly improves relative Hive performance, but it cannot achieve the effect of a relational database, so it depends on the real needs of the business case.
Part of the information:
Hive: Complex batch query tasks and data conversion tasks.
Impala: real-time data analysis. Because UDF is not supported, the problem domain that can be addressed is limited. Use them with Hive to analyze Hive's results dataset in real time.
Supports SQL queries to query big data quickly.
You can query existing data to reduce data loading and conversion.
Different storage formats can be selected (parquet, text, Avro, RCFile, SequeenceFile).
Can be used in conjunction with the Hive.
UDF custom function is not supported.
Full text search in the text domain is not supported.
Transformations are not supported.
The fault tolerance of the query period is not supported.
High memory requirements.
create table test join parquet as
> select a.tid, a.buyer_nick, b.status, b.adjust_fee, b.buyer_email
> from (select tid, buyer_nick from s_trade_big_parquet) a
> (select tid, status, adjust_fee, buyer_email from s_trade_big_parquet) b
> on (a.tid = b.tid);
Query: create table testjoinparquet as select a.tid, a.buyer_nick, b.status, b.adjust_fee, b.buyer_email from (select tid, buyer_nick from s_trade_big_parquet) a join (select tid, status, adjust_fee, buyer_email from s_trade_big_parquet) b on (a.tid = b.tid)
ERRORS ENCOUNTERED DURING EXECUTION:
Backend 3: Memory Limit Exceeded
Query Limit: Consumption = 1.35 GB
Fragment dd496e82ab98ee40: 19f71d48047534a2: Consumption = 16.00 KB
UDFs: Consumption = 0.00
EXCHANGE_NODE (id = 4): Consumption = 0.00
DataStreamMgr: Consumption = 0.00
HdfsTableSink: Consumption = 0.00
Fragment dd496e82ab98ee40: 19f71d48047534a6: Consumption = 1.27 GB
UDFs: Consumption = 0.00
HASH_JOIN_NODE (id = 2): Consumption = 1.07 GB
HDFS_SCAN_NODE (id = 0): Consumption = 207.88 MB
EXCHANGE_NODE (id = 3): Consumption = 1.70 MB
DataStreamMgr: Consumption = 1.70 MB
DataStreamSender: Consumption = 2.45 KB
Fragment dd496e82ab98ee40: 19f71d48047534aa: Consumption = 82.39 MB
UDFs: Consumption = 0.00
HDFS_SCAN_NODE (id = 1): Consumption = 82.19 MB
DataStreamSender: Consumption = 8.00 KB
Note: The memory has been used up causing some nodes to read abnormally. Writing the table in parquet format is very memory intensive because the parquet is in 1G units and 1G must be loaded into memory and then written.
Open Source China: http://my.oschina.net/weiqingbin/blog/196143#OSC_h2_31, there are also many translation documents for the optimization and application of the Impala framework.
To compare beehive and Impala see http://www.cloudera.com/content/cloudera- content / cloudera-docs / Impala / latest / Installing and Using Impala / ciiu_langref_unsupported.html # langref_unsupported,
Reprint at: https://my.oschina.net/sniperLi/blog/688768
- What rhymes with morange
- Can you refinish furniture
- Can hydrogen ever power an engine?
- What makes your world round
- What is Capgemini
- Are bike routes reliable on Google Maps
- Which is better PF or mutual fund
- Prisoners may wear nail clippers
- What happened to Haiti's natural resources
- What role does the designer play
- What is standard atmosphere
- Which drugs are most commonly abused?
- What is due diligence in business
- How quickly will the dollar collapse
- What does the UAE look like
- Will Vail destroy smaller independent ski resorts
- There are schools for airbrush training
- Football is fun
- Who looks through my webcam
- What do you like most about hugging
- What are the limits for CEPT
- How much do sports cars cost
- Wears USAF dog tags
- Is Halo more popular than Star Trek