impala/hive脱敏导数方法，建表跟hdfs路径关联

经常使用的脱敏规则
1.md5()函数加密，需要用hive，用hive容易报内存不足，set mapreduce.map.memory.mb=1024的倍数
2.保留第一位，后面用*代替

concat(strleft(coalesce(Cst_no             ,''),1),regexp_replace(substr(coalesce(Cst_no             ,''),2),'.','*'))

脱敏导数常用方法
1.导出格式化数据到文本文件，（有表头，字段间分割），可以直接浏览数据
①创建sql脚本配合脱敏规则，拿出需要的字段
②执行导数，调用sql脚本impala-shell -i data02.bigdata.com -d default -l --auth_creds_ok_in_clear -u hive --ldap_password_cmd="echo -n Ab123456" -f test.sql -B --output_delimiter="|" --print_header -o result.txt
注意：只有-B -o时会把表头去掉，这个我在存量客户回访的判断是否交易日执行的脚本中用到了，i=cat clkh.txt，clkh.txt中要么是0，要么是1，用的就是impala导数。
-f执行脚本文件 -q执行"sql脚本"
2.生产拿到脱敏数据，并在测试环境建表查看脱敏后的数据
①在生产环境建一张临时表，表结构和需要脱敏的表结构一样。
②创建sql脚本配合脱敏规则，拿出需要的字段，insert overwrite table 临时表 partition(biz_date)可动态分区，可静态分区
③执行sql脚本,则临时表里就是脱敏后的数据
④从生产环境把hdfs的临时表数据拿下来在测试环境建表查看数据
举例

-------------------------------------------------------------check oldtable create table
[data02.bigdata.com:21000] > show create table test1;
Query: show create table test1
+-----------------------------------------------------------------+
| result                                                          |
+-----------------------------------------------------------------+
| CREATE TABLE ljh.test1 (                                        |
|   id INT,                                                       |
|   name STRING                                                   |
| )                                                               |
| PARTITIONED BY (                                                |
|   biz_date INT                                                  |
| )                                                               |
| STORED AS TEXTFILE                                              |
| LOCATION 'hdfs://nameservice1/user/hive/warehouse/ljh.db/test1' |
| TBLPROPERTIES ('numRows'='-1')                                  |
+-----------------------------------------------------------------+
----------------------------------------------------check oldtable data
[data02.bigdata.com:21000] > select * from test1;
Query: select * from test1
Query submitted at: 2021-07-28 13:41:52 (Coordinator: http://data02.bigdata.com:25000)
Query progress can be monitored at: http://data02.bigdata.com:25000/query_plan?query_id=9740be28f4fee1e6:adf2ad1a00000000
+----+------+----------+
| id | name | biz_date |
+----+------+----------+
| 2  | lili | 20210708 |
| 1  | tom  | 20210707 |
+----+------+----------+
------------------------------------------------get oldtable data to local
hadoop fs -get hdfs://nameservice1/user/hive/warehouse/ljh.db/test1 ./test1
----------------------------------------------------check local result
[test_user@data02 test1]$ ll
total 0
drwxrwxr-x 2 test_user test_user 65 Jul 28 13:43 biz_date=20210707
drwxrwxr-x 2 test_user test_user 66 Jul 28 13:43 biz_date=20210708
drwxrwxr-x 2 test_user test_user  6 Jul 28 13:43 _impala_insert_staging
-----------------------------------------------------create newtable hdfs path
[test_user@data02 test1]$ hadoop fs -mkdir hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728
----------------------------------------------------put local data to newtable hdfs path (table dir/partitions dir)
[test_user@data02 test1]$ hadoop fs -put ./* hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728
---------------------------------------------------check newtable hdfs result
[test_user@data02 test1]$ hadoop fs -ls hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728
Found 3 items
drwxr-xr-x   - test_user test_user          0 2021-07-28 13:54 hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728/_impala_insert_staging
drwxr-xr-x   - test_user test_user          0 2021-07-28 13:54 hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728/biz_date=20210707
drwxr-xr-x   - test_user test_user          0 2021-07-28 13:54 hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728/biz_date=20210708
-----------------------------------------------------create newtable as oldtable(STORED must be same)
CREATE TABLE ljh.test0728(
id INT,
name STRING
)
PARTITIONED BY(
biz_date INT
)
STORED AS TEXTFILE
LOCATION'hdfs://nameservice1/user/hive/warehouse/ljh.db/test0728'
TBLPROPERTIES('numRows'='-1')
-----------------------------------------------------relate to partitions
[data02.bigdata.com:21000] > alter table ljh.test0728 add partition(biz_date=20210707);
Query: alter table ljh.test0728 add partition(biz_date=20210707)
Fetched 0 row(s) in 0.05s
[data02.bigdata.com:21000] > alter table ljh.test0728 add partition(biz_date=20210708);
Query: alter table ljh.test0728 add partition(biz_date=20210708)
Fetched 0 row(s) in 0.03s
-------------------------------------------------------check newtable result
[data02.bigdata.com:21000] > select * from ljh.test0728;
Query: select * from ljh.test0728
Query submitted at: 2021-07-28 13:56:10 (Coordinator: http://data02.bigdata.com:25000)
Query progress can be monitored at: http://data02.bigdata.com:25000/query_plan?query_id=5b4ec77c3b04dad0:eb35d95d00000000
+----+------+----------+
| id | name | biz_date |
+----+------+----------+
| 1  | tom  | 20210707 |
| 2  | lili | 20210708 |
+----+------+----------+
Fetched 2 row(s) in 0.11s-------------------------------------------------------succeed-----------------------------

impala/hive脱敏导数方法，建表跟hdfs路径关联相关推荐

Hive的三种建表方式
Hive的三种建表方式 1.直接建表法: 2.抽取(as) 建表: 3.like建表 1.直接建表法: create table movies (uid string,iid string,score ...
获取impala下所有的数据库建表语句
本博文介绍三种方法,推荐使用第三种,前两种都是尝试. 方法一: 现在的导出还是有缺陷的,导出的文件中还是存在其他不必要的信息 #!/bin/bash ##获取数据库 databases=$(hive ...
Hive表与hdfs文件关联
2019独角兽企业重金招聘Python工程师标准>>> Hive版本2.1.1 List-1 建表 create external table alarm_external_tabl ...
hive：建库建表、表分区、内部表外部表、数据导入导出
hive建库建表与数据导入建库 hive中有一个默认的库: 库名: default 库目录:hdfs://hdp20-01:9000/user/hive/warehouse 新建库: create ...
5.hive建库建表与数据导入
建库 hive中有一个默认的库: 库名: default 库目录:hdfs://hdp20-01:9000/user/hive/warehouse 新建库: create database db_or ...
HIVE表占用HDFS空间 TOP表查询
思路 1.通过Hive元数据,获取到表和HDFS的对应关系 2.通过 pyhdfs工具类的get_content_summary方法快速获取到占用HDFS大小代码如下 # requirements ...
hive建表语句_Hive数据如何同步到MaxCompute之实践讲解
摘要:本次分享主要介绍 Hive数据如何迁移到MaxCompute.MMA(MaxCompute Migration Assist)是一款MaxCompute数据迁移工具,本文将为大家介绍MMA工具的 ...
Hive建表与数据加载
建表 CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name[(col_name data_type [CO ...
Hive建表时指定分隔符或使用多字符分隔
不知道你是否会有会这样的苦恼,就是从hdfs或者file中导入数据到Hive建数仓的时候,因为数据是数据流前段的小GG用特殊符号分隔的,然后导致你导入数据时一脸蒙圈? 这篇教程可能会帮到你. 首先我们 ...

impala/hive脱敏导数方法，建表跟hdfs路径关联

impala/hive脱敏导数方法，建表跟hdfs路径关联相关推荐

最新文章

热门文章