本篇是spark read一个parquet源码分析的第三篇,这一篇主要介绍spark的默认的partition的设置逻辑,当然,这一篇实际上算不上源码分析了
第一篇
第二篇

1 . userProfileSource 的partition数量决定因素

这一块儿之前总是看到说是由文件的大小决定的,每个block是一个partition(一般是128M,可以在hdfs上设置),但是分析一般是对于单个文件做的分析,这里尝试对多个文件进行分析。

1. 增加调试代码

为了调试rdd partition的具体信息,增加了下面的代码

userProfileSource.javaRDD().getNumPartitions();

代码变成


public class UserProfileTest {static String filePath = "hdfs://test:9000/user/daily/20200828/*.parquet";public static void main(String[] args) {SparkConf sparkConf = new SparkConf().setMaster("local").setAppName("user_profile_test").set(ConfigurationOptions.ES_NODES, "").set(ConfigurationOptions.ES_PORT, "").set(ConfigurationOptions.ES_MAPPING_ID, "uid");SparkSession sparkSession = SparkSession.builder().config(sparkConf).getOrCreate();Dataset<Row> userProfileSource = sparkSession.read().parquet(filePath);userProfileSource.javaRDD().getNumPartitions();userProfileSource.count();userProfileSource.write().parquet("hdfs:///user/daily/result2020091008/");}
}

2. 对应目录下的文件信息


# hsl -h hdfs://test:9000/user/daily/20200828/ |head184.2 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00000-41ca-c000.snappy.parquet182.2 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00001-41ca-c000.snappy.parquet183.3 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00002-41ca-c000.snappy.parquet183.2 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00003-41ca-c000.snappy.parquet183.6 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00004-41ca-c000.snappy.parquet182.7 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00005-41ca-c000.snappy.parquet183.0 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00006-41ca-c000.snappy.parquet182.1 M 2020-09-02 21:19 hdfs://test:9000/user/daily/20200828/part-00007-41ca-c000.snappy.parquet

可以看到文件的大小基本都是180M,总共有100个这样的文件。

3. rdd的partition信息

在调试中和程序运行中也可以看到(对应的job图),rdd有150个partition,为了方便把前文中的DAG图再这里再粘贴一遍。下图中count的第一个stage就是有150个task运行。

这里我就不再仔细的调试rdd生成的逻辑了,直接给出rdd的分区信息,这个分区信息是通过debug模式从rdd中获取到的,下面显示了150个partition的具体情况。


result = {Partition[150]@13053}0 = {FilePartition@13054} "FilePartition(0,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00000-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"1 = {FilePartition@13055} "FilePartition(1,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00001-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"2 = {FilePartition@13056} "FilePartition(2,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00002-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"3 = {FilePartition@13057} "FilePartition(3,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00003-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"4 = {FilePartition@13058} "FilePartition(4,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00004-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"5 = {FilePartition@13059} "FilePartition(5,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00005-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"6 = {FilePartition@13060} "FilePartition(6,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00006-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"7 = {FilePartition@13061} "FilePartition(7,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00007-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"8 = {FilePartition@13062} "FilePartition(8,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00008-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"9 = {FilePartition@13063} "FilePartition(9,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00009-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"10 = {FilePartition@13064} "FilePartition(10,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00010-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"11 = {FilePartition@13065} "FilePartition(11,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00011-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"12 = {FilePartition@13066} "FilePartition(12,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00012-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"13 = {FilePartition@13067} "FilePartition(13,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00013-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"14 = {FilePartition@13068} "FilePartition(14,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00014-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"15 = {FilePartition@13069} "FilePartition(15,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00015-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"16 = {FilePartition@13070} "FilePartition(16,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00016-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"17 = {FilePartition@13071} "FilePartition(17,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00017-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"18 = {FilePartition@13072} "FilePartition(18,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00018-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"19 = {FilePartition@13073} "FilePartition(19,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00019-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"20 = {FilePartition@13074} "FilePartition(20,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00020-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"21 = {FilePartition@13075} "FilePartition(21,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00021-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"22 = {FilePartition@13076} "FilePartition(22,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00022-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"23 = {FilePartition@13077} "FilePartition(23,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00023-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"24 = {FilePartition@13078} "FilePartition(24,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00024-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"25 = {FilePartition@13079} "FilePartition(25,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00025-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"26 = {FilePartition@13080} "FilePartition(26,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00026-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"27 = {FilePartition@13081} "FilePartition(27,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00027-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"28 = {FilePartition@13082} "FilePartition(28,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00028-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"29 = {FilePartition@13083} "FilePartition(29,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00029-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"30 = {FilePartition@13084} "FilePartition(30,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00030-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"31 = {FilePartition@13085} "FilePartition(31,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00031-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"32 = {FilePartition@13086} "FilePartition(32,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00032-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"33 = {FilePartition@13087} "FilePartition(33,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00033-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"34 = {FilePartition@13088} "FilePartition(34,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00034-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"35 = {FilePartition@13089} "FilePartition(35,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00035-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"36 = {FilePartition@13090} "FilePartition(36,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00036-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"37 = {FilePartition@13091} "FilePartition(37,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00037-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"38 = {FilePartition@13092} "FilePartition(38,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00038-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"39 = {FilePartition@13093} "FilePartition(39,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00039-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"40 = {FilePartition@13094} "FilePartition(40,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00040-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"41 = {FilePartition@13095} "FilePartition(41,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00041-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"42 = {FilePartition@13096} "FilePartition(42,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00042-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"43 = {FilePartition@13097} "FilePartition(43,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00043-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"44 = {FilePartition@13098} "FilePartition(44,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00044-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"45 = {FilePartition@13099} "FilePartition(45,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00045-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"46 = {FilePartition@13100} "FilePartition(46,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00046-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"47 = {FilePartition@13101} "FilePartition(47,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00047-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"48 = {FilePartition@13102} "FilePartition(48,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00048-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"49 = {FilePartition@13103} "FilePartition(49,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00049-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"50 = {FilePartition@13104} "FilePartition(50,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00050-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"51 = {FilePartition@13105} "FilePartition(51,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00051-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"52 = {FilePartition@13106} "FilePartition(52,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00052-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"53 = {FilePartition@13107} "FilePartition(53,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00053-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"54 = {FilePartition@13108} "FilePartition(54,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00054-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"55 = {FilePartition@13109} "FilePartition(55,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00055-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"56 = {FilePartition@13110} "FilePartition(56,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00056-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"57 = {FilePartition@13111} "FilePartition(57,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00057-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"58 = {FilePartition@13112} "FilePartition(58,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00058-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"59 = {FilePartition@13113} "FilePartition(59,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00059-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"60 = {FilePartition@13114} "FilePartition(60,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00060-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"61 = {FilePartition@13115} "FilePartition(61,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00061-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"62 = {FilePartition@13116} "FilePartition(62,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00062-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"63 = {FilePartition@13117} "FilePartition(63,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00063-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"64 = {FilePartition@13118} "FilePartition(64,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00064-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"65 = {FilePartition@13119} "FilePartition(65,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00065-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"66 = {FilePartition@13120} "FilePartition(66,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00066-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"67 = {FilePartition@13121} "FilePartition(67,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00067-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"68 = {FilePartition@13122} "FilePartition(68,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00068-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"69 = {FilePartition@13123} "FilePartition(69,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00069-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"70 = {FilePartition@13124} "FilePartition(70,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00070-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"71 = {FilePartition@13125} "FilePartition(71,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00071-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"72 = {FilePartition@13126} "FilePartition(72,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00072-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"73 = {FilePartition@13127} "FilePartition(73,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00073-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"74 = {FilePartition@13128} "FilePartition(74,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00074-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"75 = {FilePartition@13129} "FilePartition(75,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00075-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"76 = {FilePartition@13130} "FilePartition(76,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00076-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"77 = {FilePartition@13131} "FilePartition(77,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00077-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"78 = {FilePartition@13132} "FilePartition(78,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00078-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"79 = {FilePartition@13133} "FilePartition(79,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00079-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"80 = {FilePartition@13134} "FilePartition(80,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00080-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"81 = {FilePartition@13135} "FilePartition(81,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00081-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"82 = {FilePartition@13136} "FilePartition(82,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00082-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"83 = {FilePartition@13137} "FilePartition(83,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00083-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"84 = {FilePartition@13138} "FilePartition(84,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00084-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"85 = {FilePartition@13139} "FilePartition(85,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00085-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"86 = {FilePartition@13140} "FilePartition(86,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00086-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"87 = {FilePartition@13141} "FilePartition(87,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00087-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"88 = {FilePartition@13142} "FilePartition(88,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00088-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"89 = {FilePartition@13143} "FilePartition(89,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00089-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"90 = {FilePartition@13144} "FilePartition(90,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00090-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"91 = {FilePartition@13145} "FilePartition(91,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00091-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"92 = {FilePartition@13146} "FilePartition(92,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00092-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"93 = {FilePartition@13147} "FilePartition(93,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00093-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"94 = {FilePartition@13148} "FilePartition(94,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00094-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"95 = {FilePartition@13149} "FilePartition(95,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00095-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"96 = {FilePartition@13150} "FilePartition(96,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00096-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"97 = {FilePartition@13151} "FilePartition(97,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00097-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"98 = {FilePartition@13152} "FilePartition(98,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00098-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"99 = {FilePartition@13153} "FilePartition(99,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00099-41ca-c000.snappy.parquet, range: 0-134217728, partition values: [empty row]))"100 = {FilePartition@13265} "FilePartition(100,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00020-41ca-c000.snappy.parquet, range: 134217728-195123990, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00009-41ca-c000.snappy.parquet, range: 134217728-195046711, partition values: [empty row]))"101 = {FilePartition@13266} "FilePartition(101,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00051-41ca-c000.snappy.parquet, range: 134217728-194146721, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00078-41ca-c000.snappy.parquet, range: 134217728-193958286, partition values: [empty row]))"102 = {FilePartition@13267} "FilePartition(102,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00072-41ca-c000.snappy.parquet, range: 134217728-193883957, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00017-41ca-c000.snappy.parquet, range: 134217728-193740729, partition values: [empty row]))"103 = {FilePartition@13268} "FilePartition(103,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00052-41ca-c000.snappy.parquet, range: 134217728-193733952, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00035-41ca-c000.snappy.parquet, range: 134217728-193630805, partition values: [empty row]))"104 = {FilePartition@13269} "FilePartition(104,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00060-41ca-c000.snappy.parquet, range: 134217728-193595463, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00040-41ca-c000.snappy.parquet, range: 134217728-193582950, partition values: [empty row]))"105 = {FilePartition@13270} "FilePartition(105,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00097-41ca-c000.snappy.parquet, range: 134217728-193553823, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00088-41ca-c000.snappy.parquet, range: 134217728-193318554, partition values: [empty row]))"106 = {FilePartition@13271} "FilePartition(106,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00028-41ca-c000.snappy.parquet, range: 134217728-193247136, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00081-41ca-c000.snappy.parquet, range: 134217728-193238350, partition values: [empty row]))"107 = {FilePartition@13272} "FilePartition(107,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00083-41ca-c000.snappy.parquet, range: 134217728-193192582, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00057-41ca-c000.snappy.parquet, range: 134217728-193169659, partition values: [empty row]))"108 = {FilePartition@13273} "FilePartition(108,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00000-41ca-c000.snappy.parquet, range: 134217728-193154555, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00096-41ca-c000.snappy.parquet, range: 134217728-193056440, partition values: [empty row]))"109 = {FilePartition@13274} "FilePartition(109,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00027-41ca-c000.snappy.parquet, range: 134217728-193040434, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00089-41ca-c000.snappy.parquet, range: 134217728-193010886, partition values: [empty row]))"110 = {FilePartition@13275} "FilePartition(110,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00012-41ca-c000.snappy.parquet, range: 134217728-192989901, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00099-41ca-c000.snappy.parquet, range: 134217728-192957150, partition values: [empty row]))"111 = {FilePartition@13276} "FilePartition(111,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00021-41ca-c000.snappy.parquet, range: 134217728-192890064, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00091-41ca-c000.snappy.parquet, range: 134217728-192871474, partition values: [empty row]))"112 = {FilePartition@13277} "FilePartition(112,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00030-41ca-c000.snappy.parquet, range: 134217728-192862266, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00058-41ca-c000.snappy.parquet, range: 134217728-192846129, partition values: [empty row]))"113 = {FilePartition@13278} "FilePartition(113,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00092-41ca-c000.snappy.parquet, range: 134217728-192831718, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00049-41ca-c000.snappy.parquet, range: 134217728-192826946, partition values: [empty row]))"114 = {FilePartition@13279} "FilePartition(114,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00087-41ca-c000.snappy.parquet, range: 134217728-192797029, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00053-41ca-c000.snappy.parquet, range: 134217728-192723879, partition values: [empty row]))"115 = {FilePartition@13280} "FilePartition(115,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00064-41ca-c000.snappy.parquet, range: 134217728-192715391, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00042-41ca-c000.snappy.parquet, range: 134217728-192710676, partition values: [empty row]))"116 = {FilePartition@13281} "FilePartition(116,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00055-41ca-c000.snappy.parquet, range: 134217728-192685175, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00033-41ca-c000.snappy.parquet, range: 134217728-192652367, partition values: [empty row]))"117 = {FilePartition@13282} "FilePartition(117,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00056-41ca-c000.snappy.parquet, range: 134217728-192639026, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00036-41ca-c000.snappy.parquet, range: 134217728-192636886, partition values: [empty row]))"118 = {FilePartition@13283} "FilePartition(118,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00077-41ca-c000.snappy.parquet, range: 134217728-192628108, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00037-41ca-c000.snappy.parquet, range: 134217728-192619533, partition values: [empty row]))"119 = {FilePartition@13284} "FilePartition(119,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00016-41ca-c000.snappy.parquet, range: 134217728-192610267, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00004-41ca-c000.snappy.parquet, range: 134217728-192553300, partition values: [empty row]))"120 = {FilePartition@13285} "FilePartition(120,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00038-41ca-c000.snappy.parquet, range: 134217728-192550224, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00079-41ca-c000.snappy.parquet, range: 134217728-192528242, partition values: [empty row]))"121 = {FilePartition@13286} "FilePartition(121,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00048-41ca-c000.snappy.parquet, range: 134217728-192518314, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00014-41ca-c000.snappy.parquet, range: 134217728-192492734, partition values: [empty row]))"122 = {FilePartition@13287} "FilePartition(122,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00043-41ca-c000.snappy.parquet, range: 134217728-192488828, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00075-41ca-c000.snappy.parquet, range: 134217728-192457423, partition values: [empty row]))"123 = {FilePartition@13288} "FilePartition(123,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00080-41ca-c000.snappy.parquet, range: 134217728-192436303, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00086-41ca-c000.snappy.parquet, range: 134217728-192422791, partition values: [empty row]))"124 = {FilePartition@13289} "FilePartition(124,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00010-41ca-c000.snappy.parquet, range: 134217728-192401383, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00031-41ca-c000.snappy.parquet, range: 134217728-192390802, partition values: [empty row]))"125 = {FilePartition@13290} "FilePartition(125,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00073-41ca-c000.snappy.parquet, range: 134217728-192384328, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00068-41ca-c000.snappy.parquet, range: 134217728-192373709, partition values: [empty row]))"126 = {FilePartition@13291} "FilePartition(126,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00045-41ca-c000.snappy.parquet, range: 134217728-192324641, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00074-41ca-c000.snappy.parquet, range: 134217728-192320288, partition values: [empty row]))"127 = {FilePartition@13292} "FilePartition(127,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00054-41ca-c000.snappy.parquet, range: 134217728-192310762, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00013-41ca-c000.snappy.parquet, range: 134217728-192309053, partition values: [empty row]))"128 = {FilePartition@13293} "FilePartition(128,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00029-41ca-c000.snappy.parquet, range: 134217728-192307665, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00085-41ca-c000.snappy.parquet, range: 134217728-192300557, partition values: [empty row]))"129 = {FilePartition@13294} "FilePartition(129,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00023-41ca-c000.snappy.parquet, range: 134217728-192287705, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00026-41ca-c000.snappy.parquet, range: 134217728-192244675, partition values: [empty row]))"130 = {FilePartition@13295} "FilePartition(130,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00002-41ca-c000.snappy.parquet, range: 134217728-192242883, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00032-41ca-c000.snappy.parquet, range: 134217728-192218855, partition values: [empty row]))"131 = {FilePartition@13296} "FilePartition(131,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00062-41ca-c000.snappy.parquet, range: 134217728-192200827, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00050-41ca-c000.snappy.parquet, range: 134217728-192191229, partition values: [empty row]))"132 = {FilePartition@13297} "FilePartition(132,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00069-41ca-c000.snappy.parquet, range: 134217728-192169417, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00098-41ca-c000.snappy.parquet, range: 134217728-192160782, partition values: [empty row]))"133 = {FilePartition@13298} "FilePartition(133,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00025-41ca-c000.snappy.parquet, range: 134217728-192152594, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00003-41ca-c000.snappy.parquet, range: 134217728-192146313, partition values: [empty row]))"134 = {FilePartition@13299} "FilePartition(134,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00095-41ca-c000.snappy.parquet, range: 134217728-192096023, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00047-41ca-c000.snappy.parquet, range: 134217728-192084102, partition values: [empty row]))"135 = {FilePartition@13300} "FilePartition(135,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00008-41ca-c000.snappy.parquet, range: 134217728-192046162, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00046-41ca-c000.snappy.parquet, range: 134217728-192036123, partition values: [empty row]))"136 = {FilePartition@13301} "FilePartition(136,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00090-41ca-c000.snappy.parquet, range: 134217728-191994182, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00066-41ca-c000.snappy.parquet, range: 134217728-191989716, partition values: [empty row]))"137 = {FilePartition@13302} "FilePartition(137,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00006-41ca-c000.snappy.parquet, range: 134217728-191934315, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00061-41ca-c000.snappy.parquet, range: 134217728-191865624, partition values: [empty row]))"138 = {FilePartition@13303} "FilePartition(138,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00015-41ca-c000.snappy.parquet, range: 134217728-191799380, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00065-41ca-c000.snappy.parquet, range: 134217728-191686604, partition values: [empty row]))"139 = {FilePartition@13304} "FilePartition(139,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00011-41ca-c000.snappy.parquet, range: 134217728-191638428, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00094-41ca-c000.snappy.parquet, range: 134217728-191613916, partition values: [empty row]))"140 = {FilePartition@13305} "FilePartition(140,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00093-41ca-c000.snappy.parquet, range: 134217728-191569247, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00005-41ca-c000.snappy.parquet, range: 134217728-191566051, partition values: [empty row]))"141 = {FilePartition@13306} "FilePartition(141,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00070-41ca-c000.snappy.parquet, range: 134217728-191540698, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00022-41ca-c000.snappy.parquet, range: 134217728-191510871, partition values: [empty row]))"142 = {FilePartition@13307} "FilePartition(142,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00024-41ca-c000.snappy.parquet, range: 134217728-191501951, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00041-41ca-c000.snappy.parquet, range: 134217728-191431095, partition values: [empty row]))"143 = {FilePartition@13308} "FilePartition(143,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00019-41ca-c000.snappy.parquet, range: 134217728-191423121, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00063-41ca-c000.snappy.parquet, range: 134217728-191381412, partition values: [empty row]))"144 = {FilePartition@13309} "FilePartition(144,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00067-41ca-c000.snappy.parquet, range: 134217728-191378487, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00039-41ca-c000.snappy.parquet, range: 134217728-191304847, partition values: [empty row]))"145 = {FilePartition@13310} "FilePartition(145,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00018-41ca-c000.snappy.parquet, range: 134217728-191205335, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00034-41ca-c000.snappy.parquet, range: 134217728-191131303, partition values: [empty row]))"146 = {FilePartition@13311} "FilePartition(146,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00084-41ca-c000.snappy.parquet, range: 134217728-191112887, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00001-41ca-c000.snappy.parquet, range: 134217728-191061715, partition values: [empty row]))"147 = {FilePartition@13312} "FilePartition(147,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00044-41ca-c000.snappy.parquet, range: 134217728-191023822, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00007-41ca-c000.snappy.parquet, range: 134217728-190977337, partition values: [empty row]))"148 = {FilePartition@13313} "FilePartition(148,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00059-41ca-c000.snappy.parquet, range: 134217728-190961642, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00071-41ca-c000.snappy.parquet, range: 134217728-190749907, partition values: [empty row]))"149 = {FilePartition@13314} "FilePartition(149,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00076-41ca-c000.snappy.parquet, range: 134217728-190698048, partition values: [empty row], path: hdfs://test:9000/user/daily/20200828/part-00082-41ca-c000.snappy.parquet, range: 134217728-190665081, partition values: [empty row]))"

这里我们首先关注前100个partition,这些partition对应的文件都只有一个,对应的range都是range: 0-134217728,对应的是128M,也就是hdfs存储的一个block。

再看后面的50个partition的情况
这里的每个partition对应的都是两个文件,拿最后一个partition,partition149来说

 149 = {FilePartition@13314}
"FilePartition(149,WrappedArray(path: hdfs://test:9000/user/daily/20200828/part-00076-41ca-c000.snappy.parquet, range: 134217728-190698048, partition values: [empty row],
path: hdfs://test:9000/user/daily/20200828/part-00082-41ca-c000.snappy.parquet, range: 134217728-190665081, partition values: [empty row]))"

对应的是两个文件,分别是range: 134217728-19069804853.8M 和range: 134217728-19066508153.8M
对应的是两个文件的第二个block,猜测一下为何这两个文件会被分配到同一个partition呢,我想应该是因为他们在hdfs上面属于同一个data-node,这样的话可以认为两个数据是放在一起的。

spark读取文件源码分析-3相关推荐

  1. spark读取文件源码分析-2

    文章目录 1. job1产生时机源码分析 1. DataSoure.getOrInferFileFormatSchema() 2. ParquetFileFormat.inferSchema 1. 简 ...

  2. spark读取文件源码分析-1

    文章目录 1. 问题背景 2. 测试代码 3. 生成的DAG图 1. job0 2. job1 4. job0 产生的时机源码分析 1. 调用DataFrameReader.load,DataFram ...

  3. Hhadoop-2.7.0中HDFS写文件源码分析(二):客户端实现(1)

    一.综述 HDFS写文件是整个Hadoop中最为复杂的流程之一,它涉及到HDFS中NameNode.DataNode.DFSClient等众多角色的分工与合作. 首先上一段代码,客户端是如何写文件的: ...

  4. Spark RPC框架源码分析(二)RPC运行时序

    前情提要: Spark RPC框架源码分析(一)简述 一. Spark RPC概述 上一篇我们已经说明了Spark RPC框架的一个简单例子,Spark RPC相关的两个编程模型,Actor模型和Re ...

  5. Spark资源调度机制源码分析--基于spreadOutApps及非spreadOutApps两种资源调度算法

    Spark资源调度机制源码分析--基于spreadOutApps及非spreadOutApps两种资源调度算法 1.spreadOutApp尽量平均分配到每个executor上: 2.非spreadO ...

  6. springboot自动配置文件读取以及源码分析

    今天来讲讲springboot自动配置文件读取以及源码分析 springboot启动之后 1.首先进入@springbootApplication(如上图) 里面的**@EnableAutoConfi ...

  7. php读取图片文件流,详解php文件包含原理(读取文件源码、图片马、各种协议、远程getshell等)...

    详解php文件包含原理(读取文件源码.图片马.各种协议.远程getshell等) 作者是namezz (看完图相当于做了一轮实验系列) 现有文件代码如下 1.png (21.16 KB, 下载次数: ...

  8. include详解 shell_详解php文件包含原理(读取文件源码、图片马、各种协议、远程getshell等) ......

    详解php文件包含原理(读取文件源码.图片马.各种协议.远程getshell等) 作者是namezz (看完图相当于做了一轮实验系列) 现有文件代码如下 include和include_once.re ...

  9. spark 2.3源码分析之SortShuffleWriter

    SortShuffleWriter 概述 SortShuffleWriter它主要是判断在Map端是否需要本地进行combine操作.如果需要聚合,则使用PartitionedAppendOnlyMa ...

最新文章

  1. R语言WVPlots包可视化克利夫兰点ClevelandDotPlot、并按照分类变量排序进行可视化克利夫兰点ClevelandDotPlot
  2. Spider_douyin
  3. 10.16 ln软硬链接的创建等
  4. mysql cluster 7.1搭建
  5. java并发,同步synchronize和lock锁的使用方法和注意,死锁案例分析
  6. 几种SQL取日期部分的方法
  7. 程序员加班到凌晨,第二天却被开除,了解原因后大家都说大快人心
  8. Linux---解决校园网下VM与Xshell连接问题
  9. 手工matlab下K-means聚类算法实现而不是调用库函数
  10. 高德地图 热力图 清空地图
  11. Matlab 求不规则图形的 内切圆和外接圆 函数
  12. wireshark抓包分析怎么看进程_教大家wireshark抓包数据怎么看
  13. 针对特定人员和部门树形数据,重新拆分构造新的树形数据思路。
  14. VMware虚拟机提速10招
  15. mysql 12点_MySQL 查询昨天中午12点到今天中午12点的数据
  16. gps测距+java_GPS测距会高估你的移动距离
  17. 学习数学:往日油印稿,今日电子书
  18. uni-app截屏截取页面可视区,以及利用截屏截取完整页面方法
  19. ACM算法模板总结(分类详细版)
  20. python写邮箱系统_教大家用Python写一个简单电子邮件发信器

热门文章

  1. cocos2d-x游戏实例(6)-A星算法(2)
  2. 【白话科普】上网时遇到的 404 是什么意思?
  3. C++ STL : 模拟实现STL中的容器适配器stack和queue
  4. SOX 音频处理工具基本使用
  5. Game as a Service —— 开源云游戏搭载WebRTC
  6. MSU公布2019视频压缩评比报告客观部分
  7. 联捷俞海乐:从技术leader到CEO视野和责任提升几个维度
  8. GitHub超实用操作
  9. 如何提高自身监控系统的能力?
  10. 腾讯云TStack,带着“数据中心”游云南