[Swift]LeetCode609. 在系统中查找重复文件

★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★
➤微信公众号：山青咏芝（shanqingyongzhi）
➤博客园地址：山青咏芝（https://www.cnblogs.com/strengthen/）
➤GitHub地址：https://github.com/strengthen/LeetCode
➤原文地址：https://www.cnblogs.com/strengthen/p/10467726.html
➤如果链接不是山青咏芝的博客园地址，则可能是爬取作者的文章。
➤原文已修改更新！强烈建议点击原文地址阅读！支持作者！支持原创！
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★

Given a list of directory info including directory path, and all the files with contents in this directory, you need to find out all the groups of duplicate files in the file system in terms of their paths.

A group of duplicate files consists of at least two files that have exactly the same content.

A single directory info string in the input list has the following format:

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

It means there are n files (f1.txt, f2.txt ... fn.txt with content f1_content, f2_content ... fn_content, respectively) in directory root/d1/d2/.../dm. Note that n >= 1 and m >= 0. If m = 0, it means the directory is just the root directory.

The output is a list of group of duplicate file paths. For each group, it contains all the file paths of the files that have the same content. A file path is a string that has the following format:

"directory_path/file_name.txt"

Example 1:

Input:
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
Output:
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

Note:

No order is required for the final output.
You may assume the directory name, file name and file content only has letters and digits, and the length of file content is in the range of [1,50].
The number of files given is in the range of [1,20000].
You may assume no files or directories share the same name in the same directory.
You may assume each given directory info represents a unique directory. Directory path and file info are separated by a single blank space.

Follow-up beyond contest:

Imagine you are given a real file system, how will you search files? DFS or BFS?
If the file content is very large (GB level), how will you modify your solution?
If you can only read the file by 1kb each time, how will you modify your solution?
What is the time complexity of your modified solution? What is the most time-consuming part and memory consuming part of it? How to optimize?
How to make sure the duplicated files you find are not false positive?

给定一个目录信息列表，包括目录路径，以及该目录中的所有包含内容的文件，您需要找到文件系统中的所有重复文件组的路径。一组重复的文件至少包括二个具有完全相同内容的文件。

输入列表中的单个目录信息字符串的格式如下：

"root/d1/d2/.../dm f1.txt(f1_content) f2.txt(f2_content) ... fn.txt(fn_content)"

这意味着有 n 个文件（f1.txt, f2.txt ... fn.txt 的内容分别是 f1_content, f2_content ... fn_content）在目录 root/d1/d2/.../dm 下。注意：n>=1 且 m>=0。如果 m=0，则表示该目录是根目录。

该输出是重复文件路径组的列表。对于每个组，它包含具有相同内容的文件的所有文件路径。文件路径是具有下列格式的字符串：

"directory_path/file_name.txt"

示例 1：

输入：
["root/a 1.txt(abcd) 2.txt(efgh)", "root/c 3.txt(abcd)", "root/c/d 4.txt(efgh)", "root 4.txt(efgh)"]
输出：
[["root/a/2.txt","root/c/d/4.txt","root/4.txt"],["root/a/1.txt","root/c/3.txt"]]

注：

最终输出不需要顺序。
您可以假设目录名、文件名和文件内容只有字母和数字，并且文件内容的长度在 [1，50] 的范围内。
给定的文件数量在 [1，20000] 个范围内。
您可以假设在同一目录中没有任何文件或目录共享相同的名称。
您可以假设每个给定的目录信息代表一个唯一的目录。目录路径和文件信息用一个空格分隔。

超越竞赛的后续行动：

假设您有一个真正的文件系统，您将如何搜索文件？广度搜索还是宽度搜索？
如果文件内容非常大（GB级别），您将如何修改您的解决方案？
如果每次只能读取 1 kb 的文件，您将如何修改解决方案？
修改后的解决方案的时间复杂度是多少？其中最耗时的部分和消耗内存的部分是什么？如何优化？
如何确保您发现的重复文件不是误报？

584ms

 1 class Solution {
 2     func findDuplicate(_ paths: [String]) -> [[String]] {
 3         var ctp = [String: [String]]()
 4         for path in paths {
 5             let comps = path.split(separator: " ")
 6             let dir = comps[0]
 7             for file in comps[1...] {
 8                 let (fullPath, contents) = analyzeFile(file, dir)
 9                 ctp[contents, default: []].append(fullPath)
10             }
11         }
12         return Array(ctp.values).filter { $0.count > 1 }
13     }
14
15     private func analyzeFile(_ file: String.SubSequence, _ dir: String.SubSequence) -> (String, String) {
16         let startIndex = file.index(of: "(")!
17         let endIndex = file.index(before: file.endIndex)
18         let contents = String(file[file.index(after: startIndex)..<endIndex])
19         let path = String(file[file.startIndex..<startIndex])
20         return ("\(dir)/\(path)", contents)
21     }
22 }

700ms

 1 class Solution {
 2     func findDuplicate(_ paths: [String]) -> [[String]] {
 3         var map = [String: [String]]()
 4         for path in paths {
 5             let contents = extractContent(path)
 6             for content in contents {
 7                 map[content[1], default: [String]()].append(content[0])
 8             }
 9         }
10         var result = [[String]]()
11         for key in map.keys {
12             if let tmpPaths = map[key] {
13                 if tmpPaths.count > 1 {
14                     result.append(tmpPaths)
15                 }
16             }
17         }
18         return result
19     }
20
21     private func extractContent(_ str: String) -> [[String]] {
22         let arr = str.split(separator: " ")
23         let root = arr[0]
24         var result = [[String]]()
25         for i in 1 ..< arr.count {
26             let str = arr[i]
27             let left = str.firstIndex(of: "(")!
28             let right = str.lastIndex(of: ")")!
29             let filename = String(str[..<left])
30             let content = String(str[left ... right])
31             result.append(["\(root)/\(filename)", content])
32         }
33         return result
34     }
35 }

908ms

 1 class Solution {
 2
 3     typealias Content = String
 4     typealias FilePath = String
 5     typealias ExtractedContentPath = (content: Content, filepath: FilePath)
 6
 7     func findDuplicate(_ paths: [String]) -> [[String]] {
 8
 9         let contentFileTable: [Content: [FilePath]] = paths.lazy
10             .flatMap { self.extractedContentPaths($0) }
11             .reduce(into: [Content: [FilePath]]()){ (dict: inout [Content: [FilePath]], extractedContentPath: ExtractedContentPath) in
12                 dict[extractedContentPath.content, default: [FilePath]()].append(extractedContentPath.filepath)
13         }
14
15         return contentFileTable.values.filter { $0.count > 1 }
16
17     }
18
19     private func extractedContentPaths(_ input: String) -> [ExtractedContentPath] {
20         let tokens = input.components(separatedBy: .whitespaces)
21         let directory = tokens.first!
22         return tokens.indices.dropFirst()
23             .lazy.map { extractedContentPath(from: tokens[$0], directory: directory) }
24     }
25
26     private func extractedContentPath(from fileAndContent: String, directory: String) -> ExtractedContentPath {
27         let tokens = fileAndContent.dropLast().components(separatedBy: "(")
28         return ExtractedContentPath(content: tokens.last!, filepath: "\(directory)/\(tokens.first!)")
29     }
30 }

992ms

 1 class Solution {
 2     func findDuplicate(_ paths: [String]) -> [[String]] {
 3         var groups = [String: Array<String>]()
 4
 5         for info in paths {
 6             let list = parse(info: info)
 7
 8             for file in list {
 9                 groups[file.1, default: Array<String>()].append(file.0)
10             }
11         }
12
13         var result = [[String]]()
14
15         for group in Array(groups.values) {
16             if group.count > 1 {
17                 result.append(group)
18             }
19         }
20
21         return result
22     }
23
24     func parse(info: String) -> [(String, String)] {
25         var components = info.components(separatedBy: " ")
26         let path = components.removeFirst()
27
28         var result = [(String, String)]()
29
30         let splitCharSet = CharacterSet(charactersIn: "()")
31         while !components.isEmpty {
32
33             let entry = components.removeFirst()
34             let parts = entry.components(separatedBy: splitCharSet)
35
36             let file = "\(path)/\(parts[0])"
37             let contents = parts[1]
38
39             result.append((file, contents))
40         }
41
42         return result
43     }
44 }

1232ms

 1 class Solution {
 2     func findDuplicate(_ paths: [String]) -> [[String]] {
 3         var fileMapPaths = [String: [String]]()
 4         paths.forEach { (str) in
 5             let arrStrs = str.split(separator: " ")
 6             let dir = arrStrs[0]
 7             for i in 1..<arrStrs.count {
 8                 let fileInfo = arrStrs[i]
 9                 let subArrStr = fileInfo.split(separator: "(")
10                 let md5 = String(subArrStr[1])
11                 let file = String(subArrStr[0].prefix(subArrStr[0].count))
12                 let filePath = dir + "/" + file
13                 var mapPaths = fileMapPaths[md5] ?? [String]()
14                 mapPaths.append(filePath)
15                 fileMapPaths[md5] = mapPaths
16             }
17         }
18         let ans = fileMapPaths.values.filter { $0.count > 1}
19         return ans
20     }
21 }

1264ms

 1 class Solution {
 2     func findDuplicate(_ paths: [String]) -> [[String]] {
 3         //create a dictionary with [content: [filepath]], output the value count which is equal and greater than 2
 4         var contentToFiles = [String: [String]]()
 5         for path in paths {
 6             let params = path.split(separator: " ")
 7             guard let dir = params.first else {
 8                 continue
 9             }
10             for i in 1 ..< params.count {
11                 let fileParams = params[i].split(separator: "(")
12                 guard let fileName = fileParams.first, let fileContentWithExtraInfo = fileParams.last else {
13                     continue
14                 }
15                 let fileContent = String(describing: fileContentWithExtraInfo.dropLast())
16                 let filePath = String(describing:dir) + "/" + String(describing: fileName )
17                 contentToFiles[fileContent] = contentToFiles[fileContent, default:[]] + [filePath]
18             }
19         }
20         return contentToFiles.values.filter{$0.count >= 2}
21
22     }
23 }

Runtime: 1324 ms

Memory Usage: 26.4 MB

 1 class Solution {
 2     func findDuplicate(_ paths: [String]) -> [[String]] {
 3         var ans:[[String]] = [[String]]()
 4         var map:[String:[String]] = [String:[String]]()
 5         for path in paths
 6         {
 7             var temp:[String] = path.components(separatedBy:" ")
 8             var root:String = temp[0]
 9             for str in temp
10             {
11                 var begin:Int = str.findLast("(")
12                 var end:Int = str.findFirst(")")
13                 if begin + 1 < end
14                 {
15                     var name:String = str.subString(begin+1,end)
16                     var s:String = root + "/" + str.subStringTo(begin - 1)
17                     if map[name] == nil
18                     {
19                         map[name] = [s]
20                     }
21                     else
22                     {
23                         map[name]!.append(s)
24                     }
25                 }
26             }
27         }
28         for val in map.values
29         {
30             if val.count > 1
31             {
32                 ans.append(val)
33             }
34         }
35         return ans
36     }
37 }
38
39 //String扩展
40 extension String {
41     // 截取字符串：从起始处到index
42     // - Parameter index: 结束索引
43     // - Returns: 子字符串
44     func subStringTo(_ index: Int) -> String {
45         let theIndex = self.index(self.startIndex,offsetBy:min(self.count,index))
46         return String(self[startIndex...theIndex])
47     }
48
49     // 截取字符串：指定索引和字符数
50     // - begin: 开始截取处索引
51     // - count: 截取的字符数量
52     func subString(_ begin:Int,_ count:Int) -> String {
53         let start = self.index(self.startIndex, offsetBy: max(0, begin))
54         let end = self.index(self.startIndex, offsetBy:  min(self.count, begin + count))
55         return String(self[start..<end])
56     }
57
58     //从0索引处开始查找是否包含指定的字符串，返回Int类型的索引
59     //返回第一次出现的指定子字符串在此字符串中的索引
60     func findFirst(_ sub:String)->Int {
61         var pos = -1
62         if let range = range(of:sub, options: .literal ) {
63             if !range.isEmpty {
64                 pos = self.distance(from:startIndex, to:range.lowerBound)
65             }
66         }
67         return pos
68     }
69
70     //从0索引处开始查找是否包含指定的字符串，返回Int类型的索引
71     //返回最后出现的指定子字符串在此字符串中的索引
72     func findLast(_ sub:String)->Int {
73         var pos = -1
74         if let range = range(of:sub, options: .backwards ) {
75             if !range.isEmpty {
76                 pos = self.distance(from:startIndex, to:range.lowerBound)
77             }
78         }
79         return pos
80     }
81 }

1344ms

 1 class Solution {
 2     func findDuplicate(_ paths: [String]) -> [[String]] {
 3         var mapping = [String: [String]]()
 4         for i in 0..<paths.count {
 5             let arr = paths[i].components(separatedBy: " ")
 6             let base = arr[0] + "/"
 7             for j in 1..<arr.count {
 8                 let arrSplit = arr[j].components(separatedBy: "(")
 9                 mapping[arrSplit[1], default:[String]()].append(base + arrSplit[0])
10             }
11         }
12         return Array(mapping.values).filter{$0.count > 1}
13     }
14 }

转载于:https://www.cnblogs.com/strengthen/p/10467726.html

[Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System相关推荐

Java实现 LeetCode 609 在系统中查找重复文件（阅读理解+暴力大法）
609. 在系统中查找重复文件给定一个目录信息列表,包括目录路径,以及该目录中的所有包含内容的文件,您需要找到文件系统中的所有重复文件组的路径.一组重复的文件至少包括二个具有完全相同内容的文件. 输 ...
LeetCode 609. 在系统中查找重复文件（哈希）
1. 题目给定一个目录信息列表,包括目录路径,以及该目录中的所有包含内容的文件,您需要找到文件系统中的所有重复文件组的路径. 一组重复的文件至少包括二个具有完全相同内容的文件. 输入列表中的单个目录 ...
在系统中查找重复文件
给定一个目录信息列表,包括目录路径,以及该目录中的所有包含内容的文件,您需要找到文件系统中的所有重复文件组的路径.一组重复的文件至少包括二个具有完全相同内容的文件. 输入列表中的单个目录信息字符串的格 ...
157、在系统中查找重复文件
题目描述: 给定一个目录信息列表,包括目录路径,以及该目录中的所有包含内容的文件,您需要找到文件系统中的所有重复文件组的路径.一组重复的文件至少包括二个具有完全相同内容的文件. 输入列表中的单个目录信 ...
在linux中查找重复的文件夹,如何在Linux上找出并删除重复的文件：FSlint
大家好,今天我们会学习如何在Linux PC或者服务器上找出和删除重复文件.这里有一款工具你可以工具自己的需要使用. 无论你是否正在使用Linux桌面或者服务器,有一些很好的工具能够帮你扫描系统中的重 ...
linux查找最近访问的文件,教您在Linux系统中查找最近修改的文件/文件夹
如果您使用Linux系统进行日常操作,则主目录文件将随时间急剧增加.如果您有成千上万个文件,很可能不记得最近更改的文件名,本文将教您在Linux系统中查找最近修改的文件/文件夹.另外,如果要检查出于任 ...
linux 查找只读文件夹,Linux系统中查找命令find的使用方法(二)
今天达内Linux培训小编要继续跟大家分享关于Linux系统中中查找命令find的使用方法的文章.在上文中小编提到,Linux查找命令是Linux系统中很重要也是很常用的命令之一.Linux的查找命令 ...
php检测txt中重复数据,Python实现检测文件的MD5值来查找重复文件案例
平时学生交上机作业的时候经常有人相互复制,直接改文件名了事,为了能够简单的检测这种作弊行为,想到了检测文件的MD5值,虽然对于抄袭来说作用不大,但是聊胜于无,以后可以做一个复杂点的. # coding ...
利用 Linux 查找重复文件
方法一:使用Find命令本部分算是对find强大功能的扩展使用方法说明.在find的基础上,我们可与(如xargs命令)等其它基本Linux命令相结合,即能创造出无限的命令行功能,比如:可以快速查找 ...
硬盘快满了，怎么办？查找重复文件
硬盘快满了,怎么办?查找重复文件,绿色版链接:https://pan.baidu.com/s/1nbhRhninawNLK5CHZfanWA 提取码:yu37 序列号 TDFP-XWAL-EPNX- ...

[Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System

[Swift]LeetCode609. 在系统中查找重复文件 | Find Duplicate File in System相关推荐

最新文章

热门文章