Amazon S3 Tools 简介及使用

Amazon 提供了两种客户端工具 S3cmdS3Express,方便用户快速操作开放 s3 标准接口的分布式文件存储系统。前者用于 Linux 环境,后者用于 Windows 环境,这里主要介绍 Linux 环境的客户端工具 S3cmd .

一、S3cmd 是什么

官方 文档给出了说明,

S3cmd is a free command line tool and client for uploading, retrieving
and managing data in Amazon S3 and other cloud storage service providers
that use the S3 protocol, such as Google Cloud Storage or DreamHost DreamObjects.It is best suited for power users who are familiar with command line programs. It is also ideal for batch scripts and automated backup to S3, triggered from cron, etc.

这里有几个关键词,着重说明下:

  • command line tool and client :是一个命令行工具
  • uploading, retrieving and managing data: 提供了上传、检索、管理数据的功能
  • S3 protocol :存储服务要支持 s3 访问协议
  • batch scripts and automated backup : 适合在批量同步和调度中使用

二、S3cmd 怎么用

1. 安装

官方的下载界面 主要提供了两种安装方式,即解压安装和yum安装。具体的操作看官方安装文档就可以了,我这里附 yum 安装方式:

# 1. 安装
yum install s3cmd
or
pip install s3cmd# 2. 配置 【可选,也可以在使用 s3cmd 脚本时指定参数】
s3cmd --configure  生成配置文件,可以一路 enter。
后面统一修改,主要保留下面四项修改如下配置:
保存目录【前面的安装日志中会有提示配置文件存放目录】:【~/.s3cfg】家目录用户目录下
[default]
access_key = AL7EMX17E9QTB03A7GY9 【ak】
secret_key = lnqB7FBwkrNFbY7fypD00QqavvuT1VEyepKJrvey  【sk】
host_base = 172.16.62.200:8000  【访问地址】
host_bucket = 127.0.0.1:7480/%(bucket)【bucket名字,不能为空】
use_https = Falsesave

2. 使用简介

详细的使用参数可以使用 s3cmd --help 查看:

s3cmd --help
Usage: s3cmd [options] COMMAND [parameters]S3cmd is a tool for managing objects in Amazon S3 storage. It allows for
making and removing "buckets" and uploading, downloading and removing
"objects" from these buckets.Options:-h, --help            show this help message and exit--configure           Invoke interactive (re)configuration tool. Optionallyuse as '--configure s3://some-bucket' to test accessto a specific bucket instead of attempting to listthem all.-c FILE, --config=FILEConfig file name. Defaults to $HOME/.s3cfg--dump-config         Dump current configuration after parsing config filesand command line options and exit.--access_key=ACCESS_KEYAWS Access Key--secret_key=SECRET_KEYAWS Secret Key--access_token=ACCESS_TOKENAWS Access Token-n, --dry-run         Only show what should be uploaded or downloaded butdon't actually do it. May still perform S3 requests toget bucket listings and other information though (onlyfor file transfer commands)-s, --ssl             Use HTTPS connection when communicating with S3.(default)--no-ssl              Don't use HTTPS.-e, --encrypt         Encrypt files before uploading to S3.--no-encrypt          Don't encrypt files.-f, --force           Force overwrite and other dangerous operations.--continue            Continue getting a partially downloaded file (only for[get] command).--continue-put        Continue uploading partially uploaded files ormultipart upload parts.  Restarts/parts files thatdon't have matching size and md5.  Skips files/partsthat do.  Note: md5sum checks are not alwayssufficient to check (part) file equality.  Enable thisat your own risk.--upload-id=UPLOAD_IDUploadId for Multipart Upload, in case you wantcontinue an existing upload (equivalent to --continue-put) and there are multiple partial uploads.  Uses3cmd multipart [URI] to see what UploadIds areassociated with the given URI.--skip-existing       Skip over files that exist at the destination (onlyfor [get] and [sync] commands).-r, --recursive       Recursive upload, download or removal.--check-md5           Check MD5 sums when comparing files for [sync].(default)--no-check-md5        Do not check MD5 sums when comparing files for [sync].Only size will be compared. May significantly speed uptransfer but may also miss some changed files.-P, --acl-public      Store objects with ACL allowing read for anyone.--acl-private         Store objects with default ACL allowing access for youonly.--acl-grant=PERMISSION:EMAIL or USER_CANONICAL_IDGrant stated permission to a given amazon user.Permission is one of: read, write, read_acp,write_acp, full_control, all--acl-revoke=PERMISSION:USER_CANONICAL_IDRevoke stated permission for a given amazon user.Permission is one of: read, write, read_acp,write_acp, full_control, all-D NUM, --restore-days=NUMNumber of days to keep restored file available (onlyfor 'restore' command).--restore-priority=RESTORE_PRIORITYPriority for restoring files from S3 Glacier (only for'restore' command). Choices available: bulk, standard,expedited--delete-removed      Delete destination objects with no correspondingsource file [sync]--no-delete-removed   Don't delete destination objects.--delete-after        Perform deletes AFTER new uploads when delete-removedis enabled [sync]--delay-updates       *OBSOLETE* Put all updated files into place at end[sync]--max-delete=NUM      Do not delete more than NUM files. [del] and [sync]--limit=NUM           Limit number of objects returned in the response body(only for [ls] and [la] commands)--add-destination=ADDITIONAL_DESTINATIONSAdditional destination for parallel uploads, inaddition to last arg.  May be repeated.--delete-after-fetch  Delete remote objects after fetching to local file(only for [get] and [sync] commands).-p, --preserve        Preserve filesystem attributes (mode, ownership,timestamps). Default for [sync] command.--no-preserve         Don't store FS attributes--exclude=GLOB        Filenames and paths matching GLOB will be excludedfrom sync--exclude-from=FILE   Read --exclude GLOBs from FILE--rexclude=REGEXP     Filenames and paths matching REGEXP (regularexpression) will be excluded from sync--rexclude-from=FILE  Read --rexclude REGEXPs from FILE--include=GLOB        Filenames and paths matching GLOB will be includedeven if previously excluded by one of--(r)exclude(-from) patterns--include-from=FILE   Read --include GLOBs from FILE--rinclude=REGEXP     Same as --include but uses REGEXP (regular expression)instead of GLOB--rinclude-from=FILE  Read --rinclude REGEXPs from FILE--files-from=FILE     Read list of source-file names from FILE. Use - toread from stdin.--region=REGION, --bucket-location=REGIONRegion to create bucket in. As of now the regions are:us-east-1, us-west-1, us-west-2, eu-west-1, eu-central-1, ap-northeast-1, ap-southeast-1, ap-southeast-2, sa-east-1--host=HOSTNAME       HOSTNAME:PORT for S3 endpoint (default:s3.amazonaws.com, alternatives such as s3-eu-west-1.amazonaws.com). You should also set --host-bucket.--host-bucket=HOST_BUCKETDNS-style bucket+hostname:port template for accessinga bucket (default: %(bucket)s.s3.amazonaws.com)--reduced-redundancy, --rrStore object with 'Reduced redundancy'. Lower per-GBprice. [put, cp, mv]--no-reduced-redundancy, --no-rrStore object without 'Reduced redundancy'. Higher per-GB price. [put, cp, mv]--storage-class=CLASSStore object with specified CLASS (STANDARD,STANDARD_IA, or REDUCED_REDUNDANCY). Lower per-GBprice. [put, cp, mv]--access-logging-target-prefix=LOG_TARGET_PREFIXTarget prefix for access logs (S3 URI) (for [cfmodify]and [accesslog] commands)--no-access-logging   Disable access logging (for [cfmodify] and [accesslog]commands)--default-mime-type=DEFAULT_MIME_TYPEDefault MIME-type for stored objects. Applicationdefault is binary/octet-stream.-M, --guess-mime-typeGuess MIME-type of files by their extension or mimemagic. Fall back to default MIME-Type as specified by--default-mime-type option--no-guess-mime-type  Don't guess MIME-type and use the default typeinstead.--no-mime-magic       Don't use mime magic when guessing MIME-type.-m MIME/TYPE, --mime-type=MIME/TYPEForce MIME-type. Override both --default-mime-type and--guess-mime-type.--add-header=NAME:VALUEAdd a given HTTP header to the upload request. Can beused multiple times. For instance set 'Expires' or'Cache-Control' headers (or both) using this option.--remove-header=NAME  Remove a given HTTP header.  Can be used multipletimes.  For instance, remove 'Expires' or 'Cache-Control' headers (or both) using this option. [modify]--server-side-encryptionSpecifies that server-side encryption will be usedwhen putting objects. [put, sync, cp, modify]--server-side-encryption-kms-id=KMS_KEYSpecifies the key id used for server-side encryptionwith AWS KMS-Managed Keys (SSE-KMS) when puttingobjects. [put, sync, cp, modify]--encoding=ENCODING   Override autodetected terminal and filesystem encoding(character set). Autodetected: UTF-8--add-encoding-exts=EXTENSIONsAdd encoding to these comma delimited extensions i.e.(css,js,html) when uploading to S3 )--verbatim            Use the S3 name as given on the command line. No pre-processing, encoding, etc. Use with caution!--disable-multipart   Disable multipart upload on files bigger than--multipart-chunk-size-mb--multipart-chunk-size-mb=SIZESize of each chunk of a multipart upload. Files biggerthan SIZE are automatically uploaded as multithreaded-multipart, smaller files are uploaded using thetraditional method. SIZE is in Mega-Bytes, defaultchunk size is 15MB, minimum allowed chunk size is 5MB,maximum is 5GB.--list-md5            Include MD5 sums in bucket listings (only for 'ls'command).-H, --human-readable-sizesPrint sizes in human readable form (eg 1kB instead of1234).--ws-index=WEBSITE_INDEXName of index-document (only for [ws-create] command)--ws-error=WEBSITE_ERRORName of error-document (only for [ws-create] command)--expiry-date=EXPIRY_DATEIndicates when the expiration rule takes effect. (onlyfor [expire] command)--expiry-days=EXPIRY_DAYSIndicates the number of days after object creation theexpiration rule takes effect. (only for [expire]command)--expiry-prefix=EXPIRY_PREFIXIdentifying one or more objects with the prefix towhich the expiration rule applies. (only for [expire]command)--progress            Display progress meter (default on TTY).--no-progress         Don't display progress meter (default on non-TTY).--stats               Give some file-transfer stats.--enable              Enable given CloudFront distribution (only for[cfmodify] command)--disable             Disable given CloudFront distribution (only for[cfmodify] command)--cf-invalidate       Invalidate the uploaded filed in CloudFront. Also see[cfinval] command.--cf-invalidate-default-indexWhen using Custom Origin and S3 static website,invalidate the default index file.--cf-no-invalidate-default-index-rootWhen using Custom Origin and S3 static website, don'tinvalidate the path to the default index file.--cf-add-cname=CNAME  Add given CNAME to a CloudFront distribution (only for[cfcreate] and [cfmodify] commands)--cf-remove-cname=CNAMERemove given CNAME from a CloudFront distribution(only for [cfmodify] command)--cf-comment=COMMENT  Set COMMENT for a given CloudFront distribution (onlyfor [cfcreate] and [cfmodify] commands)--cf-default-root-object=DEFAULT_ROOT_OBJECTSet the default root object to return when no objectis specified in the URL. Use a relative path, i.e.default/index.html instead of /default/index.html ors3://bucket/default/index.html (only for [cfcreate]and [cfmodify] commands)-v, --verbose         Enable verbose output.-d, --debug           Enable debug output.--version             Show s3cmd version (2.0.2) and exit.-F, --follow-symlinksFollow symbolic links as if they are regular files--cache-file=FILE     Cache FILE containing local source MD5 values-q, --quiet           Silence output on stdout--ca-certs=CA_CERTS_FILEPath to SSL CA certificate FILE (instead of systemdefault)--check-certificate   Check SSL certificate validity--no-check-certificateDo not check SSL certificate validity--check-hostname      Check SSL certificate hostname validity--no-check-hostname   Do not check SSL certificate hostname validity--signature-v2        Use AWS Signature version 2 instead of newer signaturemethods. Helpful for S3-like systems that don't haveAWS Signature v4 yet.--limit-rate=LIMITRATELimit the upload or download speed to amount bytes persecond.  Amount may be expressed in bytes, kilobyteswith the k suffix, or megabytes with the m suffix--requester-pays      Set the REQUESTER PAYS flag for operations-l, --long-listing    Produce long listing [ls]--stop-on-error       stop if error in transfer--content-disposition=CONTENT_DISPOSITIONProvide a Content-Disposition for signed URLs, e.g.,"inline; filename=myvideo.mp4"--content-type=CONTENT_TYPEProvide a Content-Type for signed URLs, e.g.,"video/mp4"Commands:Make buckets3cmd mb s3://BUCKETRemove buckets3cmd rb s3://BUCKETList objects or bucketss3cmd ls [s3://BUCKET[/PREFIX]]List all object in all bucketss3cmd la Put file into buckets3cmd put FILE [FILE...] s3://BUCKET[/PREFIX]Get file from buckets3cmd get s3://BUCKET/OBJECT LOCAL_FILEDelete file from buckets3cmd del s3://BUCKET/OBJECTDelete file from bucket (alias for del)s3cmd rm s3://BUCKET/OBJECTRestore file from Glacier storages3cmd restore s3://BUCKET/OBJECTSynchronize a directory tree to S3 (checks files freshness using size and md5 checksum, unless overridden by options, see below)s3cmd sync LOCAL_DIR s3://BUCKET[/PREFIX] or s3://BUCKET[/PREFIX] LOCAL_DIRDisk usage by bucketss3cmd du [s3://BUCKET[/PREFIX]]Get various information about Buckets or Filess3cmd info s3://BUCKET[/OBJECT]Copy objects3cmd cp s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]Modify object metadatas3cmd modify s3://BUCKET1/OBJECTMove objects3cmd mv s3://BUCKET1/OBJECT1 s3://BUCKET2[/OBJECT2]Modify Access control list for Bucket or Filess3cmd setacl s3://BUCKET[/OBJECT]Modify Bucket Policys3cmd setpolicy FILE s3://BUCKETDelete Bucket Policys3cmd delpolicy s3://BUCKETModify Bucket CORSs3cmd setcors FILE s3://BUCKETDelete Bucket CORSs3cmd delcors s3://BUCKETModify Bucket Requester Pays policys3cmd payer s3://BUCKETShow multipart uploadss3cmd multipart s3://BUCKET [Id]Abort a multipart uploads3cmd abortmp s3://BUCKET/OBJECT IdList parts of a multipart uploads3cmd listmp s3://BUCKET/OBJECT IdEnable/disable bucket access loggings3cmd accesslog s3://BUCKETSign arbitrary string using the secret keys3cmd sign STRING-TO-SIGNSign an S3 URL to provide limited public access with expirys3cmd signurl s3://BUCKET/OBJECT <expiry_epoch|+expiry_offset>Fix invalid file names in a buckets3cmd fixbucket s3://BUCKET[/PREFIX]Create Website from buckets3cmd ws-create s3://BUCKETDelete Websites3cmd ws-delete s3://BUCKETInfo about Websites3cmd ws-info s3://BUCKETSet or delete expiration rule for the buckets3cmd expire s3://BUCKETUpload a lifecycle policy for the buckets3cmd setlifecycle FILE s3://BUCKETGet a lifecycle policy for the buckets3cmd getlifecycle s3://BUCKETRemove a lifecycle policy for the buckets3cmd dellifecycle s3://BUCKETList CloudFront distribution pointss3cmd cflist Display CloudFront distribution point parameterss3cmd cfinfo [cf://DIST_ID]Create CloudFront distribution points3cmd cfcreate s3://BUCKETDelete CloudFront distribution points3cmd cfdelete cf://DIST_IDChange CloudFront distribution point parameterss3cmd cfmodify cf://DIST_IDDisplay CloudFront invalidation request(s) statuss3cmd cfinvalinfo cf://DIST_ID[/INVAL_ID]For more information, updates and news, visit the s3cmd website:
http://s3tools.org

这里有几个选项着重说明下:

  • –access_key=ACCESS_KEY: AWS Access Key ,s3 访问协议的 ak
  • –secret_key=SECRET_KEY: AWS Secret Key,s3 访问协议的 sk
  • –host=HOSTNAME: HOSTNAME:PORT for S3 endpoint,s3 访问协议的域地址
  • –skip-existing:数据同步时,是否跳过已存在的文件
  • –recursive:是否递归同步所有文件及文件夹
  • –no-check-md5:判断文件存在的策略默认为 md5 ,不使用 md5 会直接比对文件大小,会加速数据同步但数据可能损坏
  • –no-ssl:不使用 https

三、S3cmd 常见使用示例

1. 下载 get

s3cmd get s3://test-bucket/addrbook.xml addressbook-2.xml

2. 上传 put

s3cmd put /home/addressbook.xml s3://test-bucket/addrbook.xml

3. 文件同步 sync

同步时,有两个注意事项:

  • s3 上的文件前缀必须以 / 结尾

  • s3 上的文件前缀如果想要以 / 开始,需要在文件前缀处使用 //

详细说明,可以看这里

# 脚本中指定 服务器连接信息。可以使用同一个 s3cmd 脚本支持生产、测试多个环境。
s3cmd --access_key=ak --secret_key=sk --host=endpoint --skip-existing
--no-check-md5 --recursive sync /home/patent/XSD/ s3://spark-operator//patent/XSD/
# 双划线,则在 s3 上的文件是有 / 前缀的,具体表现为:/patent/XSD/demo.xml
s3cmd --access_key=ak --secret_key=sk --host=endpoint --skip-existing
--no-check-md5 --recursive sync /home/patent/XSD/ s3://spark-operator/patent/XSD/
# 单划线,则在 s3 上的文件是没有 / 前缀的,具体表现为:patent/XSD/demo.xml

Amazon S3 Tools 简介及使用相关推荐

  1. Amazon S3服务 简介及基本概念

    Amazon S3介绍 什么是Amazon S3? Amazon Simple Storage Service 是互联网存储解决方案.该服务旨在降低开发人员进行网络规模级计算的难度. Amazon S ...

  2. Amazon Simple Storage Service (Amazon S3) 简介

    Amazon Simple Storage Service (Amazon S3) 接下来要介绍的是如何将先前的应用程序从 EC2 搬移到更轻量的 Container 里,但在介绍 Container ...

  3. Amazon S3 云存储服务简介

    以下内容摘自IBM,完整原文链接:http://www.ibm.com/developerworks/cn/java/j-s3/ S3简介: 理论上,S3 是一个全球存储区域网络 (SAN),它表现为 ...

  4. Amazon S3简介

    目录 文献参考: 存储桶 对象 键 区域 S3数据一致性模型 存储类别 存储桶策略 AWS Identity and Access Management 操作 创建请求 AWS 账户访问密钥 IAM ...

  5. 基于 Bitbucket Pipeline + Amazon S3 的自动化运维体系

    1 前言介绍 随着自动化运维水平的提高,一个基础的运维人员维护成百上千台节点已经不是太难的事情,当然,这需要依靠于稳定.高效的自动化运维体系.本篇文章即是阐述如何利用 bitbucket pipeli ...

  6. 马逊s3云存储接口_使用 Amazon S3 云服务轻松实现存储

    Amazon Simple Storage Service (S3) 是一个公开的服务,Web 应用程序开发人员可以使用它存储数字资产,包括图片.视频.音乐和文档. S3 提供一个 RESTful A ...

  7. ssis 循环导入数据_使用集成服务(SSIS)包从Amazon S3 SSIS存储桶导入数据

    ssis 循环导入数据 This article explores data import in SQL Server from a CSV file stored in the Amazon S3 ...

  8. 异地备份工具_5个有用的Amazon S3备份工具

    异地备份工具 Amazon Web Services' S3 storage solution is useful for many things, and serves as the CDN for ...

  9. 初始Amazon S3

    一.Amazon S3简介 Amazon S3(Amazon Simple Storage Service),是互联网存储解决方案.该服务旨在降低开发人员进行网络规模级计算的难度. Amazon S3 ...

最新文章

  1. Overlay 网络 — VxLAN 虚拟可扩展局域网协议
  2. 读取Excel的文本框,除了解析xml还可以用python调用VBA
  3. 【面试经历】再惠网络、远景能源、东软集团
  4. Redisson框架快速入门
  5. idea 快捷键java
  6. 就算是蜗牛,也有爬到树顶的那一天!~
  7. html5中如何实现跑马灯效果,h5_实现跑马灯效果
  8. 联想Z5 Pro划时代旗舰发布,屏占比95.06%售价1998元起
  9. eclipse maven jersey项目搭建
  10. python读取文件报错OSError: [Errno 22] Invalid argument: '\u202aC:\\Users\\yyqhk\\Desktop\\1.csv'
  11. 网络层HTPPS和HTTP的概念与区别
  12. uni-app 生命周期
  13. 【Node.js】前端页面仔的必修课,认识node
  14. 三分的多种写法及对应的精度 三分套三分原理
  15. 电流源逆变器(CSI)的工作模态分析
  16. 问中国开源社区 谁主沉浮?
  17. Pytorch 分类模板
  18. 电场强度等于电势的负梯度
  19. TS中的事件,事件监听,移除,分发
  20. 如何获得云盘服务器,云服务器如何增加云盘

热门文章

  1. 洗脑有术:如何防止被洗脑?
  2. Movavi PDF Editor 适用于Mac的多功能PDF编辑器
  3. Windows 11 win to go 安装硬盘无法启动解决方法之一
  4. iOS测试-关东升-专题视频课程
  5. 进销存管理软件到底有什么作用?
  6. java毕业设计汽车售后服务管理系统mybatis+源码+调试部署+系统+数据库+lw
  7. python基础之python的历史(二)
  8. 智能家居系统 Home Assistant 系列 --介绍篇
  9. 14.(地图工具篇)ArcMap点图层(shape图层)转Geojson
  10. 一个超方便的国内版Chatgpt,基于gpt-3.5-turbo