https://speech-to-text-demo.ng.bluemix.net/

点击首页紫色的那个「Star for free in IBM Cloud」按钮,注册IBM Cloud并登陆

然后添加SPEECH TO TEXT 服务。

点击左侧service credentials, 创建new credentials。

复制,保存你的credentials。

{
"apikey": "xxxx",
"iam_apikey_description": "Auto generated apikey during resource-key operation for Instance - crn:v1:bluemix:public:speech-to-text:au-syd:xxx::",
"iam_apikey_name": "auto-generated-apikey-xxxx",
"iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Manager",
"iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::xxxx",
"url": "https://gateway-syd.watsonplatform.net/speech-to-text/api"
}

REF:

https://console.bluemix.net/apidocs/speech-to-text?language=python

PARAMETERS

  • audio

    AudioSource

    An AudioSource object that provides the audio that is to be transcribed.

  • content_type

    str

    The format (MIME type) of the audio. For more information about specifying an audio format, see Audio formats (content types) in the method description.

    Allowable values: [application/octet-streamaudio/basicaudio/flacaudio/g729audio/l16audio/mp3audio/mpegaudio/mulawaudio/oggaudio/ogg;codecs=opusaudio/ogg;codecs=vorbisaudio/wavaudio/webmaudio/webm;codecs=opusaudio/webm;codecs=vorbis]

  • recognize_callback

    object

    RecognizeCallback object that defines methods to handle events from the WebSocket connection. Override the definitions of the object's default methods to respond to events as needed by your application.

  • model

    str

    The identifier of the model that is to be used for the recognition request. See Languages and models.

    Allowable values: [ar-AR_BroadbandModel,de-DE_BroadbandModel,en-GB_BroadbandModel,en-GB_NarrowbandModel,en-US_BroadbandModel,en-US_NarrowbandModel,en-US_ShortForm_NarrowbandModel,es-ES_BroadbandModel,es-ES_NarrowbandModel,fr-FR_BroadbandModel,fr-FR_NarrowbandModel,ja-JP_BroadbandModel,ja-JP_NarrowbandModel,ko-KR_BroadbandModel,ko-KR_NarrowbandModel,pt-BR_BroadbandModel,pt-BR_NarrowbandModel,zh-CN_BroadbandModel,zh-CN_NarrowbandModel]

    Default: en-US_BroadbandModel

  • language_customization_id

    str

    The customization ID (GUID) of a custom language model that is to be used for the request. The base model of the specified custom language model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom language model. See Custom models.

    Note: Use this parameter instead of the deprecated customization_id parameter.

  • acoustic_customization_id

    str

    The customization ID (GUID) of a custom acoustic model that is to be used for the request. The base model of the specified custom acoustic model must match the model specified with the model parameter. You must make the request with service credentials created for the instance of the service that owns the custom model. Omit the parameter to use the specified model with no custom acoustic model. See Custom models.

  • customization_weight

    float

    If you specify a customization ID, you can use the customization weight to tell the service how much weight to give to words from the custom language model compared to those from the base model for the current request.

    Specify a value between 0.0 and 1.0. Unless a different customization weight was specified for the custom model when it was trained, the default value is 0.3. A customization weight that you specify overrides a weight that was specified when the custom model was trained.

    The default value yields the best performance in general. Assign a higher value if your audio makes frequent use of OOV words from the custom model. Use caution when setting the weight: a higher value can improve the accuracy of phrases from the custom model's domain, but it can negatively affect performance on non-domain phrases.

    See Custom models.

  • base_model_version

    str

    The version of the specified base model that is to be used for the request. Multiple versions of a base model can exist when a model is updated for internal improvements. The parameter is intended primarily for use with custom models that have been upgraded for a new base model. The default value depends on whether the parameter is used with or without a custom model. See Base model version.

  • inactivity_timeout

    int

    The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed. The default is 30 seconds. The parameter is useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See Timeouts.

    Default: 30

  • interim_results

    boolean bool

    If true, the service returns interim results as a stream of JSON SpeechRecognitionResults objects. If false, the service returns a single SpeechRecognitionResults object with final results only. See Interim results.

    Default: false

  • keywords

    list[str]

    An array of keyword strings to spot in the audio. Each keyword string can include one or more string tokens. Keywords are spotted only in the final results, not in interim hypotheses. If you specify any keywords, you must also specify a keywords threshold. You can spot a maximum of 1000 keywords. Omit the parameter or specify an empty array if you do not need to spot keywords. See Keyword spotting.

  • keywords_threshold

    float

    A confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. No keyword spotting is performed if you omit the parameter. If you specify a threshold, you must also specify one or more keywords. See Keyword spotting.

  • max_alternatives

    int

    The maximum number of alternative transcripts that the service is to return. By default, a single transcription is returned. See Maximum alternatives.

    Default: 1

  • word_alternatives_threshold

    float

    A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0.0 and 1.0. No alternative words are computed if you omit the parameter. See Word alternatives.

  • word_confidence

    bool

    If true, the service returns a confidence measure in the range of 0.0 to 1.0 for each word. By default, no word confidence measures are returned. See Word confidence.

    Default: false

  • timestamps

    bool

    If true, the service returns time alignment for each word. By default, no timestamps are returned. See Word timestamps.

    Default: false

  • profanity_filter

    bool

    If true, the service filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only. See Profanity filtering.

  • smart_formatting

    bool

    If true, the service converts dates, times, series of digits and numbers, phone numbers, currency values, and internet addresses into more readable, conventional representations in the final transcript of a recognition request. For US English, the service also converts certain keyword strings to punctuation symbols. By default, no smart formatting is performed. Applies to US English, Japanese, and Spanish transcription only. See Smart formatting.

    Default: false

  • speaker_labels

    bool

    If true, the response includes labels that identify which words were spoken by which participants in a multi-person exchange. By default, no speaker labels are returned. Specifying true forces the timestampsparameter to be true, regardless of whether you specify false for that parameter.

    To determine whether a language model supports speaker labels, use the Get a modelmethod and check that the attribute speaker_labels is set to true. See Speaker labels.

    Default: false

  • http_proxy_host

    str

    If you are passing requests through a proxy, specify the host name of the proxy server. Use the http_proxy_port parameter to specify the port number at which the proxy listens. Omit both parameters if you are not using a proxy.

    Default: None

  • http_proxy_port

    str

    If you are passing requests through a proxy, specify the port number at which the proxy service listens. Use the http_proxy_hostparameter to specify the host name of the proxy. Omit both parameters if you are not using a proxy.

    Default: None

  • customization_id

    str

    Deprecated. Use thelanguage_customization_id parameter to specify the customization ID (GUID) of a custom language model that is to be used with the request. Do not specify both parameters with a request.

  • grammar_name

    str

    The name of a grammar that is to be used with the recognition request. If you specify a grammar, you must also use the parameter to specify the name of the custom language model for which the grammar is defined. The service recognizes only strings that are recognized by the specified grammar; it does not recognize other custom words from the model's words resource. See Grammars.

  • redaction

    bool

    If true, the service redacts, or masks, numeric data from final transcripts. The feature redacts any number that has three or more consecutive digits by replacing each digit with an X character. It is intended to redact sensitive numeric data, such as credit card numbers. By default, no redaction is performed.

    When you enable redaction, the service automatically enables smart formatting, regardless of whether you explicitly disable that feature. To ensure maximum security, the service also disables keyword spotting (ignores the keywords and keywords_thresholdparameters) and returns only a single final transcript (forces the max_alternativesparameter to be 1).

    See Numeric redaction.

    Default: false

转载于:https://www.cnblogs.com/watermarks/p/10336687.html

IBM Cloud Speech to Text 语音识别相关推荐

  1. 免费视频转文字-音频转文字软件:网易见外工作台, Speechnotes, autosub, Speech to Text, 百度语音识别

    文章目录 网易见外工作台(推荐) Chrome插件 Speechnotes autosub 百度语音识别API IBM的Speech to Text(不推荐) 此文首发于我的Jekyll博客:zhan ...

  2. iOS 10 的 Speech 框架实现语音识别 (Swift)

    什么都不说先上效果 早在2011年iPhone4s 的上,iOS 5系统就有了语音识别. 但有以下缺陷 需要- 弹出键盘 只支持实时语音 无法自定义录音 单一的输出结果 不开放 在 2016 年的 W ...

  3. 【Google语音转文字】Speech to Text 超级好用的语音转文本API

    前面有一篇博客说到了讯飞输入法,支持语音输入,也支持电脑内部音源输入,详细参考:[实时语音转文本]PC端实时语音转文本(麦克风外音&系统内部音源) 但是它只是作为一个工具来使用,如果我们想自己 ...

  4. 使用谷歌Cloud Speech API将语音转换为文字

    CSDN广告邮件太多了,邮箱已经屏蔽了CSDN,留言请转SegmentFault:https://segmentfault.com/a/1190000013591768 Google Cloud Sp ...

  5. 使用 JavaScript 进行单词发音 Use JavaScript to Speech Your Text

    在w3c草案中增加了对Web Speech Api的支持;主要作用在 两个非常重要的方面: 语音识别 (将所说的转换成文本文字 / speech to text); 语音合成 (将文本文字读出来 / ...

  6. 深度解析:AWS、谷歌云、IBM Cloud和微软 Azure四巨头2018将会有哪些布局?

    来源: IDC圈 近来,公司规模已经不再是企业选择云服务商的重要因素,市场对云服务商优劣的判断有了多种标准.企业对全球一些大型云计算服务商(例如亚马逊AWS,谷歌云平台,IBM Cloud和微软 Az ...

  7. IBM Cloud:裸金属服务器+多云策略助力音视频解决方案成功出海

    点击上方"LiveVideoStack"关注我们 到底什么是公有云.私有云和混合云?疫情给云服务厂商带来了哪些挑战?IBM是如何助力音视频解决方案成功出海的?"后疫情&q ...

  8. IBM Cloud Video工程师Scott Grizzle谈流媒体协议和Codec

    Streaming Media特约编辑Tim Siglin在Streaming Media East 2018采访了IBM Cloud Video工程师Scott Grizzle.LiveVideoS ...

  9. 使用VSCode连接到IBM Cloud区块链网络

    文章目录 从IBM Cloud控制面板导出连接信息 在VSCode中创建gateway和wallet 在VSCode中提交transaction 上篇文章我们讲到怎么在IBM Cloud搭建区块链环境 ...

最新文章

  1. 电子商务系统的设计与实现(十二):技术选型
  2. 146. Leetcode 51. N 皇后 (回溯算法-棋盘问题)
  3. 论文解读 | 基于神经网络的知识推理
  4. JAVA程序设计----关于字符串的一些基本问题处理
  5. 查看vsftpd内容为空,不能建立文件夹
  6. Mac使用crontab来实现定时任务
  7. origin master 出现The authenticity of host 'github.com ' can't be established
  8. ZooKeeper JMX
  9. CCF CSP202006-1 线性分类器
  10. C++总结学习(一)
  11. oracle 查询月份
  12. E-Prime1.1安装教程及软件下载
  13. 听音室-HIFI入门之10张公认的经典发烧碟
  14. python发邮件被认定为垃圾邮件_【python文本分类】20行代码识别垃圾邮件
  15. 自动驾驶行业开源数据集调研
  16. conda search cuda后没有版本10的问题
  17. SpringBoot2.x 监听器详解
  18. AIX pv vg lv fs 文件系统
  19. 夯实Java基础系列22:一文读懂Java序列化和反序列化
  20. 玩游戏计算机虚拟内存怎么设置,玩游戏时提示虚拟内存太小怎么办

热门文章

  1. AI论文引用排行榜丨微软第一,清华第九;Alphabet董事长看AI:美国仅领先5年,大陆追赶速度快;
  2. 20 年前毁誉参半的网游《传奇》,背后是怎样的故事?
  3. python安装win32api pywin32 后出现 ImportError: DLL load failed
  4. Git基本命令 -- 别名 + 忽略 + 推送
  5. 【C++】 66_C++ 中的类型识别
  6. springboot + elasticsearch
  7. 超强1000个jquery极品插件!(连载中。。。。最近更新20090710)
  8. MySQL之帮助的使用
  9. SGI STL 内存分配方式及malloc底层实现分析
  10. Linux中ifreq 结构体分析和使用