ffmpeg api的应用——提取视频图片

这些年来，“短视频”吸引了无数网民的注意。相对于丰富有趣的内容，我们码农可能更关心其底层技术实现。本系列文章将结合ffmpeg，讲解几则视频处理案例。（转载请指明出于breaksoftware的csdn博客）

“短视频”都是以“文件"的形式保存于服务器上。任何一个便于传播的文件都会有一种定义良好的格式，同样视频也有其格式。这系列我们不会去从微观的角度去分析这些格式，因为其应用意义不是很大。我们将从宏观角度去分析，视频文件应该包含哪些信息？

能确定的是，大部分情况下，我们可以使用眼睛看到“图像”，使用耳朵听到“声音”。如果我们关闭其中任意一个器官，就将停止接受对应的信息；而没有关闭的器官还和之前一样接受信息，不受影响。

所以目前至少我们可以把视频分为：图像和声音两个模块。那这两个模块是怎么组合的？是不是一个极短时间内的图像和声音（比如我们此时此刻正看到的图像和听到的声音）融合在一个“区块”中？

从设计的角度说，“耦合”是非常不好的。如果将图像和声音信息融合在一个“区块”中，就是一种很强的“耦合”。一种良好的设计就像我们小时候在电影院看的电影文件（不知道现在电影播放的原理）：一个文件用于播放图像，一个文件用于播放声音。这样我们可以配一个普通话版，一个英语版、一个法语版……的音频文件，而不用去修改播放的图像文件。但是我们在PC上看到的视频文件是一个独立文件，那是怎么搞的？

于是在设计就要在“易用”和“可维护”之间做个平衡：宏观层面融合图像和声音文件，微观层面图像和声音信息是分离的。对应到ffmpeg上来说就是：

图像文件和声音文件分别是一个流——AVStream结构；
图像文件和声音文件微观分离体现在它们都是独立的包——AVPacket；
图像文件和声音文件宏观融合是通过“视音频复用器——Muxer”融合的；

以ffmpeg4.0.2版本的API为例

void get_video_pictures(const char* file_path) {std::unique_ptr<AVFormatContext, std::function<void(AVFormatContext*)>> avfmt_ctx_t(avformat_alloc_context(),[](AVFormatContext *s) {if (s) {avformat_close_input(&s);}});AVFormatContext* && avfmt_ctx = avfmt_ctx_t.get();if (avformat_open_input(&avfmt_ctx, file_path, NULL, NULL)) {std::cerr << "avformat_open_input error";return;}

首先我们需要构造一个AVFormatContext对象，它用于承载我们分析文件的上下文。Context（上下文）这个概念在ffmpeg中非常重要，我们可以通过它的一些参数干预ffmpeg底层的行为，还可以通过它获得对应层面的信息。之后我们会遇到各种Context。这类Context的使用有比较固定的套路：

使用XXXXX_alloc_context分配空间。AVFormatContext对应的就是avformat_alloc_context。
使用XXXXX_openXXX初始化。AVFormatContext对应的就是avformat_open_input。
使用XXXXX_free_context释放空间。AVFormatContext对应的就是avformat_free_context。由于avformat_close_input包含了更多的释放操作，且其底层也会调用avformat_free_context，所以此处我们使用了它。

AVFormatContext有个两个和“流”——AVStream相关的信息:nb_streams和streams。后者是一个AVStream数组的首地址，前者是该数组的元素个数。我们可以遍历所有流

    for (unsigned int i = 0; i < avfmt_ctx->nb_streams; i++) {AVStream *st = avfmt_ctx->streams[i];

之前我们谈到，图像和声音分别属于不同的流，于是我们可以通过AVStream::codecpar::codec_type辨别流

enum AVMediaType {AVMEDIA_TYPE_UNKNOWN = -1,  ///< Usually treated as AVMEDIA_TYPE_DATAAVMEDIA_TYPE_VIDEO,AVMEDIA_TYPE_AUDIO,AVMEDIA_TYPE_DATA,          ///< Opaque data information usually continuousAVMEDIA_TYPE_SUBTITLE,AVMEDIA_TYPE_ATTACHMENT,    ///< Opaque data information usually sparseAVMEDIA_TYPE_NB
};

在这组枚举类型中，我们还看到AVMEDIA_TYPE_SUBTITLE，它是“字幕流”类型。可以见得，字幕并不是刻印在图像上的。在现实生活中，我们在播放器中可以选择不同的字幕，不同的语言配音（英文/中文），这些都是以流的形式保存在视频文件这个容器中的，而且它们还可以是多份的。比如中文配音是一个流，英文配音是一个流，中文字幕是一个流，英文字幕是一个流。

如本文标题，我们需要从图像流中提取图片，于是切入AVMEDIA_TYPE_VIDEO类型的流进行操作

        if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {std::unique_ptr<AVCodecContext, std::function<void(AVCodecContext*)>> avcodec_ctx(avcodec_alloc_context3(NULL),[](AVCodecContext *avctx) {if (avctx) {avcodec_free_context(&avctx);}});if (0 > avcodec_parameters_to_context(avcodec_ctx.get(), st->codecpar)) {std::cerr << "avcodec_parameters_to_context error.stream " << i;continue;}AVCodec *avcodec = avcodec_find_decoder(avcodec_ctx->codec_id);if (avcodec_open2(avcodec_ctx.get(), avcodec, NULL) < 0) {std::cerr << "Failed to open codec" << std::endl;continue;}save_video_pic(avfmt_ctx, i, avcodec_ctx.get());}}
}

对于每个流，也有其自身的格式。我们需要使用解码器对该流进行解码分析，所以这次会涉及到AVCodecContext结构。和之前的Context使用套路一致：

使用avcodec_alloc_context3申请空间；
使用avcodec_free_context释放空间；
通过avcodec_parameters_to_context以流中解码器信息初始化；
通过avcodec_find_decoder找到对应的解码器；
使用avcodec_open2和上述找到的解码器，打开这个上下文；

这次我们没有使用avcodec_open2对应的avcodec_close方法，是因为该方法在4.0.2版本中被声明为“即将废弃”

/*** Close a given AVCodecContext and free all the data associated with it* (but not the AVCodecContext itself).** Calling this function on an AVCodecContext that hasn't been opened will free* the codec-specific data allocated in avcodec_alloc_context3() with a non-NULL* codec. Subsequent calls will do nothing.** @note Do not use this function. Use avcodec_free_context() to destroy a* codec context (either open or closed). Opening and closing a codec context* multiple times is not supported anymore -- use multiple codec contexts* instead.*/
int avcodec_close(AVCodecContext *avctx);

类似的，我们没有直接使用AVSteam中的AVCodecContext *codec，也是因为它“即将废弃”

    attribute_deprecatedAVCodecContext *codec;

通过avcodec_open2打开一个和解码器相关的上下文后，我们就可以开始解码了。在这之前需要熟悉两个比较微观的结构——AVPacket和AVFrame。AVPacket是编码后（未解码）的数据结构，AVFrame是编码前（未编码）的结构。所以我们从一个视频文件中，通过av_read_frame读出来的是一个尚未解码的数据——AVPacket。

void save_video_pic(AVFormatContext *avfmt_ctx, int stream_index, AVCodecContext *avcodec_ctx) {int err = av_seek_frame(avfmt_ctx, -1, avfmt_ctx->start_time, 0);do {std::unique_ptr<AVPacket, std::function<void(AVPacket*)>> avpacket_src(av_packet_alloc(), [](AVPacket *pkt) {if (pkt) {av_packet_free(&pkt);}});av_init_packet(avpacket_src.get());if (av_read_frame(avfmt_ctx, avpacket_src.get()) < 0) {break;}if (avpacket_src->stream_index != stream_index) {continue;}

注意第16行，它通过判断读出来的AVPacket的stream_index是否为之前分析出来的视频流下标，决定是否继续执行。这个流程说明不同流的AVPacket在文件中可以是穿插分布的。这种设计存在一定的合理性。因为在同一时刻，图像、声音、字幕等都要展现出来，顺序性读取并解析可以减少频繁的跳转。

因为编解码过程比较类似，我将过程中结果保存组织在一个模板类中

template<typename Component>
class AvComponentStore {
public:virtual void save(Component *d) = 0;
};template<typename Component>
class TransStore :public AvComponentStore<Component>
{
public:TransStore(std::function<Component*(const Component*)> clone, std::function<void(Component**)> free) {_clone = clone;_free = free;}~TransStore() {for (auto it = _store.begin(); it != _store.end(); it++) {if (*it) {_free(&*it);}}}
public:void traverse(std::function<void(Component*)> t) {if (!t) {return;}for (auto it = _store.begin(); it != _store.end(); it++) {if (*it) {t(*it);}}}
public:virtual void save(Component *d) {Component *p = _clone(d);_store.push_back(p);}
private:std::vector<Component*> _store;std::function<Component*(const Component*)> _clone;std::function<void(Component**)> _free;
};using PacketsStore = TransStore<AVPacket>;
using FramesStore = TransStore<AVFrame>;

FrameStore用于保存AVPacket的解码结果。对于中间产生的AVFrame结构，我们使用av_frame_clone深度拷贝。FrameStore对象释放时，将通过av_frame_free释放这些空间和资源。

        std::shared_ptr<FramesStore> frames_store = std::make_shared<FramesStore>(av_frame_clone, av_frame_free);decode_packet(avcodec_ctx, avpacket_src.get(), frames_store);frames_store->traverse(traverse_frame);} while (true);
}

解码AVPacket通过avcodec_send_packet和avcodec_receive_frame实现。从语义上说，我们将一个解码前的数据发送给一个解码器上下文，然后从这个解码器上下文中获得解码后的数据。

int decode_packet(AVCodecContext *avctx, AVPacket *pkt, std::shared_ptr<FramesStore> store) {int ret = avcodec_send_packet(avctx, pkt);if (ret < 0 && ret != AVERROR_EOF) {return ret;}std::unique_ptr<AVFrame, std::function<void(AVFrame*)>> frame(av_frame_alloc(),[](AVFrame *frame) {if (frame) {av_frame_free(&frame);}});ret = avcodec_receive_frame(avctx, frame.get());if (ret >= 0) {store->save(frame.get());}else if (ret < 0 && ret != AVERROR(EAGAIN)) {return ret;}return 0;
}

对于每个解码后的数据，我们需要通过图片编码器将其编码成一个图片文件。

和之前生成解码器上下文相似，我们要构造一个编码器上下文。这次我们要使用avcodec_find_encoder去寻找编码器

void traverse_frame(AVFrame* avframe) {AVCodec *avcodec = avcodec_find_encoder(AV_CODEC_ID_MJPEG);

然后使用avcodec_open2去打开一个和该编码器相关的上下文

    std::unique_ptr<AVCodecContext, std::function<void(AVCodecContext*)>> avcodec_ctx_output(avcodec_alloc_context3(avcodec),[](AVCodecContext *avctx) {if (avctx) {avcodec_free_context(&avctx);}});avcodec_ctx_output->width = avframe->width;avcodec_ctx_output->height = avframe->height;avcodec_ctx_output->time_base.num = 1;avcodec_ctx_output->time_base.den = 1000;avcodec_ctx_output->pix_fmt = AV_PIX_FMT_YUVJ420P;avcodec_ctx_output->codec_id = avcodec->id;avcodec_ctx_output->codec_type = AVMEDIA_TYPE_VIDEO;if (avcodec_open2(avcodec_ctx_output.get(), avcodec, nullptr) < 0) {std::cerr << "Failed to open codec" << std::endl;return;}

encode_frame方法将把每个AVFrame打包成若干个AVPacket，并保存在PacketsStore对象中

    std::shared_ptr<PacketsStore> packets_store = std::make_shared<PacketsStore>(av_packet_clone, av_packet_free);if (encode_frame(avcodec_ctx_output.get(), avframe, packets_store) < 0) {std::cerr << "encode_frame error" << std::endl;return;}

编码的过程使用avcodec_send_frame和avcodec_receive_packet方法。从语义上就是将一个解码前的数据发送到一个编码器上下文，然后从这个上下文中获得编码后的数据。

int encode_frame(AVCodecContext *c, AVFrame *frame, std::shared_ptr<PacketsStore> store) {int ret;int size = 0;std::unique_ptr<AVPacket, std::function<void(AVPacket*)>> pkt(av_packet_alloc(),[](AVPacket *pkt) {if (pkt) {av_packet_free(&pkt);}});av_init_packet(pkt.get());ret = avcodec_send_frame(c, frame);if (ret < 0) {return ret;}do {ret = avcodec_receive_packet(c, pkt.get());if (ret >= 0) {store->save(pkt.get());size += pkt->size;av_packet_unref(pkt.get());}else if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF) {return ret;}} while (ret >= 0);return size;
}

在编码完数据后，我们将其保存到一个文件中。

    std::string&& file_name = gen_pic_name(avframe);std::unique_ptr<std::FILE, std::function<int(FILE*)>> file(std::fopen(file_name.c_str(), "wb"), std::fclose);packets_store->traverse([&file](AVPacket* packet){fwrite(packet->data, 1, packet->size, file.get());});
}

ffmpeg api的应用——提取视频图片相关推荐

ffmpeg从某站提取视频、音频、详解
ffmpeg从某站提取视频.音频.详解事件背景准备链接第一步安装下载 ffmpeg是开源软件,安装网址http://ffmpeg.org/download.html#build-windows ...
ffmpeg：制作gif / 提取视频帧为图片
1.制作Gif //将指定时间区间的视频转为gif ffmpeg -ss 起始时间 -t 持续时间 -i wangzai.mp4 wangzai.gif ffmpeg -ss 9 -t 5 -i wa ...
ffmpeg api实现将音视频混合
需求: 将一首歌mp3格式,和一段无声音录屏mp4格式,合成到一起,使播放视频时能听到这首歌. 实现原理: 打开mp3音频,解析出输入音频流,再打开mp4视频,解析出输入视频流. 然后打开输出环境,创 ...
android 查看多个图片,android提取视频多张图片和视频信息
android提取视频多张图片和视频信息话说2016年的直播比较火,2017年短视频又火了.但对于开发者来说隐藏在这背后的技术才是我们所关心的,毕竟我们是靠技术吃饭的. 现在在安卓中多媒体服务比较强 ...
python获取视频帧的时间_Python提取视频中图片的示例（按帧、按秒）
一.按帧提取 #coding=utf-8 import os import cv2 def save_img(): #提取视频中图片按照每帧提取 video_path = r'D:\\test\\' ...
python做视频抽帧图_Python提取视频中图片的示例（按帧、按秒）
一.按帧提取 #coding=utf-8 import os import cv2 def save_img(): #提取视频中图片按照每帧提取 video_path = r'd:\\test\\' ...
ffmpeg提取视频所有帧
可以使用 ffmpeg 命令行工具提取视频中的所有帧.命令格式如下: ffmpeg -i input.mp4 -vf fps=1/60 frames/frame%d.jpg 其中,-i 后面是输入视频 ...
NDK学习笔记：FFmpeg解压MP34提取音频PCM（swrContext、swr_alloc_set_opts）
NDK学习笔记:FFmpeg解压MP34提取音频PCM 承接 FFmpeg解压MP4提取视频YUV ,这次我们需要提取的是音频原始数据PCM.代码流程大同小异,主要区别就是AVFrame->PC ...
如何将视频的每一帧提取成图片
关于如何将视频的每一帧提取成图片有时候我们需要将视频按帧提取出来,但是一个普通的24帧的视频每秒就有24张图片,一分钟的视频就有1440张图片,如果一帧一帧的截取,那无疑十分的浪费时间,而且如何按帧 ...

ffmpeg api的应用——提取视频图片

ffmpeg api的应用——提取视频图片相关推荐

最新文章

热门文章