How to extract Japanese subtitles and mp4 from mkv files

Some Anime or Drama in MKV format comes with Japanese subtitles.  If you want to study the subtitle with SABU.  You can extract it by a tool called ffmpeg.  ffmpeg not only can extract subtitles, it can also convert video formats.

ffmpeg is freely available for Linux, Mac and PC.  You can download it from https://www.ffmpeg.org/download.html

To do this you can:

1) Find out the stream ( subtitle ) you want from the MKV file.  In the following case, the Japanese subtitle is in 0:4 stream.

imac:$ ffmpeg -i shingeki.mkv
ffmpeg version 2.2.4-tessus Copyright (c) 2000-2014 the FFmpeg developers
built on Jun 29 2014 16:35:46 with clang version 3.3 (tags/RELEASE_33/final)
configuration: –cc=/opt/local/bin/clang-mp-3.3 –prefix=/Users/tessus/data/ext/ffmpeg/sw –as=yasm –extra-version=tessus –disable-shared –enable-static –disable-ffplay –enable-gpl –enable-pthreads –enable-postproc –enable-libmp3lame –enable-libtheora –enable-libvorbis –enable-libx264 –enable-libx265 –enable-libxvid –enable-libspeex –enable-bzlib –enable-zlib –enable-libopencore-amrnb –enable-libopencore-amrwb –enable-libxavs –enable-version3 –enable-libvo-aacenc –enable-libvo-amrwbenc –enable-libvpx –enable-libgsm –enable-libopus –enable-libmodplug –enable-fontconfig –enable-libfreetype –enable-libass –enable-libbluray –enable-filters –disable-indev=qtkit –enable-runtime-cpudetect
libavutil 52. 66.100 / 52. 66.100
libavcodec 55. 52.102 / 55. 52.102
libavformat 55. 33.100 / 55. 33.100
libavdevice 55. 10.100 / 55. 10.100
libavfilter 4. 2.100 / 4. 2.100
libswscale 2. 5.102 / 2. 5.102
libswresample 0. 18.100 / 0. 18.100
libpostproc 52. 3.100 / 52. 3.100
Input #0, matroska,webm, from ‘shingeki.mkv’:
Metadata:
encoder : libebml v1.3.0 + libmatroska v1.4.0
creation_time : 2013-10-04 03:20:23
Duration: 00:23:52.09, start: 0.000000, bitrate: 2242 kb/s
Stream #0:0: Video: h264 (High), yuv420p, 1280×720 [SAR 1:1 DAR 16:9], 23.81 fps, 23.81 tbr, 1k tbn, 47.95 tbc (default) (forced)
Stream #0:1: Audio: aac, 44100 Hz, stereo, fltp (default) (forced)
Stream #0:2(chi): Subtitle: ssa (default)
Metadata:
title : 简日
Stream #0:3(chi): Subtitle: ssa
Metadata:
title : 繁日
Stream #0:4(jpn): Subtitle: ssa
Metadata:
title : 日语
Codec 0x18000 is not in the full list.
Stream #0:5: Attachment: unknown_codec
Metadata:
filename : Arphic Roman-Mincho Ultra JIS.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:6: Attachment: unknown_codec
Metadata:
filename : EPMINBLD.TTF
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:7: Attachment: unknown_codec
Metadata:
filename : FZXQK.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:8: Attachment: unknown_codec
Metadata:
filename : msyh.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:9: Attachment: unknown_codec
Metadata:
filename : msyhbd.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:10: Attachment: unknown_codec
Metadata:
filename : STZHONGS.TTF
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:11: Attachment: unknown_codec
Metadata:
filename : 华康俪金黑体简W8.ttf
mimetype : application/x-truetype-font
At least one output file must be specified
Conversion failed!

2.  Extract the stream 0:4 and output the ssa subtitle by the command below.  Option map refers to the stream you want to extract.   Options vn and an options means no video or audio is required.

imac:$ ffmpeg -i shingeki.mkv -vn -an -map 0:4 -c:s ssa shingeki.jp.ssa
ffmpeg version 2.2.4-tessus Copyright (c) 2000-2014 the FFmpeg developers
built on Jun 29 2014 16:35:46 with clang version 3.3 (tags/RELEASE_33/final)
configuration: –cc=/opt/local/bin/clang-mp-3.3 –prefix=/Users/tessus/data/ext/ffmpeg/sw –as=yasm –extra-version=tessus –disable-shared –enable-static –disable-ffplay –enable-gpl –enable-pthreads –enable-postproc –enable-libmp3lame –enable-libtheora –enable-libvorbis –enable-libx264 –enable-libx265 –enable-libxvid –enable-libspeex –enable-bzlib –enable-zlib –enable-libopencore-amrnb –enable-libopencore-amrwb –enable-libxavs –enable-version3 –enable-libvo-aacenc –enable-libvo-amrwbenc –enable-libvpx –enable-libgsm –enable-libopus –enable-libmodplug –enable-fontconfig –enable-libfreetype –enable-libass –enable-libbluray –enable-filters –disable-indev=qtkit –enable-runtime-cpudetect
libavutil 52. 66.100 / 52. 66.100
libavcodec 55. 52.102 / 55. 52.102
libavformat 55. 33.100 / 55. 33.100
libavdevice 55. 10.100 / 55. 10.100
libavfilter 4. 2.100 / 4. 2.100
libswscale 2. 5.102 / 2. 5.102
libswresample 0. 18.100 / 0. 18.100
libpostproc 52. 3.100 / 52. 3.100
Input #0, matroska,webm, from ‘shingeki.mkv’:
Metadata:
encoder : libebml v1.3.0 + libmatroska v1.4.0
creation_time : 2013-10-04 03:20:23
Duration: 00:23:52.09, start: 0.000000, bitrate: 2242 kb/s
Stream #0:0: Video: h264 (High), yuv420p, 1280×720 [SAR 1:1 DAR 16:9], 23.81 fps, 23.81 tbr, 1k tbn, 47.95 tbc (default) (forced)
Stream #0:1: Audio: aac, 44100 Hz, stereo, fltp (default) (forced)
Stream #0:2(chi): Subtitle: ssa (default)
Metadata:
title : 简日
Stream #0:3(chi): Subtitle: ssa
Metadata:
title : 繁日
Stream #0:4(jpn): Subtitle: ssa
Metadata:
title : 日语
Codec 0x18000 is not in the full list.
Stream #0:5: Attachment: unknown_codec
Metadata:
filename : Arphic Roman-Mincho Ultra JIS.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:6: Attachment: unknown_codec
Metadata:
filename : EPMINBLD.TTF
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:7: Attachment: unknown_codec
Metadata:
filename : FZXQK.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:8: Attachment: unknown_codec
Metadata:
filename : msyh.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:9: Attachment: unknown_codec
Metadata:
filename : msyhbd.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:10: Attachment: unknown_codec
Metadata:
filename : STZHONGS.TTF
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:11: Attachment: unknown_codec
Metadata:
filename : 华康俪金黑体简W8.ttf
mimetype : application/x-truetype-font
Output #0, ass, to ‘shingeki.jp.ssa’:
Metadata:
encoder : Lavf55.33.100
Stream #0:0(jpn): Subtitle: ssa
Metadata:
title : 日语
Stream mapping:
Stream #0:4 -> #0:0 (ssa -> ssa)
Press [q] to stop, [?] for help
[ass @ 0x102034000] Encoder did not produce proper pts, making some up.
size= 31kB time=00:22:49.22 bitrate= 0.2kbits/s
video:0kB audio:0kB subtitle:28 data:0 global headers:3kB muxing overhead 0.000000%
imac:$

3.  You can also extract the mp4 from the mkv by the command below.  Option copy means no video or audio conversion.  Option sn means no subtitle.

imac:$ ffmpeg -i shingeki.mkv -c:v copy -c:a copy -sn shingeki.mp4
ffmpeg version 2.2.4-tessus Copyright (c) 2000-2014 the FFmpeg developers
built on Jun 29 2014 16:35:46 with clang version 3.3 (tags/RELEASE_33/final)
configuration: –cc=/opt/local/bin/clang-mp-3.3 –prefix=/Users/tessus/data/ext/ffmpeg/sw –as=yasm –extra-version=tessus –disable-shared –enable-static –disable-ffplay –enable-gpl –enable-pthreads –enable-postproc –enable-libmp3lame –enable-libtheora –enable-libvorbis –enable-libx264 –enable-libx265 –enable-libxvid –enable-libspeex –enable-bzlib –enable-zlib –enable-libopencore-amrnb –enable-libopencore-amrwb –enable-libxavs –enable-version3 –enable-libvo-aacenc –enable-libvo-amrwbenc –enable-libvpx –enable-libgsm –enable-libopus –enable-libmodplug –enable-fontconfig –enable-libfreetype –enable-libass –enable-libbluray –enable-filters –disable-indev=qtkit –enable-runtime-cpudetect
libavutil 52. 66.100 / 52. 66.100
libavcodec 55. 52.102 / 55. 52.102
libavformat 55. 33.100 / 55. 33.100
libavdevice 55. 10.100 / 55. 10.100
libavfilter 4. 2.100 / 4. 2.100
libswscale 2. 5.102 / 2. 5.102
libswresample 0. 18.100 / 0. 18.100
libpostproc 52. 3.100 / 52. 3.100
Input #0, matroska,webm, from ‘shingeki.mkv’:
Metadata:
encoder : libebml v1.3.0 + libmatroska v1.4.0
creation_time : 2013-10-04 03:20:23
Duration: 00:23:52.09, start: 0.000000, bitrate: 2242 kb/s
Stream #0:0: Video: h264 (High), yuv420p, 1280×720 [SAR 1:1 DAR 16:9], 23.81 fps, 23.81 tbr, 1k tbn, 47.95 tbc (default) (forced)
Stream #0:1: Audio: aac, 44100 Hz, stereo, fltp (default) (forced)
Stream #0:2(chi): Subtitle: ssa (default)
Metadata:
title : 简日
Stream #0:3(chi): Subtitle: ssa
Metadata:
title : 繁日
Stream #0:4(jpn): Subtitle: ssa
Metadata:
title : 日语
Codec 0x18000 is not in the full list.
Stream #0:5: Attachment: unknown_codec
Metadata:
filename : Arphic Roman-Mincho Ultra JIS.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:6: Attachment: unknown_codec
Metadata:
filename : EPMINBLD.TTF
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:7: Attachment: unknown_codec
Metadata:
filename : FZXQK.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:8: Attachment: unknown_codec
Metadata:
filename : msyh.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:9: Attachment: unknown_codec
Metadata:
filename : msyhbd.ttf
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:10: Attachment: unknown_codec
Metadata:
filename : STZHONGS.TTF
mimetype : application/x-truetype-font
Codec 0x18000 is not in the full list.
Stream #0:11: Attachment: unknown_codec
Metadata:
filename : 华康俪金黑体简W8.ttf
mimetype : application/x-truetype-font
Output #0, mp4, to ‘shingeki.mp4’:
Metadata:
encoder : Lavf55.33.100
Stream #0:0: Video: h264 ([33][0][0][0] / 0x0021), yuv420p, 1280×720 [SAR 1:1 DAR 16:9], q=2-31, 23.81 fps, 16k tbn, 1k tbc (default) (forced)
Stream #0:1: Audio: aac ([64][0][0][0] / 0x0040), 44100 Hz, stereo (default) (forced)
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
[mp4 @ 0x10302c000] Non-monotonous DTS in output stream 0:0; previous: 0, current: 0; changing to 1. This may result in incorrect timestamps in the output file.
frame= 2437 fps=0.0 q=-1.0 size= 31774kB time=00:01:41.70 bitrate=2559.3kbits/frame= 4523 fps=4495 q=-1.0 size= 58370kB time=00:03:08.56 bitrate=2535.8kbitsframe= 7178 fps=4734 q=-1.0 size= 83997kB time=00:04:59.37 bitrate=2298.5kbitsframe= 8899 fps=4378 q=-1.0 size= 102413kB time=00:06:11.16 bitrate=2260.3kbitsframe=12783 fps=4997 q=-1.0 size= 125976kB time=00:08:53.08 bitrate=1935.9kbitsframe=15129 fps=4934 q=-1.0 size= 152601kB time=00:10:31.00 bitrate=1981.2kbitsframe=16536 fps=4614 q=-1.0 size= 172045kB time=00:11:29.81 bitrate=2043.1kbitsframe=16643 fps=4073 q=-1.0 size= 174116kB time=00:11:34.13 bitrate=2054.9kbitsframe=18603 fps=4056 q=-1.0 size= 199781kB time=00:12:55.91 bitrate=2109.3kbitsframe=20413 fps=3992 q=-1.0 size= 225296kB time=00:14:11.31 bitrate=2168.0kbitsframe=23176 fps=4129 q=-1.0 size= 245398kB time=00:16:06.66 bitrate=2079.6kbitsframe=23176 fps=3696 q=-1.0 size= 245398kB time=00:16:06.69 bitrate=2079.6kbitsframe=28349 fps=4184 q=-1.0 size= 272023kB time=00:19:42.47 bitrate=1884.5kbitsframe=31903 fps=4377 q=-1.0 size= 297636kB time=00:22:10.75 bitrate=1832.2kbitsframe=33365 fps=4192 q=-1.0 size= 321188kB time=00:23:11.57 bitrate=1890.8kbitsframe=34335 fps=4167 q=-1.0 Lsize= 333180kB time=00:23:52.09 bitrate=1905.9kbits/s
video:309302kB audio:22376kB subtitle:0 data:0 global headers:0kB muxing overhead 0.452714%
imac:$