We've launched an exciting new project of our own! Introducing the Kickstarter app for iPhone!

Funded! This project successfully raised its funding goal on June 3, 2012.

What kind of subtitles to use?

Update #13 · May 17, 2012 · 7 comments

What standard do you think we should use?

Since Lib-Ray will be an international standard, languages and localization are going to be a design challenge. The most obvious issue is that Lib-Ray releases will need to accommodate subtitle tracks in an adequate format.

There are multiple formats for subtitles, and this is something that is going to take some more thought before being completely settled. My first choice was to use the "Kate" standard (or "OggKate") which is a very capable format that supports text and vector graphics. By far the most widely-used, though, is "SubRip" or "SRT" format, which is a very basic (unicode) text format.

There is also a format called WebVTT which is associated with the web standards and HTML5 (and is very similar to SRT in syntax). And I have heard of other formats as well.

Of course, whatever standard is used, there is software to convert between different formats, so the main issue is what each format can represent. Some allow for font, position, and color changes (all supported by Kate), while SRT supports only plain text.

Another question is whether subtitles should be embedded in the media file itself, or provided in separate files. For Kate format, it probably makes sense to put them in the media file. Either is possible with SRT format, but separate files are probably more common. WebVTT is apparently expected to be in external files.

From the point of view of making the media file more transportable on its own, it's better to put the subtitles into it. On the other hand, this makes them harder to update, and with the decision to switch to read-write Flash media, the idea of making subtitles easier to patch is an attractive advantage to providing them in separate files.

It seems to me that this is too wide open to leave unspecified, but it may be necessary to include more than one option.

The following article shows how I dealt with subtitles in my 0.1 and 0.2 prototypes of "Sintel", where they were encoded as an OggKate stream for inclusion within the video player. These worked fine in VLC, which I was using for testing at the time.

Creating Subtitles from SRT Sources for an Ogg Video with kateenc

Originally published in Free Software Magazine:: http://fsmsh.com/3540

Sun, 2011-05-01 22:18 -- Terry Hancock

One of the more interesting aspects of Ogg Video is that it allows an essentially unlimited number of subtitle tracks to be included. This is especially useful for free-culture videos, since they are generally released globally, and there are often contributed subtitles. In fact, for "Sintel", I was able to find 44 subtitle files. I will be including them all as Ogg Kate streams in my prototype "Lib-Ray" version of "Sintel", and in this column I will demonstrate the use of several command line utilities useful for this, especially the kateenc tool for creating the streams.


The main source for the "Sintel" material is a download directory on the Xiph.org website. If you've been following this series, you'll recognize this as the same site I got the PNG frames and audio soundtracks from. This site also has the original .srt format subtitle files for the nine languages that are included on the DVD version of Sintel.

Since then, however, 36 additional .srt files have been provided by the community, for a total of 45 different subtitle tracks. These are collected at a different site. I'll be using all of these.

Subtitles

There are also Ogg streams to carry subtitles. The most popular, and the one I'm going to be using, is Kate. As with Theora and FLAC, the command line tool for manipulating this format (kateenc) is included in the Debian archive (as part of the libkate-tools package).

There are also Ogg streams to carry subtitles. The most popular, and the one I'm going to be using, is Kate

The first problem I encounter with the .srt files from Sintel is that they do not use consistent encodings (I didn't realize this until kateenc choked on some of the files -- it expects UTF-8 encoded files!). Using the file command, I can see this right away:

$ file *.srt
sintel_afr.srt: ISO-8859 text, with CRLF line terminators
sintel_ar.srt: ISO-8859 text, with CRLF line terminators
sintel_bg.srt: ISO-8859 text, with CRLF line terminators
sintel_bn.srt: UTF-8 Unicode text
sintel_chs.srt: ISO-8859 text, with CRLF line terminators
sintel_cn.srt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
sintel_cz.srt: UTF-8 Unicode text
sintel_da.srt: ISO-8859 English text, with CRLF line terminators
sintel_eo.srt: UTF-8 Unicode (with BOM) text, with CRLF line terminators
[...]

I started by collecting these into directories for each of the major encodings, but unfortunately, this is a little hard to untangle. In particular, the ISO-8859 encoded files use different code pages according to language, so you have to recognize the language codes in the file names (or look them up in an ISO-639 language code table) and figure out the correct page. For example, ar is the code for Arabic, and this means that we should decode from ISO-8859-6. Or so I thought -- actually attempting this results in an error. Opening the file up in Iceweasel, I noticed some weird characters that didn't make sense. It provides some other options for encoding, and with "Windows-1256" it actually looked like Arabic. So, that's what I'll use (in fact, it turned out that the majority of these files that file identified as ISO-8859 were actually in one of various Windows encodings -- they were probably submitted in the default encoding of the user who contributed the translations).

This can be converted with the iconv command line tool:

$ cd ISO-8859
$ iconv -f WINDOWS-1256 -t UTF-8 sintel_ar.srt -o ../UTF-8/sintel_ar.srt

I didn't figure out any way to automate this, so I just went through the files one-by-one to convert the encodings appropriately (fortunately, they were not all this hard). I'm not going to go through this in detail, but in the end, I had all 44 of my subtitle files in one directory with UTF-8 encodings.

At this point, a tcsh loop is handy for processing the files in bulk to get my Ogg Kate streams:

$ cd All/UTF-8/
$ tcsh
> foreach lang ( af ar bg bn cz da de en eo es fi_ep fi_FI fi_ps fr gl gr he hr hu id it jp ko ku la lv mk ml nb_NO nl pa pl pt ro ru sk sr th tr uk vi zh_CN zh zh_TW )
 foreach? echo $lang
 foreach? kateenc -t srt -c SUB -l $lang -o ../../OGG/sintel_$lang.ogg sintel_$lang.srt
 foreach? end
 [...]
 >

Note that by providing this code to kateenc via the -l option, it will identify the subtitle track correctly by language. This will allow the player to identify the tracks correctly by language (I tested this in VLC, and it provides the full-name of each language in that language on the subtitles pull-down menu for the user to select from).

Fixing bugs in the SRT files

At this point I must be honest -- many of these .srt files had bugs. As a result, I got quite a number of syntax warnings from kateenc. I had to go back and fix a lot of these files in order to get them to work smoothly.

I had to go back and fix a lot of these files in order to get them to work smoothly

When everything is working smoothly, the code above will simply list the language/country code extensions. If something goes wrong, though, the echo line will tell which language files was being processed, so you can check it out.

Some of the errors I found on inspection:

  • Two subtitle blocks run together without an intervening blank line (kateenc skips the second one and then complains about "non-consecutive ids"). Fix by adding the blank line.
  • Extra blank lines inside a subtitle block, resulting in a syntax error.
  • Incorrect time codes (typos?) resulting in nonsensical time intervals (reversed, or of zero-length)
  • Non-standard annotations, such as author information. These can be converted into normal subtitles and placed at the end (they'll appear near the end titles), thus preserving the correct attribution.
  • Unicode byte-order marks at the beginning seem to confuse kateenc in a couple of cases, so I just removed them.

At this point we have 44 Ogg files with Kate streams in them. Combined with the audio and video streams from before, we'll be ready to assemble them into a single multimedia file, which will be the goal of my next column.


467
Backers
$21,421
pledged of $19,000 goal
0
seconds to go

Funding period
May 4, 2012 - Jun 3, 2012 (30 days)

  • Pledge $1 or more

    76 backers

    Notification of download availability for the software and specification documents.

    Estimated delivery: Jul 2013
  • Pledge $10 or more

    133 backers

    BACKER CREDIT: In addition to receiving notification of the download, anyone contributing $10 or more will be credited in the distribution notes for the software.

    Estimated delivery: Jul 2013
  • Pledge $25 or more

    35 backers

    COPY OF THE SOFTWARE: CD-R disk with all of the developed software in a Live-CD Linux so you can play Lib-Ray on most computers with CD-ROM and SD card support. The "Complete Guide to Lib-Ray" book will be provided in PDF format on the disk.

    Estimated delivery: Jul 2013
  • Pledge $30 or more

    55 backers

    BLENDER FOUNDATION OPEN MOVIE COLLECTION. Lib-Ray edition of the first three open movies created by the Blender Foundation: "Elephants Dream", "Big Buck Bunny", and "Sintel". You get all three films in high-quality 1920x1080 HD video with stereo and 5.1 surround sound, plus as many subtitle tracks as I can locate. This release will be in a compact, lightweight sleeve.

    Estimated delivery: Jul 2013
  • Pledge $35 or more

    12 backers

    SITA SINGS THE BLUES. Lib-Ray edition of Nina Paley's award-winning feature-length animated film "Sita Sings the Blues" in beautiful 1920x1080 HD video with stereo sound. Some of the proceeds from this Creator Endorsed release will be shared directly with Nina Paley after we make our minimum. This edition will be in a light-weight, compact, eco-friendly sleeve.

    Estimated delivery: Jul 2013
  • Pledge $40 or more

    25 backers

    BLENDER FOUNDATION OPEN MOVIE COLLECTION (Lib-Ray): Deluxe-packaged Lib-Ray format release of the first three of the Blender Foundation's Open Movies: "Elephants Dream", "Big Buck Bunny", and "Sintel", with extras and subtitles in all of the available languages I can find. The Lib-Ray software and documentation will be provided as an extra on this special edition card. (This is the same package as the mock-up I showed in the video).

    Estimated delivery: Jul 2013
  • Pledge $50 or more

    9 backers

    SITA SINGS THE BLUES (Lib-Ray): Deluxe packaged Creator-Endorsed Lib-Ray format release of "Sita Sings the Blues" by Nina Paley. Note that this will be the first time this title has been released in a high-definition format. The Lib-Ray software and documentation will be provided as an extra on this special edition card. (This is the same package as the mock-up I showed in the video). After we make our minimum, a portion of the profit from sales of this card will be given to Nina Paley herself.

    Estimated delivery: Jul 2013
  • Pledge $50 or more

    6 backers

    PRINTED MANUAL: A printed copy of the "Complete Guide to Lib-Ray", a book documenting the specification, tutorials on how to create a Lib-Ray disk "from scratch" as well as by using the developed Lib-Ray mastering wizard software. Plus you will receive the software on disk as well.

    Estimated delivery: Jul 2013
  • Pledge $60 or more

    54 backers

    BOTH RELEASES (SITA SINGS THE BLUES and the BLENDER OPEN MOVIE COLLECTION) on Lib-Ray: Both "Sita Sings the Blues" and the Blender Foundtion open movies "Elephants Dream", "Big Buck Bunny", and "Sintel". The Lib-Ray software and documentation will be included on these cards.

    Estimated delivery: Jul 2013
  • Pledge $80 or more

    3 backers

    PRINTED MANUAL and SITA SINGS THE BLUES: "The Complete Guide to Lib-Ray" printed book along with the Lib-Ray format copy of "Sita Sings the Blues" in the compact package.

    Estimated delivery: Jul 2013
  • Pledge $85 or more

    21 backers

    BOTH RELEASES - DELUXE PACKAGE OPTION ("Sita Sings the Blues" and "Blender Open Movies Collection"). If you would like to get both of the Lib-Ray releases in the deluxe (metal package with 3D-printed plastic card clip and printed insert with liner notes), you can select this option. These will be similar to the mock-ups in the video.

    Estimated delivery: Jul 2013
  • Pledge $100 or more

    3 backers

    TRY OUT LIB-RAY 4K: Experimental "Lib-Ray 4K" release of the Blender Foundation's "Sintel" Open Movie. This will be in an experimental 4096x2048 pixel format as currently used in digital projection systems. There are only a few home theater systems available at this resolution, and you will need to run the software on a fairly high-end computer to make this work (the hardware player I am offering is not guaranteed to be able to play this disk).

    Estimated delivery: Jul 2013
  • Pledge $120 or more

    5 backers

    PRINTED MANUAL and BOTH RELEASES: The "Complete Guide to Lib-Ray" printed book along with Lib-Ray format copies of "Sita Sings the Blues" and the Blender Foundation Open Movie Collection.

    Estimated delivery: Jul 2013
  • Pledge $200 or more

    7 backers

    GOLD SPONSOR: You will receive the printed "Complete Guide to Lib-Ray" book, both Lib-Ray releases ("Sita Sings the Blues" and the Blender Open Movie Collection), and you will be credited as a "Gold Sponsor" in the documentation, book, and distribution credits.

    Estimated delivery: Jul 2013
  • Pledge $600 or more

    0 backers Limited (100 of 100 left)

    PLATINUM SPONSOR: Be listed as a "Platinum Sponsor". Also get your choice of compact or deluxe editions of both "Blender Open Movie Collection" and "Sita Sings the Blues" and the printed manual "Complete Guide to Lib-Ray", with a signed message of thanks from me (all the "PLATINUM" level sponsors will get this -- because I'm really going to be that grateful!).

    Estimated delivery: Jul 2013
  • Pledge $750 or more

    0 backers Limited (10 of 10 left)

    FILMMAKER (MASTERING OF YOUR OWN FILM): Special reward for producers -- I will work with you to produce a Lib-Ray edition of your own film. You will need to provide lossless (or best available) original video, original audio, still images for menus (unless you want me to use frames from your video) and any other extras you want to have on the release. It does NOT have to be a free-licensed release, but I will need to get a statement identifying you as the copyright holder and indemnifying me against claims (I'll provide boilerplate for that). The video content also should not violate any laws or require any special regulatory notices (e.g. no porn films, please).

    Estimated delivery: Jul 2013
  • Pledge $1,000 or more

    1 backer Limited (19 of 20 left)

    PLATINUM SPONSOR / HTPC SYSTEM (PLAYER/COMPUTER): I will build you a Home Theater PC with playback support for Lib-Ray, DVD, and (some?) Blu-Ray disks. The system will be GNU/Linux based and will use one of the existing player distributions for DVD and Blu-Ray playback. Lib-Ray playback will be through the software I develop and will be integrated into the distribution as best I can make it. This player will come in a low-profile HTPC case. The system will be powerful enough to do the decoding in software, so you'll also be able to find other uses for it, such as playing games. You will also receive the "Sita Sings the Blues" and Blender Foundation Open Movie Collection releases and the printed "Complete Guide to Lib-Ray" book, and you will be credited as a "Platinum Sponsor".

    Estimated delivery: Jul 2013
  • Pledge $3,000 or more

    0 backers Limited (20 of 20 left)

    PLATINUM SPONSOR / HTPC SYSTEM PLUS MONITOR AND SOUND SYSTEM: I will build you a complete Home Theater (or private screening) system with the player HTPC, a set of 5.1 surround sound speakers, and a 32" flat-panel LCD 1080P HD TV monitor. You will also receive credit as "Platinum Sponsor" and you will get copies of both "Sita Sings the Blues" and the Blender Foundation Open Movie Collection as well as the printed "Complete Guide to Lib-Ray" book. (Warning: if there are customs regulations or tariffs affecting the import of this product to your country, you will need to be responsible for them. Monitor and speakers may be drop-shipped separately from the supplier to reduce shipping costs).

    Estimated delivery: Jul 2013
  • Pledge $5,000 or more

    1 backer Limited (2 of 3 left)

    CORPORATE SPONSOR: Organizational logo sponsor. Your company's logo will be included prominently in all of the project materials and on the Lib-Ray.org website as a project sponsor. You will also receive up to 10 copies each of the two Lib-Ray releases ("Sita Sings the Blues" and the Blender Foundation Open Movies Collection).

    Estimated delivery: Jul 2013