PDF Metadata in LaTeX Documents

A high-quality publication not only has good content, but also takes care of the tiny details. In an earlier blog post we looked at how to embed fonts in a PDF, today we look at PDF metadata which specifies properties such as the author, the title, a subject, and keywords. Setting the PDF metadata correctly will make it easier for search engines to find and correctly advertise your work, so spending a few minutes on setting the metadata correctly is time well spent.

If you use WYSIWYG editors such as LibreOffice or OpenOffice, you will find a suitable properties-dialog somewhere; good luck! However, if you share my preference for a document markup language such as LaTeX, you need to specify these metadata right in the sources, or add them to a PDF later. I summarize five ways of achieving the same in the following. Feel free to download the sources for adding metadata to PDFs generated with LaTeX and try them out.

PDF Metadata with hyperref in LaTeX

A convenient way of specifying the PDF metadata in LaTeX is to use the package hyperref. All you need to do is to include the hyperref package in the preamble and pass the respective entries in the PDF metadata to the macro hypersetup:

pdftitle={Your PDF title},
pdfsubject={Your PDF subject},
pdfauthor={Your PDF author},
pdfkeywords={Your PDF keywords}

This works with LaTeX+DVIPS+PS2PDF as well as with PDFLaTeX.

PDF Metadata with pdfinfo in LaTeX

The package hyperref has a couple of dependencies and sometimes clashes with other packages. If hyperref is not an option for you and if you use PDFLaTeX (Latex+DVIPS+PS2PDF does not work), use the macro pdfinfo:

/Title (Your PDF title)
/Author (Your PDF author)
/Subject (Your PDF subject)
/Keywords (Your PDF keywords)

PDF Metadata with pdftk

Sometimes you already have a PDF (not necessarily generated with LaTeX) and only need to fix the metadata. A command line tool for fixing the metadata is pdftk, which is available for all major platforms. Specify the metadata in a text file (say, meta.txt):

InfoKey: Title
InfoValue: Your PDF title
InfoKey: Subject
InfoValue: Your PDF subject
InfoKey: Author
InfoValue: Your PDF author
InfoKey: Keywords
InfoValue: Your PDF keywords

where you adjust the InfoValue entries to your needs. Then, run

pdftk input.pdf update_info meta.txt output output.pdf

to obtain a PDF file output.pdf with the desired metadata.

PDF Metadata with exiftool

Another option for setting PDF metadata is to use exiftool. Pass the respective entries directly in a single command, e.g.

exiftool -Title="Your PDF title" -Subject="Your PDF subject" -Author="Your PDF author" -Keywords="Your PDF keywords" mypdf.pdf

exiftool will overwrite the input file, but store a backup of the original file with suffix _original in the current working directory.

PDF Metadata with Ghostscript

pdftk and exiftool may not be installed on your machine. In such case, consider using Ghostscript. As with pdftk, you need to provide the metadata in a separate file, e.g. pdfmarks:

[ /Title (PDF meta title via ghostscript)
/Author (PDF meta author via ghostscript)
/Subject (PDF meta subject via ghostscript)
/Keywords (PDF meta keywords via ghostscript)
/DOCINFO pdfmark

Did you notice the similarity with the arguments of pdfinfo above? Next, call Ghostscript as follows:

gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=output.pdf input.pdf pdfmarks

to produce an output file output.pdf with correct metadata from an input file input.pdf.


This blog post is for calendar week 4 of my weekly blogging series for 2016.