Skip to main content

Raymii.org Logo (IEC resistor symbol)logo

Quis custodiet ipsos custodes?
Home | About | All pages | RSS Feed | Gopher

Viewing PDF, .docx and .odt files in mutt (as text)

Published: 03-03-2019 | Author: Remy van Elst | Text only version of this article


Table of Contents


mutt is my email client at work. I like the simple interface, the speed and theability to customize the workflow. Email is synced with offlineimap and sentvia msmtp, addresses are in abook, and calcurse is the calendar formeetings, no complicated setup there. One aspect I especially like is theability to view attachments on the command line right from mutt itself. Somedepartments at work send emails with an attached PDF or .docx file thatcontains the actual message, instead of just putting the text in the emailitself. Using pandoc and pdftotext in mutt, the text of the attachments isdisplayed as a regular mail, no interruptions in my workflow by opening anexternal program. This article explains how to set up your .muttrc and.mailcap to use pandoc and pdftotext to view attachments as text in mutt.

I do assume you have a working mutt set up as I don't cover that here. The ArchLinux Wiki on mutt is a great place to start if you haven't got mutt setupyet.

If you like this article, consider sponsoring me by trying out a Digital OceanVPS. With this link you'll get $100 credit for 60 days). (referral link)

Installing software

On Ubuntu both packages required are in the repository and can be installedusing apt:

apt-get install pandoc poppler-utils

pdf2text is in poppler-utils.

.mailcap

Your .mailcap. file contains information for a mail client how to handle non-text files.

In your .muttrc file you need to specify where this file is:

set mailcap_path = ~/.mailcap

The man page for .mailcap explains the purpose and format of the file:

Each mailcap entry consists of a content-type specification, a command toexecute, and (possibly) a set of optional "flag" values. For example, astraightforward mailcap entry (which is default behavior for metamail) wouldlook like this:

text/plain; cat %s

The optional flags can be used to specify additional information about the mail-handling command. For example:

text/plain; cat %s; copiousoutput

can be used to indicate that the output of the cat command may be voluminous,requiring either a scrolling window, a pager, or some other appropriate copingmechanism.

HTML mails

I use elinks for example to view html mails, the following line accomplishesthat:

text/html; elinks -dump ; copiousoutput;

Combined with the following line in my .muttrc to auto convert HTML mails:

auto_view text/html text/calendar application/ics

You only need text/html, but I also have calendar and meeting invites that Iauto view due to Exchange presenting those in some weird empty email withattachment format.

Now, back to the PDF and .docx files.

PDF & .docx

The following command will convert a .docx file to text. The to parameter statesmarkdown, but the output will be plain text with markdown formatting.

pandoc --from docx --to markdown My_doc_file.docx

This works for OpenOffice as well:

pandoc --from odt --to markdown My_odt_file.odt

The following command will convert a .pdf file to text:

pdftotext -layout %s

Do note that in both cases non-text items might be lost, like images. Not a bigissue since my files are mostly plain text but do keep it in mind. Tables workquite well, which surprised me. You still can save the attachments and view themwith another program (like LibreOffice).

Putting two and two together results in the following three lines in your.mailcap file:

application/vnd.openxmlformats-officedocument.wordprocessingml.document; pandoc --from docx --to markdown %s; copiousoutputapplication/vnd.oasis.opendocument.text; pandoc --from odt --to markdown %s; copiousoutputapplication/pdf; pdftotext -layout %s -; copiousoutput;

Restart mutt and open an email with an attachment you want to view. Instead ofusing s to save the file you can now use v to view the file. Eitherpdftotext or pandoc is invoked, and the plain text output is shown insidemutt.

A PDF file viewed in mutt, as text

Tags: articles, bash, mail, markdown, microsoft, mutt, office, pandoc, pdf, pdf2text, word