Ever Wondered How to Convert a PDF to Latex and Recover Original equations? There is a solution in this article.
There is a way to convert a PDF file to latex (generated, for example through MS Word or Latex). You will get a. tex file with all the original formatting so that you could simply modify it in Tex Studio or use it in Overleaf.
I will show you step-by-step how to convert your PDF to Latex today. With this method, you can also convert scanned images from books with equations to an original Latex code file.
This works regardless of whether your PDF with equations is digitally created or not, the solution below will satisfy you.
If you simply convert the digitally created PDF to word, it will not work because the character recognition software is not powerful enough to recognize math characters.
Method 1: Use MathPix
Here is the most complete video on the topic walking you step by step on how to use MathPix to perform the following:
- Convert PDF to LaTeX
- Image to LaTeX
- Word to LaTeX
- PDF with equations to Word
Watch the video until the end. Huge time saver!
With Snip Notes, you edit Markdown, import PDFs and images, and export them to DOCX, LaTeX, HTML, PDF (with HTML), PDF (with LaTeX), and Overleaf.
Your imagination is the only limit with this!
You can write your whole dissertation in Word or Latex, and if a publisher or the University asks for one format or the other to graduate or publish, this is what you can use to simplify your task.
You work with your classmates on the board and you want to screenshot that, convert into characters via OCR and add it to your homework or paper, this works.
TIP: Make sure you skim through and make sure all minor errors (if any) are fixed.
I can go on and on.
More resources about MathPix:
Download MathPix for Windows
Download MathPix for MacOS
Download MathPix for Linux
Download MathPix for Android
Download MathPix for iOS phone
Method 2: Use InftyReader
You need to Install InftyReader as well as its Latex package below all together to allow character conversion into Latex.
InftyReader is an Optical Character Recognition (OCR) application that recognizes and translates scientific documents (including math symbols) into LaTeX, MathML and XHTML!
I tried it and it’s AMAZING! Look at the examples of screenshots below.
The other good OCR software out there is ABBYY FineReader (that I know extensively), which can convert scanned PDF images or files with tables into Word and Excel automatically. However, it does not recognize equations well (even after training). Therefore, it cannot be used for PDF (with equations) to Latex.
For this example, I wrote part of my Macro problem set in www.overleaf.com, then I exported it into PDF, exported it as an image PDF in Foxit Reader, and Converted it to original latex code using InftyReader and I wanted to show you how to do it as well.
STEP 1: Make sure MikTex (to Open Latex Files), InftyReader (to Convert), and Latex Install Kit are Installed on your computer.
Step 2: Export the PDF file as an Image PDF (Trick)
The trial version of InftyReader processes 1 page at a time and only 5 pages a day.
– Make sure you install the free version of Foxit Reader: https://www.foxitsoftware.com/pdf-reader/
– Trick that works for PDF to latex: Make sure to open the PDF file using Foxit Reader and export PDF file as an IMAGE PDF using Foxit Reader Printer (Example in image below).
The reason is that I tried converting the original PDF (font embedded PDF) without converting first as an image PDF and it did not work. You need to either have the file as a .tif file, PNG, BMP, GIF or as an “image” PDF.
– Open File in Foxit Reader -> Print -> Select Print as image/picture -> Select Foxit Printer (to save as PDF) -> Specify the page to print (you can only convert 1 page at a time if you are on FREE version) -> Print = Save in folder.
You can also use online FREE websites to convert PDF files into PNG such as:
Import PDF file -> Select ImagePDF -> Select Output Folder (Default = Same) -> Select Output Format (Latex)
– Click on Start OCR
– File source will be converted into “filename.tex” and will be exported in the folder specified.
Step 4: Open the .tex file
– Open the .tex file in TexStudio
– And Press F7 to compile and view the PDF of the code.
– Compare with the original PDF (it will be almost exact ). I was really blown away by the level of accuracy and precision. Look at the screenshot of the result I had.
– Copy the code and paste it onto www.overleaf.com if you want to edit it online for free.
Note that InftyReader simultaneously uses the OCR engines of Toshiba Corporation, “ExpressReaderPro”, and of MediaDrive Corporation, “WinReader”, to take the recognition results of characters in ordinary text areas to the next level.
I managed to do this after days and hours of looking around but could find a proper tutorial on how to convert. Some believe it is not yet possible, but I believe anything can be done. If it helped you, make sure you share it to help others and subscribe to my Newsletter for exclusive tips and tutorials.
IMPORTANT: While you are here, check out my advisor’s (Marc Bellemare) amazing book called “Doing Economics: What You Should Have Learned in Grad School―But Didn’t” published by the MIT Press.
Also, check out one my fav Professors’ (Paul Glewwe) new book with Petra Todd “Impact Evaluation in International Development: Theory, Methods, and Practice” published by the World Bank.
I received an email from Professor Jason Levy at the University of Hawaii who pointed to my attention that there is a “LITE version of InftyReader (beta free)“. He showed me a .tex file from a PDF he converted and it perfectly converted the PDF pages into Latex files.
Note that the trial beta version is freely usable until July 31, 2020, and only accepts E-born PDF or electronic-born PDF (PDF produced by authoring tools such as LaTeX system, MS Word, Adobe InDesign, etc). It does not accept Image PDFs at the moment, so if you have an image PDF, use the method outlined above and DO NOT use this Lite version.
Apparently, there is no limit as to how many pages you can convert with this InftyReader Lite Trial version. However, he told me “The document was about 17 pages and it took about 30 minutes. But it seemed to be very computationally intensive. So I don’t recommend more than 20 pages at a time”.
NOTE: Please understand that you DO NOT NEED to install InftyReader Lite because all its functions are already included in the standard version mentioned above. InftyReaderLITE is just a subset of the InftyReader Standard version.
However, if you want to give it a try, you can click on the link below to download it.
Tags: How to get the LaTeX code of a PDF.