How to import PDF files into Excel
Most clients send their financials as PDF files, but to perform analysis, you need to import the data into Microsoft Excel. In this article, we will show you two ways to import tables from a PDF into Excel:
- Save the PDF as a Word file and then copy and paste the desired tables into Excel.
- Use the Power Query tool to import the desired tables directly from the PDF into Excel. As we will see, Power Query allows you to transform the table if needed.
In this example, our goal will be to import table data from the Microsoft 2020 annual report PDF into Excel. On page 7 of the PDF, you can see the financial highlights table (see Figure 1). We want to import this table into Excel.
Figure 1. Microsoft financial highlights 2016-2020 PDF table.
How to import a table from a PDF into Excel
After saving our PDF as a Word document, find the desired table on page 7 of the file. To import this table into Excel, simply copy and paste the table. The result is shown in Figure 2 and in the Word file of the workbook PDF.xlsx.
Figure 2. Microsoft financial highlights 2016-2020 Word table.
Pro tip: If the footnote references in parentheses bother you, you can manually remove them. And Power Query makes removing the square brackets a snap!
Using Power Query to import a table from a PDF
The table we want to import is on page 7 of our PDF. Follow these steps to import data from a PDF via Power Query:
- From the Data tab, choose Get Data from the Get & Transform Group.
- From the dropdown arrow, choose From File and then From PDF.
- Browse to the relevant PDF and wait for Excel to process the PDF. As shown in Figure 3, you will see a list of tables in the PDF with the corresponding page numbers.
Figure 3. List of tables in the Microsoft 2020 annual report PDF.
- Browsing through the tables we see that Table003 on page 8 is what we need. If we select Transform Data, we can use Power Query to delete any unnecessary columns.
- Select Use First Row as Headers from the Home tab to ensure the years appear in the header row.
- Use the Control key to select the column headings for all non-numeric columns except the first column, and after right-clicking, click on Remove to remove the unneeded columns.
- From File, choose Close and Load To and then Table. The resulting table is shown in Figure 4.
Figure 4. Microsoft financials imported from PDF to Excel.
Since client financial documents are usually in PDF format, you’ll need to know how to import PDF files into Microsoft Excel for analysis. For more useful Excel tutorials and tricks, take one of my Becker CPE courses.
If you want to learn more about the miracles you can achieve with Power Query, check out Chapter 28 of my new book, Microsoft Excel Data Analysis and Business Modeling (Office 2021 and Office 365).