Most clients send their financials as PDF files, but to perform analysis, you need to import the data into Microsoft Excel. To help save you time, we will show you two ways to import a PDF into Excel:
- Save the PDF as a Word file and then copy and paste the desired tables into Excel.
- Use the Power Query tool to import the desired tables directly from the PDF into Excel. As we will see, Power Query allows you to transform the table if needed.
In this example, our goal will be to import table data from the Microsoft 2020 annual report PDF into Excel. On page 7 of the PDF, you can see the financial highlights table (see Figure 1). We want to import the data from the PDF into Excel.
Figure 1. Microsoft financial highlights 2016-2020 PDF table.
How to import a PDF into Excel using Word
After saving our PDF as a Word document, find the desired table on page 7 of the file. To import this table into Excel, simply copy and paste the table. The result is shown in Figure 2 and in the Word file of the workbook PDF.xlsx.
Figure 2. Microsoft financial highlights 2016-2020 Word table.
Pro tip: If the footnote references in parentheses bother you, you can manually remove them. And Power Query makes removing the square brackets a snap!
Using Power Query to import a PDF to Excel
The table we want to import is on page 7 of our PDF. Follow these steps to import data from a PDF to Excel via Power Query:
- From the Data tab, choose Get Data from the Get & Transform Group.
- From the dropdown arrow, choose From File and then From PDF.
- Browse to the relevant PDF and wait for Excel to process the PDF. As shown in Figure 3, you will see a list of tables in the PDF with the corresponding page numbers.
Figure 3. List of tables in the Microsoft 2020 annual report PDF.
- Browsing through the tables we see that Table003 on page 8 is what we need. If we select Transform Data, we can use Power Query to delete any unnecessary columns.
- Select Use First Row as Headers from the Home tab to ensure the years appear in the header row.
- Use the Control key to select the column headings for all non-numeric columns except the first column, and after right-clicking, click on Remove to remove the unneeded columns.
- From File, choose Close and Load To and then Table. The resulting table is shown in Figure 4.
Figure 4. Microsoft financials imported from PDF to Excel.
Build your Excel skills with Becker CPE courses
Becker has a wide variety of Excel CPE courses designed to build out your skills so you can get more done in less time. Check out our Microsoft Excel Fundamentals + Data Analytics Certificate for 14 CPE courses worth 28 credits and get a full, end-to-end training in the foundations of Excel, data visualization, and advanced analytics.