CPE

How to import a PDF into Excel

person studying on a laptop and writing in notebook

Most clients send their financials as PDF files, but to perform analysis, you need to import the data into Microsoft Excel. To help save you time, we will show you two ways to import a PDF into Excel:

  1. Save the PDF as a Word file and then copy and paste the desired tables into Excel.
  2. Use the Power Query tool to import the desired tables directly from the PDF into Excel. As we will see, Power Query allows you to transform the table if needed.

In this example, our goal will be to import table data from the Microsoft 2020 annual report PDF into Excel. On page 7 of the PDF, you can see the financial highlights table (see Figure 1). We want to import the data from the PDF into Excel.

referenced pdf table

Figure 1. Microsoft financial highlights 2016-2020 PDF table.

 

How to import a PDF into Excel using Word

After saving our PDF as a Word document, find the desired table on page 7 of the file. To import this table into Excel, simply copy and paste the table. The result is shown in Figure 2 and in the Word file of the workbook PDF.xlsx.

referenced pdf table

Figure 2. Microsoft financial highlights 2016-2020 Word table.

 

Pro tip: If the footnote references in parentheses bother you, you can manually remove them. And Power Query makes removing the square brackets a snap!

Using Power Query to import a PDF to Excel

The table we want to import is on page 7 of our PDF. Follow these steps to import data from a PDF to Excel via Power Query:

  1. From the Data tab, choose Get Data from the Get & Transform Group.
  2. From the dropdown arrow, choose From File and then From PDF.
  3. Browse to the relevant PDF and wait for Excel to process the PDF. As shown in Figure 3, you will see a list of tables in the PDF with the corresponding page numbers.
pdf to excel navigator options

Figure 3. List of tables in the Microsoft 2020 annual report PDF.

 

  1. Browsing through the tables we see that Table003 on page 8 is what we need. If we select Transform Data, we can use Power Query to delete any unnecessary columns.
  2. Select Use First Row as Headers from the Home tab to ensure the years appear in the header row.
  3. Use the Control key to select the column headings for all non-numeric columns except the first column, and after right-clicking, click on Remove to remove the unneeded columns.
  4. From File, choose Close and Load To and then Table. The resulting table is shown in Figure 4.

 

PDF to Excel table

Figure 4. Microsoft financials imported from PDF to Excel.

 

Build your Excel skills with Becker CPE courses

Becker has a wide variety of Excel CPE courses designed to build out your skills so you can get more done in less time. Check out our Microsoft Excel Fundamentals + Data Analytics Certificate for 14 CPE courses worth 28 credits and get a full, end-to-end training in the foundations of Excel, data visualization, and advanced analytics. 

Now Leaving Becker.com

You are leaving the Becker.com website. Once you click “continue,” you will be brought to a third-party website. Please be aware, the privacy policy may differ on the third-party website. Adtalem Global Education is not responsible for the security, contents and accuracy of any information provided on the third-party website. Note that the website may still be a third-party website even the format is similar to the Becker.com website.

Continue