Recently I found myself having to import data from a PDF file inside a powerpivot model. This particular format is not supported by PowerQuery, that it is the princial ETL system for power bi.
Searching online and trying different solution i founded a working solution. To use this particola solution you have to convert PDF in other format. If you have Word 2013 it’s very symple to do.
- Open the PDF document from Word 2013. You can choose to show only the PDF files to speed-up your search.
- Click ok to the message warning you that Word will convert the PDF document in editable Word document.
- Activate the editing if word ask you.
- Now, save as the document to Web Page (HTML) format. That’s what permit us to import it in Power Query.
- Close Word and Open Excel
- Select the Power Query tab and choose the « From Web » option.
- A dialog window will ask for an URL address. This URL can be a local file, select you file just saved.
- The Navigator will appear on the right, displaying the multiple tables found on that page, select what you interested in and import in powerpivot.
That’s all. You can use different program to convert PDF in html, you can use for example this site : http://www.convertpdftohtml.net/ or use C# library, for example this http://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-html/.