Power Query Import PDF

Recently I found myself having to import data from a PDF file inside a powerpivot model. This particular format is not supported by PowerQuery, that it is the princial ETL system for power bi.

Searching online and trying different solution i founded a working solution. To use this particola solution you have to convert PDF in other format. If you have Word 2013 it’s very symple to do.

  1. Open the PDF document from Word 2013. You can choose to show only the PDF files to speed-up your search.
  2. Click ok to the message warning you that Word will convert the PDF document in editable Word document.
  3. Activate the editing if word ask you.
  4. Now, save as the document to Web Page (HTML) format. That’s what permit us to import it in Power Query.
  5. Close Word and Open Excel
  6. Select the Power Query tab and choose the « From Web » option.
  1. A dialog window will ask for an URL address. This URL can be a local file, select you file just saved.
  1. The Navigator will appear on the right, displaying the multiple tables found on that page, select what you interested in and import in powerpivot.

 

That’s all. You can use different program to convert PDF in html, you can use for example this site : http://www.convertpdftohtml.net/ or  use C# library, for example this http://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-html/.

Annunci

Rispondi

Inserisci i tuoi dati qui sotto o clicca su un'icona per effettuare l'accesso:

Logo WordPress.com

Stai commentando usando il tuo account WordPress.com. Chiudi sessione / Modifica )

Foto Twitter

Stai commentando usando il tuo account Twitter. Chiudi sessione / Modifica )

Foto di Facebook

Stai commentando usando il tuo account Facebook. Chiudi sessione / Modifica )

Google+ photo

Stai commentando usando il tuo account Google+. Chiudi sessione / Modifica )

Connessione a %s...