Author Topic: Mustachian Hack to Convert .pdf to Excel (So I can get my Chase CC activity)?  (Read 7886 times)

Dicey

  • Senior Mustachian
  • ********
  • Posts: 22319
  • Age: 66
  • Location: NorCal
GRRRR, I'm doing my taxes and downloaded all my AMEX transactions to Excel in just a few minutes, piece of cake. However when I attempted same with CHASE, no go. All I can get is a .pdf by month. WTF??? I never would have changed banks if I'd realized how archaic their year-end data access is. I only have the free Adobe Reader software. Am I totally screwed or is there some workaround that some smart mustachian who was lured to Chase by their big, fat bonuses (as I was) has figured out? Of course, my tax appointment is tomorrow and I use this card All.The.Time. FWIW, it's an IHG Affinity card.

PS - Sorry, GNs, I know that's a terrible subject line, but I'm in a four-wheel lock panic at the moment.

mtn

  • Handlebar Stache
  • *****
  • Posts: 1343
Can you get an online version, not a pdf? I can with Chase for my checking account (no credit card to compare). If you can get that, then just a copy and paste and format.

If you can't, maybe you can get lucky with text-to-column, but on my regular statement PDF that was a no-go since there was no common separator.

nobody123

  • Pencil Stache
  • ****
  • Posts: 519
If you use the "Search Transactions" link you can enter the year as a date range and then download the results as a CSV.

OOPS, that only works for a checking account.  Credit card is limited to the last 90 days.  WTF.
« Last Edit: March 21, 2016, 01:51:36 PM by nobody123 »

andy85

  • Handlebar Stache
  • *****
  • Posts: 1060
  • Age: 38
  • Location: Louisville, KY
Download it as a CSV file as show in the attached image.

edit: nobody123 beat me to it...

mtn

  • Handlebar Stache
  • *****
  • Posts: 1343
Wait! I might have a solution, though it will be tedious.

1: Select the data--not headers, just the data.
2: Copy all of it.
3: Paste it into Notepad (not Word, it can mess with the formatting) 
4: Select one of the spaces that is separating the data--i.e. the space or tab between the description and the amount
5: Copy the selection of the space
6: Click CTRL+H 
7: In the "Find what" box, paste the space you just copied 
8: Without closing the replace box, click back in the notepad. Type/hit tab. Select, and copy that tab.
9: Back in the "Replace with" box, paste the tab.
10: Replace all. 
11: Select all and copy.
12: Open Excel, and paste it in there.
13: Under the data tab on the top ribbon, select "Text to Columns". 
14: Select "Delimited", click next 
15: Select "Tab" under the Delimiters options, and click next. 
16: Select General, and click finish.

The data will now be separated into columns. The tediousness will come if one of the charges is something like "Comcast Cable Company" which will have all three words in separate columns--you'll need to go through that line by line and remove them--but otherwise you should have everything relatively neatly separated.

mtn

  • Handlebar Stache
  • *****
  • Posts: 1343
Bah! The CSV is the easy way. Didn't know that was an option.

tonysemail

  • Pencil Stache
  • ****
  • Posts: 718
  • Location: San Jose, CA
I ran into a variant of this problem too.
Last year, I closed one card which had a few tax related expenses on it (rental property related).
I was having trouble adding up my expenses and it would have been nice to cross check my notes with the credit card statement.
Unfortunately, I can't access the digital copies any longer since that card is now closed.
So moral of the story ... I will download all my transactions before closing a card in the future.


seattlecyclone

  • Walrus Stache
  • *******
  • Posts: 7254
  • Age: 39
  • Location: Seattle, WA
    • My blog
In the past when trying to get data out of a PDF file, I've done something similar to mtn's approach:

1) Copy and paste from PDF to a plain text file.
2) Write a perl script full of regular expressions to grab the data you care about and output to a more reasonable format (CSV, for example).
3) Import CSV into Excel.

It's tedious, but it works.

robartsd

  • Magnum Stache
  • ******
  • Posts: 3342
  • Location: Sacramento, CA
In the past when trying to get data out of a PDF file, I've done something similar to mtn's approach:

1) Copy and paste from PDF to a plain text file.
2) Write a perl script full of regular expressions to grab the data you care about and output to a more reasonable format (CSV, for example).
3) Import CSV into Excel.

It's tedious, but it works.
Yes, DIY text processing by Mastering Regular Expressions (borrow the O'Reilly book from your library) sounds like a mustachian hack (if you're the kind of person who derives happiness from solving problems using computer code). I've done pretty much this for PDF's where the data was not readily available in another format. Regex search and replace is the most important feature of a text editor to me.

If computer code will not lead to increased lifetime happiness, finding the .csv file online is much more efficient at getting the task at hand done - and cut and paste between HTML tables and spreadsheets usually works pretty well when no .csv file is available.

stashing_it

  • Stubble
  • **
  • Posts: 150
Wait! I might have a solution, though it will be tedious.

1: Select the data--not headers, just the data.
2: Copy all of it.
3: Paste it into Notepad (not Word, it can mess with the formatting) 
4: Select one of the spaces that is separating the data--i.e. the space or tab between the description and the amount
5: Copy the selection of the space
6: Click CTRL+H 
7: In the "Find what" box, paste the space you just copied 
8: Without closing the replace box, click back in the notepad. Type/hit tab. Select, and copy that tab.
9: Back in the "Replace with" box, paste the tab.
10: Replace all. 
11: Select all and copy.
12: Open Excel, and paste it in there.
13: Under the data tab on the top ribbon, select "Text to Columns". 
14: Select "Delimited", click next 
15: Select "Tab" under the Delimiters options, and click next. 
16: Select General, and click finish.

The data will now be separated into columns. The tediousness will come if one of the charges is something like "Comcast Cable Company" which will have all three words in separate columns--you'll need to go through that line by line and remove them--but otherwise you should have everything relatively neatly separated.

Actually try pasting into word first.  Plain text will break the formatting and put it all in one column.  Word will  likely  preserve the formatting and paste in as a table, which can be directly pasted into Excel

ShoulderThingThatGoesUp

  • Magnum Stache
  • ******
  • Posts: 3053
  • Location: Emmaus, PA
Acrobat is surprisingly awful at this, so nobody bother buying it for this purpose.

Proud Foot

  • Handlebar Stache
  • *****
  • Posts: 1160
What does the pdf look like?  Depending on the layout of the information you can just do a SaveAs Text and save the pdf as a text file which can then be opened up in excel.  This information is easily broken out by using the Text to Columns tool.  Whether this will work for you depends heavily on the layout of the pdf.

mtn

  • Handlebar Stache
  • *****
  • Posts: 1343
Wait! I might have a solution, though it will be tedious.

1: Select the data--not headers, just the data.
2: Copy all of it.
3: Paste it into Notepad (not Word, it can mess with the formatting) 
4: Select one of the spaces that is separating the data--i.e. the space or tab between the description and the amount
5: Copy the selection of the space
6: Click CTRL+H 
7: In the "Find what" box, paste the space you just copied 
8: Without closing the replace box, click back in the notepad. Type/hit tab. Select, and copy that tab.
9: Back in the "Replace with" box, paste the tab.
10: Replace all. 
11: Select all and copy.
12: Open Excel, and paste it in there.
13: Under the data tab on the top ribbon, select "Text to Columns". 
14: Select "Delimited", click next 
15: Select "Tab" under the Delimiters options, and click next. 
16: Select General, and click finish.

The data will now be separated into columns. The tediousness will come if one of the charges is something like "Comcast Cable Company" which will have all three words in separate columns--you'll need to go through that line by line and remove them--but otherwise you should have everything relatively neatly separated.

Actually try pasting into word first.  Plain text will break the formatting and put it all in one column.  Word will  likely  preserve the formatting and paste in as a table, which can be directly pasted into Excel

No, if that were true it could be pasted directly into Excel without the "middle man". The problem is that in the PDF it is not recognized as a table, just straight text. No matter how you copy it, word, excel, notepad, etc. will not recognize separate columns as anything more than a space. If you're lucky, it will recognize them as more than 1 space and you can easily do a find and replace that leaves descriptions intact.

robartsd

  • Magnum Stache
  • ******
  • Posts: 3342
  • Location: Sacramento, CA
No, if that were true it could be pasted directly into Excel without the "middle man". The problem is that in the PDF it is not recognized as a table, just straight text. No matter how you copy it, word, excel, notepad, etc. will not recognize separate columns as anything more than a space. If you're lucky, it will recognize them as more than 1 space and you can easily do a find and replace that leaves descriptions intact.
I've even seen some PDF's of tables where different columns came out on different lines when copied as text - quite a pain to deal with (at least in the case I came across the pattern of mapping columns to lines was consistent, so I simply had to match multi-line records). The problem with converting a table in a PDF to a table of data is that the PDF format was not intended to be a data format - it is intended to be digital paper. Features added to the PDF format over the years made using it for data interchange a possibility, but only if the PDF is authored in a way that utilizes these features.

stashing_it

  • Stubble
  • **
  • Posts: 150
Wait! I might have a solution, though it will be tedious.

1: Select the data--not headers, just the data.
2: Copy all of it.
3: Paste it into Notepad (not Word, it can mess with the formatting) 
4: Select one of the spaces that is separating the data--i.e. the space or tab between the description and the amount
5: Copy the selection of the space
6: Click CTRL+H 
7: In the "Find what" box, paste the space you just copied 
8: Without closing the replace box, click back in the notepad. Type/hit tab. Select, and copy that tab.
9: Back in the "Replace with" box, paste the tab.
10: Replace all. 
11: Select all and copy.
12: Open Excel, and paste it in there.
13: Under the data tab on the top ribbon, select "Text to Columns". 
14: Select "Delimited", click next 
15: Select "Tab" under the Delimiters options, and click next. 
16: Select General, and click finish.

The data will now be separated into columns. The tediousness will come if one of the charges is something like "Comcast Cable Company" which will have all three words in separate columns--you'll need to go through that line by line and remove them--but otherwise you should have everything relatively neatly separated.

Actually try pasting into word first.  Plain text will break the formatting and put it all in one column.  Word will  likely  preserve the formatting and paste in as a table, which can be directly pasted into Excel

No, if that were true it could be pasted directly into Excel without the "middle man". The problem is that in the PDF it is not recognized as a table, just straight text. No matter how you copy it, word, excel, notepad, etc. will not recognize separate columns as anything more than a space. If you're lucky, it will recognize them as more than 1 space and you can easily do a find and replace that leaves descriptions intact.

I assure you this is true for many newly created PDFs.   See the sample images I attached of copying from a PDF, then what it looks like pasted into Excel 2010 and Word 2010

I took this example from the Kindle book I just wrote on this and related topics here http://www.amazon.com/gp/product/B01CUSXRL8?*Version*=1&*entries*=0   (happy to provide a free PDF if any one wants one)


The other trick to use is to use the Alt key when selecting text from the PDF.   That way you can select on column at a time and could just paste into Excel one column at a time and get around many problems

stashing_it

  • Stubble
  • **
  • Posts: 150
No, if that were true it could be pasted directly into Excel without the "middle man". The problem is that in the PDF it is not recognized as a table, just straight text. No matter how you copy it, word, excel, notepad, etc. will not recognize separate columns as anything more than a space. If you're lucky, it will recognize them as more than 1 space and you can easily do a find and replace that leaves descriptions intact.
I've even seen some PDF's of tables where different columns came out on different lines when copied as text - quite a pain to deal with (at least in the case I came across the pattern of mapping columns to lines was consistent, so I simply had to match multi-line records). The problem with converting a table in a PDF to a table of data is that the PDF format was not intended to be a data format - it is intended to be digital paper. Features added to the PDF format over the years made using it for data interchange a possibility, but only if the PDF is authored in a way that utilizes these features.

Yeah, with older PDFs that are essentially just images then you lose a lot of the selecting text features.   The things I would try are
  • Alt select text -  to copy one column at a time
  • Copy the image into OneNote as an Image, and use the "Copy Text From Picture" option to get the text via optical character recognition.   Sometimes it helps to crop the image into columns so you are only copying 1 column of text at a time

Dicey

  • Senior Mustachian
  • ********
  • Posts: 22319
  • Age: 66
  • Location: NorCal
Thanks for all the suggestions. I'll sit down with them over the weekend and see what happens.

CmFtns

  • Pencil Stache
  • ****
  • Posts: 583
  • Age: 33
  • Location: Melbourne, Fl
Reading through the replies and checking my own account I concluded that you should not do any of the copy and pasting 10 step methods... @andy85 had the right answer... you can download to .csv (excell format) from the Account Activity page... I confirmed that you can do this on my chase account