Author Topic: Data analysis or statistics question  (Read 3543 times)

wienerdog

  • Pencil Stache
  • ****
  • Posts: 587
Data analysis or statistics question
« on: June 01, 2016, 06:21:36 PM »
Not sure where to put this but since there are a ton of smart folks here I figured this would get the most looks in here.  If I have a spreadsheet and I am looking at income each month and expenses each month then I use the average to calculate over the months that are entered so far it seems like I am getting bad data because of 2 months out the year a salary person gets 3 paychecks and every so often you have one bad month of spending. With only 10 or 11 months of data the average seems skewed.  Now I know over 24 months or 36 months those "bumps" will average out but it seems to upset the data if you have a small sample size and another year or two seems like a long time to smooth out the results.

I did some looking around and it seems like the harmonic mean is a better way to represent the data or I got a gut feel that it looks like it more closely represents what I am actually spending or bringing in.  Is this the correct way of going about it or is there a better method? 

abiteveryday

  • Stubble
  • **
  • Posts: 130
  • Location: Seattle
Re: Data analysis or statistics question
« Reply #1 on: June 01, 2016, 06:50:05 PM »
I generally budget around pay periods, and then have months where I get a windfall of sorts with an extra pay period.    Not saying it's the best method but it works for me.

maizefolk

  • Walrus Stache
  • *******
  • Posts: 7436
Re: Data analysis or statistics question
« Reply #2 on: June 01, 2016, 06:51:41 PM »
What about just using the median instead of the mean/average? It'd be much more resilient to a few outlier months in terms of either too much income or too much spending.

bryan995

  • Pencil Stache
  • ****
  • Posts: 595
  • Age: 37
  • Location: California
Re: Data analysis or statistics question
« Reply #3 on: June 01, 2016, 10:39:49 PM »
median or a trimmed mean (IQM?) could work.  Or if you want to get fancy, something like lowess?
http://research.stowers-institute.org/efg/R/Statistics/loess.htm

Though in your case you do actually want to account for the occasional triple paycheck month, no?  If you exclude it, then you 'average' salary will not be equal to your true yearly salary and then your spending will be off. 

K-ice

  • Pencil Stache
  • ****
  • Posts: 982
  • Location: Canada
Re: Data analysis or statistics question
« Reply #4 on: June 01, 2016, 11:10:51 PM »
As others have said you may want to look at the median.

Or at least compare the mean to the median. If they are different then you know there are outliers.

bobechs

  • Handlebar Stache
  • *****
  • Posts: 1065
Re: Data analysis or statistics question
« Reply #5 on: June 01, 2016, 11:32:57 PM »
To a man*with an HP35S, everything is a statistics problem.

How about adjusting all of your figures to operate on a weekly basis rather that vexing yourself with Gregorian calendar gyrations?  Then you would have a 52 week year of exactly seven days each.  I know that is just a teensy bit off the great sidereal wheel of solar time, but hey who is counting the odd quarter-day or leap second now and then?

Then it is just arithmetic.


*This is not expressly intended to offend the standards of gender neutrality held so dear hereabouts, but rather to mirror the folk aphorism "To a man who has only a hammer..."  Please don't hurt me.

Playing with Fire UK

  • Magnum Stache
  • ******
  • Posts: 3449
Re: Data analysis or statistics question
« Reply #6 on: June 02, 2016, 02:06:12 AM »
Would you consider putting the 3rd paycheck in a month straight into savings (counting it in your annual figures but not in any month).

If the bad spending months are due to discretionary spending, I'd keep them in to give myself a face punch. If they are due to lump sum payments for annual expenses, (annual insurances, that month when you have a load of birthday presents to buy, membership fees), you could account for them on a monthly basis (either putting money away into a sinking fund, or just splitting the cost across the year).

Or have a category for 'one offs' and include that in your annual figures but not monthly figures (both annual payments and extra paychecks).

I would think about what outputs you are looking for (saving rate, annual expenses, etc) and calculate each in the most appropriate way. If you are working out monthly expenses to calculate your 4% number, you need to have all your annual spending in there somewhere, excluding genuine one offs (your own wedding?) but including a provision for home maintenance and annual fees. If you are working out monthly expenses to see if you had a good or bad month, it makes more sense to annualise the lumpy but necessary spending.

Monkey Uncle

  • Handlebar Stache
  • *****
  • Posts: 1742
  • Location: West-by-god-Virginia
Re: Data analysis or statistics question
« Reply #7 on: June 02, 2016, 04:39:03 AM »
If you're salaried, why bother tracking actual monthly income?  You pretty much know already what it is going to be, unless you get an unexpected raise.

For expenses, I use a spreadsheet and accompanying pivot table to keep a running total of expenses for each category.  Then monthly spending to date is calculated as (total amount)/(no. days elapsed in the year so far)*30.42.  The 30.42 is the average number of days in a month (365/12).  For total expenditures and the categories in which I have fairly regular expenditures (like food, for example), I find it only takes about 3 months for things to mostly smooth out.  The more irregular categories are just going to be more lumpy.  I don't really see that as a problem; it's the total monthly and annual spend that really matters.

wienerdog

  • Pencil Stache
  • ****
  • Posts: 587
Re: Data analysis or statistics question
« Reply #8 on: June 02, 2016, 10:07:08 AM »
Thanks for the replies.  The median does seem to do a little better job.  It is really close to the harmonic mean value which seems a little closer to actual.

Even with salary I still might do a side gig one month and bring in $500 extra or sell an item here and there for an extra $2000 one month.  Plus I throw the tax refund in as income on the month it comes back.

For expenses, I use a spreadsheet and accompanying pivot table to keep a running total of expenses for each category.  Then monthly spending to date is calculated as (total amount)/(no. days elapsed in the year so far)*30.42.  The 30.42 is the average number of days in a month (365/12).  For total expenditures and the categories in which I have fairly regular expenditures (like food, for example), I find it only takes about 3 months for things to mostly smooth out.  The more irregular categories are just going to be more lumpy.  I don't really see that as a problem; it's the total monthly and annual spend that really matters.

I might try this method but it already does something similar.  Now it just takes the amount each month in a category and averages those together for the amount of months that have occurred and then multiplies it back out to get the yearly spending.  It only updates 12 times a year where I suppose yours is a little more granular.  I just live with those bumps as food and other items like that are pretty steady.  I noticed auto maintenance is running high this year because I rebuilt the front end, brakes and new front tires on the truck in March which was ~$800.  The other 4 months have $60 total with 2 months at $0. The average is $177 so far so it looks like $2131 yearly spending.  By the end of the year it will be back to normal I guess. 

Tjat

  • Pencil Stache
  • ****
  • Posts: 570
Re: Data analysis or statistics question
« Reply #9 on: June 02, 2016, 10:20:48 AM »
I'm confused why the average is inaccurate. For ease, I focus on my annual finances, tracked monthly. The average does bounce a bit each month, but over time, is a good representation of my expenditures. I worry that it sounds like you are looking to exclude non-reoccurring expenses that should be budgeted for. I assume the purpose of your tracking is to budget in the now and project what cash flow you'll need in retirement. To fiddle with your actual long term results seems to invite misrepresentations and lots of uncessary time spent on moving numbers around.

CmFtns

  • Pencil Stache
  • ****
  • Posts: 583
  • Age: 33
  • Location: Melbourne, Fl
Re: Data analysis or statistics question
« Reply #10 on: June 02, 2016, 10:33:11 AM »
I'm not sure what is the big problem here...

The average you make per month is salary/12
10 months out of the year you will actually make ~92% of your monthly average
2 months you will actually make ~138% of your monthly average

There will be no month where you will only get 1 paycheck so if you budget your expenses for less than 2 paychecks or in other terms budget for less than 92% of your calculated average then there will always be money. Your income will have a big "bumb" two months a year but who cares?


Personally I go with the track not budget method... I don't have allocations for each category because I always spend as little as possible in all categories.

« Last Edit: June 02, 2016, 10:35:58 AM by comfyfutons »

milliemchi

  • Bristles
  • ***
  • Posts: 316
Re: Data analysis or statistics question
« Reply #11 on: June 02, 2016, 11:09:46 AM »
I only look at my budget in yearly periods. All data within the current year is skewed due to ~$6-7000 of travel expenses during the summer months anyway, so it would be hard to track monthly. But tracking a year at a time works really great, as this is also the period for planning retirement contributions, etc. So don't micro-manage it and see if that works for you.

Jouer

  • Pencil Stache
  • ****
  • Posts: 501
Re: Data analysis or statistics question
« Reply #12 on: June 02, 2016, 01:09:23 PM »
I think you should go with a simple line graph - one line for income, one for expenses. Looking at the distribution will give you better information than the average, on a monthly basis.

aceyou

  • Handlebar Stache
  • *****
  • Posts: 1669
  • Age: 41
    • Life is Good - Aceyou's Journal
Re: Data analysis or statistics question
« Reply #13 on: June 02, 2016, 06:26:17 PM »
Thanks for the replies.  The median does seem to do a little better job.  It is really close to the harmonic mean value which seems a little closer to actual.

Even with salary I still might do a side gig one month and bring in $500 extra or sell an item here and there for an extra $2000 one month.  Plus I throw the tax refund in as income on the month it comes back.

For expenses, I use a spreadsheet and accompanying pivot table to keep a running total of expenses for each category.  Then monthly spending to date is calculated as (total amount)/(no. days elapsed in the year so far)*30.42.  The 30.42 is the average number of days in a month (365/12).  For total expenditures and the categories in which I have fairly regular expenditures (like food, for example), I find it only takes about 3 months for things to mostly smooth out.  The more irregular categories are just going to be more lumpy.  I don't really see that as a problem; it's the total monthly and annual spend that really matters.

I might try this method but it already does something similar.  Now it just takes the amount each month in a category and averages those together for the amount of months that have occurred and then multiplies it back out to get the yearly spending.  It only updates 12 times a year where I suppose yours is a little more granular.  I just live with those bumps as food and other items like that are pretty steady.  I noticed auto maintenance is running high this year because I rebuilt the front end, brakes and new front tires on the truck in March which was ~$800.  The other 4 months have $60 total with 2 months at $0. The average is $177 so far so it looks like $2131 yearly spending.  By the end of the year it will be back to normal I guess.

I'd be careful of using the median.  You generally use the median if outliers are not representative of the overall data in a meaningful way.  Like, suppose Bill Gates moves to a small town and it makes it look like the average net worth in the town is now 50 million dollars, but that's not at all representative of anyone.  So, you'd just use the median.  But in your budget the outliers represent things that are very meaningful to your direct budget like a $5000 car purchase or the extra paycheck or the trip you booked all in one month.  You don't want to lose those data points IMO. 

That said, I don't have a great answer for you, because I go monthly with my budgeting also and just have to deal with the fact that 2 months have three pay periods. 

 

Wow, a phone plan for fifteen bucks!