Author Topic: Big Data & Data Analytics  (Read 6266 times)

goalphish2002

  • Bristles
  • ***
  • Posts: 290
Big Data & Data Analytics
« on: March 29, 2019, 09:09:08 AM »
I realize I sort of highjacked another thread and apologize.  I am just going to start this one.  I have a B.S. in Consumer Economics and an M.S. in Accounting.  I currently work as a Senior Project Controls Analyst at an engineering firm.  I am interested in where this trend might go, and how I may get into it.  I am not interested in getting another university degree.  My employer will not pay for this, and I don't have the time or inclination for that.  However, I am interested in a boot camp, web training, and self-teaching. 

Is this viable? 
Anyone in this field?

jps

  • Bristles
  • ***
  • Posts: 256
Re: Big Data & Data Analytics
« Reply #1 on: March 29, 2019, 09:25:38 AM »
Are you thinking of data science, or just data analytics? I think most people who call themselves "data scientists" (there is no science of data, btw) either have a master's degree or are self-taught. The difference might be that data scientists use more statistical modeling, e.g. random forest sampling, t-tests, regression modeling, etc.

I'm a data analyst at an institution w/ ~2000 employees, and every tool that I use could pretty easily be self-taught with enough initiative, though I learned them all on the job. There are lots of free web sources on everything that you would need to develop these skills - the tough part is getting someone to pay you money to do it if you aren't already doing it. I'd recommend reading about SQL and R or Python. If you are looking for a field where you communicate the story of the data, it's helpful to have another visualization tool under your belt such as Tableau or Power BI.

I think it's very viable to learn these tools yourself - but don't have experience in switching career fields to get to it. Would you have any opportunity to apply some data analytics skills in the position you are currently in?

Source: jumped into data/financial analytics after BA in Economics.

goalphish2002

  • Bristles
  • ***
  • Posts: 290
Re: Big Data & Data Analytics
« Reply #2 on: March 29, 2019, 09:39:31 AM »
Are you thinking of data science, or just data analytics? I think most people who call themselves "data scientists" (there is no science of data, btw) either have a master's degree or are self-taught. The difference might be that data scientists use more statistical modeling, e.g. random forest sampling, t-tests, regression modeling, etc.

I'm a data analyst at an institution w/ ~2000 employees, and every tool that I use could pretty easily be self-taught with enough initiative, though I learned them all on the job. There are lots of free web sources on everything that you would need to develop these skills - the tough part is getting someone to pay you money to do it if you aren't already doing it. I'd recommend reading about SQL and R or Python. If you are looking for a field where you communicate the story of the data, it's helpful to have another visualization tool under your belt such as Tableau or Power BI.

I think it's very viable to learn these tools yourself - but don't have experience in switching career fields to get to it. Would you have any opportunity to apply some data analytics skills in the position you are currently in?

Source: jumped into data/financial analytics after BA in Economics.

I am thinking of a data analyst.  Using this at my current job, I believe so.  I do think I could use this in my current position.  I guess part of it will be looking at the various tools and seeing how I can use them (once I learn).  I assume you have to buy these programs.  I would have to pitch my employer on that.  SQL can be used in Excel, right?  I know I sound like a total newb to any of this, I am. 

MonkeyJenga

  • Walrus Stache
  • *******
  • Posts: 8894
  • Location: the woods
Re: Big Data & Data Analytics
« Reply #3 on: March 29, 2019, 10:15:25 AM »
R/rstudio and Python are free. SQL and tableau have free options for personal use, you should start with those and take some of the many free tutorials. SQL has many interfaces, try SQL server or one of the online sandboxes to play around with small datasets.
« Last Edit: March 29, 2019, 10:21:12 AM by MonkeyJenga »

jps

  • Bristles
  • ***
  • Posts: 256
Re: Big Data & Data Analytics
« Reply #4 on: March 29, 2019, 10:16:40 AM »
Are you thinking of data science, or just data analytics? I think most people who call themselves "data scientists" (there is no science of data, btw) either have a master's degree or are self-taught. The difference might be that data scientists use more statistical modeling, e.g. random forest sampling, t-tests, regression modeling, etc.

I'm a data analyst at an institution w/ ~2000 employees, and every tool that I use could pretty easily be self-taught with enough initiative, though I learned them all on the job. There are lots of free web sources on everything that you would need to develop these skills - the tough part is getting someone to pay you money to do it if you aren't already doing it. I'd recommend reading about SQL and R or Python. If you are looking for a field where you communicate the story of the data, it's helpful to have another visualization tool under your belt such as Tableau or Power BI.

I think it's very viable to learn these tools yourself - but don't have experience in switching career fields to get to it. Would you have any opportunity to apply some data analytics skills in the position you are currently in?

Source: jumped into data/financial analytics after BA in Economics.

I am thinking of a data analyst.  Using this at my current job, I believe so.  I do think I could use this in my current position.  I guess part of it will be looking at the various tools and seeing how I can use them (once I learn).  I assume you have to buy these programs.  I would have to pitch my employer on that.  SQL can be used in Excel, right?  I know I sound like a total newb to any of this, I am.

Hey no worries. Most of these tools are free. There are free SQL environments, like HeidiSQL and PostgreSQL. R and Python are both free/open-source. Pretty much everywhere uses Excel. The difficulty with trying to learn is just having data to use. There are multiple big and popular sample datasets that you can use, like Contoso or Northwind. If your employer doesn't use any databases, it would be harder to incorporate SQL into what you do, but still very easy to use Excel/others for data analytics. SQL the language for querying data that is stored in a database, so if most of your company's data is on excel spreadsheets that's fine.

Excel can deploy SQL to query from a database, yes, but it's not a SQL environment.

Let me know what other questions you have. This is one of the very few things on the forum that I feel like I actually know about. If you want to PM too, I'm happy to share my work experience/what I do/other stuff.

brute

  • Pencil Stache
  • ****
  • Posts: 691
Re: Big Data & Data Analytics
« Reply #5 on: March 29, 2019, 10:54:19 AM »
Data scientist here. Feel free to ignore the next sentence.

I always suggest getting a graduate degree in computer science with a focus on Artificial intelligence and machine learning, it will change the way your mind works and elevate you to a new level.

Ok, that's out of the way. Here's my suggestion Start working with python/pandas. These have better big data plug-ins than R. R traditionally runs in memory only and can't handle large data sets. There is some work going on the the Spark/Hadoop world that is trying to let R run in a distributed environment, but it isn't ready yet. (At least not the last time I tried it)

There's a ton of great training out there for R and Python though. Data Camp is my favorite, since it gives an excellent zero to hero walk through for pretty cheap. You even get certificates at the end to show you did something. For $25-$30 a month, if you can put in an hour or two a day, it's worth it to show your boss (or future boss) that you've been through a course.

I take (minor) issue with the idea that there is no science of data. It's usually called statistics. Combine stats with computer science and machine learning and a bit of artistic ability and you have data science. I do hate how many people call themselves data scientists though.. it cheapens the field. For those of us who can predict cancer before doctors can find it or work with petabytes of unstructured data streaming in every day, well, it's like equating a line cook with a Michelin chef. Not that we have egos or anything...

The questions to ask yourself are:
Why do you want to get into data analytics? Money, love of data, bored in your other job? If it's because you love data, go for it. Don't let anything stop you. If its anything else, go for it anyway. It's fun, and it can't hurt to have it in your tool chest.

How good are you with statistics? If you remember that they exist but couldn't explain regression or a p-value, start there. Stats is the core of all analytics.

Do you like raw math, creating visual representation of data, or something in between? Raw math and coding would lead you down the machine learning and advanced analytics path. Visualization often tops out pretty quickly, but if you love it, it pays well enough and there's always work in it. (Especially if you start contracting in Tableau/Spotfire/Microstrategy/Qlik etc). If you like both size and are good at communicating complex concepts in a simple way to people who have almost no time but make huge decisions... well, I need another data scientist on my team. Come on over. Seriously though, if you like both sides of it, there's always someone who needs those skills.

How much time do you have per week? This stuff starts off pretty easy (especially given your background) but the learning curve gets pretty steep once you jump into the coding. It flattens out again, but you'll want to spend several hours a week practicing so that it becomes second nature rather than fighting to get back into it each weekend.

Probably other stuff, but that's the main things for now. Always happy to answer any questions i can, data science and analytics are sort of my life.

FINate

  • Magnum Stache
  • ******
  • Posts: 3115
Re: Big Data & Data Analytics
« Reply #6 on: March 29, 2019, 12:02:59 PM »
What @brute said.

I FIREd about 4 years ago, so take this with a big grain of salt because the industry changes rapidly.

Big Data and Data Analytics are nebulous terms meaning different things to different people. The broad nature of the work further complicates the semantics.

What I've observed in the real world are Big Data pipelines with different skill sets at different points along the way. A simplified view might look something like:
  • Source data: one or more NoSQL databases, log files, or other artifacts from business operations.
  • Hadoop (or similar) process for aggregating, anonymizing, cleaning/normalizing raw data. If there's any ML/AI then this is where this happens.
  • Intermediate (i.e. non-production) NoSQL or SQL database.
  • Integration with Tableau or whatever Business Intelligence (BI) tool is used..
  • Visualization and BI Engineering.
There is no standard way of doing this, so YMMV.

Steps 1-2 are more similar to Software Engineering with an emphasis on statistics and ML/AI. This is where Data Scientists live. The big money is here because a) it's difficult b) cross discipline and c) super valuable.
Step 3 is a mix of Database Design and Database Administration with a sprinkling of coding/scripting.
Step 4 is a mix of System Administration and some coding to write plugins/adapters to integrate data with the BI tool. This person has expertise with the backend of the specific BI and how to integrate with databases.
Step 5 is mostly a matter of understanding statistics, data visualization, and specialized knowledge about the business intelligence product.

A small company may do something like contract out SWE work to automate getting the raw data into a useable intermediate state, and then have one person (full or part-time) to develop visualization and keep the entire system running.

A very large company will often have teams of highly specialized people at each step of the pipeline. There will typically be something like a Program Director and/or Program Manager, Project Manager overseeing the entire project and business/functional requirements. And there may be an Ops team involved to keep things running smoothly.

There are lots of gradations in between depending on the company size and the specific business needs.

Given your educational background and goals as stated here, I would suggest starting with Visualization or BI Engineering. If your company is already using a business intelligence tool then start learning that. Work through their tutorials, take online classes. Ask around at your company about what they're looking for and if openings are coming up. See if you can find a way to create visualizations or do BI analysis for your existing line of work.

If you're at a small company, you may be able to work your way into a "jack of all trades, master of none" type job where you dabble in many areas of the pipeline. If you're at a big company, you may find it's possible to work your way down the stack over time as you pick up new skills and technologies while on the job.

JZinCO

  • Pencil Stache
  • ****
  • Posts: 705
Re: Big Data & Data Analytics
« Reply #7 on: March 29, 2019, 01:14:21 PM »
Hi can I jump in on this?
I'm an academic, working on my phd in forestry. Interested in transitioning to data science, particularly in an industry where I can leverage my educational background. So for example, there are startups who are using Google Earth Engine and remote sensing analytics to identify where, geographically, the highest return on investments are for purchasing carbon offset credits.

Two pillars of forestry are biometry and sampling, so I have a decent head for frequentist statistics. I know application of basic stats: imputations like KNN and random forests, kriging, geostatistics, point pattern analysis, time series, crossvalidation, classification and regression trees, multivariate ordination (PCA, DCA, NMDS, etc), linear/nonlinear regression, categorical tests. What I don't know are bayesian stats or the methods that data science types prefer (boosting, LASSO, NLP, recommender systems, etc). I've taken at least half a dozen stats grad courses and they don't mention those.

I use R and Python daily (often a half dozen other langs), understands computer/network/internet architecture with basic competency (enough to use compute servers and write API clients), and so I have no problem writing fast efficient code (for an interpreted lang). data.table/tidyverse/rhadoop/map-reduce FTW. I even have a package on CRAN (that gets over 1K downloads/month which I feel is good for how niche it is).

Thing is, I've never taken a class or read a book on computer science or math. No, I didn't even take college algebra or trig. Reason I'm asking is that data scientist interviews seem to be alot like software dev, in that algorithms is a big part of it. Hash? linked list? sorting algorithms? that's all gibberish to me.
But, for example, at work I have sifted through tb of data on a SQL database server, fit families of quantile regression models, backtested for the best models from those families on a wholly separate dataset, and then validated my models against similar models on test data. The goal was to create predictive models and hand it off to software dev folks who put it on a website. No clue if that is 'data science' or what exactly. At the time, my role was forest biometrist.

I feel like I could perform well in a data science (or data analyst or GIS analyst) position, but am I lacking by not knowing math or computer science fundamentals?  If so, how would I go about self-teaching? I have some calculus and discrete math ebooks that I haven't cracked open yet.
I also know some basic Django and Shiny. Given that everything is a web app these days, should I dive deeper into this? I volunteer for a local Code for America group and everything we produce is a web app that often has a Python backend and JavaScript front-end. That (and my last job) gives me the impression I should keep practicing on projects that involve some web-based product. Then again, I know alot of analyses turn into white papers or slide decks. So my question is, what sort of presentation formats do yall use most? static HTML? markdown/latex PDF? slides? dynamic websites with things like leaflet or plotly?
Does the fact that I present at conferences and publish articles to various audiences help? I've heard that applicants from traditional math/computer science backgrounds lack communication skills (eg communicating complex information simply) across verbal/written media
« Last Edit: March 29, 2019, 02:12:02 PM by JZinCO »

FINate

  • Magnum Stache
  • ******
  • Posts: 3115
Re: Big Data & Data Analytics
« Reply #8 on: March 29, 2019, 02:22:55 PM »
I'm an academic, working on my phd in forestry...
<snip>
Thing is, I've never taken a class or read a book on computer science or math. No, I didn't even take college algebra or trig.
<snip>

I'm having difficulty reconciling these two quotes. The description doing sounds like something in the sciences yet you have no college level math?

I use R and Python daily (often a half dozen other langs), understands computer/network/internet architecture with basic competency (enough to use compute servers and write API clients), and so I have no problem writing fast efficient code (for an interpreted lang). data.table/tidyverse/rhadoop/map-reduce FTW.
<snip>
Reason I'm asking is that data scientist interviews seem to be alot like software dev, in that algorithms is a big part of it. Hash? linked list? sorting algorithms? that's all gibberish to me.

How do you know you're writing fast, efficient code if you don't know about hash tables, linked lists, or sorting algorithms? These are some of the most basic building blocks of CS. I don't want to seem rude, but this is a case of not even knowing what you don't know. This doesn't mean you're not smart or capable, just that you have a huge blind spot. You better know the runtime complexity of your code before you fire off a large MapReduce with an O(n^2) (or worse) algorithm to a cluster with thousands of nodes.

I feel like I could perform well in a data science (or data analyst or GIS analyst) position, but am I lacking by not knowing math or computer science fundamentals?  If so, how would I go about self-teaching? I have some calculus and discrete math ebooks that I haven't cracked open yet.

Yes, you very likely need math and CS fundamentals to be a data scientist. You really cannot do proper science without a solid grasp of mathematics, and you cannot effectively deal with very large datasets and large distributed algorithms without CS fundamentals. I don't have suggestions for how to self-teach these, maybe others here have ideas.

Does the fact that I present at scientific conferences and publish articles help? I've heard that applicants from traditional math/computer science backgrounds lack communication skills across verbal/written media

I don't know where you heard this, but it's a tired and outdated stereotype that's wildly inaccurate. I know many eloquent CS/Math folks who are great at communicating verbally across a wide range of group sizes, and are excellent writers. The caricature of the socially awkward nerd sitting in a dark room writing code doesn't really exist anymore (not sure it ever really existed). Software is very much team activity that requires excellent communication and coordination - it's simply impossible to write and maintain millions of lines of code any other way. I'll also add, that many of the engineers I've worked with are true polymaths and also create beautiful poetry, art, music, and have a wide range of skills and interests.
« Last Edit: March 29, 2019, 02:25:13 PM by FINate »

FInding_peace

  • 5 O'Clock Shadow
  • *
  • Posts: 25
Re: Big Data & Data Analytics
« Reply #9 on: March 29, 2019, 02:36:47 PM »
@JZinCO, if you have the PhD academic background doing experimental data analysis, which it sounds like you do, but are missing some software development techniques, CS fundamentals, and familiarity with some specific tools of the trade, I'd suggest a bootcamp like Insight Data Science or the Data Incubator.  Both are designed for PhDs with data analysis experience like yours, who want to transition from academia to industry data science roles.  Both are free to take (they make their money off recruitment fees companies pay to talk to you when you're finished) and work to round out your skill set so you can pass an interview question on linked lists or a software dev coding challenge, for example.  As a bonus, they'll generally hook you up with interview opportunities at various companies and coach you through landing your first data science job.  They also usually have a great alumni network of working data scientists that you will join. 

You'd certainly be a good candidate for a data scientist position with your background in hands-on R/Python data analysis.  It's just a matter of getting over some of the snags you might potentially hit with an academic-only background, like those you mentioned. 

For the original OP, I think FINate's advice was right on the money. 

As background, I've been working in data science for about 4 years now, having transitioned into it after doing an applied math PhD. 

JZinCO

  • Pencil Stache
  • ****
  • Posts: 705
Re: Big Data & Data Analytics
« Reply #10 on: March 29, 2019, 03:13:37 PM »
Hey FINate, I'll address some of your points:
-  If you think all sciences require moderate to heavy math, I have a great article from E.O. Wilson to share with you. Yeah, the deepest math I have to use is solve for geometry, calculate vectors, a system of equations, matrix multiplication, integer/real number arithmetic, trig, transform between scales, etc. These aren't any harder than what I learned in jr high and my skills are sufficient to muster a C in trig and algebra. [sidebar: I looked at my unis undergrad program and at least they have to take a calc for bio course].
Even the stats courses I have taken that "require" calculus, end up using linear approximations. Sans statistics, most of the math in my field is land surveying. In ecology, with exception of population dynamics, 90% of the requisite numeracy is stats. Part of it is that we are dealing with problems that are looking for solutions in hyperdimensional space where the mechanics are not fully understood, the 'background' matters, control is nigh impossible, and uncertainty abounds. That's why statistics is so useful but mathematics are not. I can show you my papers that employ generalized linear models with a log-link, spatially-varying weights, and a poisson error distribution. Part of being in an applied field means that the basics are embedded so that 'higher-level' research can be done. I mean, even our rangefinders can calculate tree volume for us. Heck, my primary tool is a CFD model. But no, I don't know calc so it's unlikely that I can verify a CFD program. However, I can validate the model (using stats) better than a typical mech eng (and have published those papers in engineering journals).

-I take your point that algorithms might be a blind spot. It's on my list to address. I would love to know more; alot of it has come through trial/error and emulating what I see out there. What I mean to say is taking R for example, map > apply > for loop with vectorization > for loop without, while also taking into consideration the data type and structure. I've heard of a particularly good MOOC course on algorithms that I will test out. Seems I haven't gotten exposure to any of it at all in any online data science coursework (usually they just cover basics like syntax, control flow, elementary stats, viz).

- It may trite, but every data science panel with folks from team leads to c-level execs at tech/data companies almost always say communication skills are under emphasized to grads/prospective employees and deficient. I've heard it so many times I'm starting to think it as an uninteresting observation regardless of its veracity.

Thanks for your input, if but a little haughty
« Last Edit: March 29, 2019, 04:05:37 PM by JZinCO »

FINate

  • Magnum Stache
  • ******
  • Posts: 3115
Re: Big Data & Data Analytics
« Reply #11 on: March 29, 2019, 04:04:07 PM »
I apologise for my gruffness. Suspect l don't really understand your background. In any case, if you can manage the math in CS (not terribly difficult in most cases) then you should have no problem filling in the gaps. If you're able to audit an Intro to Algorithms and Data Structures class this would be time well spent.

Presenting technical information to higher level management and especially the C-suite is a specialize skill in itself. It's valuable, but narrow enough that schools don't focus on it.

JZinCO

  • Pencil Stache
  • ****
  • Posts: 705
Re: Big Data & Data Analytics
« Reply #12 on: March 29, 2019, 04:11:27 PM »
I still like getting feedback however it is framed so I appreciate the original response.

Presenting technical information to higher level management and especially the C-suite is a specialize skill in itself. It's valuable, but narrow enough that schools don't focus on it.
I'm sorry I meant, people with the 'authority' (ie the leaders in their field) to speak on what skills data scientists should have, mark communication as deficient. Though, maybe you are saying they find it deficient because their needs are peculiar. Again, it may just an easy to thing to say in a crowd without controversy.

Thing about math in sciences is that it's mastery is highly variable ranging from fields like animal science to physics.
I'll take that algorithms course (considering its series is the highest rated class on coursera) unless I hear it lacks the rigor and depth of a brick-and-mortar course. And considering that the aim isn't computer science but data 'science', between calc/linear algebra/discrete math, would you suggest self study on the whole series, picking one or finding 'math for computer science' primer sort of book ?
« Last Edit: March 29, 2019, 06:51:07 PM by JZinCO »

FINate

  • Magnum Stache
  • ******
  • Posts: 3115
Re: Big Data & Data Analytics
« Reply #13 on: March 29, 2019, 04:44:22 PM »
Discrete math is probably going to be the most bang for your buck for CS fundamentals, though there's also some calc and other math in certain areas. I'll let the folks working in data science chime in on their area.

SaucyAussie

  • Bristles
  • ***
  • Posts: 328
  • Location: Raleigh, NC
Re: Big Data & Data Analytics
« Reply #14 on: March 29, 2019, 04:55:05 PM »
Following this with interest, I hope to move towards this kind or career, although I should FIRE before I ever get there.

I have been a "programmer" of sorts for 20 years, but with no formal training, so I have been taking some college level CS classes.  Once I hit the pre-requisites, I may do a graduate program like this one:
https://www.csc.ncsu.edu/academics/graduate/degrees/dsf.php

When I get there, I'll come back and ask you all which electives I should take.

For brushing up on math skills, I really like Khan Academy.  It will take you from where ever you are at, right up to Calculus, Statistics and beyond.  And it's free!
« Last Edit: March 29, 2019, 04:57:47 PM by SaucyAussie »

FINate

  • Magnum Stache
  • ******
  • Posts: 3115
Re: Big Data & Data Analytics
« Reply #15 on: March 29, 2019, 05:03:26 PM »
One thing I should add: Not sure how realistic it is to study discrete math without a calculus foundation. It was a very long time ago, but recall that calc I,II, and III were prerequisites, assume there was a sound pedagogical reason.
« Last Edit: March 29, 2019, 05:17:09 PM by FINate »

JZinCO

  • Pencil Stache
  • ****
  • Posts: 705
Re: Big Data & Data Analytics
« Reply #16 on: March 29, 2019, 07:31:12 PM »
I hope I didn't detract from OP and hopefully added.

For the record, I'll probably dip my toes in with a book, Doing Math with Python, which seems to cover basics of graphs, sets, derivatives and integrals. Then, I found a highly recommended book, Discrete Math and its Applications, that requires calculus in only a few spots. It's much more textbook-y. I'll be working my way through it this Summer. Mostly code free but has some Maple. Both are free and available online if there are others in a similar situation as me.

goalphish2002

  • Bristles
  • ***
  • Posts: 290
Re: Big Data & Data Analytics
« Reply #17 on: March 30, 2019, 04:53:15 PM »
Thanks to everyone for their replies.  I am going to read thoroughly and reply!

katsiki

  • Handlebar Stache
  • *****
  • Posts: 2015
  • Age: 43
  • Location: La.
Re: Big Data & Data Analytics
« Reply #18 on: April 02, 2019, 06:14:33 PM »
R/rstudio and Python are free. SQL and tableau have free options for personal use, you should start with those and take some of the many free tutorials. SQL has many interfaces, try SQL server or one of the online sandboxes to play around with small datasets.

Thanks for mentioning Tableau having a free option.  I think I found it...  For anyone else interested, go here:

https://public.tableau.com/s/

Shwaa

  • 5 O'Clock Shadow
  • *
  • Posts: 70
Re: Big Data & Data Analytics
« Reply #19 on: April 02, 2019, 10:00:37 PM »
I work in Data Warehouse as a Test Automation Engineer.  Heavy SQL/Python/Pandas and just now starting into cloud/AWS.  I also use Tableau to visualize test run results. It's a good field to be in, I have a marketing degree so this is way different than what I went to school for years ago.


Joel

  • Pencil Stache
  • ****
  • Posts: 887
  • Location: California
Re: Big Data & Data Analytics
« Reply #20 on: April 02, 2019, 11:24:43 PM »
I’m a huge fan of Power BI, in addition to the built in Microsoft Excel add-ons Power Pivot and Power Query. Simple way for someone to start getting their feet wet with data analytics, instead of make shift excel analysis typically performed.

Parizade

  • Handlebar Stache
  • *****
  • Posts: 1028
  • Location: Variable
  • Happily FIREd
Re: Big Data & Data Analytics
« Reply #21 on: April 03, 2019, 06:11:59 AM »
Here's a white paper you might find interesting

http://go.iiba.org/driving-industry-transformation

Archipelago

  • Pencil Stache
  • ****
  • Posts: 781
  • Age: 29
  • Location: NH
Re: Big Data & Data Analytics
« Reply #22 on: April 03, 2019, 06:21:44 AM »
Following. I was just thinking of starting a topic on this last night...funny.

I'm a Data Integrity specialist, 3 months into my job. I curate our customer database to ensure proper data accuracy within accounts. I also run different reports for profitability (but I'm not the one building the reports/coming up with the methods). I'm looking for ways to add value to my set of skills.

Anyone have suggestions specific to my type of role? The info above is fairly vague, so I'm happy to answer any questions on more detail.

wbarnett

  • 5 O'Clock Shadow
  • *
  • Posts: 58
  • Location: Denver, CO, USA
Re: Big Data & Data Analytics
« Reply #23 on: April 05, 2019, 11:25:16 PM »
I'm a data scientist, by way of a graduate degree in statistics. Most of the people in the field seem to have quantitative backgrounds, even if it's not in pure math/stats. In the megacorp world, there are distinctions between data analysts and data scientists, although in smaller companies the distinction is probably not as rigid.

Data analysts might use SQL, Excel, Tableau/PowerBI. Most of these can be learned for free.

Data scientists often spend most of their time programming in an environment like R / Python / Spark doing predictive modeling. The knowledge base required for 'data science' is a moving target (it's not well-defined at all), but could include a lot of high level math such as linear algebra, calculus, probability theory, and a high proficiency in coding. It's certainly possible to learn these topics on your own; others have mentioned a lot of great online resources. The shortest way to learn is definitely a graduate degree, though.

ApacheStache

  • Stubble
  • **
  • Posts: 119
  • Location: West By West West
Re: Big Data & Data Analytics
« Reply #24 on: April 06, 2019, 12:17:16 AM »
The knowledge base required for 'data science' is a moving target (it's not well-defined at all), but could include a lot of high level math such as linear algebra, calculus, probability theory, and a high proficiency in coding. The shortest way to learn is definitely a graduate degree, though.

+1 to this. As a software engineer currently spending evenings and weekends working toward a Bachelor's Degree in Data Science, from what I've observed there's no shortcut into Data Science. Also, like others have mentioned, Data Science, Data Analytics and Data Engineering require different skillsets and can encompass any number of tasks depending on which company you're applying to. I would recommend reviewing job openings and deciding which type of Big Data role makes the most sense for you. Once you have a good idea of what direction you want to take, try to incorporate some of that work into your current role and see if you can use that experience to your advantage when applying to future jobs

With that said, it seems like many companies see the potential benefits that larger companies have derived from Data Science and Machine Learning and they want in on the profits and the competitive advantage — however, the problem with this is many companies that try to build a Data Science/Machine Learning program typically have no idea what they want to achieve and/or how to empower their Data Scientists to be successful and provide meaningful ROI.

goalphish2002

  • Bristles
  • ***
  • Posts: 290
Re: Big Data & Data Analytics
« Reply #25 on: April 08, 2019, 06:25:52 PM »
Update: I took an intro course to Python and one for web development.  Both were free.  I must say I sort of enjoyed the web development one more.  However, I know that data analysis will help me more in my current role.  Has anyone done both?

Also, I spoke with my programmer the other day and mentioned my interest.  We obviously use Microsoft Office and our accounting system is Microsoft Dynamics SL.  He stated that learning VBA would be something I could immediately use.  Thoughts?

Thanks so much for all the help.

katsiki

  • Handlebar Stache
  • *****
  • Posts: 2015
  • Age: 43
  • Location: La.
Re: Big Data & Data Analytics
« Reply #26 on: April 08, 2019, 07:08:34 PM »
Update: I took an intro course to Python and one for web development.  Both were free. 

Way to go!  You moved fast on this..

Mind sharing which site you used for the courses?

Archipelago

  • Pencil Stache
  • ****
  • Posts: 781
  • Age: 29
  • Location: NH
Re: Big Data & Data Analytics
« Reply #27 on: April 08, 2019, 08:08:20 PM »
Update: I took an intro course to Python and one for web development.  Both were free. 

Way to go!  You moved fast on this..

Mind sharing which site you used for the courses?

I'm not sure what course that was but I think Free Code Camp could be worth checking out? It's a nonprofit that gives tutorials, projects and interview questions to work on. I spent some time on it learning HTML and was super impressed by the interface.

https://www.freecodecamp.org/

goalphish2002

  • Bristles
  • ***
  • Posts: 290
Re: Big Data & Data Analytics
« Reply #28 on: April 09, 2019, 07:05:19 AM »
Update: I took an intro course to Python and one for web development.  Both were free. 

Way to go!  You moved fast on this..

Mind sharing which site you used for the courses?

https://flatironschool.com/

Note: This school has physical boot camps, as well as online training.  The two courses I have taken are free and very surface level.  However, it does let you gauge your interest.

mizzourah2006

  • Handlebar Stache
  • *****
  • Posts: 1063
  • Location: NWA
Re: Big Data & Data Analytics
« Reply #29 on: April 09, 2019, 07:06:17 AM »
I guess I can give my background as a data scientist as well.

I graduated with a PhD in Industrial Psychology, with a focus on quant measurement and psychometrics. I ended up doing a lot of pre-employment selection and people analytics work early on in my career. I transitioned into a more traditional data science team at fortune 50 company with a large tech arm. Around this time was where I realized SPSS and SAS wasn't really going to cut it anymore, so I taught myself Python along with SQL/HQL to leverage all the data we had available in Hadoop and Teradata. In this role I did a bit of everything. Leveraging ML to automate the classification of open-ended responses from consumers and customers to be put into dashboards for our end users. I also consulted on the appropriate methodologies to capture differences in consumer behaviors over time, etc. In this role my technical title was a senior consumer research scientist, but I sat in the Data Science team. Ironically enough, people from social sciences backgrounds (except Economics) couldn't be called data scientists even though I was doing more "data science" than most of the people that were called data scientists. I say that to mean most of the data scientists were only automating data aggregation and reporting, whereas I was doing that and consulting on measurement methods/techniques, sampling biases/pitfalls, etc. After that I moved back into the Industrial Psychology space at a smaller consulting firm where we are leveraging advanced ML and some DL in the people space.

I definitely agree with others that an intro to CS is helpful, understanding Big O notation basic sorting algorithms, etc. But in my personal experience I've been able to stand on the shoulders of giants and leverage packages like NumPy, Cython, etc. where the heavy lifting in Big O is already taken care of. But knowing why it is important conceptually and understanding the implication O(log n) vs. O(N!) can have on your computational complexity and ability to scale are definitely important.

My advice would be it's definitely an area that's blowing up, but I do see a lot of companies hiring anyone they can get their hands on into roles like this and then trying to figure out how to leverage them. Data Scientists aren't the answer to everything. The old saying of Garbage In = Garbage Out still holds true. They can't turn poo into rainbows. I think in the coming years we will see a move away from DS where companies realize it's not the silver bullet they had thought it was. The beauty is being data savvy will help you in almost any career. IMO it really helps you think about the problem(s) you have holistically. Do I actually have the data to solve this problem? Could the anecdotes I am seeing be biased because of some confounding variable, etc.


goalphish2002

  • Bristles
  • ***
  • Posts: 290
Re: Big Data & Data Analytics
« Reply #30 on: April 27, 2019, 01:10:05 PM »
Changing direction on the education part:  What about an M.S. in Data Analytics?

scottish

  • Magnum Stache
  • ******
  • Posts: 2716
  • Location: Ottawa
Re: Big Data & Data Analytics
« Reply #31 on: April 27, 2019, 03:05:37 PM »
I have a technical question.

Is Hadoop still in use for big data?   Or has everyone moved to GPU clusters?

brute

  • Pencil Stache
  • ****
  • Posts: 691
Re: Big Data & Data Analytics
« Reply #32 on: April 29, 2019, 06:06:35 AM »
I have a technical question.

Is Hadoop still in use for big data?   Or has everyone moved to GPU clusters?

Not mutually exclusive. Hadoop is the defacto big data environment these days, but most of us are running our learning on GPUs, or at least training our neural nets on the GPUs and leveraging other hardware for the Zoo as needed.

https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html


mizzourah2006

  • Handlebar Stache
  • *****
  • Posts: 1063
  • Location: NWA
Re: Big Data & Data Analytics
« Reply #33 on: April 29, 2019, 09:07:47 AM »
I have a technical question.

Is Hadoop still in use for big data?   Or has everyone moved to GPU clusters?

Not mutually exclusive. Hadoop is the defacto big data environment these days, but most of us are running our learning on GPUs, or at least training our neural nets on the GPUs and leveraging other hardware for the Zoo as needed.

https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/UsingGpus.html

Yup, Hadoop for most people's purposes is a multi-clustered SQL database that splits up the queries into nodes and workers. GPUs are used to increase compute speed on training algorithms because they are naturally good at multi-processing matrix algebra. I'm not an expert in database management, but I don't see any reason you'd use GPUs when it comes to querying data. I'm not sure multi-threading GPUs for this purpose is any more efficient than multi-threading CPUs.


Edit: To say that there is obviously more to the Apache Hadoop eco-system, but most people using Hadoop are going to be focusing on the HQL, data query portion and using better languages like Python or Spark for the compute in the areas of machine learning and data analytics.
« Last Edit: April 29, 2019, 09:11:59 AM by mizzourah2006 »

 

Wow, a phone plan for fifteen bucks!