Hi can I jump in on this?
I'm an academic, working on my phd in forestry. Interested in transitioning to data science, particularly in an industry where I can leverage my educational background. So for example, there are startups who are using Google Earth Engine and remote sensing analytics to identify where, geographically, the highest return on investments are for purchasing carbon offset credits.
Two pillars of forestry are biometry and sampling, so I have a decent head for frequentist statistics. I know application of basic stats: imputations like KNN and random forests, kriging, geostatistics, point pattern analysis, time series, crossvalidation, classification and regression trees, multivariate ordination (PCA, DCA, NMDS, etc), linear/nonlinear regression, categorical tests. What I don't know are bayesian stats or the methods that data science types prefer (boosting, LASSO, NLP, recommender systems, etc). I've taken at least half a dozen stats grad courses and they don't mention those.
I use R and Python daily (often a half dozen other langs), understands computer/network/internet architecture with basic competency (enough to use compute servers and write API clients), and so I have no problem writing fast efficient code (for an interpreted lang). data.table/tidyverse/rhadoop/map-reduce FTW. I even have a package on CRAN (that gets over 1K downloads/month which I feel is good for how niche it is).
Thing is, I've never taken a class or read a book on computer science or math. No, I didn't even take college algebra or trig. Reason I'm asking is that data scientist interviews seem to be alot like software dev, in that algorithms is a big part of it. Hash? linked list? sorting algorithms? that's all gibberish to me.
But, for example, at work I have sifted through tb of data on a SQL database server, fit families of quantile regression models, backtested for the best models from those families on a wholly separate dataset, and then validated my models against similar models on test data. The goal was to create predictive models and hand it off to software dev folks who put it on a website. No clue if that is 'data science' or what exactly. At the time, my role was forest biometrist.
I feel like I could perform well in a data science (or data analyst or GIS analyst) position, but am I lacking by not knowing math or computer science fundamentals? If so, how would I go about self-teaching? I have some calculus and discrete math ebooks that I haven't cracked open yet.
I also know some basic Django and Shiny. Given that everything is a web app these days, should I dive deeper into this? I volunteer for a local Code for America group and everything we produce is a web app that often has a Python backend and JavaScript front-end. That (and my last job) gives me the impression I should keep practicing on projects that involve some web-based product. Then again, I know alot of analyses turn into white papers or slide decks. So my question is, what sort of presentation formats do yall use most? static HTML? markdown/latex PDF? slides? dynamic websites with things like leaflet or plotly?
Does the fact that I present at conferences and publish articles to various audiences help? I've heard that applicants from traditional math/computer science backgrounds lack communication skills (eg communicating complex information simply) across verbal/written media