Answer by Patrick Lilley:
You mentioned you'd have 15 to 20 hours per week for career development. You're fortunate; that is a wealth of time if used properly. I would recommend several things to accelerate your skills over the next one to two years, listed in order of priority but done in parallel:
1) Go the UCI Machine Learning Database (Google that title) and try lots of the problems. Set a goal to produce reasonable results on 100 data sets in two years. At first you'll take longer than the requisite one week per data set, but you'll get faster as you learn, and as you develop your own tools to automate some tasks. Start with regressions in Excel as a baseline, and to learn the limitations most people face, then focus on R and any add-ons you'd like to try.
2) Get a mentor who'll sit with you for lunch two or three times a month. This mentor should be a data scientist, a statistician, or an engineer with significant machine learning experience. Talk about problems and what it takes to solve them. Show your mentor your UCI project results and your methods, and get feedback.
3) Read like hell. Academic machine learning papers, Wikipedia pages on concepts and phrases you haven't heard, books on everything reasonably related to the theory and practice of data science.
4) Take online courses in related subjects.
Move fast, don't get caught up in making tiny incremental improvements or trying to beat published results. Just do this large number of projects on UCI's relatively small data sets so that you're exposed to a wide variety of problems. Forget the "big" in "big data" until later. Right now you need to build your intuitive feel for solving problems, and how the various machine learning and statistical methods differ in both approach and results. Don't get caught up in academic theory, religious adherence to a single method, or the latest shiny new thing. Just solve problems and pay attention to how different measures of success have implications for practical use of the results.
For the next two years, minimize the amount of algorithms and approaches that you develop for yourself — the only development you should do is scripts for pre-processing data and automating repetitive tasks. After the two years, you'll know whether you should invent and development something. Be patient; build your foundation first.
And always write yourself a little problem statement at the beginning of each project, and a statement at the end about how this could benefit someone — and calculate the magnitude of the benefit and how many people worldwide would experience that benefit versus standard regression. As mental exercise, ask yourself how much you could charge for that benefit if people bought your model. This process will help you stay focused on the practical aspects of your skills. It'll also help you learn to define success metrics properly for any data set.
Do this for two years and you will advance your data science skills faster than 99% of new practitioners, and you will have a killer body of work as an addendum to your resume.
You've made a great career choice. Now outrun your peers. Good luck.