Frequently Asked Questions: Data Scientist

This post is writtern with tounge-in-cheek humour as well as to answer (almost) all frequent question I get regarding Data Science. If you’re a person who does not do well in humorous condition PLEASE DO NOT PROCEED. I’m not responsible for anyone getting offended, you are responsible for your own emotions.

I feel like I should dedicate this blog. So this blog is dedicated to those who inspired it and those who will not read it.

How to read this post - If you can read this sentence, you are pretty much doing it right.

I’ve been getting quite a lot of question regarding getting started in Data Science and Machine Learning for quite some time now. And some of these questions are getting repetitive, so I thought that putting it together in one post would be the right way to go. So here it is…

What is Data Science ? Is it edible ?

Firstly NO. Data Science is the art of drawing insights from data. Insights can be of many forms: Knowledge, features, patterns.

Science Biyatch
Me when I see a bunch of data. Source: GIPHY

Why are you weird ?

Sí.

As a data scientist, do you fight self aware sentinent robots everyday ?

No. But if you come across one, let him/her/it or whatever identity it chooses to assume know, I’m interested.

Will Deep Learning lead to Skynet ?

If by Skynet you mean AGI, it is best answered by somone qualified:

We need a Goldilocks Rule for AI:
- Too optimistic: Deep learning gives us a clear path to AGI!
- Too pessimistic: DL has limitations, thus here's the AI winter!
- Just right: DL can’t do everything, but will improve countless lives & create massive economic growth.
— Andrew Ng (@AndrewYNg) August 17, 2018

Although, in Deep Learning the models depends on data so if the data is biased or corrupted the model will reflect it. For example, if you decide to build a chatbot with psychopathic responses data, that’s the kind of responses you should expect.

So what do you work on ?

Magic
Deal in Black Magic

I magically solve all worldly problems with AI … in my dreams. In real life, I work on applying Deep Learning to solve concerning problems in machine vision.

Which programming language do you prefer ?

Mostly python. But don’t be afraid to mix it up with R, C++, Javascript, Vanilla, Chocolate if needed… Oh wait! Sprinkle it up really.

Let it Rain
Me approaching a problem

What are the key skills of a Data Scientist ?

As a Data Scientist (as suggested by Jérôme Blanchet):

You can import keras and feel like a God.
Do your Ph.D. by copy-pasting StackOverflow.
Learn how Kaggle is representative of real world.
Learn how grid-search is useless with a 100,000$ multi GPU.
Predict titanic survival and call yourself a scientist.

But on a serious note, the most important would be Patience. Be ready to see your model train for a week to crash and burn. Things like Statistics, Linear Algebra, Programming can be learned but you need to be inherently patient.

Do I need to have a deep knowledge of Linear Algebra and Calculus ?

This is difficult to answer, there are extremists who believe that one should start with basic maths, programming, stats, and then delve into deep learning; and then there are those who believe that practical implementation is the only things that is needed.
My advice is to follow a middle path. If something is not clear of the bat, try using readily available implementation, then understands its theory slowly. You can co-relate between the two more easily.

Can I do Deep Learing without knowing Linear Algebra and Calculus ?

I would suggest against it, A magician should be able to perform a magic trick but he should also know how to handle it if things go awry. Since Deep Learning is basically glorified Linear Algebra your should know at least the important parts.

Why did you get into the field ?

It occured to me once, while I was sleeping.
Never forgetti the motivation (Section 2) for the dropout paper.

Let it Rain
TFW Models don’t work

How did you get into the field ?

These people came down in flying saucers and bestowed me with knowledge. Really.
Well, it involved a lot of Google searches, lot of downvotes on Stackoverflow and making a lot of mistakes. That includes but not limited to looking at NaN losses, loss plot doing jumping jacks on the plot axis, models producing a single type of output.

Never Forgetti
Every Project ever

How do you keep up-to date with the latest happening in AI ?

I read a lot. Like a lot. Including but not limited to blogs by experienced Data Scientist and ML Researcher, latest papers and other books.
A lot of action happens on twitter too. * wink *

Kabhi kabhi lagta hai...
I’m something of a scientist myself

Do I really need GPUs for deep learning ?

You don’t need GPU for getting started. Small models run in reasonable time on CPU.

GPU. GPU. GPU.
Cries in Broke. Source

But if you want to experiment with newer, deeper models yes you’ll need a GPU. No need to burn a hole in your pocket something basic would do, or you can “borrow” your friend’s gaming rig.

Is the universe a simulation and half of the world sleeps at night because the simulation does not have enough computing power to simulate all life-forms at the same time ? If yes, is it possible to hack it and make cheat codes ?

Wait. What ?! … What ?!?!

How do I get started with Deep Learning ?

Hit the road running. Crash and Burn. Scrouge Kaggle, UCI repository or capture live data using Twitter API for datasets and try making sense of it. Try appyling machine learning to problems that already exist. You’ll get it. There is no single defined path, you have to keep at it. Remember:

“Every great developer you know got there by solving problems they were unqualified to solve until they did it.” - Patrick McKenzie

AI is Electricity
Recurrent reacts only

How do I get into Deep Learning research ?

Here is the answer from someone qualified: Yann LeCun answer on What’s your advice for an undergraduate student who aspires to be a research scientist in deep learning or related field one day?

My two cents: Try to find internships or just work with professors who are working in the field of your interest. You’ll get exposure to the research environment. After that try to move up with your ideas.

What is the difference between ML in research and in industry ?

Academic research is more based on novelty. Experience wise it is writing messy code to try different experiments, depending on the team size, you’re most probably writing code that will never see the light of the day. You may deal with micro-optimization on a specific platform, pushing the performance of the current state of the art and dealing with bad results.
On the other hand in the industry, it is usefulness based. You’ll be working in a team, so the way you write your code should be clean, concise and explainable, following the coding guidelines of the organization. You’ll also spend time on questions like maximum allowable latency, the user’s privacy and model serving through services.

What are the best blogs to read in data science?

Well there are lot of well written blogs, but here are a few of the top of my mind:

Data Science Central
No Free Hunch
Machine Learning Mastery
The Shape of Data
Sebastian Raschka’s blog
PyImageSearch, mostly related to computer vision problems.
Berkely Artificial Intelligence Research blog, relatively new one.
Deepmind, OpenAI and Google AI are gold-mines regarding latest works.

If you know of any great blogs, please write to me in the comments.

How long do you think a person should take to master a data science skill, and what is the best pathway to do so?

The competencies required for Data Science or any field for that matter will depend on the time one is willing to put in. As I have said before if it is a learnable skill it can be learned, Data Scientists aren’t inherently born with the capacity to understand cost functions, convolutions or polynomial splicing for feature smoothing.

Born a DS
…aren’t born. They’re made.

As for the path to follow, there is no single straightforward way. One pointer though - You know how in a game if the enemies stopped appearing you’re going in the wrong path. Same here, except the enemies are challenges, there is no respawn, there is a sheer lack of background music, NPCs are much harder to deal with, the map is huge, the character customization available is pretty boring and 18 years of paid tutorials still wouldn’t prepare you for the real game. Great graphics though.

Is data science another bubble like software development that could saturate or burst sooner or later, since the resources available are limited?

There is no bubble it’s demand and supply. With most businesses automating their business there will be always some demand for software engineers to grow and maintain the complex systems.
The real question, however, is the quality. Colleges pump out engineers by the dozen a dime, with a large skill gap between the industry standards and the courses. Every engineer should look at themselves as a company and develop new skills as they are in demand. Those are the guys that’ll will survive if there is ever a saturation.

What do you do if your models fail ?

The question is not about “if”, the question is “when”. I get over it after I’m done being dramatic at first.

Tears
Pulled a sneaky on you, didn’t I ?

How do you motivate yourself to work ?

I don’t. Motivation is fickle, I rather force myself to cultivate working as a habit. So no matter what happens, I do the things I gotta do as a force of habit. Most people have been/are in a situation they don’t like an aspect of, but only a few go ahead and does something to change it; be the rest, change what you don’t like, else you’re just a part of the problem. Give your 100%. 110% is impossible. Only idiots recommend that.

Tears
Be the change

Which Deep Learning framework should I use ?

All frameworks are made for certain use cases. And you can choose frameworks for the same with experience. For example, Tensorflow is really verbose and stable considering the v2.0 release coming up (as of writing this) but it is good for having control over your entire pipeline, including expensive data-loading. For hitting the road faster from idea-to-implementation many prefer Keras. PyTorch is gaining momentum quickly.

Huehuehuehue
Make the Noble choice use Theano. Source

What tool stack do you use ?

Pandas, Numpy, matplotlib, dask, tensorflow, keras, scipy, SQL for queries are some of the few. It is surprising what you can achieve with these basic libraries.
You can try to use a cosmic ray laser to flip bits on a remote computer to program your model though, I ain’t judging.

Which researchers do you follow ?

Andrej Karpathy, Ian Goodfellow, Andrew Ng, Yoshua Bengio, DeepMind, OpenAI, Google Brain to name a few.

What are your favourite papers ?

The AlexNet Paper. More recently A. Vaswani’s “Attention is all you Need” paper and Xianyan Jia’s paper where they trained on ImageNet in 4 minutes. Random fun fact: They used 2048 K40 GPUs, that’s $12 million worth of GPUs.
Also Hinton’s Trilogy:

Transforming Autoencoders.
Dynamic Routing between Capsules.
Matrix Capsules with EM Routing.

But the trick is to be updated.

How do I measure myself as a Data Scientist ?

You should be able to compile Caffe from scratch. * evil laughter *
Also I’ll leave this here:

Huehuehuehue. Source

Do you have a formal degree in Machine Learning ?

No. I’m what they called self-learned. Although everyone is technically self-learned because people can explain it to you, but they can not understand it for you.

Huehuehuehue
Shake. Shake.

How do you get a Data Science job right out of college ?

Huehuehuehue
¯_(ツ)_/¯

Whew. This is a toughie. The question is subjective, although if I have to I’d simplify it into two steps:

Data Scientists are in demand in the industry, so you need to cultivate the skill that is required by the industry.
Secondly, “communicate”. It’s no use having skill if no one knows about them. Write about your projects. Show it to experienced Data Scientists and ask for their advice.

But try to explore your own way, that complements your particular skills.

Can I buy you a coffee ?

Yesh.