Google Track

Friday, November 14, 2014

22 tips for better data science


These tips are provided by Dr Granville, who brings 20 years of varied data-intensive experience working with successful start-ups, small companies across various industries, and eBay, Visa, Microsoft, GE and Wells Fargo.
1.     Leverage external data sources: tweets about your company or your competitors, or data from your vendors (for instance, customizable newsletter eBlast statistics available via vendor dashboards, or via submitting a ticket)
2.     Nuclear physicists, mechanical engineers, and bioinformatics experts can make great data scientists.
3.     State your problem correctly, and use sound metrics to measure yield provided by data science initiatives.
4.     Use the right KPIs (key metrics) and the right data from the beginning, in any project. Changes due to bad foundations are very costly. This requires careful analysis of your data to create useful databases.
5.     Fast delivery is better than extreme accuracy. All data sets are dirty anyway. Find the perfect compromise between perfection and fast return.
6.     With big data, strong signals (extremes) will usually be noise. Here's a solution.
7.     Big data has less value than useful data.
8.     Use big data from third party vendors, for competitive intelligence.
9.     You can build cheap, great, scalable, robust tools pretty fast, without using old-fashioned statistical science. Think about model-free techniques.
10.  Big data is easier and less costly than you think. Get the right tools! Here's how to get started.
11.  Correlation is not causation. This article might help you with this issue.
12.  You don't have to store all your data permanently. Use smart compression techniques, and keep statistical summaries only, for old data. Don't forget to adjust your metrics when your data changes,to keep consistency for trending purposes.
13.  A lot can be done without databases, especially for big data.
14.  Always include EDA and DOE (exploratory analysis / design of experiment) early on in any data science projects. Always create a data dictionary. And follow the traditional life cycle of any data science project.
15.  Data can be used for many purposes:
·        quality assurance
·        to find actionable patterns (stock trading, fraud detection)
·        for resale to your business clients
·        to optimize decisions and processes (operations research)
·        for investigation and discovery (IRS, litigation, fraud detection, root cause analysis)
·        machine-to-machine communication (automated bidding systems, automated driving)
·        predictions (sales forecasts, growth and financial predictions, weather)
16.  Don't dump Excel. Embrace light analytics.
17.  Data + models + gut feelings + intuition is the perfect mix. Don't remove any of these ingredients in your decision process.
18.  Leverage the power of compound metrics: KPIs derived from database fields, that have a far betterpredictive power than the original database metrics. For instance your database might include a single keyword field but does not discriminate between user query and search category (sometimes because data comes from various sources and is blended together). Detect the issue, and create a new metric called keyword type - or data source. Another example is IP address category, a fundamental metric that should be created and added to all digital analytics projects.
19.  When do you need true real time processing? When fraud detection is critical, or when processing sensitive transactional data (credit card fraud detection, 911 calls). Other than that, delayed analytics(with a latency of a few seconds to 24 hours) is good enough.
20.  Make sure your sensitive data is well protected. Make sure your algorithms can not be tampered by criminal hackers or business hackers (spying on your business and stealing everything they can, legally or illegally, and jeopardizing your algorithms - which translates in severe revenue loss). An example of business hacking can be found in section 3 in this article.
21.  Blend multiple models together to detect many types of patterns. Average these models. Here's a simple example of model blending.
22.  Ask the right questions before purchasing software.




12 comments:

Unknown said...

A strong data scientist is someone who can clearly and fluently translate their technical findings to non-technical teams. You also need to have a solid understanding of the industry you’re working in, including what problems your company is trying to solve. Be open-minded and think outside of the box. business analytics course

Unknown said...

Hi Everyone, this blog is very helpful for me.We are also the provider of Email marketing service,B2B Email Marketing services,Data Base Services,Data Base Services,b2b email mailing lists,email panel with the best data support Global wide.

kumar said...

I agree with the most of the points in this article and it’s great without any doubt. I think you made some good points in Features also.I would like to suggest your blog in my friend’s circle. Here I find everything in details. I hope I will see this type of post again in your blog.
Engineering Colleges, ECE Engineering Colleges in Chennai

Unknown said...

A great content and very much useful to the visitors. Looking for more updates in future.

BIGDATA Training in Chennai

SARA MARK said...

This is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article.
Language Translation Services ,
Subtitling Companies in Bangalore

devidnayana said...



I am very happy when read this blog post because blog post written in good manner and write on
good topic. Thanks for sharing valuable information.

School Information Management System,
Online Fee Payment Integration
College Management System

Message

simbu said...

I love the blog. Great post. It is very true, people must learn how to learn before they can learn. lol i know it sounds funny but its very true. . .

Java training in Bangalore | Java training in Marathahalli

Java training in Bangalore | Java training in Btm layout

Java training in Bangalore | Java training in Marathahalli

Java training in Bangalore | Java training in Btm layout

gowsalya said...

Some us know all relating to the compelling medium you present powerful steps on this blog and therefore strongly encourage contribution from other ones on this subject while our own child is truly discovering a great deal. Have fun with the remaining portion of the year.
python course institute in bangalore
python Course in bangalore
python training institute in bangalore

mounika said...

Nice post..

DOT NET training in btm

dot net training institute in btm

dot net course in btm

best dot net training institute in btm

preethi minion said...

nice post..............!inplant training in chennai
inplant training in chennai
inplant training in chennai for it
italy web hosting
afghanistan hosting
angola hosting
afghanistan web hosting
bahrain web hosting
belize web hosting
india shared web hosting

shri said...

good blogs...
internship in chennai for ece students
internships in chennai for cse students 2019
Inplant training in chennai
internship for eee students
free internship in chennai
eee internship in chennai
internship for ece students in chennai
inplant training in bangalore for cse
inplant training in bangalore
ccna training in chennai


Ramesh Sampangi said...

Boost your Data Science career skills by joining the advanced Data Science Training in Hyderabad at AI Patasala.
Data Science Course with Placements in Hyderabad