Google Track

Saturday, January 9, 2016

Treehouse courses

Learning with Treehouse for only 30 minutes a day can teach you the skills needed to land the job that you’ve been dreaming about.

I can honestly say that besides technical knowledge, Treehouse also gave me a certain mindset, and while I generally am a very ambitious person, with each completed course I felt energized to better myself even more.

Is there any advice you’d like to share with new students who are aspiring developers?

I’ve been only for about 2+ years in this industry and I could already write tens of pages about what should and should not be done by aspiring developers, but here’s a few:

If you’re offered a project you know nothing about, take it, you’ll learn after.
Be passionate about it. If your brain doesn’t get “turned on” by new concepts, libraries, programming languages, you should not be doing it.
Get ready to learn continuously. The web is like the Universe. Ever expanding. Consequently the same has to happen to your knowledge and skill-set.
Teach others and code forward ( code forward is a concept I came up with and it comes from “pay it forward”; it basically means do a few projects for free once in a while for people who deserve it ).
Accept failure as a necessary step in self-betterment.
Research before asking questions and know when to ask. Putting in even hours or days for finding the solution will always be more rewarding in the long-run than asking a question on StackOverflow waiting to be spoon-fed the answer. That being said, asking has its place especially in a team-based environment or if the deadlines are (as they often are) very tight.
Go to as many interviews as you can. If you tell a recruiter (agent) that you’d like to go to the interview even if just for the sake of the experience, they’ll respect you for that, and will do their best to land you an interview. That being said, don’t just rely on recruitment agencies, show some initiative and contact companies on your own too. It might just be the detail that gets you hired.
Finally, keep learning, especially when you feel discouraged.

Friday, November 14, 2014

22 tips for better data science

These tips are provided by Dr Granville, who brings 20 years of varied data-intensive experience working with successful start-ups, small companies across various industries, and eBay, Visa, Microsoft, GE and Wells Fargo.
1.     Leverage external data sources: tweets about your company or your competitors, or data from your vendors (for instance, customizable newsletter eBlast statistics available via vendor dashboards, or via submitting a ticket)
2.     Nuclear physicists, mechanical engineers, and bioinformatics experts can make great data scientists.
3.     State your problem correctly, and use sound metrics to measure yield provided by data science initiatives.
4.     Use the right KPIs (key metrics) and the right data from the beginning, in any project. Changes due to bad foundations are very costly. This requires careful analysis of your data to create useful databases.
5.     Fast delivery is better than extreme accuracy. All data sets are dirty anyway. Find the perfect compromise between perfection and fast return.
6.     With big data, strong signals (extremes) will usually be noise. Here's a solution.
7.     Big data has less value than useful data.
8.     Use big data from third party vendors, for competitive intelligence.
9.     You can build cheap, great, scalable, robust tools pretty fast, without using old-fashioned statistical science. Think about model-free techniques.
10.  Big data is easier and less costly than you think. Get the right tools! Here's how to get started.
11.  Correlation is not causation. This article might help you with this issue.
12.  You don't have to store all your data permanently. Use smart compression techniques, and keep statistical summaries only, for old data. Don't forget to adjust your metrics when your data changes,to keep consistency for trending purposes.
13.  A lot can be done without databases, especially for big data.
14.  Always include EDA and DOE (exploratory analysis / design of experiment) early on in any data science projects. Always create a data dictionary. And follow the traditional life cycle of any data science project.
15.  Data can be used for many purposes:
·        quality assurance
·        to find actionable patterns (stock trading, fraud detection)
·        for resale to your business clients
·        to optimize decisions and processes (operations research)
·        for investigation and discovery (IRS, litigation, fraud detection, root cause analysis)
·        machine-to-machine communication (automated bidding systems, automated driving)
·        predictions (sales forecasts, growth and financial predictions, weather)
16.  Don't dump Excel. Embrace light analytics.
17.  Data + models + gut feelings + intuition is the perfect mix. Don't remove any of these ingredients in your decision process.
18.  Leverage the power of compound metrics: KPIs derived from database fields, that have a far betterpredictive power than the original database metrics. For instance your database might include a single keyword field but does not discriminate between user query and search category (sometimes because data comes from various sources and is blended together). Detect the issue, and create a new metric called keyword type - or data source. Another example is IP address category, a fundamental metric that should be created and added to all digital analytics projects.
19.  When do you need true real time processing? When fraud detection is critical, or when processing sensitive transactional data (credit card fraud detection, 911 calls). Other than that, delayed analytics(with a latency of a few seconds to 24 hours) is good enough.
20.  Make sure your sensitive data is well protected. Make sure your algorithms can not be tampered by criminal hackers or business hackers (spying on your business and stealing everything they can, legally or illegally, and jeopardizing your algorithms - which translates in severe revenue loss). An example of business hacking can be found in section 3 in this article.
21.  Blend multiple models together to detect many types of patterns. Average these models. Here's a simple example of model blending.
22.  Ask the right questions before purchasing software.

Wednesday, November 5, 2014

Top 10 Big Data Technologies Of Present Times

Over last few years Big Data technologies are getting due attention and there are several trends and innovations in this space in recent times.  
Wednesday, October 22, 2014:  Big Data is a concept which is quite broad and comprises several trends and technology developments. Over last few years Big Data technologies are getting due attention and there are several trends and innovations in this space in recent times. Here we'll discuss top ten emerging Big Data technologies.

1. Column-oriented databases: 

Traditional databases are excellent in online transaction processing but when it comes to query performance while data volumes grow, these databases fall short on performance. The new column-oriented databases store data and focuses on columns and not rows. It allows huge data compression and faster query times.

2. Streaming Big Data analytics: 

There are several projects in this section including Storm, Spark, Data Torrent, Spring XD and SQL Stream. Apache Storm is an open source distributed real-time computation system which simplifies streams of data and real-time processing. Spark is a data processing platform which is compatible with Hadoop. DataTorrent is a real-time streaming platform which enables businesses to perform data processing. Spring XD supports streams for event driven data while SQLStream provides a distributed stream processing platform for streaming analytics, visualization and continuous integration of machine data.

3. Schema-less databases, or NoSQL databases: 

This database category includes key-value stores and document stores. This database focuses on storage and retrieval of large volumes of unstructured, semi-structured or even structured data.

4. SQL-in-Hadoop: 

This technology includes Apache Hive, Shark, Apache Drill, Presto and Phoenix among many others. It helps in making queries and it also manages large datasets in distributed storage. Shark is a data warehouse system which supports Hive's query language. Apache Drill is an Apache incubation project and it's designed for scalability. It's backed by MapR. Presto is an open source distributed SQL query engine and Phoenix is an open source SQL query engine for Apache Hbase.

5. MapReduce: 

It's a programming paradigm which allows massive job execution scalability against thousands of servers or clusters of servers. Its two tasks are Map task and Reduce task. It converts any input dataset into different set of value pairs while reducing set of tuples.

6. Hadoop: 

Hadoop is an open source platform for handling Big Data which can work with multiple data sources. It has other applications too and it's largely used for changing data like location-based data from weather or traffic sensors, web-based or social media data or machine-to-machine transactional data.

7. PIG: 

PIG brings the Hadoop project close to developers and business users and it's used by Perl like language allowing query execution over data stored on a Hadoop cluster. PIG was a project by Yahoo! But now it's completely open source.

8. Big Data Lambda Architecture: 

Lambda Architecture is a hybrid platform which combines real-time data and pre-computed data to provide a near-real time view of the data at all times. Its frameworks include Summingbird by Twitter and Lambdoop.


It almost copies Hadoop and it requires developer knowledge to operate. It's a platform which turns queries into Hadoop jobs with immediate effect and creates an abstraction layer to simplify the datasets in Hadoop.

10. SkyTree: 

It's a high performance machine learning data analytics platform which handles Big Data. It's an essential part of Big Data.

Courtesy: TechRepublic and InfoQ 

Sanchari Banerjee, EFYTIMES News Network 

Friday, June 20, 2014

Must read before attending any data science interview

Here are fundamental resources that you should check out before your next data science interview. Read these documents thoroughly to get prepared, impress your interviewer, boost your chances to be hired, and get bigger paycheck if hired:
Resources to read 2-3 days before your job interview:
Related articles