Choosing the proper machine learning method

Scikit-learn portal published a cheat sheet map for choosing a right estimator for the particular job. On the edge of map there are most common jobs: clustering, customization, regression and dimensions reduction. From the start point, graph asks a couple of questions on your problem which you want to solve. Firstly, it suggest to get more data if there are less than 50 observations 🙂 On the classification problem, possible given solving techniques are: Linear SVC, SGD Classifier or kernel approximation (for large datasets), Naive Bayes, KNeighbour Classifiers, SVC (ensemble classifiers).

machine-learning
Click to view larger

It would be great if somebody improved this map for more problems, frameworks (not only scikit-learn) and made a website for fast robust method suggestion (through question asking).

Also, check out a very similar (but much larger!) to the map described on dlib C++ Machine Learning library page. http://dlib.net/ml_guide.svg

Source: Jassim Moideen on “Big Data and Analytics” LinkedIn group

What are the best machine learning libraries for Python ?

I found recently an oDesk job which can match my interest. Client says: “I would like that predictor to be written in Python only, and leverage only publicly-available libraries (mlpy, scipy,scikit etc.)“. Well, it would be good idea to utilize more than one package and check for output F-score, so I googled the most known machine learning packages for Python, there they are:

MLPY and PyML seem to be the most known and mainstream choices. Regarding the list above – Anaconda Python distribution seems to include only scikit-learn package. On the other hand, if your task is connected with NLP only, NLTK package may be enough.

Source: http://www.quora.com/What-are-the-best-open-source-machine-learning-libraries-written-in-Python
http://docs.continuum.io/anaconda/#packages-included-in-anaconda