I found recently an oDesk job which can match my interest. Client says: “I would like that predictor to be written in Python only, and leverage only publicly-available libraries (mlpy, scipy,scikit etc.)“. Well, it would be good idea to utilize more than one package and check for output F-score, so I googled the most known machine learning packages for Python, there they are:
- MLPY (https://mlpy.fbk.eu/)
- PyML (http://pyml.sourceforge.net/)
- Milk (http://pypi.python.org/pypi/milk/)
- Shogun (http://www.fml.tuebingen.mpg.de/… Code is in C++ but it has a python wrapper.
- MDP (http://mdp-toolkit.sourceforge.n… Python library for data mining
- PyBrain (http://pybrain.org/)
- Orange (http://www.ailab.si/orange/): Statistical computing and data mining
- PYMVPA (http://www.pymvpa.org/)
- scikit-learn (http://scikit-learn.org): Numpy / Scipy / Cython implementations for major algorithms + efficient C/C++ wrappers
- Monte (http://montepython.sourceforge.n… a software for gradient-based learning in Python
- Rpy2 (http://rpy.sourceforge.net/): Python wrapper for R
MLPY and PyML seem to be the most known and mainstream choices. Regarding the list above – Anaconda Python distribution seems to include only scikit-learn package. On the other hand, if your task is connected with NLP only, NLTK package may be enough.
A quick-witted tutorial on dissasembly of python byte-code:
You will need a dis module (http://docs.python.org/2/library/dis.html) and patience 😉
BeautifulSoup is a Python package that parses broken HTML, just like lxml supports it based on the parser of libxml2. BeautifulSoup uses a different parsing approach. It is not a real HTML parser but uses regular expressions to dive through tag soup. It is therefore more forgiving in some cases and less good in others. It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection. It very much depends on the input which parser works better.
To prevent users from having to choose their parser library in advance, lxml can interface to the parsing capabilities of BeautifulSoup through the lxml.html.soupparser module. It provides three main functions: fromstring() and parse() to parse a string or file using BeautifulSoup into an lxml.html document, and convert_tree() to convert an existing BeautifulSoup tree into a list of top-level Elements.