I will list some packages or knowledge during my internship, just for reference and casual discussion.
Python
Time Scheduler
- Scheduler(https://pypi.python.org/pypi/schedule)
Perfectly solving the problem of auto email sending
Auto Email Sending
- Google Cloud (https://cloud.google.com/)
DAG Structure
- Airflow(http://nerds.airbnb.com/airflow/)
Word Comparision
- FuzzyWuzzy (https://github.com/seatgeek/fuzzywuzzy)
- Difflib (https://docs.python.org/2/library/difflib.html)
Difflib provides three types of comparison method, regarding the speed.
Graphical Viewer
- snakeviz (https://jiffyclub.github.io/snakeviz/)
I likt this, it is based on CPython and very powerful on jupyter/?Ipython.
Multithreading and Parallel Computing
- threading (https://docs.python.org/3/library/threading.html)
- concurrent.future (https://docs.python.org/3/library/concurrent.futures.html)
- multiprocessing ()
- multiprocessing.Pool cannot return value during the process
- multiprocessing.Process can use pipe() or queue() to make process communicate with each other
- multiprocessing.ThreadPool can print out everything during the process
threading is the most typical one, but concurrent.future has some better attributes to deal with the thread output
URL Parse
- urllib (https://docs.python.org/3/library/urllib.html)
- requests.get(url)
- http.client (https://docs.python.org/3/library/http.client.html)
In python 3, urllib2 has been replaced by urllib, urllib.request, urllib.error and etc. requests packages is much faster than urllib
Financial Data API for python
- pandas-datareader (https://pandas-datareader.readthedocs.io/en/latest/)
The coolest thing I ever had