Python学习笔记——数据处理2019-05-02 – AJohn11 – Python量化投资

Python学习笔记——数据处理2019-05-02 – AJohn11

  • NumPy,数值计算的基础包。它定义了数值数组和矩阵类型以及它们的基本操作。
  • SciPy的库,数值算法和特定领域的工具箱,包括信号处理,优化,统计和更多的集合。
  • Matplotlib是一个成熟且受欢迎的绘图软件包,可提供出版品质的2D绘图以及基本的3D绘图



  • pandas,提供高性能,易于使用的数据结构。
  • SymPy,用于符号数学和计算机代数。
  • scikit-image是用于图像处理的算法的集合。
  • scikit-learn是用于机器学习的算法和工具的集合。
  • h5pyPyTables都可以访问以HDF5格式存储的数据。


  • IPython是一个丰富的交互式界面,可让您快速处理数据和测试想法。
  • Jupyter笔记本提供了Web浏览器IPython的功能多,让您在轻松重现的形式记录您的计算。
  • Cython扩展了Python语法,以便您可以方便地构建C扩展,既可以加速关键代码,也可以与C / C ++库集成。
  • DaskJoblibIPyParallel用于分布式处理,重点是数字数据。


  • nose,一个测试Python代码的框架,逐步淘汰优先于pytest
  • numpydoc,用于记录Scientific Python库的标准和库。

The SciPy ecosystem

Scientific computing in Python builds upon a small core of packages:

  • Python, a general purpose programming language. It is interpreted and dynamically typed and is very suited for interactive work and quick prototyping, while being powerful enough to write large applications in.
  • NumPy, the fundamental package for numerical computation. It defines the numerical array and matrix types and basic operations on them.
  • The SciPy library, a collection of numerical algorithms and domain-specific toolboxes, including signal processing, optimization, statistics and much more.
  • Matplotlib, a mature and popular plotting package, that provides publication-quality 2D plotting as well as rudimentary 3D plotting

On this base, the SciPy ecosystem includes general and specialised tools for data management and computation, productive experimentation and high-performance computing. Below we overview some key packages, though there are many more relevant packages.

Data and computation:

  • pandas, providing high-performance, easy to use data structures.
  • SymPy, for symbolic mathematics and computer algebra.
  • scikit-image is a collection of algorithms for image processing.
  • scikit-learn is a collection of algorithms and tools for machine learning.
  • h5py and PyTables can both access data stored in the HDF5 format.

Productivity and high-performance computing:

  • IPython, a rich interactive interface, letting you quickly process data and test ideas.
  • The Jupyter notebook provides IPython functionality and more in your web browser, allowing you to document your computation in an easily reproducible form.
  • Cython extends Python syntax so that you can conveniently build C extensions, either to speed up critical code, or to integrate with C/C++ libraries.
  • Dask, Joblib or IPyParallel for distributed processing with a focus on numeric data.

Quality assurance:

  • nose, a framework for testing Python code, being phased out in preference for pytest.
  • numpydoc, a standard and library for documenting Scientific Python libraries.

© 著作权归作者所有,转载或内容合作请联系作者


0 条回复 A 作者 M 管理员
欢迎您,新朋友,感谢参与互动!欢迎您 {{author}},您在本站有{{commentsCount}}条评论