Wayne's Github Page

A place to learn about statistics

Importing Packages

If you haven’t noticed yet, Python’s default functionality is quite minimal. For example, log() which is commonly available in most calculators is not available by default. You need to source the function from a package.

Using built-in packages

Continuing the log() example, there are 2 ways to obtain access to log():

The first method is useful when you’re developing your code since you may not know all the functions you’ll need upfront. The latter is common when you only need a specific function from the package and the package is relatively large, e.g. sklearn.

Using custom packages

There is no difference in how to access the functions from a custom package vs built-in package except in how the package is retrieved.

Built-in packages come with the Python installation. Custom packages require an additional step that could create problems when working across projects.

Packages may depend on different versions of the same package, creating a “dependency conflict”. This can cause major pains if not handled carefully. For example, installing a package may force you to downgrade another package which may lead to your old code crashing.

Familiarizing yourself with a package manager will come in handy in the future. conda is popular with the scientific community, providing additional features like virtual environments, allowing you to have different environments with different dependencies.

pip is also quite popular but you would need to pair it with virtualenv. This is more programmer friendly.

Most data science packages can be installed easily through Anaconda.

For those unfamiliar with the command line, the Anaconda Navigator may be a better start but this isn’t well supported given most data practictioners are reasonably comfortable with the command line.

Dominant packages in data science

Here’s a list of the popular packages used in data science that will be covered in later chapters.

When sourcing these packages, it’s common to shorten their name. Below we shorten numpy to np so the code is shorter.

import numpy as np

demo_array = np.array([1, 2, 3])