Calling Built-in Functions in Python
Similar to calculators with basic transformations, Python has a suite of functions to help you work with data.
But unlike calculator functions that take in a single number (e.g. abs(-1)),
there can be multiple inputs and inputs can be lists or other more complicated
objects.
Functions with multiple arguments
Sometimes it’s useful to sort the data according to a feature before examining
the data so you can focus on the outliers. sorted() is such a function that
can help:
demo = [10, 9, 9, 1, 5, 8, 6, 7, 7, 10]
asc_demo = sorted(demo)
print(asc_demo)
dsc_demo = sorted(demo, reverse=True)
print(dsc_demo)
print(demo)
This example shows:
- Multiple inputs can be passed into functions separated by
, - Inputs can be different types (e.g. list and boolean).
- Inputs can have names like
reversewhere=again assigns the input to the particular argument. - The input list,
demo, itself was not changed. This may not be true for some functions that have “in-place” manipulations.
Passing arguments by position vs keywords
Continusing the example above, the first input demo was passed in as the first
input which is considered a positional argument. The second input True was
passed as a keyword argument because we referenced the keyword reverse.
Notice that keyword arguments cannot be used before position arguments since the intent is unclear: do you expect the keyword argument to count as one of the positions or not?
sorted(reverse=True, demo)
File "<ipython-input-18-9383389be95f>", line 1
sorted(reverse=True, demo)
^
SyntaxError: positional argument follows keyword argument
How did we know what the keywords were called? Look up the documentation
help(sorted)
# or in iPython
?sorted
Writing your own function
Imagine that we would like to standardize text data by lowercasing the words then split the text up by spaces.
def standardize_text(text):
return text.lower().split()
- We use the
defkeyword to let Python know we’re defining our own function - The name of the function is
standardize_text - The function takes in one input called
text, the name would suggest the input is a string - The body of the function comes after
: - The indentation of 4 spaces for the body of the function is necessary
- The
returnkeyword will pass the resulting output and terminate the function. All commands after thereturnfunction is run will be ignored. - We did not have to specify the data type of the output
You should always test the function with various types of input to see what could go wrong. It is good to have cases that will for sure fail!
standardize_text('hello world!')
standardize_text('#2022HereWeComeAgain')
standardize_text('George Washington')
standardize_text('STOP')
standardize_text('')
standardize_text(['i said x', 'you said y'])
Setting defaults in functions
You can give your function default values for its parameters by assigning the
parameter during its definition. For example, if we wanted a log function
that defaulted to base=10 instead of base=e:
from math import log
def my_log(x, base=10):
return log(x, base)
my_log(100)
my_log(100, 100)
- By assigning
base=10in the parameter specification, we default the value ofbaseto10. - We can still override the value of
baselike a normal parameter.
In-place functions
Python has a few in-place functions/methods that modify the object directly and usually returns nothing (this is a coding pattern in Python, not a result of in-place algorithms). Here’s an example of a common mistake
demo = [1, -1, 0]
sorted_demo = demo.sort()
print(demo) # prints [-1, 0, 1]
print(sorted_demo) # prints None
list.sort() sorts the list demo directly so demo is modified.
The method, however, does not return anything so sorted_demo will
be assigned None. The common mistake is to assume all functions will
return an object and see code crash because sorted_demo is None instead
of a list.
In-place operations are attractive because they do not require doubling the
memory when working with data. If demo and sorted_demo
both existed, then you now have two copies of the data (different order).
If these lists were much larger, this will quickly become a problem.
There unfortunately is no standard around naming for in-place functions and
you need to read the documentation. The popular package pandas often has
options to do in-place operations although its default is to return a new
copy of data.