๐Python notes
pip: upgrade all packages in a virtual env
pip list --outdated --format=freeze | grep -v '^\-e' | cut -d = -f 1 | xargs -n1 pip install -U
Modern Python (2021 edition)
I use Python heavily but still prefer tools and libraries from the 2.x days. Here are some newer tools/packages/learnings that I want to upgrade to. Some of the notes below are from reading Expert Python Programming, 3rd Edition.
black
- Auto-formats your files, like
go fmt
. Just run and forget. - I just selected this as the formatter in vscode. For manual usage, just pip install and run:
filegen/app โฏ black main.py
reformatted main.py
All done! โจ ๐ฐ โจ
1 file reformatted.
poetry
- Creates its own virtualenv so don't install it in one. Instead do this:
pip install --user poetry
Usage:
poetry new some-project # initializes folder structure for src, tests, toml file
Or if you already have a project and just want to initialize the .toml file:
poetry init
The pyproject.toml
structure conforms to PEP-518.
Then, activate the virtual env with:
poetry shell
For adding packages, the only way is with:
poetry add <package>
This will update the toml file as well. You can then lock dependencies with:
poetry lock
wait-for-it
Run a command when a port is open. Example:
wait-for-it --service 0.0.0.0:8000 -- curl localhost:8000
Debugging
Here's a 2 liner that drops you into a shell:
python3 -m pdb -c continue script.py
For setting breakpoints, do this:
breakpoint()
# or this:
import pdb; pdb.set_trace()
# or this if you want a better shell:
import ipdb; ipdb.set_trace()
Get help with help pdb
.
There is also something called development mode for some extra runtime checks, turn it on with one of these:
python -X dev app.py
# or
PYTHONDEVMODE=1 app.py
This gives better stack traces on receiving system signals or segfaults, deprecation warnings, unsafe GIL usage etc.
There is also something called development mode for some extra runtime checks, turn it on with one of these:
python -X dev app.py
# or
PYTHONDEVMODE=1 app.py
This gives better stack traces on receiving signals, deprecation warnings, unsafe GIL usage etc.
See python paths
This shows the location of the user's site-packages folder, the sys.path
contents, etc.
python3 -m site
Merging dictionaries
Newer python versions:
>>> {'a':1,'b':2} | {'b':20,'c':30}
{'a': 1, 'b': 20, 'c': 30}
Right most dict gets the value if there's a clash.
Alternative way, using collections.ChainMap
:
>>> import collections
>>> collections.ChainMap({'a':1,'b':2}, {'b':20,'c':30})
ChainMap({'a': 1, 'b': 2}, {'b': 20, 'c': 30})
>>>
In this case the left dict gets priority. ChainMap just uses the referenes so if the underlying dicts change, ChainMap knows about it.
Type Hints
Generic types: tuple
, list
, dict
, set
, frozenset
etc. For dicts you need to use
dict[KeyType, ValueType]
. There is also typing.Any
if your return value is unknown, for
instance.
map/filter/reduce
Map takes the function (in this case just squares your input), and applies it to the iterable (in
this case the range
).
list(map(lambda x: x**2, range(10)))
Filter takes an iterable and runs a function against each element one by one. e.g. to prevent even numbers:
list(filter(lambda x: x%2 == 0, range(10)))
Reduce is now in functools
. It reduces an iterable to a single value. i.e. it takes reduce(func,
[a, b, c, d])
and does this: func(func(func(a, b), c), d)
. E.g to sum numbers in a range:
from functools import reduce
reduce(lambda x, y: x + y, range(10))
Since these returns an iterator, you can call next()
to get the next result.
Concurrency
Best definition is from Leslie Lamport:
"Two events are concurrent if neither can causally affect the other."
So, concurrency is a propery of a program or problem or algorithm, while parallelism is one of the ways of achieving this.
multithreading
- Use
Queue
for communication - Use thread pools and throttling so you don't overload yourself (or the remote party)
- Don't print inside threads, send your output to a results queue and print in main() at the end
multiprocessing
The in-built multiprocessing
package has interesting tools for
sharing data:
Pipe
for simple socket-like 2 way communication. Has asend()
,recv()
etc.Queue
which is just like the standard queuesharedctypes
for C types in a dedicated pool of memory
You can replace multiprocessing
with multiprocessing.dummy
and you
write exactly the same code but with threads instead of processes in
the background.
async
This is co-operative so someone must relinquish execution control so that the next process can get a slice.1
You can feed a bunch of coroutines to asyncio.gather()
. It returns a
single future
object, which can then be run in the even loop.
Use await asyncio.sleep()
instead of time.sleep()
to yield control.
Use concurrent.futures
to take a synchronous library and run it
under the hood in a multithreaded manner but by using async
patterns. It has a ThreadPoolExecutor
and ProcessPoolExecutor
to
do so. Once you submit
a function to these, you will get back a
Future
object, and you can get the actual result in
Future.result()
. Instead of run_in_loop
you would do a
run_in_executor
.
Event-Driven Programming
- Callback-based: 1:1 binding between the object that emits the event and the handler that executes some logic based on that event. e.g. GUI toolkits.
- Subject-based: Many subscribers can
register
to a Subject, which will thennotify
all subscribers whenever any event is triggered. one-to-many. - Topic-based: Write to a topic, and forget about it. One or more handlers can subscribe to that topic. Completely decoupled.
Metaprogramming
Useful packages:
inflection
macropy3
falcon
hy
Class decorators return a class instead of a function object, example
@dataclass
and @runtime_checkable
.
Design patterns
Observer:
- Like a presence subscription. multiple obvservers get notified when a core object changes.
- Also see @property to create setters/getters for attributes. That way when an attribute changes, we can make a call to update() and let the observers know.
Strategy:
- Different implementations hidden beneath a single abstraction. e.g. user calls sort() but does not care which algorithm is used.
- Create different classes and implement
__call__
, so that when an object is created, that function is called automatically.
Template:
- Differing steps in separate classes, one controller.
Unicode, and Encode vs Decode
bytes.decode
: interpret a stream of bytes (from a file or the network) as unicode, with a passed charset like UTF-8 or latin-1.str.encode
: take a unicode string and convert it to bytes, to write to file or send over a network.- deep breath both byte strings and unicode have a method to convert to the other. byte.decode('utf-8') takes a byte stream and represents it as a UTF-8 string. utf.encode() takes a utf-8 string and represents it as a byte stream (to send out over a network, for example).
encode()
accepts an arg that allows you to represent as ascii, but replace out-of-range characters with ???? or something else.- Python2: str: stream of bytes, unicode: unicode.
- Python3: str: unicode, bytes: stream of bytes.
- Python2 implicitly tries to change bytes to unicode and vice versa to be 'helpful'. Python3 doesn't.
- A recommendation: bytes on the outside -> decode to utf-8 -> use unicode inside -> encode to bytes -> send out.
IOBuffer
andFileBuffer
are like dummy network/file interfaces (useful for tests?)- ISO-8859-1 aka Latin-1 is a single byte extension to ascii that supports a few extra symbols.
Coverage usage
Run this:
coverage run scriptname args
Results are stored in .coverage
. If the script has different args,
backup .coverage
, run again and then combine them:
coverage combine .coverage.old .coverage
For a quick summary (-m
specifies the missing lines):
coverage report -m
For a report (-d
specifies the dest folder):
coverage html -d coverage_html
Decorator example
def my_decorator(func):
def wrapper(*args, **kwargs):
# do pre-processing
ret = func(*args, **kwargs)
# do post-processing
return ret
return wrapper
i.e., return a function (wrapper) that calls func with its args, but do your pre/post stuff within wrapper.
Django notes
- Change your models (in models.py).
- Run
python manage.py makemigrations
to create migrations for those changes - Run
python manage.py migrate
to apply those changes to the database.
Magic number
The python interpreter's magic number is found in:
importlib.util.MAGIC_NUMBER
.pyc file format:
magic number | mtime | size | marshaled code object
Dunder methods
- String/bytes representation
__repr__, __str__, __format__, __bytes__
- Conversion to number
__abs__, __bool__, __complex__, __int__, __float__, __hash__, __index__
- Emulating collections
__len__, __getitem__, __setitem__, __delitem__, __contains__
- Iteration
__iter__, __reversed__, __next__
- Emulating callables
__call__
- Context management
__enter__, __exit__
- Instance creation and destruction
__new__, __init__, __del__
- Attribute management
__getattr__, __getattribute__, __setattr__, __delattr__, __dir__
- Attribute descriptors
__get__, __set__, __delete__
- Class services
__prepare__, __instancecheck__, __subclasscheck__
Pickling
- The last byte of a pickled file is a '.' Anything that follows it is ignored.
- This would be a way to embed a jpeg or something to the end of a pickle file..??
Staticmethod vs Classmethod
- staticmethod: can be called directly from the class, without instantiating an object.
- classmethod: exactly the same.
- BUT: when a staticmethod is called from an object, the 'self' parameter is never passed, the way it is for classmethods and normal methods.
Import files with hyphens
test_cmd = importlib.import_module('test-cmd', None)