Why does zip(*[iter(s)]*n) chunk s into n chunks in Python?

This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n).

python.org zip docs

During this year’s Advent of Code, I’ve been learning python, and that snippet above blew my tiny little python mind. Let’s take a look at everything that’s going on here!

chunk = lambda s, n: zip(*[iter(s)] * n)
chunk(range(0, 9), 3) # returns iterator((0, 1, 2), (3, 4, 5), (6, 7, 8))
  • iter(s) turns s into an iterator. As a caller calls next(iterator), the iterator is used up.

    my_iterator = iter([1, 2, 3, 4])
    for x in my_iterator:
        print(x) # prints 1, 2, 3, 4
    for y in my_iterator:
        print(y) # does nothing! there's nothing left in my_iterator
    next(my_iterator) # raises StopIteration exception
    
  • multiplying a list by n creates n new copies of the list. It doesn’t do a deep copy, so when you do [iter(range(0, 9))] every list gets the same iterator.

    my_iterator = iter([1, 2, 3])
    my_iterator_list = [my_iterator] * 9
    
    next(my_iterator_list[0]) # 1
    next(my_iterator_list[1]) # 2
    next(my_iterator_list[8]) # 3
    next(my_iterator_list[5]) # raises StopIteration exception
    
  • fn(*[1, 2, 3]) lets you call a function with the entries in the list expanded as the parameters of the function.

    # behold! a useless example
    def takes_two_arguments (a, b):
        return a > b
    
    takes_two_arguments(1, 2) == takes_two_arguments(*[1, 2]) # True
    
  • zip grabs one value from each of its arguments.

    zip([1, 2, 3], [4, 5, 6], [7, 8, 9])
    # iterator<(1, 4, 7), (2, 5, 6), (3, 6, 9)>
    
  • when you ask zip to pull from the same iterator multiple times, you end up chunking your array! If you pass in the same iterator to zip three times, zip calls that iterator three times to construct the first value that it yields back. It call that same iterator three more times to construct the next argument that it yield back.

    # chunk = lambda s, n: zip(*[iter(s)] * n)
    l = range(0, 10)
    iterable_l = iter(l)
    chunked_l = zip(iterable_l, iterable_l, iterable_l)
    
  • The order of operations for fn(*[iter(s)] * n) is fn(*([iter(s)] * n)). The argument spread happens after the list multiplication

  • putting that all together, we end up with an “idiomatic” way to chunk a list:

    chunk = lambda s, n: zip(*[iter(s)]*n)
    chunk(range(0, 10), 3) # iter((0, 1, 2), (3, 4, 5), (6, 7, 8))
    

I think this is a coding idiom I’m going to stay away from for now. I’m not yet fluent enough with python to be confident reading or writing a line like zip(*[iter(range(0, 100))] * 10), but I’m glad I can at least puzzle through it!