Arrays

Often we desire to perform the same operation on numerous values at the same time. This can be accomplished using the numpy module.

The use numpy, we have to import it to our workspace:

import numpy as np

Why np? The imported name is shortened to np for better readability of code using NumPy. This is a widely adopted convention that makes your code more readable for everyone working on it.

Once imported, we can its attributes via the . operator.

Array Creation

Use np.array to create arrays from lists/tuples.

1D Arrays

Lets create a numpy array of 5 numbers:

a = np.array([3,4,7,8,9])
a

array([3, 4, 7, 8, 9])

b = np.arange(1,6) # Could you tell what this does?
b

array([1, 2, 3, 4, 5])

NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape.

a.shape

(5,)

b.shape

(5,)

The shape shows the number of elements in each dimension. In the case of the arrays above, they have one dimension. Thus the shape is a tuple of length 1. With the number 5 representing the size of array in that dimension.

Doing math:

a + b

array([ 4,  6, 10, 12, 14])

a - b

array([2, 2, 4, 4, 4])

a * b

array([ 3,  8, 21, 32, 45])

a / b

array([3.        , 2.        , 2.33333333, 2.        , 1.8       ])

a % b

array([0, 0, 1, 0, 4])

a ** b

array([    3,    16,   343,  4096, 59049])

a // b

array([3, 2, 2, 2, 1])

Suppose we want to use a scalar and an array? NumPy’s broadcasting rule relaxes this constraint when the arrays’ shapes meet certain constraints. The simplest broadcasting example occurs when an array and a scalar value are combined in an operation:

a + 3 # adds 3 to every element

array([ 6,  7, 10, 11, 12])

a * 4 # multiplies every element by 4

array([12, 16, 28, 32, 36])

Indexing

For 1D arrays, Indexing remains the same as done with lists and tuples

a[0] # get the first element

a[0] = 10 # set the first element to 10
a[::-1] # reverse the array

array([ 9,  8,  7,  4, 10])

np.flip(a) # reverse the array

array([ 9,  8,  7,  4, 10])

a[:4] # from the first to the 4th element.

array([10,  4,  7,  8])

2D Arrays:

a2 = np.array([[1,2],[3,4]])
a2

array([[1, 2],
       [3, 4]])

a2.shape

(2, 2)

b2 = np.array([[5,6,7,8]])
b2

array([[5, 6, 7, 8]])

b2.shape

(1, 4)

c = np.array([[5],[6],[7],[8]])
c

array([[5],
       [6],
       [7],
       [8]])

c.shape

(4, 1)

All of these are two dimensional arrays. But notice the difference in the shapes. array a is a \(2 \times 2\) array, b is a \(1\times 4\) array and c is a \(4\times 1\) array.

Indexing

This is abit different than previously learned. We use the normal \(X_{ij}\) notation where \(i\) represents the rows and \(j\) represents the columns. This is used in conjunction with the extraction operator []

a3 = np.array([[5,6,7,8],[9,10,11,12], [13,14,15,16]])
a3[0] # the first object. ie row 0

array([5, 6, 7, 8])

a3[0][0] # the first element of the first object.

a3[0,0] # the first element. ie the object at row 0 and column 0(same as above)

a3[1,:] # the entirety of row 1. ie row 1 for all columns.

array([ 9, 10, 11, 12])

a3[:,0] # the entirety of column 0

array([ 5,  9, 13])

a3[1:,:2] # rows 1 to end for columns 0 and 1

array([[ 9, 10],
       [13, 14]])

Note that for entire axes, we use the elipses instead of the colon

a3[...,0]

array([ 5,  9, 13])

You might see the different when using higher dimensions:

arr_dim3 = np.arange(18).reshape(2,3,3)
arr_dim3

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]]])

arr_dim3.shape

(2, 3, 3)

arr_dim3[:,0] # first row for every array

array([[ 0,  1,  2],
       [ 9, 10, 11]])

arr_dim3[...,0] # first column for every array.

array([[ 0,  3,  6],
       [ 9, 12, 15]])

arr_dim3[:, :, 0]

array([[ 0,  3,  6],
       [ 9, 12, 15]])

Broadcasting

a2

array([[1, 2],
       [3, 4]])

a2 + a2

array([[2, 4],
       [6, 8]])

a2 * a2 # elementwise

array([[ 1,  4],
       [ 9, 16]])

Can we do math operations on arrays of different shapes?

a2

array([[1, 2],
       [3, 4]])

b2

array([[5, 6, 7, 8]])

array([[5],
       [6],
       [7],
       [8]])

a2 + b2

operands could not be broadcast together with shapes (2,2) (1,4)

a2 + c

operands could not be broadcast together with shapes (2,2) (4,1)

b2 + c

array([[10, 11, 12, 13],
       [11, 12, 13, 14],
       [12, 13, 14, 15],
       [13, 14, 15, 16]])

Only b2 + c worked. Why? How was numpy able to do the above? By using what is known as broadcasting. The above is the simplest notion of broadcasting. Broadcasting is simply stretching an array along a given dimension to match the size of another array in that dimension. The only time broadcasting occurs is when one array is of length 1 while the other is of another size within a particular dimension.

Notice how with broadcasting, we did not have to write a loop to do all the sums.

How can we use this to do math?

Suppose we have been given an array a which is 1D. A simple question is to find the distance between each point to the rest. in this case, we could use broadcasting. But how?

First lets look at a loop solution

a = [3,4,7,8,9]
a

[3, 4, 7, 8, 9]

We need a \(4\times 4\) as the results. ie

np.array([[abs(i-j) for j in a] for i in a])

array([[0, 1, 4, 5, 6],
       [1, 0, 3, 4, 5],
       [4, 3, 0, 1, 2],
       [5, 4, 1, 0, 1],
       [6, 5, 2, 1, 0]])

How can we do the same using broadcasting?

All we have to do is to ensure that when subtracting \(a\) from itself, the next \(a\) need to have a different dimension, and then python will stretch the two to match.

arr = np.array(a)
arr.reshape(5,1)

array([[3],
       [4],
       [7],
       [8],
       [9]])

abs(arr - arr.reshape(5, 1))

array([[0, 1, 4, 5, 6],
       [1, 0, 3, 4, 5],
       [4, 3, 0, 1, 2],
       [5, 4, 1, 0, 1],
       [6, 5, 2, 1, 0]])

In the above, we used reshape(5,1) to turn the vector to a matrix. Here we specified that there should be 5 rows and 1 column. Well sometimes we do not know the first size before hand and therefore need while reshaping. Here are the other ways:

arr[:, None]

array([[3],
       [4],
       [7],
       [8],
       [9]])

abs(arr - arr[:, None])

array([[0, 1, 4, 5, 6],
       [1, 0, 3, 4, 5],
       [4, 3, 0, 1, 2],
       [5, 4, 1, 0, 1],
       [6, 5, 2, 1, 0]])

abs(arr - arr.reshape(-1,1)) # We use -1 to tell the computer to calculate for us

array([[0, 1, 4, 5, 6],
       [1, 0, 3, 4, 5],
       [4, 3, 0, 1, 2],
       [5, 4, 1, 0, 1],
       [6, 5, 2, 1, 0]])

abs(arr - arr[:, np.newaxis])

array([[0, 1, 4, 5, 6],
       [1, 0, 3, 4, 5],
       [4, 3, 0, 1, 2],
       [5, 4, 1, 0, 1],
       [6, 5, 2, 1, 0]])

The way to rearrange the arrays is very important and you need to know this.

More examples:

a3 = np.array([[5,6,7],[9,10,11], [13,14,15]])

Suppose we wanted to subtract 5,9,13 from the first row, then 5,9,13 from the second row and also from the 3rd row. How will we do this?

b = np.array([5,9,13])

Note that since it is row wise , we only need the first dimension ie the row to match. They are already matching, ie, b is already packed as one unit of 3 elements, and for a3, each unit has 3 elements. we can therefore directly do the subtraction

a3 - b

array([[ 0, -3, -6],
       [ 4,  1, -2],
       [ 8,  5,  2]])

What if we wanted to Subtract 5 from the first row, 9 from the second row and 13 from the third row? We use broadcasting. We ensure the object to be subtracted is packed as a unit. Note that we have to arrange b such that it has 3 rows and 1 element per row.

To do the task,:

a3.shape

(3, 3)

We can then do:

b.shape = (3, 1) # similar to b.shape = 3, 1
b

array([[ 5],
       [ 9],
       [13]])

Now we can do the subtraction:

a3 - b

array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])

Note that the method I used above for replacing the shape by a tuple changes b completely. I now have a b that is of shaped as (3, 1). If there were other operations that were to depend on the previous version of b they would fail if the dimension did not align. Thus instead of doing an inplace replacement, we should use one of the methods shown above.

Firs lets revert back b to how it was

b = b.flatten() # or b.shape = (3,) or b = b.ravel()
b

array([ 5,  9, 13])

Now lets use any of the methods previously introduced.

a3 - b[:, None]

array([[0, 1, 2],
       [0, 1, 2],
       [0, 1, 2]])

Now what if we wanted to subtract 5 from all the rows and columns, then 9 from all the rows then 13 from all the rows?. Well alittle advanced:

a3 - b[:,None,None]

array([[[ 0,  1,  2],
        [ 4,  5,  6],
        [ 8,  9, 10]],

       [[-4, -3, -2],
        [ 0,  1,  2],
        [ 4,  5,  6]],

       [[-8, -7, -6],
        [-4, -3, -2],
        [ 0,  1,  2]]])

Anyway dont worry about the last example. But notice how 5 was subtracted from the whole of a3 then 9 then 13. We get three matrices.

Broadcasting is useful as it alleviates the need for loops whenever the operations are independent.

Question

Click here

flowers = [[1,6],[3,7],[9,12],[4,13]]
people = [2,3,7,11]
flowers = np.array(flowers)
people = np.array(people)[:, None]
((flowers[:,0] <= people) & (people <= flowers[:,1])).sum(1)

array([1, 2, 2, 2])

Elementary Math functions

There are a lot of elementary functions. Some basic are:

Polynomial functions:

np.sqrt([0,4,9])

array([0., 2., 3.])

np.cbrt([0,27])

array([0., 3.])

np.array([0,4,5])**2

array([ 0, 16, 25])

Exponential and Logarithm

np.e # constant e

2.718281828459045

np.exp([0,1,2]) # same as above e^x function

array([1.        , 2.71828183, 7.3890561 ])

np.exp2([0,1,2]) # 2^x

array([1., 2., 4.])

np.expm1([0,1,2]) # similar to e^x - 1. To provide greater precision for small x

array([0.        , 1.71828183, 6.3890561 ])

np.log([10, 20, 30]) # log base e.

array([2.30258509, 2.99573227, 3.40119738])

np.log10([10,100,1000]) # log base 10

array([1., 2., 3.])

np.log1p([1,2,3]) # log base e of 1+x. high precision for small x.

array([0.69314718, 1.09861229, 1.38629436])

np.log2([2,4, 8, 16]) # log base 2

array([1., 2., 3., 4.])

Trigonometric and hyperbolic functions

np.sin(np.radians(30)) # np.sine(30*np.pi/180) We are used to degrees

0.49999999999999994

np.degrees(np.arcsin(0.5)) #inverse of sin function

30.000000000000004

np.cosh(2) # hyperbolic cosine

3.7621956910836314

(np.exp(2) + np.exp(-2))/2 # definition of hyperbolic sine

3.7621956910836314

Other trigonometric functions could be found here and here.

The hyperbolic functions could be found here and their relations to the trigonometric functions could be found here

Methods

arr = np.array([[1, -1, 2, 3, 5], [4,-8,-3,0,5]])

Instance Methods

arr.min() #minimum for the whole array

-8

arr.min(0) # minimum across the rows (along the column)

array([ 1, -8, -3,  0,  5])

arr.min(1) # minimum across the columns (along the row)

array([-1, -8])

arr.argmin() # position where minimum occurs

arr.argmin(0) # position where minimum occurs across rows/along columns

array([0, 1, 1, 1, 0], dtype=int64)

arr.argmin(1) # similar to arr.argmin(axis = 1). Likewise for the above

array([1, 1], dtype=int64)

arr.max()  #arr.max((0, 1)) or arr.max(axis = (0, 1))
arr.max(0) # arr.max(axis = 0)
arr.max(1)
arr.sum()
arr.sum(0)
arr.sum(1)
arr.mean()
arr.mean(0)
arr.mean(1)
arr.std()
arr.std(0)
arr.var()
arr.var(0)
arr.cumsum()
arr.cumsum(0)
arr.cumsum(1)
arr.prod()
arr.prod(0)
arr.cumprod()
arr.sort()# CAUTION. DOES INPLACE ORDERING, AND CHANGES THE ORIGINAL ARRAY
arr.sort(0) # CAUTION. DOES INPLACE ORDERING, AND CHANGES THE ORIGINAL ARRAY
arr.argsort() # returns indices at which the current values should be for the array to be sorted
arr.argsort(0)
(arr >= 0).all(0)
(arr >= 0).any(0)

You could get other instance methods by typing the instance and period the pressing on the tab key.

Question:

Find the distance matrix for the ar below. Where the distance is defined as:

\[ d(x_i, x_j) = \sqrt{\sum_{l} (x_{il} - x_{jl})^2} \]

ar = np.arange(10).reshape(-1, 2)

Module numeric array methods.

All the instance method are just an inheritance of the class methods. Thus for every every instance method, there is an equivalent class method.

np.sum(arr)
np.sum(arr, 1)
np.sum(arr, axis = 1)
np.min(arr)

In addition, the module provides extra functions that are not inherited by the class

np.minimum(arr, -10) # minimim per element

array([[-10, -10, -10, -10, -10],
       [-10, -10, -10, -10, -10]])

np.maximum(arr, 0) # maximum per element

array([[1, 0, 2, 3, 5],
       [4, 0, 0, 0, 5]])

np.add(arr, arr) # seems redundant? But its not. We will see

array([[  2,  -2,   4,   6,  10],
       [  8, -16,  -6,   0,  10]])

The comparison above does not do justice the the provided functions. Lets take another example.

np.minimum([1,4,7,9],[0,5,8,3])

array([0, 4, 7, 3])

Also the module provided the equivalent functions to be carried out whenever there are nan’s in an array:

np.nansum(arr)
np.nansum(arr, axis = 1)
np.nanmean(arr)
np.nanmax(arr) 
np.nancumsum(arr) # etc

Other vector/array functions

arr

array([[ 1, -1,  2,  3,  5],
       [ 4, -8, -3,  0,  5]])

np.where Think of this in 2 ways.

As a vectorized if else ternary operator:

np.where(arr >= 4, 0, 5) # 0 if arr>4 else 5

array([[5, 5, 5, 5, 0],
       [0, 5, 5, 5, 0]])

np.where(arr>=4, arr - 1, abs(arr)) #

array([[1, 1, 2, 3, 4],
       [3, 8, 3, 0, 4]])

Compare the following:

a = np.array([1,4,7,9])
b = np.array([0,5,8,3])
np.minimum(a, b)

array([0, 4, 7, 3])

np.where(a<b, a, b)

array([0, 4, 7, 3])

As a vectorized find function. ie gives index where the condition is True

np.where(arr >= 4)

(array([0, 1, 1], dtype=int64), array([4, 0, 4], dtype=int64))

np.select vectorized generic elif statement. ie nested np.where

EG: grading scale: A : >=90, B: >=80, C: >=70; D: >=60, E: >=50, F: <50

grade = np.array([97, 90, 72, 89, 50, 23])
np.where(grade >=90, "A", np.where(grade>=80, "B", np.where(grade>=70, "C", np.where(grade>=60, "D", np.where(grade>=50, "E","F")))))

array(['A', 'A', 'C', 'B', 'E', 'F'], dtype='<U1')

That was a long one. We simply use np.select:

conditions = [grade>=90, grade>=80, grade>=70, grade>=60, grade>=50, grade<50]
choices = 'A','B','C','D','E','F'
np.select(conditions, choices)

array(['A', 'A', 'C', 'B', 'E', 'F'], dtype='<U3')

Note that the else part, ie the very last condition could be omitted and a default value passed to the np.select function. eg:

conditions = [grade>=90, grade>=80, grade>=70, grade>=60, grade>=50]
choices = 'ABCDE'
np.select(conditions, choices, 'F')

array(['A', 'A', 'C', 'B', 'E', 'F'], dtype='<U1')

np.in1d Determines as to whether elements of one array are in the other. Note that the function is specifically named 1d as it deals with 1d arrays:
```
np.in1d([1,3,4,6], [2,5,7,6,3])
```
```
array([False,  True, False,  True])
```

np.unique. Among top 3 most useful for data science. Determines the unique values in an array, their positions, their counts, etc

arr1 = [1,2,2,2,1,1,1,3,3,6,3,2,2,4,4,4]
np.unique(arr1)

array([1, 2, 3, 4, 6])

np.unique(arr1, return_index = True)

(array([1, 2, 3, 4, 6]), array([ 0,  1,  7, 13,  9], dtype=int64))

np.unique(arr1, return_inverse = True)

(array([1, 2, 3, 4, 6]), array([0, 1, 1, 1, 0, 0, 0, 2, 2, 4, 2, 1, 1, 3, 3, 3], dtype=int64))

np.unique(arr1, return_counts = True)

(array([1, 2, 3, 4, 6]), array([4, 5, 3, 3, 1], dtype=int64))

np.unique(arr1, return_index = True, return_inverse = True, return_counts = True)

(array([1, 2, 3, 4, 6]), array([ 0,  1,  7, 13,  9], dtype=int64), array([0, 1, 1, 1, 0, 0, 0, 2, 2, 4, 2, 1, 1, 3, 3, 3], dtype=int64), array([4, 5, 3, 3, 1], dtype=int64))

np.diff returns the differences w.r.t a certain order. Note that the first difference is the difference between the next point and the current point

arr = np.array([[1,3,5,6,9],[2,9,7,4,10]])
np.diff(arr)

array([[ 2,  2,  1,  3],
       [ 7, -2, -3,  6]])

np.diff(arr, axis = 0)

array([[ 1,  6,  2, -2,  1]])

np.diff(arr, n=2)

array([[ 0, -1,  2],
       [-9, -1,  9]])

np.diff(arr, n = 2, axis = 0)

array([], shape=(0, 5), dtype=int32)

np.concatenate Used to combine multiple arrays into one array

a1 = np.arange(12).reshape(-1,3)
a2 = np.arange(13,25).reshape(-1,3)
a3 = np.array([1,2,3])
np.concatenate([a1, a2])

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [13, 14, 15],
       [16, 17, 18],
       [19, 20, 21],
       [22, 23, 24]])

np.concatenate([a1, a2], axis=1)

array([[ 0,  1,  2, 13, 14, 15],
       [ 3,  4,  5, 16, 17, 18],
       [ 6,  7,  8, 19, 20, 21],
       [ 9, 10, 11, 22, 23, 24]])

np.row_stack - Stacks the arrays row-wise
np.column_stack - Stacks the arrays column-wise
np.hstack - Stacks the arrays horizontally. Equal to np.column_stack IFF the arrays have more than one dimension
np.vstack - Stacks the arrays vertically. Equivalent to np.row_stack

Here is a link to showcase the differences about the 4 functions above.

np.r_

np.r_[a1, a2]

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [13, 14, 15],
       [16, 17, 18],
       [19, 20, 21],
       [22, 23, 24]])

np.r_[0, 1:10, a3]

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3])

np.c_

np.c_[a1, a2]

array([[ 0,  1,  2, 13, 14, 15],
       [ 3,  4,  5, 16, 17, 18],
       [ 6,  7,  8, 19, 20, 21],
       [ 9, 10, 11, 22, 23, 24]])

np.append only works with 2 input arrays.

np.append(a1, a2)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24])

np.append(a1, a2, axis = 1)

array([[ 0,  1,  2, 13, 14, 15],
       [ 3,  4,  5, 16, 17, 18],
       [ 6,  7,  8, 19, 20, 21],
       [ 9, 10, 11, 22, 23, 24]])

Matrix Operations/ functions

The best way to work with arrays is to use matrix operations. These operations allow us to easily manipulate data. So me of the functions include:

T /transpose. Transpose your matrix/array. transpose is generic as it passes axes to be transposed.

a1.T

array([[ 0,  3,  6,  9],
       [ 1,  4,  7, 10],
       [ 2,  5,  8, 11]])

a1.transpose()

array([[ 0,  3,  6,  9],
       [ 1,  4,  7, 10],
       [ 2,  5,  8, 11]])

@ Used for matrix multiplication/ inner product for vectors

a3 @ a3

a3.dot(a3)

a2 @ a3

array([ 86, 104, 122, 140])

Universal functions

Suppose you have 3 students who did different exams. You are asked to find the average of the students. eg

students = [1, 1, 2, 2, 1, 1, 3, 3, 2]
grades = [90, 70, 87, 84, 65, 87, 98, 99, 86]

Note that each grade corresponds to a student. to visualize, we have:

student 1: 90, 70, 65, 87
student 2: 87, 84, 86
student 3: 98, 99

How shall we go about this? So far, we know that numpy arrays only hold rectangular data. ie, data that can is of the same shape.

We could revert back to the for-loops or simply start thinking for various ways to solve. Luckily, numpy provides a way out. Notice how so far we have been passing only the axis into the functions, there are other arguments to be passed in.

We could create a matrix but also have indices to indicate those elements that we are interested in:

grades_mat = np.array([[90,70,65,87],
          [87,84,86,np.nan],
          [98,99,np.nan,np.nan]])
np.nanmean(grades_mat,1) # average per student.

array([78.        , 85.66666667, 98.5       ])

grades_mat.mean(1, where = ~np.isnan(grades_mat))# average per student

array([78.        , 85.66666667, 98.5       ])

This method would require us to manually manipulate the data, add the nans then solve the problem. That will be tedious. Is there a way to directly use the data given? Yes:

students = np.array(students)
grades = np.array(grades)
sort_index = np.argsort(students)
sorted_grades = grades[sort_index]
_, idx, counts = np.unique(students[sort_index], return_index = True, return_counts = True)
np.add.reduceat(sorted_grades, idx)/counts

array([78.        , 85.66666667, 98.5       ])

The second method is quite intriguing. We did not have to manually structure our data in a certain way. we just had to use reduceat method provided by the add function.

Note that if we want the maximum per student, we use the reduceat provided by the universal function maximum. The rest of the code remains the same:

np.minimum.reduceat(sorted_grades, idx)

array([65, 84, 98])

np.maximum.reduceat(sorted_grades, idx)

array([90, 87, 99])

Extra:

Note that we could do the same using python’s STL.

[sum(vec:=[grade for stud, grade in zip(students, grades) if stud ==i ])/len(vec) for i in set(students)]

[78.0, 85.66666666666667, 98.5]

Note how we used the inbuilt universal functions. We could write a function and vectorize it.

Ways to vectorize a function:

np.vectorize
np.frompyfunc

Note that at times we are just interested to apply the function along a given axis or over some axes. Use the functions:

np.apply_along_axis
np.apply_over_axes

Time wont allow me to talk of loading data into python using numpy, of dealing with character arrays, of rolling windows/strides tricks using the np.libs.stride_tricks module, of padding arrays using the np.lib.arraypad module, of convolutions etc.

There is still alot to learn from this package that we haven.t scratched the surface. Once you grasp what is happening, and you could respond to problems, then you are ready for DATA SCIENCE. are you ready?

Options :

Move directly to Data Science ie scipy, sklearn , statsmodels . Need to learn np.linalg module
Move to data Analytics ie pandas json sql pyspark siuba - Pandas an extension of numpy for data frames, thereby no extra numpy knowledge needed. Pandas would provide easy ways to solve.

Both of the above still require Data Visualization – using matplotlib and seaborn This is someone you can easily learn.

Question

https://stackoverflow.com/questions/77262300/how-do-i-filter-on-multiple-criteria-in-group-by

https://stackoverflow.com/questions/77260897/r-how-to-do-the-rowwise-mutate-operation

https://stackoverflow.com/questions/77262541/loop-combination-two-columns-sums-in-r

https://stackoverflow.com/questions/77262398/update-values-in-df-after-groupby-and-get-group