-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathindex.html
More file actions
270 lines (190 loc) · 10.6 KB
/
index.html
File metadata and controls
270 lines (190 loc) · 10.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
<!doctype html>
<!doctype html>
<head>
<title>CS106A</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta http-equiv="content-type" content="text/html; charset=UTF8">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/css/bootstrap.min.css" integrity="sha384-Gn5384xqQ1aoWXA+058RXPxPg6fy4IWvTNh0E263XmFcJlSAwiGgFAW/dAiS6JXm" crossorigin="anonymous">
<link rel="stylesheet" type="text/css" href="../css/style.css">
<!-- Java Script -->
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
<script src="../plugins/moment.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.12.9/umd/popper.min.js" integrity="sha384-ApNbgh9B+Y1QKtv3Rn7W3mgPxhU9K/ScQsAP7hUibX39j7fakFPskvXusvfa0b4Q" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0/js/bootstrap.min.js" integrity="sha384-JZR6Spejh4U02d8jOt6vLEHfe/JQGiRRSQQxSfFWpi1MquVdAyjUar5+76PVCmYl" crossorigin="anonymous"></script>
<!-- Python highlighting -->
<script src="../plugins/prism/prism.js"></script>
<link href="../plugins/prism/prism.css" rel="stylesheet" />
<!-- Probability Packages -->
<script src="../plugins/probability/gaussian.js"></script>
<script src="../plugins/color.js"></script>
<!-- font awesome -->
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.0/css/all.css" integrity="sha384-lZN37f5QGtY3VHgisS14W3ExzMWZxybE1SJSEsQp9S+oqd12jhcu+A56Ebc1zFSJ" crossorigin="anonymous">
<!-- SWAL -->
<script src="https://cdn.jsdelivr.net/npm/sweetalert2@9"></script>
<!-- Stanford -->
<link href='https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,600,700' rel='stylesheet' type='text/css'>
<link href='https://fonts.googleapis.com/css?family=Source+Serif+Pro:400,600,700' rel='stylesheet' type='text/css'>
<!-- Math Jax -->
<script type="text/x-mathjax-config">
MathJax.Hub.Config({tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}});
</script>
<script src="../plugins/math.min.js"></script>
<script src='https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'></script>
</head>
<body>
<div class="container">
<script>
// arg is button followed by div which it does show/hide alternately
function showHide(button) {
if (button.nextElementSibling.style.display == "block") {
button.nextElementSibling.style.display = "none";
} else {
button.nextElementSibling.style.display = "block";
}
}
</script>
<h1>Python Dict</h1>
<p>The dict "dictionary" type is very important because it takes in disorganized data, organizes it,
and it is <b>fast</b>. The dict here is based on the "hash table" data structure.
You will learn how to build and analyze a hash table in CS106B, here we're happy to just use them. Many important algorithms leverage hash tables in some way since they can handle large data sets and remain fast.
'dict' is the name of the Python dict type, so we'll use just 'd' as a generic variable name.
<pre>
d = {} # Create empty dict
d['ca'] = 'California' # 1. Set key/value pairs into dict
d['ok'] = 'Oklahoma'
d['nj'] = 'New Jersey'
d['tx'] = 'Texas'
val = d['nj'] # 2. Retrieve value by key
val = d['xx'] # fails with KeyError
check = 'nj' in d # 3. in check -> True
</pre>
<p>1. <code>d['ca'] = 'California'</code> - with equal-sign: d[key] = xxx creates a key/value pair in the dict. If a pair was in there already, it is overwritten.
<p>2. <code>val = d['nj']</code> - referring to d[key] retrieves the value for that key, or is a KeyError. Code needs to check that the key is in before doing [ ] retrieval.
<p>3. <code>if 'nj' in d:</code> - use <b>in</b> to check if a key is in the dict or not. This sort of 'in' works for lists too, but for dicts it's much faster.
<p>Key point: dicts (hash tables) are <b>fast</b>. Even if the dict has 1 million key/value pairs in it, performing the get/set/in of single key is very fast. If the dict grows to hold 10 million key/value pairs, the speed is largely unchanged.
<p>Strategy: therefore if we are reading a file or a network taking in disorganized data, load the data into a dict choosing the key and value definitions we want to output. The data will become organized by that key, even though it was in random order as it was read.
<p>The type used as key should be "immutable", often a string or int (or tuple when we get to those).
<h2>Dict Counting</h2>
<p>Here is the canonical logic to load up a dict - in this example
the code counts the number of occurrences of each word, but many useful
dict operations will follow this basic loop/in/out pattern.
<pre>
def strs_counts(strs):
"""
Given a list of strings, return a 'counts' dict of
str->count key/value pairs
"""
counts = {}
for s in strs:
# fix up not-in case
if not s in counts:
counts[s] = 0
# invariant: at this line, s is in
counts[s] += 1
# alternate 'else' solution:
#if not s in counts:
# counts[s] = 1
#else:
# counts[s] += 1
return counts
</pre>
<h2>Getting Data out of Dict</h2>
<p>1. <code>len(d)</code> - as you would guess, the number of key/value pairs in the dict
<p>2. <code>d.get(<i>key</i>)</code> - retrieves the value for a key, but if the key is not there, returns None by default (vs. throwing an error like [ ]). A 2 parameter form <code>d.get(key, missing-value)</code> specifies what value to return if the key is missing. This forms an alternative to writing if/in logic to check if the key is in.
<p>3. <code>d.keys()</code> - returns an iterable of all the keys in dict (in a random order). Can loop over this to get each key. The keys alone, no values.
<p>4. <code>d.values()</code> - returns an iterable of all the values in dict (in a random order). Can loop over this to get each value.
<p>5. <code>d.items()</code> returns an iterable of the key,value pairs. This works with a particular sort of double-variable loop, see below. If you want to look at all of the key,value pairs, this is the most direct way.
<p>The most common pattern for looping over a dict, using sorted() function which returns a linear collection sorted into increasing order:
<pre>
def print_counts2(counts):
# sort the keys for nicer output
for key in sorted(counts.keys()):
print(key, counts[key])
</pre>
<p>The double-variable key,value loop (more detailed explanation in the tuples section below)
<pre>
def print_counts3(counts):
# key,value loop over .items()
# unsorted
for key,value in counts.items():
print(key, value)
</pre>
<!--
<a name=tuple>
<h2>Tuples</h2>
<p>The 'tuple' is a Python type which is like a small list. Tuples are like a list but written within parenthesis, here is a tuple of three strings: <code>('a', 'b', 'c')</code>
<p>Accessing a tuple works like a list -- len, [ ], in, slices, loops -- but the big difference is that a tuple is immutable. It is built once and then never changes. There is no append() function.
<pre>
>>> t = ('a', 'b', 'c')
>>> len(t)
3
>>> t[0]
'a'
>>> 'c' in t
True
>>> t[0:2]
('a', 'b')
>>> t[0:2]
('a', 'b')
>>> t[0] = 'hi' # no changing it!
Error:'tuple' object does not support item assignment
</pre>
<p>Tuples are useful where you have a fixed number of things to keep as a group -- e.g. an x,y coordinate pair stored in a len-2 tuple like (4, 10) keeps the x,y together as a unit. In contrast, a list is useful when you have an unlimited number of items you want to store together, such as the typical pattern of reading out of a file and using lst.append() to put all the items together in one list.
<p>Recall that dict-keys should be immutable. Therefore if you have some composite data that you want to use as a dict key, e.g. a string and an int, form them into a tuple ('meh', 16) and then use that as the key.
<h2>Tuples and Assignment</h2>
<p>One quirky but important use of tuples is assignment multiple variable at once, like this:
<pre>
>>> (x, y) = (12, 4)
>>> x
12
>>> y
4
</pre>
<p>The length of the left-hand-side and right-hand-side must be the same (CS terminology protip "lhs" and "rhs".)
<h2>Tuples and Multiple Value Return</h2>
<p>Communicating <b>in</b> to a function is a rich area: you can have any number of parameters, they each get a name. How do you communicate <b>out</b> of a function .. return 1 value. Returning 1 value covers 90% of the cases. But sometimes it really makes sense to return 2 or more values. The Pythonic way to do this is to return a tuple packing together the values. Like this
<pre>
def min2(a, b):
"""
Given 2 ints, returns (True, min_val) if at least
one is negative and min_val is the minimum value.
Otherwise returns (False, None)
"""
if a<0 or b<0:
return (True, min(a, b))
return (False, None)
</pre>
<p>There are 2 requirements:
<p>1. The function doc string must state the size and content-types of the returned tuple. There's no way the caller code can be written correctly without
this information.
<p>2. All paths should return a tuple of that same length, even if it is
(None, None), so that standard looking calling code of the form (x, y) = fn(..) works
without crashing should it hit the None case.
<p>Calling min2 looks like this, catching the len-2 tuple result into 2 variables.
<pre>
(negative, min_val) = min2(a, b)
if negative:
...
</pre>
-->
<h2>Tuples and Dicts</h2>
<p>One handy use of tuples is the dict.items() function, which returns the entire contents of the dict as an list of len-2 (key, value) tuples.
<pre>
>>> d = {'a':1, 'd':4, 'c':2, 'b':3}
>>> d
{'a': 1, 'd': 4, 'c': 2, 'b': 3}
>>> d.items()
dict_items([('a', 1), ('d', 4), ('c', 2), ('b', 3)]) # (key, value) tuples
>>> sorted(d.items()) # same tuples, sorted by key
[('a', 1), ('b', 3), ('c', 2), ('d', 4)]
</pre>
<p>Sorting of tuples goes left-to-right within each tuple -- first sorting by all the [0] values across the tuples, then by [1], and so on.
<p>Once all the data has been loaded into a dict, it's natural to have a process-all-the-data phase. This can be written as a loop over the above d.items() list. The loop syntax below takes one tuple off the list for each iteration, setting the two variable, key and value each time:
<pre>
# Example: for loop setting key/value for each iteration
for key,value in d.items():
# use key and value in here
</pre>
<p>This is a special version of the for loop, where there are multiple variables, and the number of variables matches the size of a tuples coming off the list. The above example, looping key,value over dict.items() is probably the most common use of this multiple-variable variant of the for loop.
</div>
</body>