-
Notifications
You must be signed in to change notification settings - Fork 154
Expand file tree
/
Copy pathChatScript-Pattern-Redux.html
More file actions
335 lines (332 loc) · 30.8 KB
/
ChatScript-Pattern-Redux.html
File metadata and controls
335 lines (332 loc) · 30.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<meta name="generator" content="pandoc" />
<title></title>
<style type="text/css">code{white-space: pre;}</style>
</head>
<body>
<h1 id="chatscript-pattern-redux">ChatScript Pattern Redux</h1>
<p>Copyright Bruce Wilcox, mailto:gowilcox@gmail.com www.brilligunderstanding.com<br />
<br>Revision 10/18/2020 cs10.7</p>
<p>Pattern matching information was introduced in the Beginner manual and expanded in the <a href="ChatScript-Advanced-User-Manual.html">Advanced User Manual</a>. Since pattern matching is of such importance, this concise manual lists everything about patterns in one place, including patterns not listed in the Advanced manual.</p>
<p>NOTE: despite the extraordinary range of weird matching abilities, almost all of my normal code is based on one of three patterns:</p>
<pre><code># rule 1
u: (![plastic] << bag trick >>)
# rule 2
u: (I * love * you)
# rule 3
t: What fruit do you like?
a: (~why)
a: (orange)
a: (apple)
a: (~vegetables)</code></pre>
<p>Rule 1 - searches for key words in any order. While there is a normal order to questions, e.g., <em>where do you live</em>, one can ask <em>you live where?</em> so handling arbitrary order is generally valuable. Just have all the keywords you need to detect a meaning and use <code>![...]</code> to get rid of interpretations you don't want.</p>
<p>Rule 2 - requires an order when both first person and second person pronouns are involved, since order will matter.</p>
<p>Rule 3 - uses simple keywords or concept sets in rejoinders, since the context of the gambit constrains the input so highly.</p>
<h2 id="if-patterns">IF Patterns</h2>
<p>Pattern matching can be done not just in a rule's pattern component but also in its output component, within an <code>if</code> statement, e.g.:</p>
<pre><code>if ( PATTERN _~number ) { print( _0) }</code></pre>
<p>That is, if the first word in the test condition is the word <code>PATTERN</code>, the rest is treated as a standard pattern of a rule (not using <code>AND</code> <code>OR</code> etc). You can capture data here or do anything a normal pattern does.</p>
<h2 id="pattern-position">Pattern Position</h2>
<p>A pattern consists of tokens. By default, any normal word in canonical form can match any form of the word, so <em>he</em> in a pattern can match <em>him</em>, <em>he</em>, <em>his</em>. A pattern aborts when a token fails to match unless allowed to not match.</p>
<p>The performance cost of a pattern generally is linear in the number of tokens processed. That means these two rules take the same time to match (other than the imperceptable time difference to read the longer token).</p>
<pre><code>u: (apple)
u: (~ten_thousand_names_of_fruits)</code></pre>
<p>The system tracks current position in the sentence as it matches.</p>
<p>The first token of a pattern is allowed to match anywhere in the sentence. After that normally tokens are matched against words in consecutive order in the input. If a pattern starts to match and then fails, the system is allowed to retry matching later in the sentence once. It does this by freeing up the first matching word/concept token and letting it rebind later.</p>
<p>Given this rule:</p>
<pre><code>u: ( I like apple )</code></pre>
<p>The input <em>Do you know that I like oranges and I like apples</em> would match as follows.</p>
<p>The first pattern token <code>I</code> would match the first <em>I</em> partway into the sentence (because it is allowed to match anywhere).</p>
<p>The next pattern token <code>like</code> is required to match the next input word in the sentence, which it does.</p>
<p>The third pattern word <em>apple</em> fails to match <em>oranges</em>. We just failed. But we have one retry left.</p>
<p>So <em>I</em> is sought deeper in the sentence and matched. <code>like</code> matches <em>like</em> and <code>apple</code> matches <em>apples</em>. So we match.</p>
<p>Had that not matched, no more retries exist so the failure sticks. There are tokens you can use that alter the rules/location around current position.</p>
<h2 id="pattern-constituents">Pattern Constituents</h2>
<h3 id="type-of-sentence-s">Type of Sentence <code>s:</code> <code>?:</code></h3>
<p>A responder beginning with <code>s:</code> or <code>?:</code> implictly is testing that the sentence is a statement or a question. It is built in even before the pattern. All other rules are not immediately sensitive to kind of sentence.</p>
<h3 id="existence---word-concept-var-sysvar-_0-0-var">Existence - word <code>~concept</code> <code>$var</code> <code>%sysvar</code> <code>_0</code> <code>@0</code> <code>^var</code> <code>?</code> <code>~</code></h3>
<p>Basic pattern matching is against words or concepts. Does this word or concept exist?</p>
<pre><code>u: ( this ~animal )</code></pre>
<p>matches <em>this dog</em> or <em>this dogs</em> but not <em>this is my dog</em></p>
<p>You can also ask if a user variable is defined just by naming it:</p>
<pre><code>u: ( $myvar help )</code></pre>
<p>this only matches if input has <em>help</em> and <code>$myvar</code> is not null.</p>
<h4 id="system-variables">System variables</h4>
<p>one would not ask if they are defined (they almost always are) but would use in a relation instead.</p>
<p>Similarly, <code>_0</code> by itself in a pattern means is it defined, that is, not null.</p>
<pre><code>u: ( _{apple orange} _0 )</code></pre>
<p>matches only if apple or orange got matched. And <code>@0</code> by itself means does this fact-set have any facts stored in it.</p>
<p>You can also reference an argument to a function call, and its value will be used to decide what to do.</p>
<h4 id="stand-alone">Stand-alone <code>?</code></h4>
<p>A stand-alone <code>?</code> means is this sentence a question.</p>
<h4 id="stand-alone-1">Stand-alone <code>!?</code></h4>
<p><code>!?</code> would test if it is not a question.</p>
<h4 id="stand-alone-2">Stand-alone <code>~</code></h4>
<p>A stand-alone <code>~</code> means the current topic is already on the pending topic list (was recently considered an active topic).</p>
<h3 id="grouping-pairs">Grouping Pairs <code>(</code> <code>)</code> <code>[</code> <code>]</code> <code>{</code> <code>}</code> <code><<</code> <code>>></code></h3>
<h4 id="parens-...">Parens <code>(</code> ... <code>)</code></h4>
<p>Parens mean the tokens within must be found "in sequence". The notation of a pattern starts with parens, but has the unusual property of allowing the match to occur anywhere within the sentence, not just at the start. Any nested parens do not have that property, and still require in sequence.</p>
<pre><code>u: ( this (is my) pattern)</code></pre>
<p>matches <em>this is my pattern</em> and not <em>this sometimes is my pattern</em></p>
<h4 id="brackets-...">Brackets <code>[</code> ... <code>]</code></h4>
<p>Brackets mean match one of contained tokens, in the order given. A bracket list tries all its members in sequence, stopping when it finds a match. For the input <em>I go home for Christmas</em> this will not match:</p>
<pre><code>u: ( [~noun ~verb] * home )</code></pre>
<p>because <code>~noun</code> will match to <em>home</em> and then <em>home</em> cannot be found later. On a retry, <code>~noun</code> will match to <em>Christmas</em>. Since <code>~noun</code> can match multiple times, <code>~verb</code> never gets tried.</p>
<p>You can composite things like:</p>
<pre><code>u: ( [ apple pear (favorite fruit) cherry ] )</code></pre>
<p>to match <em>I eat pear</em> and <em>my favorite fruit</em>, but this form is unlikely to be used in normal CS.</p>
<p>Note that <code>[ … ]</code> and <code>~concept</code> are similar but different in important ways. Matching <code>~concept</code> is faster than the corresponding list inside <code>[]</code> because naming the concept only requires one token. But it takes more memory to store the concept than it does to put the words inside the <code>[]</code>.</p>
<p>The other fundamental difference is in position. Words in <code>[]</code> are matched in the order given, possibly moving your position mark deep into the sentence.</p>
<p>Words in a concept are all matched simultaneously,<br />
so which one is found first in the sentence is what sets the position. For an input <em>I like beer but not wine</em></p>
<pre><code>u: ( I like * ~drinks )</code></pre>
<p>this would match beer if beer and wine are in that concept in any order.</p>
<pre><code>u: ( I like * [wine beer] )</code></pre>
<p>this would match wine even though it is farther in the sentence.</p>
<h4 id="braces-...">Braces <code>{</code> ... <code>}</code></h4>
<p>Braces means match one of the contained tokens if you can, but don't fail if you don't.</p>
<p>Using <code>{}</code> inside of angles is pointless (unless you put an underscore in front to memorize something) because it makes no difference to matching whether or not you had the <code>{}</code> content.</p>
<p>Braces do not align position within a sentence. They are normally used to assist in positional alignment by swallowing words.</p>
<pre><code>u: ( I go to {the} market )</code></pre>
<p>matches both <em>I go to the market</em> and <em>I go to market</em>.</p>
<p>If you use underscore before braces to memorize the answer found, then when no answer is found the match variable is set to <code>null</code> (no content) but it is set.</p>
<h4 id="angles-...">Angles <code><<</code> ... <code>>></code></h4>
<p>Angles mean match all of the contained tokens in any order.</p>
<p>Putting <code>*</code> in this kind of pattern is illegal because it has no meaning.</p>
<p>Position is not relevant anyway. Position is freely reset to the start following this sequence so if you had the pattern:</p>
<pre><code>u: ( I * like << really >> photos )</code></pre>
<p>and input <em>photos I really like</em> then it would match because it found <em>I * like</em> then found anywhere <code>really</code> and then reset the position freely back to start and found <code>photos</code> anywhere in the sentence.</p>
<h3 id="wildcards-2-3--2-2b">Wildcards <code>*</code> <code>*~2</code> <code>*3</code> <code>*-2</code> <code>*~2b</code></h3>
<p>Wildcards allow you to relax the positional requirements for matching. The classic wildcard <code>*</code>allows you to have zero or more words between other tokens in a pattern.</p>
<pre><code>u: ( I * you )</code></pre>
<p>matches <em>I love chicken and hate you</em> as well as <em>I you they</em>.</p>
<p>You can limit the unlimited range by adding <code>~n</code> after it. So <code>*~1</code> means 0 or 1 words may intervene.</p>
<p><code>*~2</code> is what I commonly use to restrict a range. This allows a determiner and an adjective to fit before a noun, for example, but not allow a pattern to match weirdly.</p>
<pre><code>u: ( I like *~2 cat )</code></pre>
<p>matches <em>I like my cats</em> or <em>I like a yellow cat</em>.</p>
<p><code>*~2b</code> is similar to <code>*~2</code> except it tries to match bidirectionally. First it tries to match behind it, and if that fails, it tries forward (like *~2). You may not follow a bidirectional wildcard with either <code>{</code> or <code>(</code>.</p>
<p>You can also request a match of a specific number of words in succession using <code>*n</code>. <code>*1</code> means get the next word. If you are already positionally on the end of the sentence, this match fails.</p>
<p>If you aren't sure how many words are left, you could do something like this:</p>
<pre><code>u: ( apple _[*4 *3 *2 *1] )</code></pre>
<p>which will grab the next 4 or 3 or 2 or 1 words, depending on how many are available.</p>
<p>Generally done with an underscore in front to memorize the sequence.</p>
<p><code>*-2</code> is like <code>*2</code>, only it matchs backwards instead of forwards. Valid thru <code>*-9</code>.</p>
<h3 id="negation-and">Negation <code>!</code> and <code>!!</code></h3>
<p><code>!x</code> means match only if x is not found anywhere in the sentence later than where we are:</p>
<pre><code>u: ( !not I love you )</code></pre>
<p>This pattern says the word not cannot occur anywhere in the sentence.</p>
<p><code>!!x</code> means match only if x is not the next word.</p>
<h3 id="original-form">Original Form <code>'</code></h3>
<p>While CS normally matches both original and canonical forms of words when you give a pattern word in canonical form, you can require it only match the original form by quoting it.</p>
<pre><code>u: ( I * 'take it ) </code></pre>
<p>does not match <em>I am taking it</em></p>
<p>Likewise in a relation where you use a match variable, quoting it means use only its original value.</p>
<pre><code>u: ( _~fruits '_0==apple )</code></pre>
<p>matches <em>I like apple</em> but not <em>I like apples</em></p>
<h3 id="literal-next">Literal Next <code>\</code></h3>
<p>You can tell CS that a token should be considered a token, not a special form, by putting a <code>\</code> in front of it.</p>
<p>This applies to single characters like: <code>\[ \]</code> and it also applies to relational tokens like <code>\tom=*</code> which means do not treat this as a relational test, but instead as a token whose name is wildcarded.</p>
<p>Note that the <code>\</code> does not suppress detecting the <code>*</code> in a word and therefore allowing variant spelling.</p>
<h3 id="composite-words-my-composite-word">Composite Words "my composite word"</h3>
<p>There are sequences of words that have a specific meaning and are treated as a single word, e.g., <em>batting cage</em>.</p>
<p>In a dictionary these are often represented using an <code>_</code> instead of a space, e.g., <em>batting_cage</em>.</p>
<p>When CS tokenized your input, it automatically converts your separated input words into ones with underscores in them when appropriate.</p>
<p>They are no longer single words, but instead a single composite word. This would normally mean that</p>
<pre><code>u: ( batting cage )</code></pre>
<p>would not match. But the script compiler does the same tokenization thing, so your actual internal pattern looks like:</p>
<pre><code>u: ( batting_cage )</code></pre>
<p>For clarity, it is recommended that when you know you are dealing with a composite word, you use the underscore notation.</p>
<p>Sequences of words can also be designated using double quotes.</p>
<pre><code>u: ( "batting cage" )</code></pre>
<p>CS converts a quoted string into the same underscore notation. The distinction between the two is generally one of documentation.</p>
<p>I use quoted strings for phrases to highlight the intention that they are a phrase. I also use them for multiple word proper names like <em>Eiffel Tower</em>.</p>
<p>It is particularly important to use the quoted notation when punctuation is embedded in the name like <em>John's Apple Pie</em> because knowing where to put underscores when punctuation is involved is tricky.</p>
<p>By using quotes, you tell the system to manage things appropriately (<code>John_'s_Apple_Pie</code>)</p>
<p>When using the quoted notation, the system will actually try to match original and canonical, just like with ordinary words. If all words in phrase are canonical, the system will match any form of each word.</p>
<p>If one is not canonical, it can only match the original form.</p>
<pre><code>u: ( "king of the jungle" )</code></pre>
<p>cannot match <em>kings of the jungle</em> because <em>the</em> in pattern is not canonical.</p>
<pre><code>u: ( "king of a jungle" )</code></pre>
<p>but the above rule can match <em>kings of the jungle</em> since all words in the quote are canonical.</p>
<h3 id="memorization-_">Memorization <code>_</code></h3>
<p>Placing an underscore means to memorize what was matched onto a match variable. Match variables are allocated in sequence in a pattern, starting with <code>_0</code> and increasing to <code>_1</code> etc for each memorized match.</p>
<p>The system memorizes the original word, the canonical word, and the position in the sentence of the match.</p>
<h3 id="relations">Relations <code>></code> <code><</code> <code>?</code> <code>==</code> <code>!=</code> <code><=</code> <code>>=</code></h3>
<p>You can test relationships by conjoining a token with a relationship operator and another token, with no spaces. E.g.,</p>
<pre><code>u: ( I am _~number > _0>18 ) You are of legal age.</code></pre>
<p>The relationship operators are:</p>
<table style="width:24%;">
<colgroup>
<col width="12%" />
<col width="11%" />
</colgroup>
<thead>
<tr class="header">
<th align="center">operator</th>
<th>meaning</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="center"><code>==</code></td>
<td>equal</td>
</tr>
<tr class="even">
<td align="center"><code>!=</code></td>
<td>not equal</td>
</tr>
<tr class="odd">
<td align="center"><code><</code></td>
<td>less than</td>
</tr>
<tr class="even">
<td align="center"><code><=</code></td>
<td>less than or equal to</td>
</tr>
<tr class="odd">
<td align="center"><code>></code></td>
<td>greater than</td>
</tr>
<tr class="even">
<td align="center"><code>>=</code></td>
<td>greater than or equal to</td>
</tr>
<tr class="odd">
<td align="center"><code>&</code></td>
<td>bit anded results in non-zero</td>
</tr>
<tr class="even">
<td align="center"><code>?</code></td>
<td>is member of 2nd arg concept or topic or JSON array. If no argument occurs after, means is value found in sentence</td>
</tr>
</tbody>
</table>
<p>Using a compare with two text strings (not numbers) will evaluate based on case-independent alpha sorting.</p>
<p>The <code>?</code> operator has two forms. <code>xxx?~yyy</code> will look for actual membership in the set whereas <code>_n?~yyy</code> will only see if the location of match detection of <code>_n</code> is the same as a corresponding match location for the concept. If the concept has not been marked, then obviously no match is found.</p>
<p>Note: You can put <code>!</code> before the tokens instead of using <code>!=</code> and <code>!?</code>. E.g.,</p>
<pre><code>u: ( _~noun !_0?~fruit ) if the noun is not in fruit concept</code></pre>
<h4 id="dynamic-matching">Dynamic matching</h4>
<p>The stand-alone <code>?</code> is used with variables for dynamic matching.</p>
<p>While you cannot do memorization in front of a comparison (because no positional data is gained) you can in front of the <code>?</code> operator since finding where in the sentence something is will return a position for memorization.</p>
<pre><code>u: ( $var? ) </code></pre>
<p>means is the value of <code>$var</code> found in the sentence anywhere</p>
<p>Note that when <code>$var</code> is a normal word, that is simple for CS to handle.</p>
<p>If <code>$var</code> is a phrase, then generally CS cannot match it. This is because for phrases, CS needs to know in advance that a phrase can be matched.</p>
<p>If you put <em>take a seat</em> as a keyword in a concept or topic or pattern, that phrase is stored in the dictionary and marked as a pattern phrase, meaning if the phrase is ever seen in a sentence, it should be noticed and marked so it can be matched in a pattern.</p>
<p>But if it is merely in a variable, then the dictionary is unaware of the phrase and so <code>$var?</code> will not work for it.</p>
<p>There is also a <code>?$var</code> form, which means see if the value of the variable is findable. The value can be either a word or a concept name.</p>
<h3 id="assignment-in-a-pattern">Assignment in a pattern</h3>
<p>You can directly assign to any variable from any other value using <code>:=</code>. You can even do arithmetic for these assignments (:+= :-= "*= :/= :&= and any of the other numeric assignment operators) .</p>
<pre><code> $value = 5
( _some_test $value:=5 $value1:=_0 $value2:='_0 $value3:=%time )
( _some_test $value:+=5 $value1:-=_0 )</code></pre>
<p>If you want to do function call assignment, you can do this:</p>
<pre><code> $value:=^"^function(foo d)"</code></pre>
<p>The reason you have to do an active string here, is because normally spaces break apart tokens, and a pattern token involving a function needs to have all arguments part of the same token. Hence assigning from an active string, where the double quotes around it prevents the token from breaking apart.</p>
<h3 id="escape">Escape <code>\</code></h3>
<p>If you want to match a reserved punctuation symbol like <code>[</code> or <code>(</code>, you must escape it by putting a backslash in front. This is commonly done in matching out-of-band information.</p>
<pre><code>u: ( < \[ * \] ) ^respond(~determine_oob)</code></pre>
<p>One also uses escape if you want to know if the sentence was punctuated with an exclamation.</p>
<pre><code>s: ( \! )</code></pre>
<p>means user did something like <em>I love you!</em>.</p>
<p>You may use either <code>?</code> or <code>\?</code> when asking if the sentence has a question in it. You would generally only do this in a rejoinder.</p>
<h2 id="concept-intersection-keywords">Concept intersection keywords</h2>
<p>If you join a word (or a concept) and one or more concepts, that represents the intersection of them. e.g., (<sub>animals</sub>tasty) will reference all animals considered tasty.</p>
<p>Note, you cannot use word~1 (meaning specification) or word~n (pos-tag specification) on your first word.</p>
<h3 id="function-call---xxx...">Function Call - <code>^xxx(...)</code></h3>
<p>You can call a function from within a pattern. If the function returns a failure code of any kind, the match fails. If the function is a predefined system function, you are allowed relation operators on the result as well.</p>
<pre><code>u: ( ^lastused(~ gambit)>5 )</code></pre>
<p>NOTE:<br />
User defined functions (<code>patternmacros</code>) do not allow relational operators after them.</p>
<p>Patternmacros do not generate answers. They are treated as in-line additional pattern tokens.</p>
<pre><code>Patternmacro: ^testuse(^value) _~noun _0==^value
u: ( ~noun ^testuse(apple)) # matches "I like pear and apple"</code></pre>
<p>A powerful use of function calling is to call <code>^respond(~topicname)</code> in a pattern. The topic can match something and set up a variable for further guidance. E.g.,</p>
<pre><code>u: ( ^respond(~finddelay) $$delay ) Wait for $$delay.</code></pre>
<p><code>~finddelay</code> can hunt for time referred to in seconds, minutes, hours, etc, or in words like next week or tomorrow or whatever complex matching you want to do.</p>
<h3 id="partially-spelled-words-ing-bottle-8bott">Partially Spelled words: <code>*ing</code> <code>bottle*</code> <code>8bott*</code></h3>
<p>You can request a match against a partial spelling of an original word (not its canonical alternative) in various ways.</p>
<p>If you use <code>*</code> somewhere after an alpha, it matches any number of characters.</p>
<pre><code>u: ( sag*us ) </code></pre>
<p>matches many misspellings of <em>sagittarius</em>.</p>
<p>If you use <code>*</code> followed by an alpha, you get anything as a prefix followed by what you request.</p>
<pre><code>u: ( *tha ) </code></pre>
<p>matches <em>Martha</em>.</p>
<p>If you put a number in front, it means the word must be exactly that many characters long, matching your pattern.</p>
<pre><code>u: ( 6sit* )</code></pre>
<p>matches <em>sitter</em>.</p>
<p>When using an <code>*</code> word, you can use <code>.</code> to indicate exactly one character of any value.</p>
<pre><code>u: ( sit*u.tion ) </code></pre>
<p>matches <em>situation</em>.</p>
<h3 id="altering-position-_0-_0--_0">Altering Position <code><</code> <code>></code> <code>@_0+</code> <code>@_0-</code> <code>@_0</code></h3>
<p>When you put <code><</code> in your pattern, it doesn't actually match anything. It means "reset position" to the start of the sentence.</p>
<pre><code>u: ( < I love )</code></pre>
<p>matches <em>I love</em> but not <em>do I love</em>.</p>
<p>When you put <code>></code> in your pattern, it does not alter your position, but it tries to confirm you are on the last word of the sentence.</p>
<pre><code>u: ( I * > )</code></pre>
<p>in this pattern <code>></code> is redundant, since <code>*</code> would match to the end of the sentence anyway.</p>
<p>You may also use <code>!></code> to ask that we NOT be at the end of the sentence.</p>
<p><code>@_1+</code> says to set the position to where the given match variable (<code>_1</code>) matched. Positional sequencing will continue normally increasing thereafter.</p>
<p>You can suffix the match variable with <code>-</code> instead, to tell CS to begin matching in reverse order in the sentence, i.e., matching backwards to the start of the sentence.</p>
<p>When you use <code>+</code>, the position starts at the end of the match. When you use <code>-</code>, the position starts at the start of the match.</p>
<pre><code>u: ( _home is @_0- pretty )</code></pre>
<p>matches <em>my pretty home is near here</em>.</p>
<p>Note when you use <code>-</code> for reverse matching, the behavior of <code><</code> and <code>></code> changes. <code>></code> sets a position and <code><</code> confirms it instead of the way it is for <code>+</code>.</p>
<p>When you omit either + or -, you create a matchable anchor like <code>@_0</code>. It represents what was found at that position, and during the pattern must also match at that location now.</p>
<pre><code>u: ( _@0 is @_1 )</code></pre>
<p>The above pattern says that the word <code>is</code> must be precisely found between the locations referenced by <code>@0</code> and <code>@1</code>.</p>
<h3 id="retrying-scan-retry">Retrying scan <code>@retry</code></h3>
<p>Normally one matches a pattern, performs the output code, and if you want to restart the pattern to find the next occurrence of a match, you use ^retry(RULE) or ^retry(TOPRULE). Well, if your pattern executes <span class="citation">@retry</span> as a token, it will retry on its own without needing to execute any output code. Useful in conjunction with ^testpattern.</p>
<h2 id="debugging">Debugging</h2>
<h3 id="testpattern"><code>:testpattern</code></h3>
<p>The system inputs the sentence and tests the pattern you provide against it. It tells you whether it matched or failed.</p>
<pre><code>:testpattern ( it died ) Do you know if it died?</code></pre>
<p>Some patterns require variables to be set up certain ways. You can perform assignments prior to the sentence.</p>
<pre><code>:testpattern ($gender=male hit) $gender = male hit me</code></pre>
<p>Typically you might use <code>:testpattern</code> to see if a subset of your pattern that fails works, trying to identify what has gone wrong. You can also name an existing rule, rather than supply a pattern.</p>
<pre><code>:testpattern ~aliens.ufo do you believe in aliens?</code></pre>
<h3 id="prepare"><code>:prepare</code></h3>
<p>Since CS may revise your input for various reasons, to know why a pattern fails you may need to know what actually say.</p>
<p>Using <code>:prepare</code> will tell you what the final input words were, and what concepts got marked.</p>
<pre><code>:prepare This is a sentence.</code></pre>
<h3 id="verify"><code>:verify</code></h3>
<p>In general all of your responders and rejoinders should have a sample input comment above them.</p>
<pre><code>#! Do you believe in dogs?
?: ( << you believe dog >>) I do.</code></pre>
<p>This allows you to do</p>
<pre><code>:verify ~mytopic pattern</code></pre>
<p>and have the system test if your rule would match your input.</p>
<h3 id="trace"><code>:trace</code></h3>
<p>You can get a trace of various system functions.</p>
<pre><code>:trace pattern</code></pre>
<p>will show you pattern matching and match variable binding. Also useful if done before <code>:testpattern</code>.</p>
<h2 id="overrulingsupplementing-cs-matching">Overruling/Supplementing CS Matching</h2>
<p>Sometimes you want to supplement the marking of concepts done by adding your own marks. This is particularly useful handling idioms where no keyword exists. I set <code>$cs_prepass</code> to be a topic which looks for idioms.</p>
<pre><code>?: ( < what do you do > ) ^mark(~work)</code></pre>
<p>This will cause the work topic to react later as though one of its keywords was given.</p>
<p>Likewise sometimes you want to disable some marking. For example, <em>chocolate</em> is both a flavor and a color. To avoid going to the color topic incorrectly I might do this:</p>
<pre><code>u: ( << _chocolate [taste eat] >> ) ^unmark(~colorTopic _0)</code></pre>
<p>If the above rule detects <em>chocolate</em> in the apparent context of eating, it will unmark any reference to <code>~colortopic</code> found at the location of the word <em>chocolate</em>.</p>
<h2 id="graduation-exercise">Graduation Exercise</h2>
<p>The pattern matching system of ChatScript has esoteric abilities. I was asked if I would implement an additional one that would look something like this:</p>
<pre><code><< green nose mucus >>~3</code></pre>
<p>which he wanted to mean: find all those words, in any order, with each word after the first within a gap range of 3 from the previous word.</p>
<p>So it could recognize: <em>green is the nose with my mucus</em> or <em>my nose puts forth green mucus</em> and not match <em>while green is my favorite color</em> or <em>I don't want to see it in my nose mucus</em>.</p>
<p>I replied it could probably already be done in CS as it stood, and a few minutes later had whipped up code to do that. Your advanced challenge, if you care to think about it and really warp your mind, is to think of a way to do it yourself. That will prove you really understand what can be done in pattern matching. Answer is on the next page.</p>
<pre><code>patternmacro: ^nearbyword(^word)
[
(@_0+ *~3 _^word ^eval(_0 = _1))
(@_0- *~3 _^word ^eval(_0 = _1))
]</code></pre>
<p>The macro contains two choices, a sequence that looks forward from where you are and a sequence that looks backwards. Using a nested <code>()</code> the system will effectively treat that as a single match, which makes it a single token to be used in a <code>[]</code> choice.</p>
<p>Whichever <code>()</code> finds the next word, it memorizes where it is, then sets the current <code>_0</code> location to the new word and the choice ends.</p>
<p>While you can't do code execution directly in a pattern, and you can't call out to user-defined outputmacros, you can call out to system functions, and <code>^eval</code> lets you do any amount of normal code execution. So this allows us to assign the new match variable to the old. And assigning match variables means assigning all of their attributes, including original value, canonical value, and actual position data.</p>
<pre><code>topic: ~test()
u: ( _green ^nearbyword(nose) ^nearbyword(mucus)) You have a disgusting nose.</code></pre>
<p>The test pattern therefore, finds the first word and sets the current <code>_0</code> location. Then it uses the macro to find the next word and change location, and then the next word.</p>
</body>
</html>