forked from EFForg/https-everywhere
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathrulesets.html
More file actions
304 lines (264 loc) · 11.8 KB
/
rulesets.html
File metadata and controls
304 lines (264 loc) · 11.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
<style>
body {
max-width: 40em;
}
</style>
<p>
This page describes how to write rulesets for
<a href="https://eff.org/https-everywhere">HTTPS Everywhere</a>,
a browser extension that switches sites over from HTTP
to HTTPS automatically. HTTPS Everywhere comes with
<a href="http://www.eff.org/https-everywhere/atlas/">thousands</a>
of rulesets that tell HTTPS Everywhere which sites it should switch
to HTTPS and how. If there is a site that offers HTTPS and is not handled by
the extension, this guide will explain how to add that site.
</p>
<h4><a name="rulesets" href="#rulesets"
>Rulesets</a>
</h4>
<p>
A <tt>ruleset</tt> is an <a
href="http://www.xml.com/pub/a/98/10/guide0.html?page=2">XML</a>
file describing behavior for a site or group of sites. A ruleset contains
one or more <tt>rules</tt>. For example, here is
<a href="https://github.com/efforg/https-everywhere/blob/master/src/chrome/content/rules/RabbitMQ.xml"><tt>RabbitMQ.xml</tt></a>,
from the plugin distribution:
</p>
<pre>
<ruleset name="RabbitMQ">
<target host="rabbitmq.com" />
<target host="www.rabbitmq.com" />
<rule from="^http:"
to="https:" />
</ruleset>
</pre>
<p>
The <tt>target</tt> tag specifies which web sites the ruleset applies
to. The <tt>rule</tt> tag specifies how URLs on those web sites should be
rewritten. This rule says that any URLs on <tt>rabbitmq.com</tt> and
<tt>www.rabbitmq.com</tt> should be modified by replacing "http:" with
"https:".
</p>
<p>
When the browser loads a URL, HTTPS Everywhere takes the host
name (e.g. <tt>www.rabbitmq.com</tt>) and searches its ruleset database for
rulesets that match that host name.
</p>
<p>
HTTPS Everywhere then tries each rule in those rulesets against the full URL.
If the <a href="http://www.regular-expressions.info/quickstart.html"
>Regular Expression</a>, or regexp, in one of those rules matches, HTTPS
Everywhere <a href="#rules-and-regular-expressions">rewrites the URL</a>
according the the <tt>to</tt> attribute of the rule.
</p>
<h4><a name="wildcard-targets" href="#wildcard-targets"
>Wildcard Targets</a>
</h4>
<p>
To cover all of a domain's subdomains, you may want to specify
a wildcard target like <tt>*.twitter.com</tt>. Specifying
this type of left-side wildcard matches any host name with
<tt>.twitter.com</tt> as a suffix, e.g. <tt>www.twitter.com</tt>
or <tt>urls.api.twitter.com</tt>. You can also specify a
right-side wildcard like <tt>www.google.*</tt>. Right-side
wildcards, unlike left-side wildcards, apply only one level
deep. So if you want to cover all countries you'll generally
need to specify <tt>www.google.*</tt>, <tt>www.google.co.*</tt>,
and <tt>www.google.com.*</tt> to cover domains like
<tt>www.google.co.uk</tt> or <tt>www.google.com.au</tt>. You should
use wildcard targets only when you have rules that apply to the
entire wildcard space. If your rules only apply to specific hosts,
you should list each host as a separate target.
</p>
<h4>
<a name="rules-and-regular-expressions" href="#rules-and-regular-expressions"
>Rules and Regular Expressions</a>
</h4>
<p>
The <tt>rule</tt> tags do the actual rewriting work. The <tt>from</tt> attribute of
each rule is a <a href="http://www.regular-expressions.info/quickstart.html"
>regular expression</a> matched against a full URL. You can use rules to rewrite
URLs in simple or complicated ways. Here's a simplified (and now obsolete) example
for Wikipedia:
</p>
<pre>
<ruleset name="Wikipedia">
<target host="*.wikipedia.org" />
<rule from="^http://(\w{2})\.wikipedia\.org/wiki/"
to="https://secure.wikimedia.org/wikipedia/$1/wiki/"/>
</ruleset>
</pre>
<p>
The <tt>to</tt> attribute replaces the text matched by the <tt>from</tt>
attribute. It can contain placeholders like <tt>$1</tt> that are replaced with
the text matched inside the parentheses.
</p>
<p>
This rule rewrites a URL like
<tt>http://fr.wikipedia.org/wiki/Chose</tt> to
<tt>https://secure.wikimedia.org/wikipedia/fr/wiki/Chose</tt>. Notice,
again, that the target is allowed to contain (just one) * as a wildcard
meaning "any".
</p>
<p>
Rules are applied in the order they are listed within each ruleset.
Order between rulesets is unspecified. Only the first rule or exception
matching a given URL is applied.
</p>
<p>
Rules are evaluated using <a
href="http://www.regular-expressions.info/javascript.html">Javascript regular
expressions</a>, which are similar but not identical to <a
href="http://www.regular-expressions.info/pcre.html">Perl-style regular
expressions.</a>
Note that if your rules include ampersands (&), they need
to be appropriately XML-encoded: replace each occurence of
<strong>&</strong> with <strong>&#x26;</strong>.
</p>
<a name="exclusions" href="#exclusions"><h4>Exclusions</h4></a>
<p>
An exclusion specifies a pattern, using a
regular expression, for URLs where the rule should <strong>not</strong> be
applied. The Stack Exchange rule contains an exclusion for the OpenID login
path, which breaks logins if it is rewritten:
</p>
<pre>
<exclusion pattern="^http://(?:\w+\.)?stack(?:exchange|overflow)\.com/users/authenticate/" />
</pre>
<p>
Exclusions are always evaluated before rules in a given ruleset. Matching any
exclusion means that a URL won't match any rules within the same ruleset.
However, if other rulesets match the same target hosts, the rules in those
rulesets will still be tried.
</p>
<h4><a name="style" href="#style"
>Style Guide</a>
</h4>
<p>
There are many different ways you can write a ruleset, or regular expression
within the ruleset. It's easier for everyone to understand the rulesets if they
follow similar practices. You should read and follow the
<a href="https://github.com/EFForg/https-everywhere/blob/master/ruleset-style.md">Ruleset
style guide</a>. Some of the guidelines in that document are intended to make <a
href="https://github.com/EFForg/https-everywhere/blob/master/ruleset-testing.md">Ruleset
testing</a> less cumbersome.
</p>
<h4><a name="secure-cookies" href="#secure-cookies"
>Secure Cookies</a>
</h4>
<p>
Many HTTPS websites fail to correctly set the <a
href="https://secure.wikimedia.org/wikipedia/en/wiki/HTTP_cookie#Secure_and_HttpOnly">secure
flag</a> on authentication and/or tracking cookies. HTTPS
Everywhere provides a facility for turning this flag on. For instance:
</p>
<pre><securecookie host="^market\.android\.com$" name=".+" /></pre>
<p>
The "host" parameter is a regexp specifying which domains
should have their cookies secured; the "name" parameter is a
regexp specifying which cookies should be secured. For a cookie to be secured,
it must be sent by a target host for that ruleset. It must also be sent over
HTTPS and match the name regexp. For cookies set by Javascript in a web page,
the Firefox extension can't tell which host set the cookie and instead uses
the domain attribute of the cookie to check against target hosts. A cookie whose
domain attribute starts with a "." (the default, if not specified by
Javascript) will be matched as if it was sent from a host
name made by stripping the leading dot.
</p>
<h4><a name="testing" href="#testing"
>Testing</a>
</h4>
<p>
We use an <a href="https://github.com/hiviah/https-everywhere-checker">automated
checker</a> to run some basic tests on all rulesets. This is described in more
detail in our <a
href="https://github.com/EFForg/https-everywhere/blob/master/ruleset-testing.md">Ruleset
Testing</a> document, but in short there are two parts: Your ruleset must have
enough test URLs to cover all the various types of URL covered by your rules.
And each of those test URLs must load, both before rewriting and after
rewriting. Every target host tag generates an implicit test URL unless it
contains a wildcard. You can add additional test URLs manually using the
<tt><test url="..."/></tt> tag. The test URLs you add this way should be
real pages loaded from the site, or real images, CSS, and Javascript if you have rules that
specifically affect those resources.
</p>
<p>
You should also manually test your ruleset by
placing it in the <tt>HTTPSEverywhereUserRules/</tt> subdirectory in
<a href="http://kb.mozillazine.org/Profile_folder_-_Firefox">your
Firefox profile directory</a>, and then restarting Firefox. While
using the rule, check for messages in the Firefox Error Console
to see if there are any issues with the way the site supports
HTTPS.
</p>
<p>
If you've tested your rule and are sure it would
be of use to the world at large, submit it as a <a
href="https://help.github.com/articles/using-pull-requests/">pull
request</a> on our <a
href="https://github.com/EFForg/https-everywhere/">GitHub
repository</a> or send it to the rulesets mailing list at
<tt>https-everywhere-rules AT eff.org</tt>. Please be aware that this
is a public and publicly-archived mailing list.
</p>
<h4><a name="make-trivial" href="#make-trivial"
>make-trivial-rule</a>
</h4>
<p>
As an alternative to writing rules by hand, there are scripts you
can run from a Unix command line to automate the process of creating
a simple rule for a specified domain. These scripts are not included
with HTTPS Everywhere releases but are available in our development
repository and are described in <a href="https://www.eff.org/https-everywhere/development">our development
documentation</a>.
<h4><a name="default-off" href="#default-off"
>Disabling a ruleset by default</a>
</h4>
<p>
Sometimes rulesets are useful or interesting, but cause problems
that make them unsuitable for being enabled by default in everyone's
browsers. Typically when a ruleset has problems we will disable it by
default until someone has time to fix it. You can do this by adding
a <tt>default_off</tt> attribute to the ruleset element, with a value
explaining why the rule is off.
</p>
<pre>
<ruleset name="Amazon (buggy)" default_off="breaks site">
<target host="www.amazon.*" />
<target host="amazon.*" />
</ruleset>
</pre>
<p>
You can add more details, like a link to a bug report, in the comments for the
file.
</p>
<h4><a name="mixed-content" href="#mixed-content"
>Mixed Content Blocking (MCB)</a>
</h4>
<p>
Some rulesets may trigger active mixed content (i.e. scripts loaded over HTTP
instead of HTTPS). This type of mixed content is blocked in both <a
href="https://trac.torproject.org/projects/tor/ticket/6975">Chrome</a> and
Firefox, before HTTPS Everywhere has a chance to rewrite the URLs to an HTTPS
version. This generally breaks the site.
However, the Tor Browser doesn't block mixed content, in order to allow HTTPS
Everywhere to try and rewrite the URLs to an HTTPS version.
</p>
<p>
To enable a rule only on platforms that allow mixed content (currently
only the Tor Browser), you can add a <tt>platform="mixedcontent"</tt>
attribute to the ruleset element.
</p>
<h4><a name="downgrades" href="#downgrades"
>HTTPS->HTTP downgrade rules</a>
</h4>
<p>
By default, HTTPS Everywhere will refuse to allow rules that
would downgrade a URL from HTTPS to HTTP. Occasionally, this is necessary
because the extension rewrites a page to HTTPS, and that page contains relative
links to resources which do not exist on the HTTPS part of the site.
This is very rare, especially because these resources will typically be blocked
by <a href="#mixed-content">Mixed Content Blocking</a>. If it necessary, you can
add a <tt>downgrade="1"</tt> attribute to the rule to make it easier to audit
the ruleset library for such rules.
</p>