https-everywhere/docs/rulesets.html at rm-gitcommit · ldgbc/https-everywhere

History

304 lines (264 loc) · 11.8 KB

Raw

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

<style>

body {

max-width: 40em;

}

</style>

<p>

This page describes how to write rulesets for

<a href="https://eff.org/https-everywhere">HTTPS Everywhere</a>,

a browser extension that switches sites over from HTTP

to HTTPS automatically. HTTPS Everywhere comes with

<a href="http://www.eff.org/https-everywhere/atlas/">thousands</a>

of rulesets that tell HTTPS Everywhere which sites it should switch

to HTTPS and how. If there is a site that offers HTTPS and is not handled by

the extension, this guide will explain how to add that site.

</p>

<h4><a name="rulesets" href="#rulesets"

>Rulesets</a>

</h4>

<p>

A <tt>ruleset</tt> is an <a

href="http://www.xml.com/pub/a/98/10/guide0.html?page=2">XML</a>

file describing behavior for a site or group of sites. A ruleset contains

one or more <tt>rules</tt>. For example, here is

<a href="https://github.com/efforg/https-everywhere/blob/master/src/chrome/content/rules/RabbitMQ.xml"><tt>RabbitMQ.xml</tt></a>,

from the plugin distribution:

</p>

<pre>

<rule from="^http:"

to="https:" />

</ruleset>

</pre>

<p>

The <tt>target</tt> tag specifies which web sites the ruleset applies

to. The <tt>rule</tt> tag specifies how URLs on those web sites should be

rewritten. This rule says that any URLs on <tt>rabbitmq.com</tt> and

<tt>www.rabbitmq.com</tt> should be modified by replacing "http:" with

"https:".

</p>

<p>

When the browser loads a URL, HTTPS Everywhere takes the host

name (e.g. <tt>www.rabbitmq.com</tt>) and searches its ruleset database for

rulesets that match that host name.

</p>

<p>

HTTPS Everywhere then tries each rule in those rulesets against the full URL.

If the <a href="http://www.regular-expressions.info/quickstart.html"

>Regular Expression</a>, or regexp, in one of those rules matches, HTTPS

Everywhere <a href="#rules-and-regular-expressions">rewrites the URL</a>

according the the <tt>to</tt> attribute of the rule.

</p>

<h4><a name="wildcard-targets" href="#wildcard-targets"

>Wildcard Targets</a>

</h4>

<p>

To cover all of a domain's subdomains, you may want to specify

a wildcard target like <tt>*.twitter.com</tt>. Specifying

this type of left-side wildcard matches any host name with

<tt>.twitter.com</tt> as a suffix, e.g. <tt>www.twitter.com</tt>

or <tt>urls.api.twitter.com</tt>. You can also specify a

right-side wildcard like <tt>www.google.*</tt>. Right-side

wildcards, unlike left-side wildcards, apply only one level

deep. So if you want to cover all countries you'll generally

need to specify <tt>www.google.*</tt>, <tt>www.google.co.*</tt>,

and <tt>www.google.com.*</tt> to cover domains like

<tt>www.google.co.uk</tt> or <tt>www.google.com.au</tt>. You should

use wildcard targets only when you have rules that apply to the

entire wildcard space. If your rules only apply to specific hosts,

you should list each host as a separate target.

</p>

<h4>

<a name="rules-and-regular-expressions" href="#rules-and-regular-expressions"

>Rules and Regular Expressions</a>

</h4>

<p>

The <tt>rule</tt> tags do the actual rewriting work. The <tt>from</tt> attribute of

each rule is a <a href="http://www.regular-expressions.info/quickstart.html"

>regular expression</a> matched against a full URL. You can use rules to rewrite

URLs in simple or complicated ways. Here's a simplified (and now obsolete) example

for Wikipedia:

</p>

<pre>

<rule from="^http://(\w{2})\.wikipedia\.org/wiki/"

to="https://secure.wikimedia.org/wikipedia/$1/wiki/"/>

</ruleset>

</pre>

<p>

The <tt>to</tt> attribute replaces the text matched by the <tt>from</tt>

attribute. It can contain placeholders like <tt>$1</tt> that are replaced with

the text matched inside the parentheses.

</p>

<p>

This rule rewrites a URL like

<tt>http://fr.wikipedia.org/wiki/Chose</tt> to

<tt>https://secure.wikimedia.org/wikipedia/fr/wiki/Chose</tt>. Notice,

again, that the target is allowed to contain (just one) * as a wildcard

meaning "any".

</p>

<p>

Rules are applied in the order they are listed within each ruleset.

Order between rulesets is unspecified. Only the first rule or exception

matching a given URL is applied.

</p>

<p>

Rules are evaluated using <a

href="http://www.regular-expressions.info/javascript.html">Javascript regular

expressions</a>, which are similar but not identical to <a

href="http://www.regular-expressions.info/pcre.html">Perl-style regular

expressions.</a>

Note that if your rules include ampersands (&), they need

to be appropriately XML-encoded: replace each occurence of

<strong>&</strong> with <strong>&#x26;</strong>.

</p>

<a name="exclusions" href="#exclusions"><h4>Exclusions</h4></a>

<p>

An exclusion specifies a pattern, using a

regular expression, for URLs where the rule should <strong>not</strong> be

applied. The Stack Exchange rule contains an exclusion for the OpenID login

path, which breaks logins if it is rewritten:

</p>

<pre>

</pre>

<p>

Exclusions are always evaluated before rules in a given ruleset. Matching any

exclusion means that a URL won't match any rules within the same ruleset.

However, if other rulesets match the same target hosts, the rules in those

rulesets will still be tried.

</p>

<h4><a name="style" href="#style"

>Style Guide</a>

</h4>

<p>

There are many different ways you can write a ruleset, or regular expression

within the ruleset. It's easier for everyone to understand the rulesets if they

follow similar practices. You should read and follow the

<a href="https://github.com/EFForg/https-everywhere/blob/master/ruleset-style.md">Ruleset

style guide</a>. Some of the guidelines in that document are intended to make <a

href="https://github.com/EFForg/https-everywhere/blob/master/ruleset-testing.md">Ruleset

testing</a> less cumbersome.

</p>

<h4><a name="secure-cookies" href="#secure-cookies"

>Secure Cookies</a>

</h4>

<p>

Many HTTPS websites fail to correctly set the <a

href="https://secure.wikimedia.org/wikipedia/en/wiki/HTTP_cookie#Secure_and_HttpOnly">secure

flag</a> on authentication and/or tracking cookies. HTTPS

Everywhere provides a facility for turning this flag on. For instance:

</p>

<p>

The "host" parameter is a regexp specifying which domains

should have their cookies secured; the "name" parameter is a

regexp specifying which cookies should be secured. For a cookie to be secured,

it must be sent by a target host for that ruleset. It must also be sent over

HTTPS and match the name regexp. For cookies set by Javascript in a web page,

the Firefox extension can't tell which host set the cookie and instead uses

the domain attribute of the cookie to check against target hosts. A cookie whose

domain attribute starts with a "." (the default, if not specified by

Javascript) will be matched as if it was sent from a host

name made by stripping the leading dot.

</p>

<h4><a name="testing" href="#testing"

>Testing</a>

</h4>

<p>

We use an <a href="https://github.com/hiviah/https-everywhere-checker">automated

checker</a> to run some basic tests on all rulesets. This is described in more

detail in our <a

href="https://github.com/EFForg/https-everywhere/blob/master/ruleset-testing.md">Ruleset

Testing</a> document, but in short there are two parts: Your ruleset must have

enough test URLs to cover all the various types of URL covered by your rules.

And each of those test URLs must load, both before rewriting and after

rewriting. Every target host tag generates an implicit test URL unless it

contains a wildcard. You can add additional test URLs manually using the

<tt><test url="..."/></tt> tag. The test URLs you add this way should be

real pages loaded from the site, or real images, CSS, and Javascript if you have rules that

specifically affect those resources.

</p>

<p>

You should also manually test your ruleset by

placing it in the <tt>HTTPSEverywhereUserRules/</tt> subdirectory in

<a href="http://kb.mozillazine.org/Profile_folder_-_Firefox">your

Firefox profile directory</a>, and then restarting Firefox. While

using the rule, check for messages in the Firefox Error Console

to see if there are any issues with the way the site supports

HTTPS.

</p>

<p>

If you've tested your rule and are sure it would

be of use to the world at large, submit it as a <a

href="https://help.github.com/articles/using-pull-requests/">pull

request</a> on our <a

href="https://github.com/EFForg/https-everywhere/">GitHub

repository</a> or send it to the rulesets mailing list at

<tt>https-everywhere-rules AT eff.org</tt>. Please be aware that this

is a public and publicly-archived mailing list.

</p>

<h4><a name="make-trivial" href="#make-trivial"

>make-trivial-rule</a>

</h4>

<p>

As an alternative to writing rules by hand, there are scripts you

can run from a Unix command line to automate the process of creating

a simple rule for a specified domain. These scripts are not included

with HTTPS Everywhere releases but are available in our development

repository and are described in <a href="https://www.eff.org/https-everywhere/development">our development

documentation</a>.

<h4><a name="default-off" href="#default-off"

>Disabling a ruleset by default</a>

</h4>

<p>

Sometimes rulesets are useful or interesting, but cause problems

that make them unsuitable for being enabled by default in everyone's

browsers. Typically when a ruleset has problems we will disable it by

default until someone has time to fix it. You can do this by adding

a <tt>default_off</tt> attribute to the ruleset element, with a value

explaining why the rule is off.

</p>

<pre>

</ruleset>

</pre>

<p>

You can add more details, like a link to a bug report, in the comments for the

file.

</p>

<h4><a name="mixed-content" href="#mixed-content"

>Mixed Content Blocking (MCB)</a>

</h4>

<p>

Some rulesets may trigger active mixed content (i.e. scripts loaded over HTTP

instead of HTTPS). This type of mixed content is blocked in both <a

href="https://trac.torproject.org/projects/tor/ticket/6975">Chrome</a> and

Firefox, before HTTPS Everywhere has a chance to rewrite the URLs to an HTTPS

version. This generally breaks the site.

However, the Tor Browser doesn't block mixed content, in order to allow HTTPS

Everywhere to try and rewrite the URLs to an HTTPS version.

</p>

<p>

To enable a rule only on platforms that allow mixed content (currently

only the Tor Browser), you can add a <tt>platform="mixedcontent"</tt>

attribute to the ruleset element.

</p>

<h4><a name="downgrades" href="#downgrades"

>HTTPS->HTTP downgrade rules</a>

</h4>

<p>

By default, HTTPS Everywhere will refuse to allow rules that

would downgrade a URL from HTTPS to HTTP. Occasionally, this is necessary

because the extension rewrites a page to HTTPS, and that page contains relative

links to resources which do not exist on the HTTPS part of the site.

This is very rare, especially because these resources will typically be blocked

by <a href="#mixed-content">Mixed Content Blocking</a>. If it necessary, you can

add a <tt>downgrade="1"</tt> attribute to the rule to make it easier to audit

the ruleset library for such rules.

</p>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

rulesets.html

Latest commit

History

rulesets.html

File metadata and controls