Slow Engineering

Photographic Evidence is Dead

Fake Image, Fake NEWS, Fake Trust.

-----BEGIN TEXT-----

We have now witnessed the death of almost 200 years of photographic (and other recorded) evidence. Images, videos, and audio recordings can now be easily faked or altered in ways that cannot be detected. Digital technology has made this happen. Analog media is continuous, so subtle modifications can be noticed. However, digital media has discrete bits that are not dependent on the bits around them.

It is time to relearn what was so obvious to our ancestors: the SOURCE is more important than the content. “Do you trust or believe the source?” This can be a personal choice, but we no longer have the convenience of “socially accepted” sources.

Sources

The mainstream media can not be trusted without question; they are polluted with greed.
The government cannot be trusted without checks; they are polluted with keeping control.
Non-mainstream sources can not be trusted without question; they are polluted with the desire for control.
People on social media sites can not be trusted; many repost fake news to get the algorithm’s attention.
Individuals cannot be trusted until you have seen or heard their patterns of bias.

Trusted Sources = Reputation

–MORE–

The old patterns of trusted sources have to be rebuilt. Luckily, we have some technological advantages that didn’t exist 200 years ago: public/private key encryption makes it possible to build “webs of trust” that can be wider than your contacts. The bits of any digital files (text, images, videos, etc.) can be signed by individuals, so you can trust that those exact bits have not been tampered with, after that person has signed them. Also, the chain of trust to that individual can be followed to help determine if that person can be trusted to have signed an untampered file. Yes, this can be complex, but it can also be simplified if groups want to build a reputation for being trustworthy by allowing audits and third-party witnesses who can verify that the images (or other evidence) matches what they have seen.

This is a new age: all sources will always be questionable, and all the old sources must build up their reputations from ground zero with a verifiable audit trail. For journalists, the audit trail of sources could be kept private but still be auditable. People smarter than me know how to do this. In this new age, putting your trust in an unverified source is foolish. “We are all being deceived.”

AI Fakes

Congress wants to pass laws requiring AI-generated content to be labeled. Okay, but content publishers must also attest that the content was NOT AI-generated or that an original version was or was not modified. If they have lied about the content, they should be identified publically, and they could be charged with making liable comments. If some content was modified, the original sources must be provided somewhere, with a signed audit trail. It does not matter if AI or humans created the content; what matters is whether the person or organization is telling the truth.

My message to big corporations and big media companies is that if you are unwilling to provide auditable chains of trust, for your content, we should assume your content is untrustworthy.

Who to trust?

Currently, it comes down to a personal choice of who you will believe? Questioning the sources should always be acceptable. It would be rude to not allow others to question or to not see a verifiable “chain of trust.” But even then, the chain will still end with individuals or organizations; do you believe they can be “trusted”? Reputation will become a precious commodity, as it was in the past.

My touchstone for evaluating someone else’s level of trust is that if they do not question their sources, I will rank them lower than people who question even “reliable” sources.

Some Technology Help

These technologies can “help” build trust, but they can all be compromised. We should never again put unconditional trust in any medium.

For signing tools, see: “GNUPG” and “OpenPGP”
Building a “web of trust” is documented with many articles. Yes, there are problems, so let’s solve them.

PGP (Pretty Good Privacy) was initially released in 1991. Secure encryption and signing have been around for over 30 years! It is long past time for all web pages, documents, images, and even videos to be dated and signed with secure and auditable signatures.

Here are some newer techniques that could incorporate signing with chains of trust.

Fediverse – supporting decentralized applications
ActivityPub – a protocol for decentralized applications
Ghost – micro blogging
Mastodon – decentralized social network
Matrix.org – decentralized encrypted social network
PeerTube – decentralized

Companies and individuals must start building reputations that we can trust and verify. If we don’t, everything will be “fake,” and democracy will falter.

A Crude Example

This section is for engineers. Non-engineers can skip the rest of this article.

I’ll show a crude example using the GNU Privacy Guard tool (GnuPG, gpg), which implements PGP encryption/decryption and signing with public/private keys. This is a “crude” implementation because it can only be used by a few people who know how to use CLI (Command Line Interface) tools. Wrapping this code with a GUI (Graphical Interface) would be a much better implementation. Or even better, embedded keys and signing into a web platform so it is mostly hidden.

All of the code and sample files used in this article can be found in this GitHub repository: example/photographic-evidence-is-dead

gpg can sign a file.

If the file is a text file, the signature can be appended to the file.
If the file is binary, the signature can be output as a separate file (detached).
A separate file is also useful for text files because the signature is 14 lines of nonsense text.
The signature could be converted to a QR image.

Signing a text file

These gpg commands will sign and verify a signed text file.

Sign:
gpg --default-key test@example.com --clear-sign -o sample-1.txt.sig sample-1.txt

Verify:
gpg FILE.txt.sig

I wrote a script that makes it easier sign and verify files with gpg. The script can be found here: gpg-sign.sh The usage help can be found here: gpg-sign.sh.md

Before gpg can be used for signing, you need to create a key. Here is a quick way to create a key; just hit Enter to accept all the defaults. Usually you will use an email for your key’s Id. (Note: gpg will create the ~/.gnupg directory for its files.)

gpg --full-generate-key

To demonstrate the script, we need a text file. Here is an example input file we will sign. sample-1.txt

Gettysburg Address

Four score and seven years ago our fathers brought forth on this
continent, a new nation, conceived in Liberty, and dedicated to the
proposition that all men are created equal.

Source: https://en.wikipedia.org/wiki/Gettysburg_Address

Sign sample-1.txt with the gpg-sign.sh script using the “-c” option to create a signed “clear-text” file, with the signature appended to the file. When this is run you will be prompted for the test@example.com passphrase.

$ gpg-sign.sh -c -k test@example.com -f sample-1.txt

sample-1.txt.sig is the signed file. (sample-1.txt is not modified.)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Gettysburg Address

Four score and seven years ago our fathers brought forth on this
continent, a new nation, conceived in Liberty, and dedicated to the
proposition that all men are created equal.

Source: https://en.wikipedia.org/wiki/Gettysburg_Address
-----BEGIN PGP SIGNATURE-----

iQHFBAEBCgAvFiEEPuEu+RQEXLLLgaza4jlYZSPGd3MFAmdgisARHHRlc3RAZXhh
bXBsZS5jb20ACgkQ4jlYZSPGd3P6Swv9Ezs0uvkNiStJBs4QWWrv1p1y4icTmgOX
6u0c9H4750sfSll4SQ30I5j8xC9W28TyEHGKj9QOaTwK5kwOs903W3EmA7S8g1Bv
gi3V3LGXXTeAJfhTcDPnMmTLwrpkDSHaSGWH+etPtG2vM77X89s3D1oNFKeGrBER
2P8BfPQlK3hMQrFy4trwU8Jr94Pg8/B3/A3ex8xDCuo5HpDgMYymaM/JFn3CIju+
VQX3fnF++p9+rb9MXAqEEEDsOgxH1JxrDgLuPOMadVwLy+GvDDje+h54lMYiSf9e
Wg/ZsTPTXF5f8fOMfLpDCnbY2jhYZ7MOUPfwtz4Tz9MXhDqKWzc7rRmRNt+dWQMB
JVohMrQKcDE4lp5vTN/bDy6Imzi5HQPgWxvAUjWYNXHnpwjnb00h7sqYI2XdVv32
HD5a20Dkw7aOqMjoAGt1MDKNNELvZxqU+rrC374tkW7yB8zoZKKtVtO2wOk9vvBA
8FpDNz4bD8ApTNfhaAyKQuJT0LjJUgGl
=dHAM
-----END PGP SIGNATURE-----

Signing a web page

Problem: the signed content cannot be even one bit different. This works well for program files or for text files that are emailed, but this is a problem for web pages because many websites embed different tags in the page dynamically (usually with JavaScript), and the format can be changed at any time.

Here is an html example sample-2.html. (note: spaces are put after ‘<‘ to make the tags inactive on this page.)

< !doctype html>
< html xmlns="http://www.w3.org/1999/xhtml">
< head>
  < meta http-equiv="content-type"
        content="text/html; charset=utf-8" />
  < title>gettysburg address< /title>
< /head>
< body>
  < h1 id="gettysburg-address">gettysburg address< /h1>
  < p>---begin text---< /p>
  < p>four score and seven years ago our fathers brought forth on this
  continent, a new nation, conceived in liberty, and dedicated to the
  proposition that all men are created equal.< /p>
  < p>source: < a href=
  "https://en.wikipedia.org/wiki/gettysburg_address">gettysburg
  address< /a>< /p>
  < p>---end text ---< /p>
< /body>
< /html>

One solution is to normalize the files so that only text is signed.

Cut/paste a defined range of text.
Remove all tags and collapse all white space and line breaks to one space.
Sign the normalized file, with a detached signature file.

The routine that normalizes the text would be shared with the signature so that when a user cut/pastes the text from a web page, the normalized text would match the text that was signed.

I wrote a script that will, remove all tags, remove extra spaces (and new lines), and it will leave html links. the script can be found here: just-words.pl

Normalize sample-2.html to sample-2.txt with just-words.pl

$ just-words.pl < sample-2.html > sample-2.txt

sample-2.txt file. (In the file, there are no line breaks.)

Gettysburg Address Four score and seven years ago our fathers
brought forth on this continent, a new nation, conceived in Liberty,
and dedicated to the proposition that all men are created
equal. Source: https://en.wikipedia.org/wiki/Gettysburg_Address
Gettysburg Address

Now, sample-2.txt can be signed, with the signature output in a separate file.

$ gpg-sign.sh -f sample-2.txt -s -k test@example.com

sample-2.txt.sig is created.

Now the contents of sample-2.txt.sig can be put at the end of the sample-2.html file after the “—END TEXT —“. Or to make it look better a QR code image could be appended, where the QR code would be the signature text. The https://gchq.github.io/CyberChef/ is a simple QR generator. Just put all the sample-2.txt.sig text in the Input field.

For example see sample-3.html

< !DOCTYPE html>
< html xmlns="http://www.w3.org/1999/xhtml">
< head>
  < meta http-equiv="Content-Type"
        content="text/html; charset=utf-8" />
  < title>Gettysburg Address< /title>
< /head>
< body>
  < h1 id="gettysburg-address">Gettysburg Address< /h1>
  < p>---BEGIN TEXT---< /p>
  < p>Four score and seven years ago our fathers brought forth on this
  continent, a new nation, conceived in Liberty, and dedicated to the
  proposition that all men are created equal.< /p>
  < p>Source: < a href=
  "https://en.wikipedia.org/wiki/Gettysburg_Address">Gettysburg
  Address< /a>< /p>
  < p>---END TEXT ---< /p>
  < p>Signature< /p>
  < img src="sample-2.html.sig.png"/>
< /body>
< /html>

My Keys

You can find my php keys at: my-pgp-keys or (Archive)

(Image by Rob Oo from NL on Wikimedia Commons)

-----END TEXT-----

Signature

To verify this article’s text:

Save the page as file.html
Run: just-words.txt <file.html >file.txt
Copy the PGP SIGNATURE text below to a file. For example: file.txt.sig
Import my turtle.engr.pub key to your gpg (one-time action)
Run: gpg-sign.sh -f file.txt

Passwordless ssh keys

This is a common junior-engineer mistake.

You need to execute a program or access files on another computer with a ssh command in a background script, but a ssh key is required. Good security requires the key to have a password. Redirection tricks can be used to input the password to the ssh program. But now you have exposed the password in clear text. That is just as bad as having no password. Screw it; just remove the password. Done.

Mistake! Now, anyone who gets a copy of your private key will have access to ALL the places where you use that key!

Most of the time, passwordless keys are created because engineers are lazy. To be more generous, passwords slow down their development “flow.” However, there is a solution that has minimal impact (once set up), and it is more secure than passwordless keys.

Use ssh-agent

Release Engineering 101

How to make sure you can reproduce your codebase, and tool-chain, and not get burned by missing dependencies.

These are some guidelines that I’ve learned over the last 40 years in the software business. For the last 18 years of my career, I was primarily doing Release Engineering–taking code from developers and getting it into production.

1. Always document the actual source URL and any other information needed to access the source code or binaries. Include credentials or the location for the credentials (for example, see KeePass.)

Why? You’ll want to work on other projects and not be the information provider forever.

General Release Directory Structures

Preface

This document is to be used as a guideline for organizing and naming document folders and directory trees, which are shared by many people. This is a structure that I have evolved over 15 years while working at 6 different companies. I figure it would be on major version “8.0”, if it were a product. In the last two companies, I haven’t had to make any major structural changes. Of course, only about 10% of the top levels get used, but they are there if the categories start to get “too deep.”

A number of goals are embedded in this structure:

Avoid the 90-degree collision of mixed structures. For example, a structure could start with “projects”, then sub-folders are used for “functional” areas, marketing, engineering, etc. While another structure starts with functional areas, then the project is repeated across each of the areas. Either structure could work, but using both will lead to confusion. The “project” structure is what most people start with, but it has a major flaw. Often documents in the project need to be shared with external partners, but functional groups will want to keep some documents private to their group. So the project files start to be copied to alternate structures, because opening up the top project directory, to external partners, reveals too much information. In other words, I’ve noticed that the sensitivity of information tends to be grouped stronger by functional boundaries, than by projects. Marketing will partner with an external group, but engineering wants to keep its code private. Engineering may outsource the work, and marketing will want the business plans kept secret. This is a problem because the whole point of a repository is to share information. So the top directories for this template are by functional areas, not project.

Avoid using different names for the same thing. For example, if you have configuration files, these can be collected in a directory named: etc, or config, but don’t use both names. Pick one and be consistent within the functional area or “category”. This document lists some recommended names for different categories. Before making up a name, look around the other project areas and notice the naming patterns used by other projects. Don’t assume that the oldest or even the newest names are the preferred names–other engineers could have just not cared enough to make things consistent with other code. If you abbreviate a name, then always use the abbreviation.

Clean-code refactoring tip: if you see an inconsistent naming pattern (either not self-consistent or not company style consistent), then fix it! Your IDE should make it easy to change names across multiple files. If not, learn to use find, grep, sed, perl, etc. A professional engineer should be able to make global changes across thousands of files with confidence in their tools.

The top-level directories are “fixed”, and sibling directories, at the same level, categorize the same content. This makes it easy to write scripts for archiving and publishing. It also makes it easier to create and include paths in software code, because they are more “regular”. It is also easy to add new libraries, with little or no changes to the build scripts.

The structure of similar content is grouped together.

The “branch level” for software can be placed at almost any fixed level. But keep that level consistent. Do not branch at other levels. (Note: this part is not relevant for “git” repositories.) Notice the branches are on the same level as trunk. Creating a “branches/” dir. level is unnecessary, because there should not be that many “active” branches. Inactive branches are deleted to keep the upper levels clean. Of course, the deleted branches are still available in the repository. With subversion and other modern version management tools, there also isn’t much need for a “tags/” directory. If you want to keep track of the source that goes with particular versions, an easier way is to just create a “Version” file that holds the version tag name and the repository path (branch) and id for that version. Using this style of branching and tagging, will keep the workspaces “clean” and focused on “active” code, while still having the flexibility to quickly get any desired branch and version. The best branch points are: just under each functional area or just under each category area. When branching, it is not necessary to include all of the content. Only the content that will be changed or saved, needs to be “copied” to the branch. The branch level is mainly determined by what parts a company normally includes in each release and on the dependencies of the parts. Of course, the branch level is not needed for the non-versioned areas, like a wiki tool, or a shared directory.

Directory Levels

This is the most important part. Each level has a “meaning.”

Simple Made Easy

Summary

Rich Hickey emphasizes simplicity’s virtues over easiness’, showing that while many choose easiness they may end up with complexity, and the better way is to choose simple first, then easy, if it is also simple.

This shows the classic problem with valuing writing code fast, with no regard to the 80% of the time that will be needed to maintain the code. Ignoring this leads to legacy code, periodic full rewrites of a codebase, and continually rewriting and debugging the same algorithms.

http://www.infoq.com/presentations/Simple-Made-Easy This is a good version of his talk.

Included here is a YouTube version of his talk. This one isn’t as good, because the slides are small. I also think the other version (above) was a better presentation.

Clean Code

I’ve been learning Google Apps Script, which is basically JavaScript. I’ve also been learning Test Driven Development. Sure, let’s learn three new things at the same time!

I was motivated to try TDD by this video:

In this video, he demonstrates a TDD Kata for creating a function that will find prime factors of numbers. I want to write code like that!

(All of the example code mentioned in this article can be found at: example/clean-code.)

My google drive was getting cluttered with files that were named by others, who had a (bad) habit of using spaces and lots of other special characters in the file names. (Usually, this is because people try to encode too much information into a file name. But that is another topic.) I would download, version, and process some files, then upload them. Spaces and special characters in the file names messed up my ability to write “simple” bash scripts. So how about a renaming tool to normalize the file names? I.e., convert all the non-alphanumeric characters to ‘_’ (also allow ‘.’ and ‘-‘).

For the tests, I needed to create and recreate test folders and files in Google Drive. It was really tedious to do manually. So let’s automate it. I wrote a routine that would parse a nested array structure that represented the folders and files.

This was the object that I created. (This is the first version. I now have a version where you can optionally pass the array as a parameter.) See: test-refactor-before.js

The Great Software Stagnation

Regarding:
The Great Software Stagnation

My comment:

Open Source and “free” software can be really confusing. For example see Richard Stallman’s points at: Open source misses the point

I think the analogy to music and movie is getting closer to the real issue: copyrights. When and what can be copied? When $ gets involved, the time gets extended to a very long time. In the U.S.A. 70 years after the author’s death. What about copyrights owned by corporations? When do they “die”?

Software has another subtly: the source code vs the compiled executable code. Here an automobile analogy is often used. Only selling or sharing the executable, but not the source code, is equivalent to selling a car with the hood locked so that a mechanic cannot repair it. If a mechanic were to completely copy the engine under the hood, then sure, they would be in violation of “copyright” or “patent” laws. But replacing broken components with the same or better components–that is OK. Repairing a defect in software should also be possible, without the permission of the manufacturer. Unfortunately, code is very easy to copy vs copying a physical device.

I think the solution to the “source code” issue is something related to the intent of copyright and patent laws: they exist to create a “temporary” (short-term) monopoly for the creators (not their heirs) so that they will be rewarded for releasing their works into the public domain. Otherwise, software or recipes for how things are made could be lost.

‘Not My Problem’: A Big Problem for DevOps Teams

Reposted from: ‘Not My Problem’: A Big Problem for DevOps Teams – DevOps.com

Not my problem (NMP) — (n)

1. a statement, or position, of apathy expressed by those who perceive they are external and unaffected by a negative predicament. While sometimes warranted, it is typically uttered by those who perceive themselves as powerless; can’t be bothered; are too lazy, or are selfish non-contributing leeches. See also “complete cop out.”

2. an attitude that will stymie attempts to implement DevOps in your organization and will thwart success

3. (archaic) Actually not your problem

While perhaps it’s become more frequent in our culture of relative indifference, some of our oldest stories bear witness to how timeless the phenomenon “not my problem” is. For example, in the biblical story of Cain murdering Abel, God asks Cain afterward, “Hey Cain, I can’t find Able. Do you know where he is?” Cain responds famously, “Am I my brother’s keeper? (NMP)” It crosses country borders and language barriers, too: A transliteration of the same sentiment in Polish is, “Not my circus, not my monkeys.”

Continue reading ‘Not My Problem’: A Big Problem for DevOps Teams

A comment left on Slashdot. – Development Chaos Theory

(Reblogged from A comment left on Slashdot. – Development Chaos Theory)

The problem is that our industry, unlike every other single industry except acting and modeling (and note neither are known for “intelligence”) worship at the altar of youth. I don’t know the number of people I’ve encountered who tell me that by being older, my experience is worthless since all the stuff I’ve learned has become obsolete.

This, despite the fact that the dominant operating systems used in most systems is based on an operating system that is nearly 50 years old, the “new” features being added to many “modern” languages are really concepts from languages that are between 50 and 60 years old or older, and most of the concepts we bandy about as cutting edge were developed from 20 to 50 years ago.

It also doesn’t help that the youth whose accomplishments we worship usually get concepts wrong. I don’t know the number of times I’ve seen someone claim code was refactored along some new-fangled “improvement” over an “outdated” design pattern who wrote objects that bare no resemblance to the pattern they claim to be following. (In the case above, the classes they used included “modules” and “models”, neither which are part of the VIPER backronym.) And when I indicate that the “massive view controller” problem often represents a misunderstanding as to what constitutes a model and what constitutes a view, I’m told that I have no idea what I’m talking about–despite having more experience than the critic has been alive, and despite graduating from Caltech–meaning I’m probably not a complete idiot.)

Continue reading A comment left on Slashdot. – Development Chaos Theory

Last In – First Out: Ad-Hoc Verses Structured System Management

This is a repost. This is an excellent list of all the operation areas that need to be well managed. The theme is: with automation, templates, and monitoring. Not with manual fiddling until it works. See below, for the link to the full post.

Structured system management is a concept that covers the fundamentals of building, securing, deploying, monitoring, logging, alerting, and documenting networks, servers and applications. Structured system management implies that you have those fundamentals in place, you execute them consistently, and you know all cases where you are inconsistent. The converse of structured system management is what I call ad hoc system management, where every system has it own plan, undocumented and inconsistent, and you don’t know how inconsistent they are, because you’ve never looked.

You know you have structured system management when:

Sources

Trusted Sources = Reputation

AI Fakes

Who to trust?

Some Technology Help

A Crude Example

Signing a text file

Signing a web page

My Keys

Signature

Share this:

Use ssh-agent

Share this:

Share this:

TOC

Preface

Directory Levels

Share this:

Summary

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: