#!/usr/bin/python3
"""
===========================================================================================
genhtml.py - static HTLM inserts 
Author and copyright: M. Lutz, 2015
License: Provided freely but with no warranties of any kind

VERSION 2, November 16, 2015: smarter dependency checking
  Don't regenerate an HTML file for a changed insert file, unless the HTML's template
  actually USES the insert file.  Also refactor as functions - at ~250 lines, top-level
  script code becomes too scattered to read (at ~1K lines, class structure is required).

SYNOPSIS
  Simple static HTML inserts script: given an HTML templates dir and an HTML
  inserts dir, generate final HTML files by applying text replacements, where
  replacement keys and text are derived from names and content of insert files.

  For static insert content, this script is an alternative to:
  - Mass file edits on every common item change (which can be painfully tedious).
  - Client-side includes via embedded JavaScript (which not all visitors may run).
  - Server-side includes via PHP (which makes pages unviewable without a server).
  This script does add a local admin step on every HTML content change, but there
  is no true HTML include; <object> is close, but is not supported on (some?) IE.

DEPENDENCY CHANGE DETECTION
  This script acts like a makefile; it automatically regenerates an HTML file if:
  (a) The HTML template file has been changed since its last generation, or
  (b) Any inserts-dir file that the HTML template file uses has been changed
      since the HTML file's last generation.

  In other words, the script generates expanded HTML files for all HTML templates
  that have no expansion yet, are newer than their expanded versions, or use any
  insert file that is newer than their expanded versions.  Conversely, an HTML
  file is not regenerated if neither the html template nor any used insert is
  newer than the expansion target.  The FORCEREGEN switch ignores dependencies.

COMMAND-LINE USAGE
  % gensite.py (or click)
       => copy all changed non-html source files to target (if any)
       => regen all html files whose template or inserts have changed since last gen
  % gensite.py [file.htm]+
       => same, but apply to just one or more specific files in source dir

USAGE PATTERNS
  1) Change html-template files in SOURCEDIR (only!),
     and/or insert-text files in INSERTSDIR.
  2) Run this script to regenerate all changed html files in TARGETDIR as needed.
  3) Upload newly-generated files (or all) from the TARGETDIR to your web server.

  There are two ways to structure your site's files:
  1) Keep both HTML templates and all other site files in SOURCEDIR.  In this mode,
     non-HTML changed files are copied to TARGETDIR when HTML files are regenerated.
  2) Use SOURCEDIR for HTML template files only, keep other web site files in TARGETDIR.
     This mode avoids copying other non-HTML files to TARGETDIR on HTML regenerations. 
  Either way, TARGETDIR is always a full mirror image of your web site, for uploads.

TEXT REPLACEMENTS
  - Replacement keys are all the 'XXX' in 'INSERTDIR/XXX.txt' filenames.
  - Replacement values are the contents of the 'INSERTDIR/XXX.txt' files.
  - Algorithm:
    For each changed .htm and .html HTML file in SOURCEDIR:
        For each 'XXX' in INSERTDIR/XXX.txt:
            Replace _all_ '$XXX$' in HTML file with the content of file XXX.txt
        Save result in TARGETDIR
    Other non-HTML file types (if any) are copied to TARGETDIR unchanged.

  To automate changing DATES in both HTML files and insert files, the script also
  replaces special '$_DATE*$' keys: e.g., '$_DATELONG$' => 'November 6, 2015'.

  To accommodate NESTED INSERTS, the script also replaces any '$XXX$' keys in loaded
  insert-files text first, before regenerating any HTML files.  This allows insert
  files to be built from other insert files, before being applied to HTML templates.
  For an example use case, see FOOTER-COMMON.txt and its clients in the Html-inserts
  folder; it's inserted into other footer include files with varying footer parts.
  Note and caveat: this step is only 1-level deep per file (it's not recursive!).

  Example replacements (each with possible nested inserts):
    Coded in <HEAD>
      $STYLE$  => INSERTDIR/STYLE.html   (a <style> or <link rel...>)
      $SCRIPT$ => INSERTDIR/SCRIPT.html  (analytics or other JS code block)
      $ICON$   => INSERTDIR/ICON.html    (site-specific icon link spec)
    Coded in <BODY>
      $FOOTER$ => INSERTDIR/FOOTER.html  (a standard sites links toolbar)
      $HEADER$ => INSERTDIR/HEADER.html  (a standard header block)
      $TOTOC$  => INSERTDIR/TOTOC.html   (a standard go-to-index button line)

  See also "__docs__/template-pattern.html: for a skeleton use case example
  file, and the "Html-templates" test folder for additional template examples.

GENERATION NOTES:
  - To force a regeneration of all HTML files, open and save any insert file
    used by every template (if any), or set the FORCEREGEN variable below.
  - By design, there is no dependency checking for non-file date inserts (else
    each date key client would be regenerated every day!); open and save date
    client files to force their regen with updated dates when appropriate.
  - Caveat: there is no dependency checking for nested inserts (a $XXX$ in an 
    insert file); open and save the nested insert's clients to force HTML regen.

MISCELLANEOUS NOTES
  - Skips replacement targets not present in HTML text (replace is a no-op).
  - Skips replacement targets having no INSERTDIR file (no replace is run).
  - Tries multiple Unicode encodings for HTML text: expand set as needed.
  - Assumes insert files are all in same Unicode encoding: change as needed.
  - Changed external CSS files must be uploaded, but do not require or trigger
    HTML regeneration here (unlike changed CSS <link> or inline CSS code inserts).
  - See "Programming Python, 4th Edition" for automated FTP site upload scripts.

TBDs:
  - Subdirectories are not directly supported, though they can be maintained
    as separately-generated and uploaded working folders. 
  - Automatic "<!-- -->" comment wrappers could be emitted, but they may be 
    invalid for shorter text inserts that are not one or more full lines.
  - Dependency checking for nested inserts might be handled by an initial
    inserts text scan searching for $XXX$ keys, and propagating the modtime
    of the nested insert's text to the client's.  Is this worth the effort?
  - It might be useful to parametize inserts in some fashion.  For instance,
    between '@' delimiters, allow a script name and arguments defining a
    command line whose stdout gives the insert text.  This is an order of
    magnitude more complex, though, and not warranted by any use case so far.
===========================================================================================
"""

import os, sys, shutil, time
trace = lambda *args: None     # set to print to see more output

# user settings 
INSERTDIR = 'Html-inserts'     # insert text, filename gives key: XXX.txt.
SOURCEDIR = 'Html-templates'   # load html templates (and others?) from here
TARGETDIR = 'Complete'         # save expanded html files (and others?) to here

CLEANTARGET = False            # True = empty all files in TARGETDIR first
FORCEREGEN  = False            # True = regenerate all HTML files, ignoring dependencies

# customize Unicode endings if needed
TemplateEncodings = ('ascii', 'utf8', 'latin1', 'utf16')   # try each, in turn
InsertsEncoding   = 'utf8'                                 # use for all inserts


def loadinserts():
    """
    load insert files/keys and modtimes
    """
    inserts, insmodtimes = {}, []
    for insertfilename in os.listdir(INSERTDIR):
        insertkey = '$' + insertfilename[:-4] + '$'          # key='$XXX$' from 'XXX.txt'
        try:
            path = os.path.join(INSERTDIR, insertfilename)   # load insert text for key
            file = open(path, encoding=InsertsEncoding)      # platform default or custom
            text = file.read()
            file.close()                                     # close for non-CPython                       
            inserts[insertkey] = text
            insertmodtime = os.path.getmtime(path)           # modtime for changes test   
            insmodtimes.append((insertkey, insertmodtime))   # add file-based inserts only
        except:
            inserts[insertkey] = ''  # empty if file error

    # add special non-file replacement keys (evolve me)
    inserts['$_DATELONG$']  = time.strftime('%B %d, %Y')     # November 06, 2015
    inserts['$_DATESHORT$'] = time.strftime('%b-%d-%Y')      # Nov-06-2015
    inserts['$_DATENUM$']   = time.strftime('%m/%d/%Y')      # 11/06/2015
    inserts['$_DATETIME$']  = time.asctime()                 # Fri Nov  6 10:44:58 2015
    inserts['$_DATEYEAR$']  = time.strftime('%Y')            # 2015
    return inserts, insmodtimes


def expandinserts(inserts):
    """
    globally replace any keys in loaded insert-file text
    """
    for key1 in inserts:                                     # for all insert texts
        text = inserts[key1] 
        for key2 in inserts:                                 # for all insert keys
            text = text.replace(key2, inserts[key2])         # no-op if no match
        inserts[key1] = text                                 # inserts changed in-place
        

def sourcenewer(pathfrom, pathto, allowance=2):
    """
    was pathfrom changed since pathto was generated?
    2 seconds granularity needed for FAT32: see mergeall;
    this and its follower assume both file paths exist;
    """
    fromtime = os.path.getmtime(pathfrom)
    totime   = os.path.getmtime(pathto)
    return fromtime > (totime + allowance)


def insertsnewer(textfrom, pathto, insmodtimes, allowance=2):
    """
    was any used insert file changed since pathto was generated?
    2 seconds granularity needed for FAT32: see mergeall;
    this could use any() and generators...but should it?
    """
    # for all insert files, check if newer and used
    totime = os.path.getmtime(pathto)
    for (inskey, instime) in insmodtimes:
        if instime > (totime + allowance) and inskey in textfrom:
            return True
    return False


def loadtemplate(pathfrom):
    """
    try to load an HTML template file, using various Unicode types,
    from simpler to more complex; return tuple of text + encoding;
    """
    for encoding in TemplateEncodings:
        try:
            file = open(pathfrom, mode='r', encoding=encoding)
            text = file.read()
            return (text, encoding)               # success: return now
        except:                                 
            trace(encoding, 'invalid, ', sys.exc_info()[0])                   
    return (None, None)                           # no encoding worked


def generatehtmls(filestoprocess, inserts, insmodtimes):
    """
    generate expanded HTML files for all HTML templates that have no
    expansion yet, are newer than their expanded versions, or use any
    insert file that is newer than their expanded versions;
    """
    global numcnv, numcpy, numskip, numfail
    
    for filename in filestoprocess:
        print('=>', filename, end=': ')
        pathfrom = os.path.join(SOURCEDIR, filename)
        pathto   = os.path.join(TARGETDIR, filename)
        
        if not os.path.isfile(pathfrom):
            # skip any subdirs, etc.
            print('non-file, skipped')
            numskip += 1

        elif not filename.endswith(('.htm', '.html')):
            #
            # non-html file: don't attempt regen, copy to target if changed;
            # used when entire site in templates dir, not just templates
            #
            if os.path.exists(pathto) and not sourcenewer(pathfrom, pathto):
                # source file unchanged, don't copy over
                print('unchanged, skipped')
                numskip += 1
                
            else:
                # copy in binary mode unchanged
                rawbytes = open(pathfrom, mode='rb').read()
                file = open(pathto, mode='wb')
                file.write(rawbytes)
                file.close()                        # close for non-CPython
                shutil.copystat(pathfrom, pathto)   # copy modtime over too
                print('COPIED unchanged')
                numcpy += 1

        else:
            #
            # html file: regen to target if html or used inserts changed;
            # whether templates dir is entire site, or templates only
            #
            (text, encoding) = loadtemplate(pathfrom)
            if text == None:
                print('**FAILED**')
                numfail += 1
               
            elif (os.path.exists(pathto)                           # target generated
                  and not (
                      sourcenewer(pathfrom, pathto) or             # source not changed
                      insertsnewer(text, pathto, insmodtimes) or   # no used insert changed
                      FORCEREGEN                                   # not forcing full regen
                 )):
                # neither html template nor any used insert newer than target
                print('unchanged, skipped')
                numskip += 1

            else:
                # globally replace keys in text and copy over;
                # no copystat(): insertsnewer() needs new modtime
                for key in inserts:                                # for all filename keys
                    text = text.replace(key, inserts[key])         # no-op if no match
                file = open(pathto, mode='w', encoding=encoding)   # encoding=source's
                file.write(text)
                file.close()                                       # close for non-CPython
                print('GENERATED, using', encoding)
                numcnv += 1


if __name__ == '__main__':
    numcnv = numcpy = numskip = numfail = 0    # globals all

    # empty target dir?
    if CLEANTARGET:
        for filename in os.listdir(TARGETDIR):
            os.remove(os.path.join(TARGETDIR, filename))
        print('--Target dir cleaned')

    # load/add inserts text
    inserts, insmodtimes = loadinserts()
    print('Will replace all:',
          *(key for key in sorted(inserts)), sep='\n\t', end='\n\n')

    # expand inserts replacements first
    expandinserts(inserts)

    # check run mode
    if len(sys.argv) == 1:
        filestoprocess = os.listdir(SOURCEDIR)    # all files in source dir
    else:
        filestoprocess = sys.argv[1:]             # or just filename(s) in args

    # expand and copy templates, copy others
    generatehtmls(filestoprocess, inserts, insmodtimes)

    # wrap up
    summary = '\nDone: %d converted, %d copied, %d skipped, %d failed.'
    print(summary % (numcnv, numcpy, numskip, numfail))
    if sys.platform.startswith('win'):
        input('Press enter to close.')   # retain shell if clicked
