Starfish is a system for Perl-based text-embedded programming and preprocessing. This is the Web documentation page. The main code of the system is in Starfish.pm. You can find more files in the public project directory. The module is available on CPAN as well.

# NAME

Text::Starfish.pm and starfish - A Perl-based System for Text-Embedded Programming and Preprocessing

# SYNOPSIS

starfish [ -o=outputfile ] [ -e=initialcode ] [ -replace ] [ -mode=mode ] file...

where files usually contain some Perl code, delimited by <? and !>. To produce output to be inserted into the file, use variable $O or function echo. # DESCRIPTION (The documentation is probably not up to date.) Starfish is a system for Perl-based text-embedded programming and preprocessing, which relies on a unifying regular expression rewriting methodology. If you know Perl and php, you probably know the basic idea: embed Perl code inside the text, execute it is some way, and interleave the output with the text. Very similar projects exist and some of them are listed in "SEE ALSO". Starfish is, however, unique in several ways. One important difference between starfish and similar programs (e.g. php) is that the output does not necessarily replace the code, but it follows the code by default. It is attempted with Starfish to provide a universal text-embedded programming language, which can be used with different types of textual files. There are two files in this package: a module (Starfish.pm) and a small script (starfish) that provides a command-line interface to the module. The options for the script are described in subsection ""starfish_cmd list of file names and options"". The earlier name of this module was SLePerl (Something Like ePerl), but it was changed it to starfish -- sounds better and easier to type. One option was oyster,' but some people are thinking about using it for Perl beans, and there is a (yet another) Perl module for embedded Perl Text::Oyster, so it was not used. The idea with the starfish' name is: the Perl code is embedded into a text, so the text is equivalent to a shellfish containing pearls. A starfish comes by and eats the shellfish... Unlike a natural starfish, this starfish is interested in pearls and does not normally touch most of the surrounding meat. # EXAMPLES ## A simple example A simple example, after running starfish on a file containing:  <?$O= "Hello world!" !>

we get the following output:

     <? $O= "Hello world!" !> #+ Hello world! #- The output will not change after running the script several times. The same effect is achieved with:  <? echo "Hello world! !> The function echo simply appends its parameters to the special variable$O.

Some parameters can be changed, and they vary according to style, which depends on file extension. Since the code is not stable, they are not documented, but here is a list of some of them (possibly incorrect):

 - code prefix and suffix (e.g., <? !> )
- output prefix and suffix (e.g., \n#+\n \n#-\n )
- code preparation (e.g., s/\\n(?:#+|%+\/\/+)/\\n/g )

## HTML Examples

### Example 1

If we have an HTML file, e.g., 7.html with the following content:

  <HEAD>
<BODY>
<!--<? $O="This code should be replaced by this." !>--> </BODY> then after running the command  starfish -replace -o=7out.html 7.html the file 7out.html will contain:  <HEAD> <BODY> This code should be replaced by this. </BODY> The same effect would be obtained with the following line:  <!--<? echo "This code should be replaced by this." !>--> ### Output file permissions The permissions of the output file will not be changed. But if it does not exist, then:  starfish -replace -o=7out.html -mode=0644 7.html makes sure it has all-readable permission. ### Example 2 Input file 21.html:  <!--<? use CGI qw/:standard/; echo comment('AUTOMATICALLY GENERATED - DO NOT EDIT'); !>--> <HTML><HEAD> <TITLE>Some title</TITLE> </HEAD> <BODY> <!--<? echo "Put this." !>--> </BODY> </HTML> Output:  <!-- AUTOMATICALLY GENERATED - DO NOT EDIT --> <HTML><HEAD> <TITLE>Some title</TITLE> </HEAD> <BODY> Put this. </BODY> </HTML> ## Example from a Makefile  LIST=first second third\ fourth fifth <? echo join "\n", getmakefilelist$Star->{INFILE}, 'LIST' !>
#+
first
second
third
fourth
fifth
#-

Beside $O,$Star is another predefined variable: It refers to the Starfish object currently processing the text.

## TeX and LaTeX Examples

### Simle TeX or LaTeX Example

Generating text with a variable replacement:

  <?echo "
When we split the probability reserved for unseen characters equally
among the remaining $UnseenNum characters, we obtain the final estimated probabilities: "!> ### Example from a TeX file  % <?$Star->Style('TeX') !>

% For version 1 of a document
% <? #$Star->addHook("\n%Begin1","\n%End1",'s/\n%+/\n/g'); % #$Star->addHook("\n%Begin2","\n%End2",'s/\n%*/\n%/g');
%    #For version 2
%    $Star->addHook("\n%Begin1","\n%End1",'s/\n%*/\n%/g'); %$Star->addHook("\n%Begin2","\n%End2",'s/\n%+/\n/g');
% !>

%Begin1
%Document 1
%End1

%Begin2
Document 2
%End2

## Example with Test/Release versions (Java)

Suppose you have a stanalone java file p.java, and you want to have two versions:

  p_t.java -- for complete code with all kinds of testing code, and
p.java -- clean release version.

Solution:

Copy p.java to p_t.java and modify p_t.java to be like:

  /** Some Java file.  */

//<? $O = defined($Release) ?
// "public class p {\n" :
// "public class p_t {\n";
//!>//+
public class p_t {
//-

public static int main(String[] args) {

//<? $O = " ".(defined$Release ?
//qq[System.out.println("Test version");] :
//qq[System.out.println("Release version");]);
//!>//+
System.out.println("Release version");//-

return 0;
}
}

In Makefile, add lines for updating p_t.java, and generating p.java (readonly, so that you do not modify it accidentally):

  p.java: p_t.java
starfish -o=$@ -e='$$Release=1' -mode=0400$<
tmp.ind: p_t.java
starfish $< touch tmp.ind ## Command-line Examples The following are the reference examples. For further information, please lookup the explanations of the command-line options and arguments. starfish -mode=0400 -replace -o=paper.tex -mode=0400 paper.tex.sfish In the above line, Starfish is used on top of a TeX/LaTeX file. The Starfish is separated from the .tex file to keep the source clean. However, a user in this situation may by mistake start editing the paper.tex file, so we set the output file mode to 0400 to prevent this accidental editing. ## Macros Note: This is a quite old part of Starfish and needs a revision. Macros are a form of code folding (related terms: holophrasting, ellusion(?)), expressed in the Starfish framework. Starfish includes a set of macro features (primitive, but in progress). There are two modes, hidden macros and not hidden, which are indicated using variable$Star->{HideMacros}, e.g.:

  starfish -e='$Star->{HideMacros}=1' *.sfish starfish *.sfish Macros are activated with:  <?$Star->defineMacros() !>

In Java mode, a macro can be defined in this way:

  //m!define macro name
...
//m!end

After //m!end, a newline is mandatory. After running Starfish, the definition will disapear in this place and it will be appended as an auxdefine at the end of file.

In the following way, it can be defined and expanded in the same place:

  //m!defe macro name
...
//m!end

A macro is expanded by:

  //m!expand macro name

When macro is expanded it looks like this:

  //m!expanded macro name
...
//m!end

Macro is expanded even in hidden mode by:

  //m!fexpand macro name

and then it is expanded into:

  //m!fexpanded macro name
...
//m!end

Hidden macros are put at the end of file in this way:

  //auxdefine macro name
...
//endauxdefine

Old macro definition can be overriden by:

  //m!newdefe macro name
...
//m!end

# PREDEFINED VARIABLES AND FIELDS

## $O After executing a snippet, the contents of this variable represent the snippet output. ##$Star

More precisely, it is $::Star.$Star is the Starfish object executing the current code snipet (this). There can be a more such objects active at a time, due to executing Starfish from a starfish snippet. The name is introduced into the main namespace, which might be a questionable decision.

## $Star->{INFILE} Name of the current input file. ##$Star->{Loops}

Controls the number of iterations. The default value is 1, but we may want to repeat starfishing the text several times, or even until a fix-point is reached. For example, by setting the number of Loops to be at least 2, as in:

    $Star->{Loops} = 2 if$Star->{Loops}<2;

we require Starfish to proces the input in at least two iterations.

## $Star->{Out} Output content of the current processing unit. For example, to use #-style line comments in the replace Starfish mode, one can make a final substitution in an HTML file:  <!--<?$Star->{Out} =~ s/^#.*\n//mg; !>-->

It is important to have in mind that the contents of this variable is the output processed so far, so any final output processing should be done in a snippet where no new output is produced.

## $Star->{OUTFILE} If option -o=* is used, then this variable contains the name of the specified output file. # METHODS ## Text::Starfish->new(options) The method for creation of a new Starfish object. If we are already processing within a Starfish object, we may use a shorter variant$Star->new().

The options, given as arguments, are a list of strings, which may include the following:

-infile=* Specifies the name of the input file (field INFILE). The file will not be read.

-copyhooks Copies hooks from the Star object ($::Star). This option is also available in loadinclude, getinclude, and include, from which it is passed to new. It causes the new object to have similar properties as the current Star object. It could be generalized to include any specified object, or to use the prototype object that is given to the constructor, but there does not seem to be need for this generalization. More precisely, -copyhooks copies the fields: Style, CodePreparation, LineComment, IgnoreOuter, and per-component copies the array hook. ##$o->add_tag($tag,$action)

Normally used by sfish_add_tag by translating the call to $Star-add_tag($tag, $action)>. Examples: $Star->add_tag('slide', 'ignore');
sub { $_="<a name\"$_[2]\"><h3>$_[3]</h3</a>" }) !>--> line before .section:overview Document Overview line after will produce the following output, in the replace mode:  line before <a name"overview"><h3>Document Overview</h3</a> line after ##$o->addHook

This method is deprecated. It will be gradually replaced with add_hook, which is better defined since it includes hook type.

Adds a new hook. The method can take two or three parameters:

 ($prefix,$suffix, $evaluator) or  ($regex, $replacement) In the case of three parameters ($prefix, $suffix,$evaluator), the parameter $prefix is the starting delimiter,$suffix is the ending delimiter, and $evaluator is the evaluator. The parameters$prefix and $suffix can either be strings, which are matched exactly, or regular expressions. An empty ending delimiter will match the end of input. The evaluator can be provided in the following ways: special string 'default' in which case the default Starfish evaluator is used, special strings 'ignore' and 'echo' 'ignore' ignores the hook and produces no echo, 'echo' simply echos the contests between the delimiters. other strings are interpreted as code which is embedded in an evaluator by providing a local$_, $self which is the current Starfish object,$p - the prefix, and $s the suffix. After executing the code$p.$_.$s is returned, unless in the replacement mode, in which $_ is returned. code reference (sub {...}) is interpreted as code which is embedded in an evaluator. The local$_ provides the captured string. Three arguments are also provided to the code: $p - the prefix,$_, and $s - the suffix. The result is the value of$_.

For the format with two parameters, ($regex,$replacement), currently in this mode addHook understands replacement 'comment' and code reference (e.g., sub { ... }). The replacement 'comment' will repeat the token in the non-replace mode, and remove it in the replace mode; e.i., equivalent to no echo. The regular expression is matched in the multi-line mode, so ^ and $can be used to match beginning and ending of a line. (Caveat: Due to the way how scanner works, beginning of a line starts after the end of previously matched token.) Example: $Star->addHook(qr/^#.*\n/, 'comment');

## $o->ignore_outer() Sets the mode for ignoring the outer text in the replace mode. The function sfish_ignore_outer does the same on the default object Star. If an argument is given, it is used to set the mode, so as a consequence the mode can be turned off by giving the argument ''. ##$o->last_update()

Or just last_update(), returns the date of the last update of the output.

## $o->process_files(@args) Similar to the function starfish_cmd, but it expects already built Starfish object with properly set options. Actually, starfish_cmd calls this method after creating the object and returns the object. ##$o->rmHook($p,$s)

Removes a hook specified by the starting delimiter $p, and the ending delimiter$s.

# PREDEFINED FUNCTIONS

## include( filename and options ) -- starfish a file and echo

Reads, starfishes the file specified by file name, and echos the contents. Similar to PHP include. Uses getinclude function.

## getinclude( filename and options ) -- starfish a file and return

Reads, starfishes the file specified by file name, and returns the contents (see also include to echo the content implicitly). By default, the program will not break if the file does not exist. The option -noreplace will starfish file in a non-replace mode. The default mode is replace and that is usually the mode that is needed in includes (non-replace may lead to a suprising behaviour). The option -require will cause program to croak if the file does not exist. It is similar to the PHP function require. A special function named require is not used since require is a Perl reserved word. Another interesting option is -copyhooks, for using hooks and some other relevant properties from the Star object ($::Star). This option is eventually passed to new, so you can see the constructor new for more details. The code for get include is the following:  sub getinclude($@) {
my $sf = loadinclude(@_);$sf->digest();

## starfish_cmd list of file names and options

The function starfish_cmd is called by the script starfish with the @ARGV list as the list of arguments. The function can also be used from Perl code to "starfish" a file, e.g.,

    starfish_cmd('somefile.txt', '-o=outfile', '-replace');

The arguments of the functions are provided in a similar fashion as argument to the command line. As a reminder, the command usage of the script starfish is:

starfish [ -o=outputfile ] [ -e=initialcode ] [ -replace ] [ -mode=mode ] file...

The options are described below:

-o=outputfile

specifies an output file. By default, the input file is used as the output file. If the specified output file is '-', then the output is produced to the standard output.

-e=initialcode

specifies the initial Perl code to be executed.

-replace

will cause the embedded code to be replaced with the output. WARNING: Normally used only with -o.

-mode=mode

specifies the mode for the output file. By default, the mode of the source file is used (the first one if more outputs are accumulated using -o). If an output file is specified, and the mode is specified, then starfish will set temporarily the u+w mode of the output file in order to write to that file, if needed.

Those were the options.

## appendfile filename, list

appends list elements to the file.

## echo string

appends string to the special variable $0. ## DATE AND TIME FUNCTIONS ### current_year returns the current year in string format. ### file_modification_time Returns modification time of this file (in format of Perl time). ### file_modification_date Returns modification date of this file (in format: Month DD, YYYY). ## FILE FUNCTIONS getfile file grabs the content of the file into a string or a list. getmakefilelist makefile, var returns a list, which is a list of words assigned to the variable var; e.g.,  FILE_LIST=file1 file2 file3\ file4 <? echo join "\n", getmakefilelist$Star->{INFILE}, 'FILE_LIST' !>

Embedded variables are not handled.

putfile filename, list

opens file, writes the list elements to the file, and closes it. putfile filename' "touches" the file.

The function takes one string argument. If it starts with 'file=' then the rest of the string is treated as a file name, which contents replaces the string in further processing. The string is translated into a list of records (hashes) and a reference to the list is returned. The records are separated by empty line, and in each line an attribute and its value are separated by the first colon (:). A line can be continued using backslash (\) at the end of line, or by starting the next line with a space or tab. Ending a line with \ effectively removes the "\\\n" string at the end of line, but "\n[ \t]" combination is replaced with "\n". Comments, starting with the hash sign (#) are allowed between records. An example is:

  id:1
name: J. Public
phone: 000-111

id:2
etc.

If an attribute is repeated, it will be renamed to an attribute of the form att-1, att-2, etc.

Reads recursively (up the dir tree) configuration files starfish.conf.

# STYLES

There is a set of predefined styles for different input files: HTML (html), HTML templating style (.html.sfish), TeX (tex), Java (java), Makefile (makefile), PostScript (ps), Python (python), and Perl (perl).

## HTML Templating Style (.html.sfish)

This style is similar to the HTML style, but it is supposed to be run in the replace mode towards a target .html file, so it allows for more hooks. The character # (hash) at the beginning of a line denotes a comment.

## Makefile Style (makefile)

The main code hooks are <? and >.

Interestingly, the makefile style has similar special requirements as Python. For example, in the following expansion:

 starfish: tmp
starfish Makefile
#<? if (-e "file.tex.sfish")
#{ echo "\tstarfish -o=tmp/file.tex -replace file.tex.sfish" } !>
#+
starfish -o=tmp/file.tex -replace file.tex.sfish
#-

it is convenient to have the embedded output indented in the same way as the embedded code.

# STYLE SPECIFIC PREDEFINED FUNCTIONS

## get_verbatim_file( filename )

Specific to LaTeX mode. Reads textual file filename and returns a string ready for inclusion in a LaTeX document. It untabifies the file contests for proper representation of whitespace. The function code is basically:

    return "\\begin{verbatim}\n".
