Reg
is a small program to manipulate small registers. A register is a sequences of
records stored in a file, i.e., the same as a table, or a relation, in a
relational database.
Registers are represented as
Haskell values of type
[[String]]
in textual
form. The first element in this list determines the names of the
fields of the records in the file.
Example:
All records should have the same number of fields.[["Name","Extension"], ["Thomas Hallgren","5422"], ["Magnus Carlsson","1058"], ["Ana Bove","1020"] ]
Synopsis
Reg location
output_format
op1 ... opn
input_format
Operations are performed from right to left, just as function
applications and function compositions in Haskell. The
input_format is this thus the first operation,
determining the format of the input, and
output_format is the last operation, that determines
the format of the output. Since all operations have a known number of
parameters, no parantheses or other delimiters are needed to separate
one operation from the next.
Without any arguments, Reg
could naturally pass the input to
the output unchanged, but since this would be rather pointless,
Reg
outputs a usage message instead.
Register locations
Location | Meaning |
---|---|
if no location is given, input is taken from standard input and output is written to standard output | |
file path
| Reg operates on the contents of the file at the given
path. To prevent data loss, the result is written
to a temporary file that is renamed to replace the file at
path. Write permission is thus required in the directory
where the register is located, permissions and ownership of the
register file might change and links might be broken. It is probably
also wise to limit the use of input and output format conversions,
to keep the output in the same format as the input.
|
Input/output formats
Reg
supports a number of bidirectional format conversions:
Input | Output | Meaning |
---|---|---|
If no input or output format is given, the register file format is used. | ||
from-show
| show
| A human readable textual format. See Examples below. |
from-csv
| csv
| Use the CSV format (comma-separated values, RFC 4180), where fields are separated by comma. Fields can be enclosed in double quotes, in which case they can contain commas. (No other features of the CSV format are supported at the moment.) The first line is assumed to contain the names of the fields. |
from-ssv
| ssv
| A variant of from-csv /csv where the
values are separated by semicolons instead of commas.
|
from-passwd
| passwd
| Use the UNIX password file format, that is,
with one line per record and : separating the fields.
For input, if the first line starts with # it is assumed
to contain the names of the fields, separated by : .
For output, a line started with # followed by the names of
the fields is always included.
|
from-tabbed
| tabbed
| Use the tab-separated values format, i.e. one record per line with field values separated by tabs. For input, if the fields on the first line are all enclosed in square brackets, they are assumed to be the names of the fields. For output, the first line is always the field names in square brackets. |
from-tabbed0
| tabbed0
| A variant of from-tabbed /tabbed
which doesn't require the field
names to be enclosed in square brackets. The first line is always
the field names, without square brackets.
|
from-url
| url
| The format is one url-encoded-query per line. url-encoded-query is the format used by web browsers when submitting the contents of a form to a web server. |
from-json
| json
| The format is a JSON array containing a number of records. |
Input-only formats
In addition to the intput/output formats above, Reg
can also
read input in the following formats:
Format | Meaning |
---|---|
from-mbox
| The input is assumed to be in the
UNIX mailbox format,
which is a
sequence of mail messages where the beginning of each message is
identified by a line starting with "From ".
The resulting register will have the following fields:
From, To, Date,
Subject, Message-Id, Headers,
FilePos and Body. The five
first fields contain the values of the corresponding mail headers,
The Headers field contains the values of the remaining
headers, the FilePos field contains the position of the
message in the input file and the Body field contains
the body of the mail message.
|
from-clf
| The input is assumed to be in the Common Log Format, or Combined Log Format, used by some web servers. |
Output-only formats
In addition to the intput/output formats above, Reg
can also
produce output in the following formats:
Format | Meaning |
---|---|
html
| Generate an HTML table. The fields are assumed to contain plain text, so characters that have special meaning in HTML are escaped. |
html0
| Generate an HTML table. The fields are assumed to contain HTML, and are output as is. |
fmt format
| Format the output according to a formatting string (see below). |
Formatting strings
Formatting strings used with thefmt
command
work in much the same way as the formatting strings used in the C functions
printf
and strftime
. Most characters stand for
themselves, except the %
characters, which starts a formatting
command.
Format | Meaning |
---|---|
%%
| The percent character. |
%/
| The newline character. |
%0 field;
| The contents of the named record field as is. |
%" field;
| The contents of the named record field wrapped in double
quotes. Double quotes and newline characters are escaped as
\" and \n , respectively.
|
%' field;
| The contents of the named record field wrapped in single
quotes. Single quotes and newline characters are escaped as
\' and \n , respectively.
|
%# field;
| The line count of the contents of the named record field. |
% field;
| The contents of the named record field.
Characters that have special meaning in HTML are escaped, that is,
& is replaced by & and
< is replaced by < .
|
%{ field- fmt}
| This creates an HTML link by using the named field as the URL
and the formatting string fmt as the link text.
If the URL field is empty (or contains only blank space),
the link text is output without turning it into a link. Links can not
be nested and fmt can not contain } .
|
%{ field= fmt}
| As above, but the empty string is substituted if the link field is empty. |
Operations
In the table below, fields denotes a comma seperated list of field names, for exampleName,Phone
.
Operation | Meaning |
---|---|
add url-encoded-query
| Add a new record to the register |
update where what
| Update records matching the url-encoded-query where with the values given in url-encoded-query what. |
pick fields
| Projection. Select the named fields from the records. The order of the fields is not changed. |
drop fields
| Projection. Remove the named fields from the records. |
arrange fields
| Rearrange the fields of the records. This can change the order of the fields, drop some fields, duplicate fields and introduce new fields. |
grep string
| Selection. Select the records that has string as a substring of some field. The comparison is case insensitive. |
grep-in fields string
| Selection. Select the records that has string as a substring of some of the mentioned fields. The comparison is case insensitive. |
urlgrep url-encoded-query
| Selection. Select records with fields that contain substrings of the strings given in url-encoded-query. Fields not mentioned in the query can contain anything. The comparison is case insensitive. |
urlgrep-v url-encoded-query
| Selection. Select records with fields that are not exactly matched by the url-encoded-query. Fields not mentioned in the query can contain anything. |
urlmatch url-encoded-query
urlmatch-v url-encoded-query
| Selection. Select records with fields that match the
url-encoded-query, which can contain ? and
* wildcards. Fields not mentioned in the
query can contain anything. urlmatch-v selects records
that don't match.
|
nub
| Remove duplicate records. |
nubBy fields
| Remove duplicate records. Use only the given fields to determine if two records are equal. |
sort
| Sort the records. Fields are compared lexicographically from left to right. |
sortBy fields
| Sort the records. The given fields are compared lexicographically in the order given. |
sortBy-n fields
| Sort the records like sortBy , except the first
field is compared numerically.
|
reverse
| Reverse the order of the records. |
lines field
| If the contents of field is more than one line long, split the record into several records with one line per record. |
unlines field
| The opposite of lines , that is, consecutive records which
are equal except for the contents of field, are combined.
|
groupBy fields
| Similar to unlines : consecutive records where
the corresponing fields agree are combined.
|
aggr fn fields
| Apply aggregation function fn to the given fields.
This is useful as a postprocessing step after groupBy .
Supported aggregation functions: count , max ,
min , nub , product and
sum . The numeric aggregation functions understand
numbers that have a unit suffix, e.g. 53m².
The aggregation is only applied if all number have the same unit.
|
concat fields
| Combine the given fields into one field by concatenating the contents of the given fields in each record. The name of the new field is the concatenation of fields. |
split field
| Split field into several fields. The contents of the field are split on line breaks. The number of fields is thus determined by the maximum number of lines in the field. The names of the new fields are obtained by appending a number to field. |
string | If string is not one of the operations recognized by
Reg , it is intepreted as grep
string, that is, when you search, you can omit
the word grep in most cases.
|
Examples
For the following examples, we assume that the filepeople
contains the register displayed in the introduction.
Command | Output |
---|---|
Reg show <people
| Name····· Thomas Hallgren Extension 5422 Name····· Magnus Carlsson Extension 1058 Name····· Ana Bove Extension 1020 |
Reg fmt '%Name; %Extension;%/' <people
| Thomas Hallgren 5422 Magnus Carlsson 1058 Ana Bove 1020 |
Reg show grep magnus <people
| Name····· Magnus Carlsson Extension 1058 |
Reg file people update 'Name=Thomas*' Extension=5555
Reg show thomas <people
| Name..... Thomas Hallgren
|
Implementation
Reg
is implemented in Haskell. The source is 555 lines
long (2001-05-20), of which 383 lines were written specifically for
Reg
and 172 lines were reused from other programs and
libraries. It also uses functions defined in the Haskell prelude and
standard libraries, which are not counted here.
Past, present and future
Reg
has evolved over time and could still be improved in various
ways.
groupBy
andaggr
were added in September 2021.- An operation for adding new records to a register was added on 2007-02-04.
- There should probably be an operation for adding new fields to a
register (in an easier way than with
arrange
). - Except that
pick
is slightly more efficient thanarrange
, there is no good reason to have two so similar operations. - There should probably be operations that combine two or more registers
in various ways, for example union, intersection and join.
Currently, there are two separate programs,
RegJoin
andRegCat
, for joining and concatenating registers, respectively. - Conversion from alternate input formats was initially added on
2001-05-20, and the list of supported formats has been extended over
time, but
Reg
could support more input formats.
See also
- Relational databases
- Relational algebra
- SQL
- Haskell
- Unix shell commands: grep, sort, cut.
- Haskell standard functions: nub, nubBy, sort, sortBy, groupBy, lines, filter.