ERL2DB(1)                                               ERL2DB(1)


NAME
       erl2db  -  ERL download format to DBase import format con-
       version program.



SYNOPSIS
       erl2db [-ahdDeirRsSvV] [-E editor-path]  [-o  output-file]
       [--] [file...]



DESCRIPTION
       erl2db  is  a  program  to convert an Electronic Reference
       Library (ERL) download file to a DBase ASCII import  file.
       The download file should be created with WinSPIRS and con-
       tain all fields, each field on a single line (see  subsec-
       tion Example ERL datarecord). Abbreviated fieldidentifiers
       must be used (TI:, AU: etc.). The outputfile contains  all
       fields  of one record on a single line. It is suited to be
       read by DBase.

       This section is divided into  the  following  subsections:
       Initialization,    Options,    Processing,   Example   ERL
       datarecord, Example erl2db output record, Example profile,
       Syntax  of  profile, Semantics of profile and Program exit
       status.


   Initialization
       When erl2db is run, it  starts  scanning  the  commandline
       parameters. Then erl2db looks for a profile in the current
       directory. If no profile is found there, erl2db looks  for
       the system-wide profile in the directory where the program
       resides. If no profile has been  found,  erl2db  issues  a
       warning message: without a profile that contains an output
       definition, no output records are generated.  erl2db  pro-
       cesses the datafile(s) specified and outputs the converted
       records and statistic information. When no files are spec-
       ified, erl2db behaves as a filter program.


   Options
       erl2db can be executed with the following options:

       -a      author,

       -h      overview of options,

       -D      print debug information on stderr,

       -d      print debug information on stdout,

       -e      edit address when it contains too many fields,



23 Aug 1996epartment of Biophysics, Huygens Laboratorium        1





ERL2DB(1)                                               ERL2DB(1)


       -ee     force edit of addressfields,

       -i      select  files  specified  on  commandline interac-
               tively,

       -R      print number of record last processed (stderr),

       -r      print number of record last processed (stdout),

       -S      print statistic message (grand total, on stderr),

       -s      print statistic message (grand total, on stdout),

       -ss     print also statistic message for each record,

       -V      print informative messages (filename, on stderr),

       -v      print informative messages (filename, on stdout),

       -vv     print also number of record ([d]) for each  record
               processed,

       -vvv    print  also  ERL  fieldname  ([ll]) for each field
               processed,

       -vvvv   print also contents ([contents])  for  each  field
               processed,

       --      end option section,

       -E editor-path
               specify name or path of editor (implies -e),

       -o output-file
               specify name of outputfile.

       A  %s in the argument for option -E can be used to specify
       the position of the name of the file with the address that
       is to be edited.


   Processing
       erl2db  processes one record at a time. First it scans the
       various fields  and  does  fieldspecific  inputprocessing,
       like case conversion and word substitutution, as specified
       by the [Capitalize], [Title] etc. sections in the profile.
       Then  it writes the output record as specified by the pro-
       file [Output] section and the record statistics as  speci-
       fied by the profile [Statistics] section.

       Following are the most notable inputprocessing steps.

       Title
       The  titleline is split into separate words. The words are



23 Aug 1996epartment of Biophysics, Huygens Laboratorium        2





ERL2DB(1)                                               ERL2DB(1)


       lookup-up in the dictionary filled with entries  from  the
       profile  [Title]  section and replaced when found. Finally
       words not present in the dictionary are capitalized if  so
       specified  in  the profile [Capitalize] section. Words are
       separated by the  following  characters:  space,  tab  and
       ".,:;/'!?*()[]{}<>.
       When  the  titleline  is  too long according to the corre-
       sponding output definition, it is truncated and ends  with
       an ellipsis (...).

       Authors
       The format of the author fields is changed from e.g BOSCH-
       MK to Bosch,M.K..  The names are capitalized as  specified
       in the profile [Capitalize] section.
       The  author  names can be retrieved in an alternate output
       format with the Authors_lt  [Output]  definition  as:  MK
       Bosch.
       In  some  instances,  -(Reprint-Author) is appended to the
       name of an author.  This name can be  retrieved  with  the
       ReprintAuthor  [Output]  definition as: MK Bosch. If there
       is no such indication, ReprintAuthor yields  the  name  of
       the first author.

       Address
       The  address  field  is  scanned  upto the phrase (Reprint
       Address). The words are lookup-up in the dictionary filled
       with  entries  from  the  profile  [Address]  section  and
       replaced when found. Finally words not present in the dic-
       tionary  that  are longer than two characters are capital-
       ized if so specified in the profile [Capitalize]  section.
       Words  are  separated by, and do not contain the following
       characters: space, comma and semicolon.

       Journal
       The journalname is obtained from the ERL SO: field. It  is
       changed  from  e.g.  BIOCHEMISTRY-MOSCOW  to  Biochemistry
       Moscow. The words are lookup-up in the  dictionary  filled
       with  entries  from  the  profile  [Journal]  section  and
       replaced when found. Finally words not present in the dic-
       tionary  are  capitalized  if  so specified in the profile
       [Capitalize] section. Words are separated by, and  do  not
       contain the following characters: dash and dot.

       Keywords
       The  keywords  are  obtained from the ERL KW:, KA: and KP:
       fields. The words are lookup-up in the  dictionary  filled
       with  entries  from  the  profile  [Keywords]  section and
       replaced when found. Finally words not present in the dic-
       tionary  are  capitalized  if  so specified in the profile
       [Capitalize] section. Words are separated by, and  do  not
       contain the following characters: space and semicolon.

       Abstract
       The  abstractline  is  split  into  words,  separated by a



23 Aug 1996epartment of Biophysics, Huygens Laboratorium        3





ERL2DB(1)                                               ERL2DB(1)


       space. When the abstract is too long according to the cor-
       responding  output  definition,  it  is truncated and ends
       with an ellipsis (...).

       CC-Edition
       The format of the Current Contents edition is changed from
       e.g.   CC-Life-Sciences to Life Sciences. The words of the
       Current Contents edition are capitalized if  so  specified
       in  the profile [Capitalize] section.  Words are separated
       by, and do not contain the following characters: space and
       dash.


   Example ERL datarecord
       An example of an ERL datarecord is shown below.

       AN:  RX893-16 See Table of Contents
       TI:  LACK OF BINDING COMPETITION BETWEEN DIURON AND PERFLUOROISOPROPYLDINITROBENZENE DERIVATIVES, NOVEL...
       AU:  ZHARMUKHAMEDOV-SK; KLIMOV-VV; ALLAKHVERDIEV-SI
       AD:  RUSSIAN ACAD SCI, INST SOIL SCI & PHOTOSYNTH, PUSHCHINO 142292 RUSSIA (Reprint Address)
       SO:  BIOCHEMISTRY-MOSCOW.  JUN 1995; 60 (6) : 723-728.
       PT:  Article-Citation
       PY:  1995
       IS:  0006-2979
       LA:  ENGLISH
       KA:  PHOTOSYSTEM II; LIGHT INDUCED ELECTRON TRANSFER; ELECTRON TRANSFER INHIBITORS; DIURON; COMPETITIVE...
       KP:  THYLAKOID MEMBRANE; HERBICIDE BINDING; REACTION CENTERS; FLUORESCENCE; PLASTOQUINONE; CHLOROPLASTS;...
       AB:  Binding competition between Diuron and perfluoroisopropyldinitrobenzene (PFIPDNB)(3) derivatives, novel...
       JS:  BIOCHEMISTRY-AND-BIOPHYSICS
       CC:  CC-Life-Sciences
       RF:  32 REFS
       GA:  RX893
       UD:  9603



   Example erl2db output record
       Here  is  the  erl2db output record for the ERL datarecord
       shown above, when erl2db is used with the profile as shown
       in  subsection Example profile below. Note that all infor-
       mation is contained in one line.

       "Zharmukhamedov","S.K.","Klimov","V.V.","Allakhverdiev","S.I.","","","","","","",
       "PHOTOSYSTEM II","LIGHT INDUCED ELECTRON TRANSFE","ELECTRON TRANSFER INHIBITORS",
       "DIURON","COMPETITIVE BINDING","THYLAKOID MEMBRANE","HERBICIDE BINDING",
       "REACTION CENTERS","FLUORESCENCE","PLASTOQUINONE",
       "Lack Of Binding Competition Between Diuron And Perfluoroisopropyldinitrobenzene
       Derivatives, Novel Inhibitors Of Electron Transfer In Photosystem II",
       "Biochemistry Moscow","English","J","60","723","728",
       "SK Zharmukhamedov",
       "Russian Acad Sci","Inst Soil Sci & Photosynth","Pushchino 142292 Russia","","","","",
       "1995",
       "Binding competition between Diuron and perfluoroisopropyldinitrobenzene
       (PFIPDNB)(3) derivatives, novel inhibitors of photosystem II (PS II), was



23 Aug 1996epartment of Biophysics, Huygens Laboratorium        4





ERL2DB(1)                                               ERL2DB(1)


       investigated by studying their effect on the electron transfer reactions in
       PS II. The inhibition of PS ","II reactions by the PFIPDNB derivatives was
       insensitive to Diuron at concentrations that exceed that of sdthe PFIPDNB
       derivatives by two orders of magnitude. The lack of the functional
       competition between these substances indicates that the binding sit","e for
       the PFIPDNB derivatives is different from that for Diuron, a known inhibitor
       of electron transfer in PS II.","","","","","","","","9603"



   Example erl2db log-messages
       Here is an example of the  erl2db  log-messages,  obtained
       with option -vvv.


       [AN][TI][AU][AD][SO][PT][PY][IS][LA][KA][KP][AB][JS][CC][RF][GA][UD][2]
       Record #2:
       title complete
       abstract complete
       journal complete
       volume complete
       publication year complete
       begin page complete
       end page complete
       language complete
       3 authors, 0 truncated, 0 skipped
       15 keyword fields, 1 truncated, 5 skipped
       3 address fields, 0 truncated, 0 skipped, not edited

        ...

       Grand total of:
       1 file processed
       72 records processed
       0 titles truncated
       2 abstracts truncated
       0 journal names truncated
       0 volume fields truncated
       0 publication years truncated
       0 begin pages truncated
       0 end pages truncated
       1 language field truncated
       273 authors, 0 truncated, 3 skipped
       810 keyword fields, 26 truncated, 162 skipped
       305 address fields, 6 truncated, 0 skipped, 0 addresses edited



   Example profile
       The  profile  contains  information  on  which fields must
       undergo case conversion  (be  capitalized),  the  specific
       spelling  and case of words in title, keyword, journal and
       address fields, the specification of  the  format  of  the
       output  record  and  the  specification  of  the statistic



23 Aug 1996epartment of Biophysics, Huygens Laboratorium        5





ERL2DB(1)                                               ERL2DB(1)


       information that is to be  printed  (see  also  subsection
       Syntax of profile).

       Here is an example erl2db profile.

       #
       # bin/erl2db.pro - system wide profile for ERL to Dbase conversion program.
       #

       #
       # Capitalization of the following fields:
       #

       [Capitalize]

       Authors
       Title
       Journal
       Language
       Address

       #
       # Word spelling and capitalization, as used in title translation:
       #

       [Title]

       "II"  = "II"             # as in Photosystem II
       "EPR" = "EPR"            # abbreviation

       #
       # Word spelling and capitalization, as used in journal translation:
       #

       [Journal]

       "AND" = "and"
       "ET"  = "et"
       "THE" = "the"
       "OF"  = "of"

       #
       # Word spelling and capitalization, as used in address translation:
       #

       [Address]

       "POB" = "POB"            # as in POB 9504
       "USA" = "USA"            # as in NY 10032 USA

       #
       # output field format:
       #




23 Aug 1996epartment of Biophysics, Huygens Laboratorium        6





ERL2DB(1)                                               ERL2DB(1)


       [Output]

       Authors         = "30, 10, 10"     # nameWidth, initWidth, #
       Number          = " "         # authorArtn  = "5"
       Keywords        = "30, 10"    # width, #
       Title           = "250, 1"    # width, #
       Journal         = "100"       # width
       Language        = "7"         # width  ( 3? )
       String          = "J"         # default Journal
       Volume          = "8"         # width
       BeginPage       = "6"         # width
       EndPage         = "6"         # width
       ReprintAuthor   = "30"        # width
       Address         = "30, 7"          # width, #
       PublicationYear = "4"         # width
       Abstract        = "250, 10"   # width, #
       UpdateCode      = "4"         # width

       #
       # statistic fields to print:
       #

       [Statistics]

       File
       Record
       Title
       Abstract
       Journal
       Volume
       PublicationYear
       BeginPage
       EndPage
       Language
       Authors
       Keywords
       Address

       #
       # End of file
       #



   Syntax of profile
       The  syntax of the profile is shown below. The contents of
       the profile are divided into sections. The definitions  of
       these  sections  are  case-sensitive.  Comments are intro-
       duced by a # and extend to the end of the line.   Comments
       and whitespace are ignored.

       To  simplify  the  syntax  of the profile, a single set of
       reserved words is used for all sections,  though  not  all
       reserved  words  are  meaningful  in  each  section  (e.g.



23 Aug 1996epartment of Biophysics, Huygens Laboratorium        7





ERL2DB(1)                                               ERL2DB(1)


       PublicationYear in section [Capitalize]). In these circum-
       stances,  the  use of a reserved word is silently ignored.
       Use of an invalid word in a section  is  signalled  as  an
       error though.

       definition         ::= section*

       section            ::= capitalize-section
                            | title-section
                            | keywords-section
                            | journal-section
                            | address-section
                            | output-section
                            | statistics-section

       capitalize-section ::= '[Capitalize]' enumeration-entry*

       title-section      ::= '[Title]'      dictionary-entry*

       keywords-section   ::= '[Keywords]'   dictionary-entry*

       journal-section    ::= '[Journal]'    dictionary-entry*

       address-section    ::= '[Address]'    dictionary-entry*

       output-section     ::= '[Output]'     definition-entry*

       statistics-section ::= '[Statistics]' enumeration-entry*

       enumeration-entry  ::= reserved-word

       dictionary-entry   ::= string '=' string

       definition-entry   ::= reserved-word '=' string

       string             ::= '"' [printable]* '"'

       reserved-word      ::= 'Abstract'
                            | 'Address'
                            | 'Authors'
                            | 'Authors_lt'
                            | 'BeginPage'
                            | 'CC-Edition'
                            | 'EndPage'
                            | 'File'
                            | 'GenuineArticle'
                            | 'ISSN'
                            | 'Journal'
                            | 'JournalSubject'
                            | 'Keywords'
                            | 'Language'
                            | 'Number'
                            | 'PublicationType'
                            | 'PublicationYear'



23 Aug 1996epartment of Biophysics, Huygens Laboratorium        8





ERL2DB(1)                                               ERL2DB(1)


                            | 'RecordType'
                            | 'Record'
                            | 'References'
                            | 'ReprintAuthor'
                            | 'String'
                            | 'Title'
                            | 'UpdateCode'
                            | 'Volume'




   Semantics of profile
       The  strings at the left side of the assignment in dictio-
       nary entries should be single words in accordance with the
       word-splitting mechanism for that dictionary section.

       The  output  format definitionstrings in the [Output] sec-
       tion come in three versions:

       "fieldwidth, fieldwidth, number-of-fields"
                                               Authors     (name,
                                               initials).

       "fieldwidth, number-of-fields"          Abstract,
                                               Authors_lt,  Key-
                                               words, Title.

       "fieldwidth"                            all  other output-
                                               fields

       The Number and String output definitions can  be  used  to
       insert  static numerical and string fields into the output
       record (see subsection Example profile).

       The File and Record reserved words are  normally  used  in
       the  [Statistics]  section only. However, they also can be
       used in the [Output] section to provide the  name  of  the
       file  being processed and the number of the record respec-
       tively.

       Meaningful and correct statistic information can  only  be
       gathered for fields that are included in the [Output] sec-
       tion, except for the fields File and Record. Because it is
       checked  if  a  field  for  which statistic information is
       requested is included in the [Output] section,  the  [Out-
       put] section should precede the [Statistics] section.


   Program exit status
       When  a  file cannot be found, or the file cannot be prop-
       erly processed, the program stops and issues an error mes-
       sage.  The  failure  to process a file is reflected in the
       programs exit status (see DIAGNOSTICS below).



23 Aug 1996epartment of Biophysics, Huygens Laboratorium        9





ERL2DB(1)                                               ERL2DB(1)


ENVIRONMENT
       COMSPEC         command interpreter used to run the editor
                       for address-editing.



FILES
       erl2db.pro      profile in current directory,

       bin\erl2db.pro  system-wide  profile, in same directory as
                       erl2db.exe.



DIAGNOSTICS
       erl2db can return the following exit values:

       0  success: program execution has been  successfully  com-
          pleted,

       1  commandline error: an invalid option is specified,

       2  processing error: a file could not be opened or closed,
          an error occurred while writing to an output file,

       3  interruption: the user interrupted the program,

       4  internal error:  an  unexpected  situation  in  program
          behaviour occurred.



SEE ALSO
       mkdb(1), mkdbfix(1),



EXAMPLE
       erl2db -RssvvvE "c:\bin\e %s 2" -o erl2db.out erl2db.inp >
       erl2db.log



BUGS
       (to be determined.)



AUTHOR
       M.J. Moene







23 Aug 1996epartment of Biophysics, Huygens Laboratorium       10