PubDB Beta Release v0.2
by
Daniel Ramsbrock, spikes51 at umiacs.umd.edu
Updated:
September 1, 2005
NOTE: File uploads are now fully operational, up to 7 MB. If you need to upload larger
files, please let Daniel know.
Overview
PubDB is a web-based
application designed to help LAMP and CLIP researchers manage their
publications in one central place. It is designed to be most
compatible with existing BibTex files and provides an automatic
import option for these. However, it also allows data input of
individual records, so you will be able to add your publications even
if you do not store them in BibTex format yourself.
NOTE:
Keep in mind that this is designed to be a database of
publications by members of LAMP and CLIP, so you should not be
adding publications that were written by others but that you cite
often.
There are two main parts to the system: the public
search interface, and the password-protected
database access area.
Public Search Interface
It
is located at http://lampsrv01.umiacs.umd.edu/pubs/search.php.
On the main screen, you will see some fairly self-explanatory options
to build queries. Choose the fields you would like to search on the
left, type the name (or part of one) you’re looking for in the
box to the right, and choose the appropriate AND/OR option on the far
left if you are searching multiple fields. Notice that you can change
the field being searched for all of them, which allows you to do
things like multi-author searches. You can also choose on which field
and in which order you would like your results sorted (year and
descending order is the default).
All searches are substring
matches, meaning that you should only be using one-word queries (or
a part of a word/name) in most cases unless you are looking
for a specific phrase. The only exception to this is when you want to
specify the first name or initial of an author, in which case you
should type: 'j doe' or 'john doe,' and the search engine will
automatically separate the tokens and search for firstname = j (or
john) and lastname = doe. Finally, notice that all searches are
case-insensitive, meaning it does not matter if you capitalize any
part of your search arguments.
On to the more non-standard
options: if you’d like to save your results, you can do so in
four formats: HTML, HTML without links to abstracts and full-text
pages, BibTex, and plain text. Just select the format you would like
and then press 'Download.' Once the page loads, you can go to File ->
Save As… to save the results to your hard drive. Be sure to
select 'Text file' as the type if you are dealing with a BibTex or a
plain text file.
Another feature is the ability to save your
query for later. This is especially useful if you have your own
website and would like to have a constantly updated list of, for
example, your own publications there. When you have the query you
would like to save, just click on the 'Go' button next to 'Save this
query permanently,' and you will get four unique links to your query
(one for each of the four formats). Download
http://lampsrv01.umiacs.umd.edu/pubs/integration.zip
to see some samples of how these
results can be integrated into your own website. Note that this
option saves the actual query, not the results. That means that if
any publications matching that query are modified or added, these
changes will be automatically included in the results. You can also
use the “restore” link provided right after you save the
query to restore the query back into the search form, at which point
you can modify it and then save it again if desired.
Private Database Interface
To log in, go to
http://lampsrv01.umiacs.umd.edu/pubs/admin/.
You should have received the login and password in the same e-mail
where you got the link to this page. If you do not have the password,
contact Dave or
Daniel.
To add
individual records, use the 'New record' link on the main search
page. Note that the first thing you must do is decide what kind of an
entry you will be making (bibtype). Anytime you change this, any
previous data you had entered for that record will be erased. Once
you have picked the type, you can enter information into the other
fields. Note that bold fields are required and the rest are
optional. For the 'journal' field, there is a pre-existing list of
the common journals that LAMPers tend to publish in, so check the
dropdown menu first. If you don't find your journal there, select
'Other...' and type in the full name of the journal (no
abbreviations, please) into the text box below.
For those of
you not familiar with BibTex, the 'bibkey' field is a short string,
usually consisting of the author's last name, an important word from
the title, and the year of publication. You can auto-generate this
once you have entered those fields, or you can choose your own.
IMPORTANT: Do not use this feature when posting tech reports.
All TRs must have a bibkey in the form of LAMP_###, where ### is the
three-digit LAMP-TR number.
The 'lampcat' field is used to indicate into which category of
publications the entry should be placed. Most of them will be 'Media'
(unless it's a tech report, in which case it will be 'TR'). 'Lang' is
currently not used, but feel free to put your publications into this
category if you feel they belong there.
Authors: the
interface for adding authors is powerful, but it needs some
explanation. By default, you can initially add six authors to a new
publication, which should be enough in most cases. If you need to add
more, you can do so after you have created the record. The number to
the left is the “rank” of the author, which determines in
which order authors will be listed whenever the record is printed
out. These numbers do not have to be in exact sequential order (i.e.
1, 10, 20 will produce the same order as 1, 2, 3). The second field
holds the name of the author as it will appear in the publication.
The third field (drop-down menu) has a list of all the
available authors, listed alphabetically by last name. Each author
has a primary name (unindented) and zero or more aliases (indented).
Here is what you will need to do if:
Author exists and is formatted exactly how you want it to appear: just select it from the drop-down menu
Author exists but is not formatted how you want it to appear: select the primary name or any of the existing aliases, then make the desired changes in the text field. Once you hit 'Save,' this new variation will be added as another alias of the primary author.
Author does not exist: select '[New Author]' from the dropdown menu, then enter the full name (first name, middle initial(s), and last name) in the text field. The software will assume that the last word (separated by spaces) is the last name. If your last name has a space in it, place a pound sign (#) between your first and last name (and no spaces around it, i.e. 'John#von Doe'), and the software will recognize that as the separator.
Attaching
files: if you have any files (abstracts, full-text, related
presentations, etc.) that you would like to attach to your record,
you can do so in the following way. For PostScript, PDF, HTML, and
PowerPoint presentations, there are separate fields available
(psfile, pdffile, htmlfile, pptfile). For any other file type, you
can use the ‘otherfile’ field. Use the 'Browse…'
button next to each field to select the file you would like to
upload. It does not matter what this file is named, as long as the
extension is the expected one (i.e. .pdf for PDF files). With
‘otherfile,’ the extension does not matter.
Batch
uploading: If you already have a BibTex file listing all of your
publications, PubDB can automatically ingest this file—go to
http://lampsrv01.umiacs.umd.edu/pubs/uploadbib.php,
or click on 'Upload BibTex file' on the main search screen. Before
you submit your file, please make sure that it is free of syntax
errors, as the parser is a little stricter than the standard Latex
parser. Authors must be listed as first name, middle initial, last
name, and must be separated by ' and ' (notice the surrounding
spaces). Once again, if any author's last name contains a space,
please use the # separator as described above. The 'Category' option
is used to indicate which 'lampcat' code each new record will be
assigned.
There are two ways to attach files to these records,
both involving ZIP files. First, you can create a ZIP file where each
file is named with the bibkey of the record to which it belongs (i.e.
DoermannOCR2004.ps and DoermannOCR2004.pdf would both be attached to
a record with the bibkey ' DoermannOCR2004'). This works for PS, PDF,
HTML, and PPT files, but not for the 'otherfile' field.
Alternatively, you can make extra entries for each record in the
BibTex file. For example:
psfile = { Full Text.ps }
htmlfile
= { My Web Page.html }
otherfile = { Full Text.doc }
This
will cause the program to look for the specified files inside the ZIP
file and attach them to that record. Note that the file should be in
standard ZIP format (i.e. not gzip or RAR) and should contain no
subdirectories. Also note that the maximum allowed size for each
uploaded file is 16 MB (16,777,216 bytes), so make sure both your
BibTex and your ZIP file are below that limit.
Once you
submit, the program will first check the ZIP file for integrity. If
it is corrupt, it will let you know and you will have to re-submit
it. Next, it will parse your BibTex file and check it for any syntax
errors. Once again, if there are any problems, it will give you a
detailed error report (letting you know on which line the error
occurred). Next,it will automatically standardize the 'month' field
to January, February, etc. (i.e. if it contains a number or
abbreviation, it will be converted to the proper long name).
Finally, it will start adding the records to the database,
checking first for duplicate bibkeys (and skipping those records
unless you checked the 'overwrite' option) and then for possible
duplicate records (with different bibkeys). It picks random words out
of the title, booktitle (if available), and journal (if available),
giving a higher similarity ranking to records where two or more of
those words match. It also checks the list of authors (taking into
account aliases) and the year of publication (plus/minus 2 years due
to publishing delays). It will give you a detailed report at the end,
listing each record that could be a duplicate in bold,
followed by one or more existing records that look like duplicates.
By default, it will keep both the new record and the old one, but you
will want to go through individually and see what you want to do
about each of them. Note that you cannot chose 'delete' for the new
record and then 'replace' for one of the old ones (obviously—since
there will be nothing with which to replace it).
Please note
that this is still a beta version, and the duplicate detection
algorithm might not be completely bug-free yet. Any bug reports and
suggestions for improvement are welcome at spikes51 [ at ] umiacs.umd.edu.