PubDB Beta Release v0.2
by Daniel Ramsbrock, spikes51 at umiacs.umd.edu

Updated: September 1, 2005

NOTE: File uploads are now fully operational, up to 7 MB. If you need to upload larger files, please let Daniel know.

Overview

PubDB is a web-based application designed to help LAMP and CLIP researchers manage their publications in one central place. It is designed to be most compatible with existing BibTex files and provides an automatic import option for these. However, it also allows data input of individual records, so you will be able to add your publications even if you do not store them in BibTex format yourself.

NOTE: Keep in mind that this is designed to be a database of publications by members of LAMP and CLIP, so you should not be adding publications that were written by others but that you cite often.

There are two main parts to the system: the public search interface, and the password-protected database access area.

Public Search Interface

It is located at
http://lampsrv01.umiacs.umd.edu/pubs/search.php. On the main screen, you will see some fairly self-explanatory options to build queries. Choose the fields you would like to search on the left, type the name (or part of one) you’re looking for in the box to the right, and choose the appropriate AND/OR option on the far left if you are searching multiple fields. Notice that you can change the field being searched for all of them, which allows you to do things like multi-author searches. You can also choose on which field and in which order you would like your results sorted (year and descending order is the default).

All searches are substring matches, meaning that you should only be using one-word queries (or a part of a word/name) in most cases unless you are looking for a specific phrase. The only exception to this is when you want to specify the first name or initial of an author, in which case you should type: 'j doe' or 'john doe,' and the search engine will automatically separate the tokens and search for firstname = j (or john) and lastname = doe. Finally, notice that all searches are case-insensitive, meaning it does not matter if you capitalize any part of your search arguments.

On to the more non-standard options: if you’d like to save your results, you can do so in four formats: HTML, HTML without links to abstracts and full-text pages, BibTex, and plain text. Just select the format you would like and then press 'Download.' Once the page loads, you can go to File -> Save As… to save the results to your hard drive. Be sure to select 'Text file' as the type if you are dealing with a BibTex or a plain text file.

Another feature is the ability to save your query for later. This is especially useful if you have your own website and would like to have a constantly updated list of, for example, your own publications there. When you have the query you would like to save, just click on the 'Go' button next to 'Save this query permanently,' and you will get four unique links to your query (one for each of the four formats). Download
http://lampsrv01.umiacs.umd.edu/pubs/integration.zip to see some samples of how these results can be integrated into your own website. Note that this option saves the actual query, not the results. That means that if any publications matching that query are modified or added, these changes will be automatically included in the results. You can also use the “restore” link provided right after you save the query to restore the query back into the search form, at which point you can modify it and then save it again if desired.

Private Database Interface

To log in, go to
http://lampsrv01.umiacs.umd.edu/pubs/admin/. You should have received the login and password in the same e-mail where you got the link to this page. If you do not have the password, contact Dave or Daniel.

To add individual records, use the 'New record' link on the main search page. Note that the first thing you must do is decide what kind of an entry you will be making (bibtype). Anytime you change this, any previous data you had entered for that record will be erased. Once you have picked the type, you can enter information into the other fields. Note that bold fields are required and the rest are optional. For the 'journal' field, there is a pre-existing list of the common journals that LAMPers tend to publish in, so check the dropdown menu first. If you don't find your journal there, select 'Other...' and type in the full name of the journal (no abbreviations, please) into the text box below.

For those of you not familiar with BibTex, the 'bibkey' field is a short string, usually consisting of the author's last name, an important word from the title, and the year of publication. You can auto-generate this once you have entered those fields, or you can choose your own. IMPORTANT: Do not use this feature when posting tech reports. All TRs must have a bibkey in the form of LAMP_###, where ### is the three-digit LAMP-TR number. The 'lampcat' field is used to indicate into which category of publications the entry should be placed. Most of them will be 'Media' (unless it's a tech report, in which case it will be 'TR'). 'Lang' is currently not used, but feel free to put your publications into this category if you feel they belong there.

Authors: the interface for adding authors is powerful, but it needs some explanation. By default, you can initially add six authors to a new publication, which should be enough in most cases. If you need to add more, you can do so after you have created the record. The number to the left is the “rank” of the author, which determines in which order authors will be listed whenever the record is printed out. These numbers do not have to be in exact sequential order (i.e. 1, 10, 20 will produce the same order as 1, 2, 3). The second field holds the name of the author as it will appear in the publication.

The third field (drop-down menu) has a list of all the available authors, listed alphabetically by last name. Each author has a primary name (unindented) and zero or more aliases (indented). Here is what you will need to do if:


Attaching files: if you have any files (abstracts, full-text, related presentations, etc.) that you would like to attach to your record, you can do so in the following way. For PostScript, PDF, HTML, and PowerPoint presentations, there are separate fields available (psfile, pdffile, htmlfile, pptfile). For any other file type, you can use the ‘otherfile’ field. Use the 'Browse…' button next to each field to select the file you would like to upload. It does not matter what this file is named, as long as the extension is the expected one (i.e. .pdf for PDF files). With ‘otherfile,’ the extension does not matter.

Batch uploading: If you already have a BibTex file listing all of your publications, PubDB can automatically ingest this file—go to http://lampsrv01.umiacs.umd.edu/pubs/uploadbib.php, or click on 'Upload BibTex file' on the main search screen. Before you submit your file, please make sure that it is free of syntax errors, as the parser is a little stricter than the standard Latex parser. Authors must be listed as first name, middle initial, last name, and must be separated by ' and ' (notice the surrounding spaces). Once again, if any author's last name contains a space, please use the # separator as described above. The 'Category' option is used to indicate which 'lampcat' code each new record will be assigned.

There are two ways to attach files to these records, both involving ZIP files. First, you can create a ZIP file where each file is named with the bibkey of the record to which it belongs (i.e. DoermannOCR2004.ps and DoermannOCR2004.pdf would both be attached to a record with the bibkey ' DoermannOCR2004'). This works for PS, PDF, HTML, and PPT files, but not for the 'otherfile' field. Alternatively, you can make extra entries for each record in the BibTex file. For example:

psfile = { Full Text.ps }
htmlfile = { My Web Page.html }
otherfile = { Full Text.doc }

This will cause the program to look for the specified files inside the ZIP file and attach them to that record. Note that the file should be in standard ZIP format (i.e. not gzip or RAR) and should contain no subdirectories. Also note that the maximum allowed size for each uploaded file is 16 MB (16,777,216 bytes), so make sure both your BibTex and your ZIP file are below that limit.

Once you submit, the program will first check the ZIP file for integrity. If it is corrupt, it will let you know and you will have to re-submit it. Next, it will parse your BibTex file and check it for any syntax errors. Once again, if there are any problems, it will give you a detailed error report (letting you know on which line the error occurred). Next,it will automatically standardize the 'month' field to January, February, etc. (i.e. if it contains a number or abbreviation, it will be converted to the proper long name).

Finally, it will start adding the records to the database, checking first for duplicate bibkeys (and skipping those records unless you checked the 'overwrite' option) and then for possible duplicate records (with different bibkeys). It picks random words out of the title, booktitle (if available), and journal (if available), giving a higher similarity ranking to records where two or more of those words match. It also checks the list of authors (taking into account aliases) and the year of publication (plus/minus 2 years due to publishing delays). It will give you a detailed report at the end, listing each record that could be a duplicate in bold, followed by one or more existing records that look like duplicates. By default, it will keep both the new record and the old one, but you will want to go through individually and see what you want to do about each of them. Note that you cannot chose 'delete' for the new record and then 'replace' for one of the old ones (obviously—since there will be nothing with which to replace it).

Please note that this is still a beta version, and the duplicate detection algorithm might not be completely bug-free yet. Any bug reports and suggestions for improvement are welcome at spikes51 [ at ] umiacs.umd.edu.