Apps  Contact  Seminars 

Archive for ‘algorithms’


January 21st, 2012

Politics and Algorithms rarely mix, but when they do, they involve food stamps and bin packing

by Amrinder

LITUS - Presidential Food Stamps

[This is one of rare posts, that will be cross-posted to the Guy Down the Street blog as well.]

Tags: ,


October 10th, 2011

Locating a Distribution Center

by Amrinder

Locating a Distribution Center is perhaps the most commercial application of the general facility location problem in computer science.  Consider a retailer which imports goods for selling in three cities, Indianapolis IN, Columbus OH, and Lexington KY.  Further, suppose that this retailer uses an ocean carrier, and the port of call is Norfolk.  In that case, where should the distribution center be located?  Here is a map view:

 

What is the right location for a DC?



August 19th, 2011

Review of Bloom Filters paper

by Amrinder

My review of the SPAA 11 paper “Understanding Bloom Filter Intersection for Lazy Address-Set Disambiguation” by Mark Jeffrey and Gregory Steffan has been published and is available here  (requires access to Computing Reviews).

Here is the list of other review activities that I am usually involved with.



July 14th, 2011

Compacting an Outlook PST file

by Amrinder

Migrating to a new computer recently, I observed that my Outlook PST file  (sitting in C:\Users\Amrinder\AppData\Local\Microsoft\Outlook folder) was about 4 GB.  While not necessarily a problem, the folder size of Personal Folders itself was only showing up to be about 2.4 GB.  (You can right click on Personal Folders -> Properties for Personal Folders -> Folder Size).

Outlook PST Folder Size

 

So, this begs the question, if Personal Folders is 2.4 GB, why is the PST file 4 GB?  The answer to that lies in the structure of the PST file.  Essentially, PST is a format that can be very easily loaded, very easily saved and very easily searched (gigabytes of information that we  load when we start up Outlook, and is searchable in about 5 seconds).  As the students and practitioners of data structures and algorithms will no doubt note, this requires that data files be indexed on many different columns, and then those index files stored, along with the data files.  Further, whenever index files are created, they contain pointers back to data files themselves, and then when the data files are changed (for example, an email deleted, etc), data file cannot be simply modified without modifying those index files as well.  So, that leads to two choices: every time you delete an email (or any element), you reprocess your index files.  This would simply lead to a very non-responsive program.  Other option is to let some of the changes go pending, and then run this maintenance once in a while, in which you shrink the data files, reclaim the deleted blocks and reindex the files.

Interestingly, you can run a small test for yourself.  First, note the size of Personal Folders and the size of the PST file. Then, send a 10 MB file as an attachment to someone, and note your Folder size and the size of the PST file.  Then, delete the email from Sent Items and check the folder size and the size of the PST file again.  You will likely observe that while the size of the Personal Folders went from x to x+10 to x, the PST file only went from y to y+10+delta, and didn’t really go down to y after you deleted the email.  So, read on.

This aspect of dirty records is not limited to Outlook only.  Relational database management systems (such as Oracle) essentially do the same – they let delete records run without index recomputations, and then recompute the indexes (and shrink the data files) in a periodic fashion.  In Oracle, you can trigger off this “data cleansing/recomputation” manually by using the analyze table queries.  In Outlook too, you can do this by: Personal Folders -> Properties for Personal Folders -> Advanced -> Compact Now.

Outlook PST Compact Now

 

[Now, if you continue your test, you may observe that the PST file does go back after the file deletion and data compaction.]

One thing that I did observe, that even after doing the data compaction (really, there is such a verb), the file did not reach the same size as the Personal Folders was suggesting, but after shutting down Outlook and restarting it, and running data compaction one more time, it got to about 20% of the folder size.



July 10th, 2011

CS 6212 (Summer 2011) – Final Grade Distribution

by Amrinder

Here is the final grade distribution for my CS 6212 (Design and Analysis of Algorithms) class at GWU.

 

 

Tags: ,


Switch to our mobile site