June 15, 2009

Mounting Samba shares from Firefox

Ever since I dumped Gnome for pure WMs and heavy file managers for their lighter counterparts, my only problem was with samba shares. I used to mount them with Nautilus, Gnome's native file manager, but unfortunately, this doesn't work with Thunar or PCManFM.

A bit of googling reveals that many people asked the question:

How can I mount samba shares in thunar?
The solution usually boils down to one of these things:
None of these solutions really satisfied my requirements -- I went with manual mounting for a while, but that is a tedious task, mounting the whole network or running specializes application just seems impractical. Furthermore, I wanted to be able to mount shares just by clicking on smb:// links in firefox.

In the end, I wrote my own solution. See it in action.



It is a simple bash scripts which takes the URL and mounts all available shares. The current version of the smbmounter script can be found on github.

Download the script and place it somewhere. Then edit it and change the variable FM to your preferred file manager (default is thunar) and variable MOUNT_ROOT to the root directory where all shares shall be placed (default is /media/samba).

mount.cifs needs to run as root, so to be able to let the script run unattended (which is important if you want to use it from firefox) you need to set passwordless permission for either mount.cifs or this script. This can be done in /etc/sudoers. This is an example which will allow this for all members of group wheel.

%wheel        ALL=NOPASSWD: /sbin/mount.cifs

For an extra paranoia, modify the script to use gksudo instead of sudo.

Setting it as a protocol handler in firefox is a bit tricky. Following steps apply for Firefox 3.0.*, Firefox 2 configuration is slightly different and I never managed to set it in Firefox 3.5b4 (but hey, it's beta).

Open about:config and create new boolean property

network.protocol-handler.external.smb = true

and new string property

network.protocol-handler.app.smb

Leave this one empty -- for now. Now click the smb:// link, the format should be smb://hostname_or_ip/ The dialog will appear, asking for external application to launch. Select this script and check remember. Now go and mount.


January 19, 2009

Enigma ported to Linux

Inspired by the wonderful Enigma desktop by Kaelri, featured at Lifehacker, I decided to port at least some of the beautiful design to my Linux desktop.

The result is this:

Compared to the original theme:

My desktop features

  • Simple calendar — I even considered adding google calendar events using gcalcli, but the desktop was way too crowded
  • RSS news
  • Remember The Milk tasks
  • Weather Forecast
  • and system stats

Almost everything you see is drawn on desktop using conky. There are however, few major drawbacks:

  • With conky you get either full transparency, or no transparency at all. It's all or nothing
  • Conky cannot draw pictures.

So how did I get the transparent sidebars and icons? Gotcha! They are part of the desktop background.

I also decided to omit the analog clock — this is certainly doable with gdesklets or screenlets, but I almost fainted when I saw the dependencies. My desktop stays lightweight, no matter what. I tried the lightweight adesklets but the result was disappointing.

To recreate this desktop you need:

  • "Patched" background — I provide a GIMP layered image, you just need to replace the background layer for the image of your choice
  • Conky and conky configuration files
  • Weather fonts — fonts displaying weather conditions (included)
  • RTM command line tool for the RTM task list
  • I am using tint2 as the panel, but any transparent panel (such as gnome-panel) should do

Installing the RTM command line tool might be tricky, as the author notes, you need to install RTMAgent perl module first:

This package is split in two components:
  • RTMAgent.pm: a Perl module that implements the low-level API. It provides a UserAgent object which lets you call all of RTM's API methods as normal Perl methods. You will need to install it from CPAN: cpan install WebService::RTMAgent
  • rtm: a Perl script that uses RTMAgent to implement a command line interface. Just put it in your path.

It seems that there are two flavors of the cpan installation scripts, mine required the following syntax: cpan -fi WebService::RTMAgent, where -fi stands for force install, as one of the dependencies failed its unit tests. Your cpan scripts syntax might vary, check the corresponding manual first.

After installing the RTMAgent, run rtm --authorise, follow the generated link and allow RTM API access. This should be done only once, before the first run.

Don't forget to change your username and password in scripts/gmail.py and set your location code in scripts/conkyForecast.template. See the instruction for the weather forecast config.

That should be all. Both wallpaper and conky configs are designed for 1280x800 resolution, you'll probably have to move things a bit for different resolutions. Unpack the attached archive to ~ (archive contains hidden files, you have been warned), set the background, set up rtm, gmail and weather forecast and run ~/scripts/start_conky.

DOWNLOAD the background and scripts.

Credits:

I am aware, that the desktop still has some rough edges, but I am posting this in the spirit of

Release early, release often.


September 16, 2008

Introduction to Association Rules Mining

Association rules are part of every data miner's arsenal. Haven't heard about it? I am pretty sure you've had. Association rules are a substantial part of every e-shop, of every supermarket and every tool that aims to analyze data.


Does the picture look familiar? If you've ever bought something at amazon, you might have noticed that they are kinda obsessed with showing you items related to your order. Where do they get this information? It is not stored statically in the database, instead it is computed from the overall orders using the association rules mining algorithms.

Do you think that items in your favorite supermarket are organized randomly? No, they are organized in a way that maximizes a chance that the items are bought. Again, this is information, that can be easily discovered using association rules mining algorithms.

From the previous paragraphs you might suspect, that association rules express relations (associations) between items. More formally, association rule is an implication of form A -> B, where the left side, A, is called premise and it represents a condition which must be true, for the right side, B (conclusion) to hold. A rule A->B can be interpreted as

If A happens, than B happens.
This is a very generic interpretation, because the true interpretation depends on the domain.I am now a supermarket employee and I got following rule from the mining software:

Bread -> Milk
The rule can be translated as:

Customers, who bought bread, also bought milk
Now I magically transform into a website traffic analyst and see a rule

/news/obama.html -> /sport/tour-de-france.html
and I instantly know that

those, who read news about Barrack Obama, also read news about le Tour and not only that, I know that those who are interested in Barrack Obama are interested in Tour de France.
Woosh, flash of light and I am now a doctor, looking at the rule

vasculitis -> paraneoplastic syndrome
and I see that there is a serious chance that my vasculitis patients will suffer paraneoplastic syndrome.

The important thing is, that association rules helped me to discover hidden knowledge (that's why they call it data mining), but the more important thing is, that I can act based on the knowledge. I can move the milk closer to bread to sell more of it together and generate more income. I can recommend stuff to my e-shop visitors, I can treat my vasculitis patients and run some tests to detect paraneoplastic syndrome early and maybe save lives.

So what do you need to get started? You need data of course, but not just any data, you need data in a form of transactions. These transactions have nothing to do with the database transactions. Instead, the transaction is a logical group of somehow related items. You might have groups of market basket items, groups of links clicked on one web page visit, group of one patient's diseases.. Such groups are then called transactions.

When I said, that rule interpretation depends on domain, it was only half of the truth. The other half is, that the interpretation also depends on your transactions. The interpretation simply depends on what you are mining, and what you are mining is based on how you define your transaction.

I'll now do a simple, manual association rule mining, using the classical market basket analysis example. We define our transaction as a content of a basket.

Transaction IdItems
1bread, milk, butter, cocoa, cheese
2bread, butter, milk, cheese
3bread, butter, olives
4milk, sugar, butter, cheese

We have four baskets, four customers and their data. Looking at the items, we see, that transactions 1,2,3 contain bread and butter. We have just found our very first rule.

bread -> butter
There are other rules in our data, for example rule

milk -> cheese
found in transactions 1,2,4. Although association rule mining may seem like a very trivial task at the first look, imagine finding the rules in dataset of billions of transactions.

The rules presented so far have all one big downside. There is no way to tell which rule is better, it is impossible to compare them. To get past this limitation, we can add several classifiers to the rule, which will represent the strength of the rule. They are commonly known as interestingness measures, because the strength of the rule is equal to its interestingness.

The two classical measures, which were introduced by R. Agrawal, an association rule pioneer, are called support and confidence. Support is a measure, which represents how often did the rule apply. It is a percentage of all transaction, where the items in the rule were found.

Confidence is a percentage of all transactions, which contain items on the left and on the right side of the rule.

IdTransactionsbread + cheesebread -> cheesecheese -> bread
1bread, cheese, honey, applesOOO
2milk, bread, cheese, pastaOOO
3milk, bread, apples
X
4bread, milk
X
5milk, pasta, cheese

X
6milk, bread, cheeseOOO

Look at the table above. Bread and cheese can be found in transactions 1,2,6, we have total six transactions, so the support of the rule bread -> cheese and cheese -> bread is 3/6 or 50%. Now take the rule bread -> cheese. Our customers bought bread in transactions 1,2,3,4,6, but bought cheese only in transactions 1,2 and 6. So five customers bought bread, but only three of them bought also a cheese, so the support of the rule is 3/5.



It should be pretty clear from the examples, that both interestingness measures are important, because they both quantify the rule and express its strength. But not only that, the interestingness measure are the key concept, that actually enables the mining.

Association rule mining is formally defined as a process of finding the rules, where the support and confidence of the rule are greater than the user provided values of minimum support and minimum confidence, further referred to as minconf and minsup. The two values actually prune the search space and make mining possible.

Take the last example. There is a rule milk -> apples [1/6, 1/1], which can be found in only one transaction. Is this rule interesting? It isn't and yet it is there. This is not a problem if we have six items in six transactions, but it would be great problem, had we thousands of items in billions of transactions. If you specify minsup=0.5, minconf=0.8 you will effectively filter out all uninteresting items. If you specify the values too low, you will end up with tons of rules, because the items will be associated with each other in all possible ways. On the other side, if you specify the values too high, you might not find a single rule. There is no universal advice as to what values should you set, the best way is to experiment.

What we'll be talking about next time? In the next posts I will show some practical examples using RapidMiner mining tool, explain the algorithm behind, tell about the problems this model has and explain why support and confidence are bad measures.


This is the first post in the Association Rule Mining series. Interested? Consider subscribing to my feed to catch up with updates.