"j" for switching directories - hacking "cd" with python

I've started to use python to create "glue" scripts to automate repetitive tasks. This is a new habit I've tried to get into, and it requires that I be mindful when I'm working to spot problem areas.

I was thinking about tasks that could benefit from a little automation and my bash history was instructive. Below are my 10 most commonly-used commands:

 19 curl
 20 fab
 28 ./runtests.py
 33 pip
 67 pacman
 75 ./manage.py
114 vim
248 cd
349 git
520 ls

If you're curious how yours looks, try the following command:

cat ~/.bash_history | sed "s|sudo ||g" | cut -d " " -f 1 | sort | uniq -c | sort -n

I decided to see if I could improve my usage of "cd".

Improving the "cd" command

Everyone uses cd a lot, I'm no exception. Because I use virtualenvs for my python projects, I'm often "cutting" through several layers of crap to get to what I actually want to edit.

The two annoyances I was trying to fix were:

  1. There are directories I use a lot, but making bash aliases for them is not maintainable. I should be able to get to them quickly.
  2. I have to keep a mental map of the directory tree to go from one nested directory to another -- e.g. cd ../../some-other-dir/foo/. Rather than backtracking, it would be nice to "jump".

The solution I came up with stores directories I use (the entire path), and then I can perform a search of that history using a partial path. In the example above, I'd just type j foo instead.

Finding the best match

The biggest challenge with this script was deciding how the search should work. The way I calculate the "best" match for the given input is to iterate through the history and count the number of dangling characters after the match.

So if I am searching for "scr" and iterating through my directory history, I would calculate the following scores:

Here is a little python function that calculates this "cruft":

def cruft(directory, search):
    pos = directory.rfind(search)
    # rfind will return -1 if no match
    if pos >= 0:
        return len(directory) - pos - len(needle)
    return float('Inf')

In the example above, since ~/tmp/scrap/ has the least amount of extra junk, the program will select that as the best match.

The other neat part is that the directories will be stored sorted last-used to most-recently-used. That way, in the event of a tie, I will be sure to pick the match that was used most recently. Here is what the search function looks like:

def search_history(history, needle):
    nlen = len(needle)
    min_cruft = float('Inf')
    match = None
    for directory in history:
        pos = directory.rfind(needle)
        if pos >= 0:
            # how much extra cruft?
            extra = len(directory) - pos - nlen
            if extra <= min_cruft:
                min_cruft = extra
                match = directory
    return match or full_filename(needle)

Putting things together

The final trick is some short-circuit logic that will skip the search if the input exactly matches a directory -- this way I can "cd" into directories that are not in the history yet.

The rest of the code is basically plumbing to load and save the history file, and a helper function to generate a full path for a directory. The program itself simply outputs a "cd" statement, so you will need to add a function to your .bashrc to evaluate the output of the program.

All together, here it is:

#!/usr/bin/env python

"""
Usage: Instead of using "cd", use "j"

Add this to .bashrc

j () {
    $(jmp $@)
}
"""

import os
import sys

env_homedir = os.environ['HOME']
db_file = os.path.join(env_homedir, '.j.db')
HIST_SIZE = 1000

def full_filename(partial):
    return os.path.abspath(
        os.path.join(os.getcwd(), os.path.expanduser(partial))
    )

def search_history(history, needle):
    nlen = len(needle)
    min_cruft = float('Inf')
    match = None
    for directory in history:
        pos = directory.rfind(needle)
        if pos >= 0:
            # how much extra cruft?
            extra = len(directory) - pos - nlen
            if extra <= min_cruft:
                min_cruft = extra
                match = directory
    return match or full_filename(needle)

def read_history():
    hist_hash = {}
    if os.path.exists(db_file):
        with open(db_file) as db_file_fh:
            history = db_file_fh.read().split('\n')
            for i, item in enumerate(history):
                hist_hash[item] = i
    else:
        history = []
    return history, hist_hash

def write_history(history):
    with open(db_file, 'w') as db_file_fh:
        if len(history) > (HIST_SIZE * 1.4):
            history = history[-HIST_SIZE:]
        db_file_fh.write('\n'.join(history))

def save_match(history, hist_hash, match):
    idx = hist_hash.get(match)
    if idx is not None:
        history.pop(idx)
    history.append(match)
    write_history(history)

if __name__ == '__main__':
    if not len(sys.argv) > 1:
        # Handle the case when you just type "j"
        print 'cd %s' % env_homedir
        sys.exit(0)

    history, hist_hash = read_history()
    needle = sys.argv[1]
    match = full_filename(needle)

    if os.path.isdir(match):
        # Handle the case when the input is a valid directory
        save_match(history, hist_hash, match)
    else:
        # Perform a search for a partial match
        match = search_history(history, needle)
        if os.path.isdir(match):
            save_match(history, hist_hash, match)
        else:
            match = needle
    print 'cd %s' % (match)

Here is how I might use it:

~ $ j envs/charlesleifer/src/peewee/
peewee $ j ~/tmp/scrap/  # cd into ~/tmp/scrap/
scrap $ j pee  # cd back into "~/envs/.../peewee"
peewee $ j  # cd back to "~"

Reading more

I hope you found this post interesting. There are a couple other projects that provide even more sophisticated "smart cd", I recommend you check them out if you're interested:

If you have any suggestions or improvements I'd be interested in hearing them, feel free to leave a comment!

Comments (7)

Charles | apr 14 2013, at 02:02pm

Johannes -- Thank you for the excellent suggestions. Bash completion would be awesome and is something I will probably look into implementing. I also like the partial search...will have to think about how to implement that.

MrQuincle -- Thanks for your comment...I'm not too sure how I could improve on trust old "ls", although I use it almost compulsively. You could always try setting PROMPT_COMMAND="ls" in your .bashrc, that way you'll never be without it.

MrQuincle | apr 13 2013, at 06:19pm

I am looking forward to your treatment of 'ls'. It's at the top of my list, which makes me think that I should actually display at least a few lines in a directory to for example the right in my console by default. It's just a reflex of course, so probably I would type something else mindlessly if I displayed this info already. :-)

No Cat | apr 13 2013, at 04:23pm

If you like the left-to-right information flow, you can do

<~/.bash_history sed "s|sudo ||g" | cut -d " " -f 1 | sort | uniq -c | sort -n

because redirections can appear at any position of a simple-command, not just at the end.

Johannes | apr 13 2013, at 03:56pm

Awesome! I will try to use this and see how it works. For the moment I'm using aliases.

Some ideas:

  1. Put your name/website in the comments, so a year from now, I know where to send my fix.
  2. Put it on github or bitbucket, so I can send a pull request if I have a fix.
  3. Add bash completion. Don't know how hard this is, but I'm sure it's non-trivial.
  4. Maybe do partial search so e.g. "memu" would match "~/media/music" (not sure what this matching is called). Not sure it'll helpful or confusing :)

Domingo Galdos | apr 13 2013, at 02:22pm

Andrew - the cat is not useless. It's stylistic :-)

In all seriousness, I often use cat in the way the author did, simply because I like the style of, within code, displaying the "flow of information" from left to right.

If we actually cared about the slight performance penalty of spawning off a new cat process, we'd be coding this stuff in C or some such language instead of bash!

sagotsky | apr 13 2013, at 11:29am

I made a similar thing in bash. Mine is called fd - find directory. It doesn't sort results, but it does have the advantage of tab completing paths on partial matches.

https://github.com/sagotsky/fd

Andrew | apr 13 2013, at 09:49am

Incidentally, this:

cat ~/.bash_history | sed "s|sudo ||g" | cut -d " " -f 1 | sort | uniq -c | sort -n

would earn a "useless cat" award [1]. Utilities like sed, awk, and grep take the file name as the final argument:

sed "s|sudo ||g" ~/.bash_history | cut -d " " -f 1 | sort | uniq -c | sort -n

[1] http://partmaps.org/era/unix/award.html


Commenting has been closed, but please feel free to contact me