Tips & Tricks Tutorials

Python Basics: Using sets to compare keymaps

If you’ve ever visited JetBrains at a conference, you know that we always have handouts with keymaps for our products. You might also know that we have different keymaps for Windows/Linux and for Mac due to the different keyboard layouts. As sometimes people grab the wrong one and find out too late, they can’t use the keymap. Therefore we’ve decided to unify the keymaps into a single keymap.

Unfortunately, with a single keymap, we have less space than with separate keymaps so we will need to select a subset of hotkeys that we want to keep. As WebStorm recently already did this, let’s have a look at theirs to compare. It’d be helpful to find out which keys they selected, and more importantly, which keys they left out. To do this, let’s write a quick script.

If you’d like to follow along with the script, you can always have a look at the code on GitHub.

Python has built-in set operations so we can use these to do the hard work later. However, in order to use these, we have to parse our data. Our source materials are two CSV files that look like this:

Running,,,
Alt + Shift + F10,Select configuration and run,,
Alt + Shift + F9,Select configuration and debug,,
,,,,
Refactoring,,,,
ctrl-alt-shift-T,Refactor this,,,
shift-F6,Rename,,,

So let’s define an object that can represent a hotkey:

class Hotkey:
    def __init__(self, ctrl = False, alt = False, shift = False, key = False, action = '', section = ''):
        self.ctrl = ctrl
        self.alt = alt
        self.shift = shift
        self.key = key
        self.action = action
        self.section = section

Using sets

In this script, we’re going to use sets to compare the keymaps. Sets are meant for collections of unique objects where the order of the objects doesn’t matter. Hotkeys are a great example of these: they need to be unique (can’t bind two actions to a single key combination) and the order doesn’t matter (it doesn’t matter if we define Ctrl+Shift+A first, or Ctrl+Alt+O).

However, as a set should only contain unique objects, a set will check whether you already have an object that matches the current object upon insertion. We’d like to have hotkeys matched on the keystroke rather than the description (which may differ slightly), so we need to define the appropriate overrides (use Ctrl+O to see them all) on our Hotkey class:

  • __eq__ which returns whether or not objects are equal
  • __ne__ which returns whether or not objects are not equal
  • __hash__ which should return a number that represents the object, this number should be the same for objects that are equal to each other, but ideally different for objects that are not equal to each other

For our class this will look like:

    def __eq__(self, other):
        if not isinstance(other, Hotkey):
            return False

        return self.ctrl == other.ctrl and \
               self.alt == other.alt and \
               self.shift == other.shift and \
               self.key == other.key

    def __ne__(self, other):
        return not self.__eq__(other)

    def __hash__(self):
        return hash((self.ctrl, self.alt, self.shift, self.key))

Our hash function here uses the recommended approach from the Python documentation: return a hash of a tuple of the objects that contribute to the equality check.

Parsing, first pass

Let’s start with a first, naive approach to parsing the files. We will expand this code later to deal with edge cases as they arise. To begin, let’s only parse the PyCharm file:

if __name__ == '__main__':
    # Define PyCharm set
    pycharm_hotkeys = set()
    webstorm_hotkeys = set()

    ## Parse the keymaps

    # The keymaps both start with an 'editing' section
    section = 'Editing'
    with open('pycharm-current.csv', 'r') as csvfile:
        reader = csv.reader(csvfile)
        for item in reader:
            # If we're at a section header, only the first element of the array will have contents
            # So let's assume that if the second element is empty, it's a section header
            if len(item[1]) == 0:
                section = item[0]
                continue

            # Define the key to add
            to_add = Hotkey()

            # Split the hotkey's elements
            keys = [key.lower() for key in item[0].split(" + ")]

            # Modifiers
            if 'ctrl' in keys:
                to_add.ctrl = True
            if 'alt' in keys:
                to_add.alt = True
            if 'shift' in keys:
                to_add.shift = True

            # Now the only key that remains is the last key (hopefully)
            to_add.key = keys[-1]

            to_add.action = item[1]

            pycharm_hotkeys.add(to_add)

Exploring Results with the Debugger

Let’s run the code, and then have a look at the objects we created using the PyCharm debugger. Let’s first put a breakpoint at the last line of the main method (pycharm_hotkeys.add(to_add)). Then, if we right click anywhere in our code, we can then select ‘Debug’:

debug code

And now in the debugger, we see this:

debugger

Wouldn’t it be nice to see something clearer than <__main__.Hotkey object at 0x0000015A02D377B8> here? Well, thankfully that is quite easy: we just have to add an override for the repr method of our object.

    def __repr__(self):
        key_parts = []
        if self.ctrl:
            key_parts.append('Ctrl')
        if self.alt:
            key_parts.append('Alt')
        if self.shift:
            key_parts.append('Shift')
        key_parts.append(self.key.capitalize())
        return "<{}> {}".format(" + ".join(key_parts), self.action)

Now when we start to debug our code again, we get a much better result:

debugger 2

And we can immediately spot an issue: the key is incomplete: It is supposed to be "<Ctrl + Space> Basic code completion (the name of any class, method or variable)". What’s happening?

This CSV file was saved using Excel, and we’re seeing an encoding issue, if you look at the ‘item’ line in the debugger, you can see that several strange characters occur before ‘Ctrl’. And if we look at the csvfile line, we can see that it’s attempting to decode cp1252, even though the file is UTF-8 with BOM. After a quick look in the Python docs, we find that we need to specify encoding='utf-8-sig' to the open method.

Handling special cases

At this point when we look at the keymap, and the results we’re getting in the debugger, we can see that there are a couple of hotkeys which are going to be hard to parse:

Ctrl +X, Shift + Delete - Cut current line or selected block to clipboard

Alt + F7 / Ctrl + F7 - Find usages / Find usages in file

These are either alternative hotkeys for the same action (indicated with the comma) or two related hotkeys on a single line (with the slash). As there are many of these on the WebStorm keymap, we’ll need to deal with them. Let’s first refactor our Hotkey parse code into a function. To help PyCharm, let’s first extract all of our code in the main clause into a main function. To do this select everything in the if name == ‘main‘ clause, and use Ctrl + Alt + M to extract the method.

Then, let’s select the lines from # Define the key to add until to_add.action = item[1], and use Ctrl + Alt + M again. Let’s name this function ‘parse_hotkey’. Let’s also manually rework the arguments to be a little more readable: keystroke and action instead of just a list item:

def main():
    # Define PyCharm set
    pycharm_hotkeys = set()
    webstorm_hotkeys = set()
    ## Parse the keymaps
    # The keymaps both start with an 'editing' section
    section = 'Editing'
    with open('pycharm-current.csv', 'r', encoding='utf-8-sig') as csvfile:
        reader = csv.reader(csvfile)
        for item in reader:
            # If we're at a section header, only the first element of the array will have contents
            # So let's assume that if the second element is empty, it's a section header
            if len(item[1]) == 0:
                section = item[0]
                continue

            to_add = parse_hotkey(item[0], item[1])

            pycharm_hotkeys.add(to_add)

def parse_hotkey(keystroke, action):
    # Define the key to add
    to_add = Hotkey()
    # Split the hotkey's elements
    keys = [key.lower() for key in keystroke.split(" + ")]
    # Modifiers
    if 'ctrl' in keys:
        to_add.ctrl = True
    if 'alt' in keys:
        to_add.alt = True
    if 'shift' in keys:
        to_add.shift = True

    # Now the only key that remains is the last key (hopefully)
    to_add.key = keys[-1]
    to_add.action = action
    return to_add

if __name__ == '__main__':
    main()

Now we can change our parse_hotkey method to return a list of hotkeys rather than a single hotkey: return [to_add], and change pycharm_hotkeys.add(to_add) to pycharm_hotkeys.update(to_add).

At this point, we can add code to handle the special cases in the parse_hotkey method. For the sake of brevity (hah), I’ll omit telling more about handing special cases, and refactoring the CSV reading method to easily read both files. You can see the full code on GitHub if you’d like. A couple of highlights:

Set Operations

At this point, we have two sets: pycharm_hotkeys, and webstorm_hotkeys. Now we can use several methods to analyze these sets:

  • intersection: which elements are in both sets? In this case: which hotkeys are both on the PyCharm, and on the WebStorm keymap?
  • difference: which elements are in set A, but not in set B (one way around)
  • symmetric_difference: which elements are in set A, but not in set B; but also which elements are in set B, but not in set A (both ways)

In our case, I’m actually interested in the differences both ways, but I care about which way around it is. This leaves my main method looking like:

def main():
    # Read the hotkeys from the CSV files
    pycharm_hotkeys = parse_csv('pycharm-current.csv', ' + ')
    webstorm_hotkeys = parse_csv('webstorm-current.csv', '-')

    print('PyCharm: {} keys found'.format(len(pycharm_hotkeys)))
    print('WebStorm: {} keys found'.format(len(webstorm_hotkeys)))

    print ('WebStorm, but not PyCharm')
    webstorm = webstorm_hotkeys.difference(pycharm_hotkeys)
    for key in webstorm:
        print("{}: {}".format(key.section, key))

    print('PyCharm, but not WebStorm')
    difference = pycharm_hotkeys.difference(webstorm_hotkeys)
    for key in difference:
        print("{}: {}".format(key.section, key))

When we run this code, we get the results we’re interested in:

PyCharm: 132 keys found
WebStorm: 81 keys found
WebStorm, but not PyCharm
Running and debugging: <Alt + F11> Run Gulp/Grunt/npm task
General: <Shift + Shift> Search everywhere
Editing: <Alt + Shift + Up> Move line up/down

...

If you run the code you may notice that the keys appear in a different order every time, which is a result of sets being unordered.

To learn more about sets in Python, check out the Python set documentation.

image description