Using Notepad++ macros to process small-scale data

Let's say you want to sort some data you found online, dirtily embedded in a webpage. Or simply, you want to change the format of a CSV file quickly. For this kind of one-time tasks, I think writing a script takes more time than the method I'm going to describe in this post!

If you're a Vim or Emacs user, you're probably familiar with using macros for this type of stuff. However, if you're one of those (just like myself) who struggle to even exit from Vim, then this method might come handy.

I'm going to show this in action. Let's say we want to sort the first 10 movies in IMDb top-rated list according to their gross income. We're keeping the number small for demonstration purposes.

The gross value is listed in the IMDb site with a dollar sign at the beginning and M letter at the end. Our task is to clean it from them and also match this value with the movie name in a nice format.

imdb top rated list

We copy the text from the web page to Notepad++.

Now, we will record the macro that extracts the movie name and gross value from each entry. For that, we need to realize a pattern between the movie name and the gross value we want to extract.

imdb top rated list in plain text

In this case, I do the following observations:

  • An entry consists of 11 lines.
  • The movie's name is in the 2nd line of the entry, plain.
  • The gross value is in the 11th line of the entry, along with some other data. It's the last word of the line where the first and the last characters are to be cleaned from it ($, M).
  • The Entry#9, Dil Bechara, does not have a gross value.

With these observations, we are ready to record our macro.

Notice that I am making use of the keys End and Home for navigating through the line (pressing twice for wrapped lines). Also, I am using the Ctrl + Left Arrow shortcut for skipping an entire word.

Now we run our macro 10 times! For the cases where our data is large, there's an option to run the macro until the end of the file.

Notice that we removed Entry#9 as it didn't have gross value (the extracted value is some part of the vote count, i.e. garbage). We need to be careful with this kind of stuff when working with macros.

Now, I want to import these beautiful data to Python. I'll turn it into a list of tuples, again using macros!

The rest is now just Python scripting: sort the tuples according to the second element and display them back.

That was all! It might look like it's a slow process since I explained in detail, but I believe it's an extremely quick and handy method when you get used to it.