lkubuntu

A listing of random software, tips, tweaks, hacks, and tutorials I made for Ubuntu

Misc python functions

As relinux 0.4 is my first real project in python, I would like to share some functions I made along the way that I found useful (in hopes that it might be useful for others).

You can do whatever you want to these functions, all I ask is to give credit. If you really can’t, that’s fine, but if you can, that would be greatly appreciated :)

Generic UTF-8 function (works on python 2 and 3, still a WIP):

# Check if a string is ASCII or not
def is_ascii(s):
    for c in s:
        if ord(c) >= 128:
            return False
    return True

# Convert a string to UTF-8
def utf8(string):
    if not sys.version_info >= (3, 0):
        if isinstance(string, unicode):
            return string.encode("utf-8")
    if not isinstance(string, str):
        if sys.version_info >= (3, 0) and isinstance(string, bytes):
            string_ = string.decode("utf-8")
            string = string_
        else:
            string_ = str(string)
            string = string_
    if not is_ascii(string):
        if sys.version_info >= (3, 0):
            return string
        return string.decode("utf-8").encode("utf-8")
    return string

I have no idea why python doesn’t have this built-in, but anyways, here is a list flattener (which works both on python 2 and 3). Most of this code was based on http://stackoverflow.com/a/4676482/999400, but I made some minor style changes (like as if anyone cares :P)

def flatten(list_):
    nested = True
    while nested:
        iter_ = False
        temp = []
        for element in list_:
            if isinstance(element, list):
                temp.extend(element)
                iter_ = True
            else:
                temp.append(element)
        nested = iter_
        list_ = temp[:]
    return list_
# Example:
# flatten(["test", ["test1", "test2", ["test", "test1"], "test3"], "test4"]   -> ["test", "test1", "test2", "test", "test1", "test3", "test4"]

I feel that this function is too simple to even post, but for some reason (maybe lack of caffeine?) I couldn’t figure this out when I needed it. As you can see from the code, this function simply removes duplicate values from an array.

def remDuplicates(arr):
    returnme = []
    for i in arr:
        if not i in returnme:
            returnme.append(i)
    return returnme
# Example:
# remDuplicates(["test", "test1", "test2", "test", "test2"])   -> ["test", "test1", "test2"]

While we are on the subject of nearly useless convenience functions, here is a function that generates an MD5 checksum from a filename (it technically just opens it and reads it through md5’s update function)

import md5
...
def genMD5(file_, blocksize = 65536):
    if not os.path.isfile(file_):
        return
    files = open(file_, "r")
    buffers = files.read(blocksize)
    m = hashlib.md5()
    while len(buffers) > 0:
        m.update(buffers)
        if sys.version_info >= (3, 0):
            buffers = bytes(files.read(blocksize), "utf-8")
        else:
            buffers = bytes(files.read(blocksize))
    return m.hexdigest()
# Example:
# genMD5("/path/to/file.txt")

I find that using sys.hexversion is the easiest way to compare python versions, but it’s a bit harder when it comes to displaying it to the user. Sorry for the bad formatting, by the way

def parsePyHex(string1):
        string = "%x" % string1
        count = 0
        result = ""
        for char in string:
            if count == 0 or count == 2 or count == 4:
                result += char
                if count != 4:
                    result += "."
            elif count == 5:
                if char.lower() == "f":
                    break
                else:
                    result += char.lower()
            elif count == 6:
                result += char
            count += 1
        return result

# Examples:
# parsePyHex(sys.hexversion)    -> "2.7.3" (under python 2.7.3 final)
# parsePyHex(sys.hexversion)    -> "3.0.0a1" (under python 3 alpha 1)

After moving relinux to Qt, I tried to remove all references of Tkinter, but I then realized that some of my functions used Tkinter’s StringVar and IntVar. Since they are quite useful, I decided to make my own version of them that supports any python object.

class EventVar():
    def __init__(self, **kw):
        self.__value__ = None
        self.writenotify = []
        self.readnotify = []
        if "value" in kw:
            self.set(kw["value"])

    def set(self, newvalue):
        self.__value__ = newvalue
        for i in self.writenotify:
            i(newvalue)

    def get(self):
        for i in self.readnotify:
            i()
        return self.__value__

    def trace(self, rw, func):
        if rw.lower() == "r":
            self.readnotify.append(func)
        elif rw.lower() == "w":
            self.writenotify.append(func)
# Example:
# def myFunc(var):
#     print("From myFunc: " + str(var))
# v = EventVar(value=None)
# v.trace("w", myFunc)
# v.set("test")

Hope someone can find use of these!

Advertisements

14 responses to “Misc python functions

  1. Thomas Kluyver September 30, 2012 at 9:59 am

    What are you trying to achieve with your utf8 function? It will produce str on Python 2 and 3, but Python 3’s str is unicode, so you would still have to handle them differently. It’s usually more useful to work with unicode on Python 2, and treat it the same as Python 3 str.

    • Anonymous Meerkat October 1, 2012 at 6:45 pm

      Yeah, that would be way easier, but this function is for when you need to use str’s.

      • Thomas Kluyver October 1, 2012 at 6:58 pm

        Here you go: https://gist.github.com/3813714

        Note that this, like your original, assumes that any bytestrings are utf-8 encoded. If not, it will throw a UnicodeDecodeError on Python 3, and produce incorrect output on Python 2.

  2. Alex September 30, 2012 at 11:54 am

    remDuplicates: list(set(arr)) :)

    • Anonymous Meerkat October 1, 2012 at 4:54 am

      Yes, but it doesn’t preserve order :(

  3. Pingback: Misc python functions | lkubuntu « towerysdev2

  4. yed podtrzitko (@yedpodtrzitko) September 30, 2012 at 4:59 pm

    Please, learn some basic python’s data structures and modules before you start reinveting wheel. For example:

    def is_ascii2(s):
    try:
    str(s)
    except UnicodeEncodeError:
    return False
    else:
    return True

    ^ this is 5000x faster than your solution (see gist below)

    def rem_duplicates(arr):
    return set(arr)
    ^ this is 5x faster than your solution (see gist below)

    def version():
    import sys
    v = sys.version_info
    return v.major, v.minor, v.micro

    ^ no hexparsing needed

    For flattening see itertools.chain, I bet is also faster.

    Some benchmarks mentioned above:

    • yed podtrzitko (@yedpodtrzitko) October 1, 2012 at 10:12 am

      sorry, correct ascii solution:

      def is_ascii2(s):
      try:
      s.decode(‘ascii’)
      except UnicodeDecodeError:
      return False
      else:
      return True

      • Thomas Kluyver October 1, 2012 at 10:18 am

        That version won’t work with a (non-ascii) unicode string – on Python 2 it will throw UnicodeEncodeError, on Python 3 it will throw AttributeError, because str doesn’t have a decode method.

        The version with str(s) will always return true if passed a str.

      • yed podtrzitko (@yedpodtrzitko) October 1, 2012 at 12:29 pm

        @Thomas Kluyver – the point was ‘do not to iterate character-by-character’, because it’s terribly SLOW. Even if my solution failed, you should at least agree there’s a better way how to solve it (raising Exception on de/encoding, using regexp…), but not iterating it.

      • Thomas Kluyver October 1, 2012 at 12:41 pm

        Oh yes, I quite agree about that. But ‘works correctly’ beats ‘fast’. Here’s my attempt at achieving both: https://gist.github.com/3811420

    • Anonymous Meerkat October 2, 2012 at 3:15 am

      For 1, Didn’t know that it was that fast (I actually thought my method was faster). Thanks for letting me know!
      For 2, as I said to alex, it may be faster, but it doesn’t keep order (plus it doesn’t return a list, but I know that’s easily fixable by list(set(arr)))
      For 3, you are totally correct, and yours would be great for version_info parsing, but mine is for hexversion parsing.
      For 4, for some reason, I had trouble with generators… I don’t know why though (I’m still a beginning python programmer :/)…

      Anyways, thanks for commenting and letting me know about the benchmarks!

  5. Victor September 30, 2012 at 5:14 pm

    # Check if a string is ASCII or not
    def is_ascii(s):
    for c in s:
    if ord(c) >= 128:
    return False
    return True

    could be

    is_ascii = lambda s: return all(ord(c)>=128 for c in s)

    • Anonymous Meerkat October 2, 2012 at 3:17 am

      Yes, that was the original function that I “cleaned up” (didn’t link to it as I forgot where it was). Reason why I cleaned it is that it’s a bit easier to read (at least IMHO), faster, and takes less memory (as it exits when the first non-ASCII character is encountered, and it doesn’t make a useless list object).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: