Here I describe a simple way to extract information on variables, clasess and functions from Python source code without actually compiling / interpreting it. Very useful Python modules that can help with this task are ast and tokenize.

The first goal I had was to get a list of all classes, functions and variables defined in a Python file, including useful information such as function and class length and the hierarchy of variables, functions and classes in the file. As it turns out, using ast this is really simple.

The first step consists in loading the source code and parsing it using ast:

with open(filename,'r') as f:
    content = f.read()
f.close()
p = ast.parse(content,filename,mode='exec')


The next step is to define a custom NodeVisitor class, which we can use to walk down all branches of the ast tree generated by the ast.parse function. In my case, I wanted to extract information on functions, classes, variables and import statements, so I define the following node visitor:

class AnalysisNodeVisitor(ast.NodeVisitor):

    def __init__(self,rootNode = None):
        self._modules = []
        self._classes = []
        self._functions = []
        self._variables = []
        self._imports = []
        self._rootNode = rootNode
        self._parentNode = rootNode
        self._level = 0

    @property
    def rootNode(self):
        return self._rootNode

    @property
    def imports(self):
        return self._imports

    @property
    def functions(self):
        return self._functions

    @property
    def variables(self):
        return self._variables

    @property
    def classes(self):
        return self._classes

    def visit_Import(self,node):
        for name in node.names:
            importNode = Node(attributes = {'type':'import','names':map(lambda x:x.name,node.names)},parent = self._parentNode)
            self._imports.append(importNode)
        ast.NodeVisitor.generic_visit(self, node)

    def visit_ImportFrom(self,node):
        for name in node.names:
            importNode = Node(attributes = {'line_number':node.lineno,'type':'from_import','module':node.module,'names':map(lambda x:x.name,node.names)},parent = self._parentNode)
            self._imports.append(importNode)
        ast.NodeVisitor.generic_visit(self, node)

    def visit_Assign(self,node):
        for target in node.targets:
            self._add_target_to_variables(target)
        ast.NodeVisitor.generic_visit(self, node)

    def visit_AssignAug(self,node):
        self._add_target_to_variables(node.target)
        ast.NodeVisitor.generic_visit(self, node)

    def _add_target_to_variables(self,target):
        if hasattr(target,'value'):
            self._add_target_to_variables(target.value)
        elif hasattr(target,'id'):
            if not target.id in self._variables and not target.id == "self":
                variableNode = Node(attributes = {'type':'variable','name':target.id},parent = self._parentNode)
                self._variables.append(variableNode)

    def visit_FunctionDef(self,node):
        body = node.body
        functionNode = Node(attributes = {'type':'function','name':node.name,'start_line':body[0].lineno,'end_line':_get_last_line_number(body),'docstring':ast.get_docstring(node)},parent = self._parentNode)
        self._functions.append(functionNode)
        oldParent = self._parentNode
        self._parentNode = functionNode
        ast.NodeVisitor.generic_visit(self, node)
        self._parentNode = oldParent

    def visit_ClassDef(self,node):
        body = node.body
        classNode = Node(attributes = {'type':'class','name':node.name,'start_line':body[0].lineno,'end_line':_get_last_line_number(body),'docstring':ast.get_docstring(node)},parent = self._parentNode)
        self._classes.append(classNode)
        oldParent = self._parentNode
        self._parentNode = classNode
        ast.NodeVisitor.generic_visit(self, node)
        self._parentNode = oldParent


When passing a node tree to the visitor class, it will call the visit(node) function on each node of the syntax tree. The default implementation of visit() then calls another function depending on the type of the node it encounters. For example, for class and function definitions, it will call the `visit_FunctionDef` and `visit_ClassDef` functions. A complete list of all function types can be found in the abstract grammar section of the ast documentation page. So in my class, I just redefine `visit_Import`, `visit_ImportFrom`, `visit_Assign`, `visit_AssignAug`, `visit_FunctionDef` and `visit_ClassDef` to extract all the required information on imports, classes, functions and variables, giving me the names, location and length of all of them. The calculation of the length of a class or function is a bit tricky since it involves the lengths of child nodes, so I wrote a little helper function to get the last line number associated to a class or function body:

def _get_last_line_number(nodes):
    children = None
    if hasattr(nodes[-1],'orelse'):
        children = nodes[-1].orelse
    elif hasattr(nodes[-1],'finalbody'):
        children = nodes[-1].finalbody
    elif hasattr(nodes[-1],'body'):
        children = nodes[-1].body
    if children:
        return max(nodes[-1].lineno,_get_last_line_number(children))
    else:
        return nodes[-1].lineno


Now, a syntax tree can be parsed by creating an `AnalysisNodeViewer`() instance and calling the `visit`(p) function with the AST syntax tree as argument on it. The Node class that appears in the code is a simple class that stores the information on each node and contains a list of the children nodes in the code hierarchy:

class Node(object):

    def __init__(self,attributes = {},parent = None):
        self.__dict__.update({'_children':[],'_parent':None,'_attributes':attributes})
        self.parent = parent

    def __repr__(self):
        return self.__class__.__name__+"(attributes = "+str(self.attributes)+")"

    @property
    def attributes(self):
        return self._attributes

    @attributes.setter
    def attributes(self,attributes):
        self._attributes = attributes

    @property
    def parent(self):
        return self._parent

    @parent.setter
    def parent(self,parent):
        if self._parent != None:
            self._parent.removeChild(self)
        self._parent = parent
        if self._parent != None:
            self._parent.addChild(self)

    @property
    def children(self):
        return self._children

    def removeChild(self,child):
        if child in self.children():
            del self._children[self._children.indexof(child)]

    def addChild(self,child):
        if not child in self._children:
            self._children.append(child)

When initializing the `AnalysisNodeVisitor` class with a root node, all nodes generated during the analysis will be attached to this node, allowing us to reconstruct the code structure from the node hierarchy. In addition, all function, class, import and variable nodes are stored in the `functions`, `variables`, `classes `and `imports` attributes of the class.