Saturday, March 29, 2014

Useful Links for Python programming

Python programming resources: I will try and keep this list updated.

Tuesday, March 25, 2014

GIS Project in Python from OSGeo4W - Part 2

Here are the working codes of the examples provided in Chapter 2 of of Python Geospatial Development Second Edition. I have tried to stick to the text book version of the programs and to the links provided in my previous post

Example 1 and Example 2

import os
from osgeo import ogr

daShapefile = r"tl_2012_us_state.shp"

driver = ogr.GetDriverByName('ESRI Shapefile')
dataSource = driver.Open(daShapefile, 0) # 0 means read-only. 1 means writeable.

# Check to see if shapefile is found.
if dataSource is None:
    print 'Could not open %s' % (daShapefile)
    print 'The driver is %s' % driver
else:
    print 'Opened %s and the driver used is %s' % ((daShapefile), driver)
    layer = dataSource.GetLayer()
    numLayers = dataSource.GetLayerCount()
    featureCount = layer.GetFeatureCount()
    print "Number of Layers in %s: %d and had %d features" % (os.path.basename(daShapefile),numLayers, featureCount)

    for featureNum in range(featureCount) :
        feature = layer.GetFeature(featureNum)
        featureName = feature.GetField("NAME")
        geometry = feature.GetGeometryRef()
        geometryName = geometry.GetGeometryName()
       
        print "Feature %d . %s with a %s in geometry data and has the following attributes: " % (featureNum, featureName, geometryName)

        attributes = feature.items()

        for key, value in attributes.items():
            print " %s = %s \t" %(key, value)

           
Example 3:
import osgeo
import os
from osgeo import ogr

daShapefile = r"tl_2012_us_state.shp"
driver = ogr.GetDriverByName('ESRI Shapefile')
shapefile = driver.Open(daShapefile, 0)

def analyzeGeometry(geometry, indent=0) :
    s = []
    s.append(" " * indent)
    s.append(geometry.GetGeometryName())
    if geometry.GetPointCount() > 0 :
        s.append(" with %d data points " %geometry.GetPointCount())
    if geometry.GetGeometryCount() > 0 :
        s.append(" containing:")

    print "".join(s)

    for i in range(geometry.GetGeometryCount()):
           
        analyzeGeometry(geometry.GetGeometryRef(i), indent+1)

layer = shapefile.GetLayer(0)
feature = layer.GetFeature(55)
geometry = feature.GetGeometryRef()

analyzeGeometry(geometry)


Example 4:
import osgeo
import os
from osgeo import ogr

daShapefile = r"tl_2012_us_state.shp" # this line assumes that the program and data are in the same directory

# In case the python program and shapefile are in two different directories  then :
# Please provide the full directory path of the shapefile to be analysed

driver = ogr.GetDriverByName('ESRI Shapefile')
shapefile = driver.Open(daShapefile, 0)

def findPoints(geometry, results) :
    for i in range(geometry.GetPointCount()):
        x,y,z = geometry.GetPoint(i)
        if results['north'] == None or results['north'] [1] < y:
            results['north'] = (x,y)
        if results['south'] == None or results['south'] [1] > y:
            results['south'] = (x,y)
    for i in range(geometry.GetGeometryCount()):
        findPoints(geometry.GetGeometryRef(i), results)

layer = shapefile.GetLayer(0)
feature = layer.GetFeature(55)
geometry = feature.GetGeometryRef()

results = { 'north' : None,
            'south' : None}

findPoints(geometry, results)

print "Northern most point is (%0.4f, %0.4f)" % results['north']
print "Southern most point is (%0.4f, %0.4f)" % results['south']


Example 5 and 6:
import osgeo
import os
from osgeo import ogr
import math
from decimal import *

daShapefile = r"tl_2012_us_state.shp" # this line assumes that the program and data are in the same directory

# In case the python program and shapefile are in two different directories  then :
# Please provide the full directory path of the shapefile to be analysed

driver = ogr.GetDriverByName('ESRI Shapefile')
shapefile = driver.Open(daShapefile, 0)

def findPoints(geometry, results) :
    for i in range(geometry.GetPointCount()):
        x,y,z = geometry.GetPoint(i)
        if results['north'] == None or results['north'] [1] < y:
            results['north'] = (x,y)
        if results['south'] == None or results['south'] [1] > y:
            results['south'] = (x,y)
    for i in range(geometry.GetGeometryCount()):
        findPoints(geometry.GetGeometryRef(i), results)

layer = shapefile.GetLayer(0)
feature = layer.GetFeature(55)
geometry = feature.GetGeometryRef()

results = { 'north' : None,
            'south' : None}

findPoints(geometry, results)

print "Northern most point is (%0.4f, %0.4f)" % results['north']
print "Southern most point is (%0.4f, %0.4f)" % results['south']

for i in range(len(results)):
    print "%d %0.4f" %(i, results['north'] [i])

# Text book exercise: Calculate the mid point and distance between the north and south extreme points
# print "%s" % type(results) # check the data type of results

latlong = results.values()
latlongrad = [[0.0] *2 for i in range(2)] # initialising and creating a 2 dimensional array of floats

for i in range(2):
    for j in range(2):
        latlongrad[i][j] = math.radians(latlong[i][j])

dLat = latlongrad[0][1]-latlongrad[1][1]
dLong = latlongrad[0][0]-latlongrad[1][0]

# haversine formula:

a = math.sin(dLat/2)**2 + math.cos(latlongrad[0][1]) * math.cos(latlongrad[1][1]) * math.sin(dLong/2)**2

c = 2 * math.atan2(math.sqrt(a), math.sqrt(1-a))

distance = 6371 * c

print " Great Circle distance between northern and southern most points is %0.0f kilometers" % distance

 

Sunday, March 23, 2014

GIS Project in Python from OSGeo4W - Part 1

Disclaimer: 

This post is an elaborate documentation with the entire context, reasoning and explanation of all the possible solutions that others may have tried as well. I decided to blog everything that followed. I am currently using two Windows 7 PC's to accomplish my goals. If you are only interested in the Solution, then you may directly go this section in the end. However, I encourage you to read through this entire post, as I have listed all useful references, in the process.

Input: GIS Data;  

Software and GIS Environment / Tools: QGIS, R, Python, PostgreSQL / PostGIS; 

Output: Vector / Raster Data.

Context: 

Trying to complete a project to meet my research goals, I decided to start small,and get familiar with Geo-programming. So I set a minimal objective of reading and writing a shapefile. After much research and the time constraints in learning a new programming language, I found that Python would fit the bill. There is ample content available on the web for geographic programming in Python. 
 
I got Python Geospatial Development - Second Edition 1 and it all appeared very easy from there on. I read through the first 4 chapters and decided to try my hand at working on those examples. I also have a copy of Learning QGIS 2.0 and the Python Programming Guide from Wiki (I read this simultaneously). But you only know that bookish content doesn't always work. 

Road Block / Problem I

As prescribed, I installed Python 2.6 on my system and downloaded GDAL and installed it too. It all went fine until I tried string and numeric examples from these manuals. The very first example given in Book 1 set me off track by throwing this error: " 'NoneType' object has no attribute GetLayerCount()". A few of them have already documented this error here on stackexchange. I followed both the responses of this thread. I found the first one to be painstakingly long with a large error stack to flip through that makes it virtually impossible to debug. The second response of this thread, on the other hand helped me throw an "IOError", which may possibly be due to corrupt data. I tried running the same code with 2009 through 2013 data. No use, and it all resulted in the same error. It is not logically possible for all the files to be corrupt, especially after I verified by opening them in QGIS 2.0.

I tried to "import shapefile" package and rework this example (See Picture 3 and 4). I initially had troubles importing the package. See Picture 2 No use, and I decided to restart! 

Picture 3
Picture 4
I tried shapefile = osgeo.ogr.Open() with another shapefile that I had downloaded from OSM (Open Street Maps) and it just worked fine. However, the GetLayerCount() wouldn't work yet.I tried again by uninstalling the standalone version of Python to avoid any further complexities, and resorted to the OSGeo4w environment which could apparently help me install all the packages required, easily using the "ez_setup" and "pip" utilities. QGIS 2.0 installs OSGeo4W by default (shown here:)

Picture 1: 
QGIS (Windows Start Menu)

 


Step 1: Start OSGeo4W      

# Note: OSGeo4W brings the user to the command prompt C:\> and stops at that. The content displayed after the prompt in this image is a result of user input.

Step 2: Type 'python' at the prompt    
# Note: The prompt displays version details of Python similar to any standalone Python installation or IDE. 
Picture 2: OSGeo4W on Windows 7 installed along with QGIS 2.0 (by default) 



While I tried the import repeatedly, it just worked fine at one instance. But this was of no use to my initial problem of "Object None Type". Yes, I was merely successful in installing the shapefile package.  


When this did not work either, I decided to read through GDAL documentation and get to the root of this problem. I looked up the GDAL documentation for this, because I doubted if this method / function is actually an option in python or is limited to C. But a few examples on nullege.com did confirm the use of this method. The GDAL API Tutorial only shows this as an option in C and does not provide any notes on this function for Python or even C++ for that matter.  

Learning of the day: GetLayerCount() is not a method of Class Layer, but is a method of the Class DataSource. Functions of the Class DataSource, require a handle (hDS) which in turn use a driver. So, there may not be a problem with the actual shapefile or the Layers, but with the Open method of the OGR Object itself. As there is an open method associated with the driver class also. Please Pardon my non-Python Vocabulary in mixing up methods, classes and functions, because I am just a fortnight old.

Hence, the solution lies in opening with the driver method. I also landed up making a small mind map to get to this, and I will post this one later. I looked up for examples where shapefiles were opened using this logic. Found one (PCJERKS), and I was more than glad to plug in the code.

While GDAL API Tutorial clearly confirms that the OGRDataSource may have more than one layers, other websites mentions that shapefile usually have one layer only. This one could be a separate discussion. And my findings only seemed validated. I now have the question of why the Author  of Python Geospatial Development - Second Edition 1 used a for loop in the first place to swift through the layers.
 
Solution:
  • Re-installed Python 2.7.5 and GDAL
  • Used the driver.Open() instead of ogr.open()
  • See Pictures 5 to look at the error while opening the 2009 file
  • See Picture 6 for the detailed code and success in opening the 2012 shapefile
Inference
  • The 2009 file is actually corrupted 
  • Shapefile mostly have one layer of point or line type or any geometric data type / feature.
Never imagine that programming instincts are good enough to take you through a project. Every project involves thorough reading and preparation, or you  may end up wasting time on trial and error methods.
Picture 5: Proof that 2009 shapefile is actually corrupted


Picture 6: The code used to access the TIGER data (shapefile) and proof that this file has one layer only






Wednesday, September 4, 2013

Managing Research Standards with Apps and Data availability

Research Management

Just as I steer through Research, a 3 step process necessarily,
1. Searching
2. Reading
3. Writing

I understand the challenges faced by fellow scholars at the very first step. The quality of research depends on the quality of information accesible to students and the analytics performed on existing research to decide their topic / area of research. Whitepapers, Projects, Lectures, Conferences, Seminars, Journals, Bibliographies, State-of-the-art presentations, etc., are part of the process and only enhance the knowledge, only vital requirement for quality research.

Should we not have access to all research papers, in a database, that connects topics and contents?

Researchers could validate the output, and testify the correctness of such automatically generated bibliography as part of their course work. In this way, Research Management is tackled well, as a social responsibility, avoiding horrrendous waste of effort in duplicate research, and irrelavant reading, that is occuring all across the globe.

I found Microsoft Academic Search (MAS) to be the best vehicle so far. Google Scholar is definitely competing, but not close enough to MAS. I just hope, I will be able to use the MAS API to import the long list of a 1900+ papers on to my desktop in an excel file, to be able to do a state of the art on Temporal Databases. It will be the best way for me to understand the area thoroughly, to test my hypothesis to be in -line with current research and industry requirements, or it to be obsolete / duplicate / tested and tried already! 

If I were asked to build an intelligent cralwer for research, I'd only say, I am far from it. Any researcher will quickly reckon with the need for the following -
  1. Crawler / Bot - To create a list of papers, relevant publications from web-sites and search engines
  2. PDF Management Software - Extract text from PDF files (abstract, Title, Author, citations, bibliography, etc., even if not part of metadata)
  3. Document Management / Content Management Software (if the files included, are all not pdfs)
  4. Database / DBMS - My SQL / XML / Excel or even a .dat file will suffice.
  5. Reporting / Analytical Software - R / SAS / Excel ? - To present all possible information
  6. Other Nice to have features - Mind Mapping software, BibTex Generator, PDF to word converter, Software for Image extraction from PDFs (they're usually part of a PDF Management software), Reference and Citation Manager, Notes Management software will be of great help.
  7. Last but not the least, a giant Database like the MAS, with all the pre-work done regularly :) .. I know that is asking for too much, at the same time, it would also render research too easy and invaluable to the namesakers, if it weren't tasking us and demand that we read, which is a process only to generate high class research.
Books to help - Reading Statistics and Research by Schuyler Huck taht was available at our college library - the best resource you can find.
  
Monday, June 11, 2012
 

Sunday, August 25, 2013

Resources for academic writing

The process of research involves perusing scores of publications, careful observation and presentation of results. While every phase is important, presenting research results and review writing could be a challenge, especially if you are new to academic writing. While I am sifting through this phase myself, I decided to document the process and efforts. Initially, I was unable to fix the key words as "academic writing", and searched for "Vocabulary for Research scholars", "words used in technical writing", "abstract nouns and verbs used in research publications", and finally, "words used in academic writing" !!

I could not pay complete attention to details, as I was primarily looking for a list of words and phrases. Nevertheless, I found the content interesting and leafed through complete articles, first few pages of many books. I also read through university resources that cover this topic, very carefully, and found the following resources, worth a second glance.

Reading Statistics and Research by Schuyler W. Huck is a book that I stumbled upon at my college library, during the initial few days of my course work. Ever since, this book has become an integral part of my research. I also have a copy of writing research papers that I found at a second hand book shop.

A list of resources that I found to be useful for academic writing and vocabulary are listed below. 
  • Web based resources
  1. Cornell University, offers valuable resources to students and research scholars across the globe.
  2. A Glossary of Research Terms can also be useful. 
  3. Academic Phrasebank by Manchester University has a practical solution. It offers a list of phrases categorized by the sections in which their use is appropriate (Example: There are phrases for describing methods, reporting results, etc.,)
  4. UNC College of Arts and science has published a collection of Handouts, of which, Word Choice is an essential read. 
  5. Unilearning offers a list of colloquial vocabulary and their formal alternatives 
  6. Linking Words And Phrases 
  • E-Books 
    • English for Writing Research Papers by Adrian Wallwork
  • Books
  1.  Science Research Writing for Non-native Speakers of English by Hilary Glasman-Deal
  2. Technical Writing By Alcantara R.D. et., al. is worth a read for the exercises included in it


    These are only a few listings and are a result of time bound resource identification, as I cannot afford too many hours to this activity. I took a screen shot of this post from a friend. 


    Happy Researching !!