Disclaimer:
This post is an elaborate documentation with the entire context, reasoning and explanation of all the possible solutions that others may have tried as well. I decided to blog everything that followed. I am currently using two Windows 7 PC's to accomplish my goals. If you are only interested in the Solution, then you may directly go this section in the end. However, I encourage you to read through this entire post, as I have listed all useful references, in the process.
Input: GIS Data;
Software and GIS Environment / Tools:
QGIS, R, Python, PostgreSQL / PostGIS;
Output: Vector / Raster
Data.
Context:
Trying to complete a project to meet my research goals, I decided to start small,and get familiar with Geo-programming. So I set a minimal objective of reading and writing a shapefile. After much research and the time constraints in learning a new programming language, I found that Python would fit the bill. There is ample content available on the web for geographic programming in Python.
I got Python Geospatial Development - Second Edition 1 and it all appeared very easy from there on. I read through the first 4 chapters and decided to try my hand at working on those examples. I also have a copy of Learning QGIS 2.0 and the Python Programming Guide from Wiki (I read this simultaneously). But you only know that bookish content doesn't always work.
Road Block / Problem I
As prescribed, I installed Python 2.6 on my system and downloaded GDAL and installed it too. It all went fine until I tried string and numeric examples from these manuals. The very first example given in Book 1 set me off track by throwing this error: " 'NoneType' object has no attribute GetLayerCount()". A few of them have already documented this error here on stackexchange. I followed both the responses of this thread. I found the first one to be painstakingly long with a large error stack to flip through that makes it virtually impossible to debug. The second response of this thread, on the other hand helped me throw an "IOError", which may possibly be due to corrupt data. I tried running the same code with 2009 through 2013 data. No use, and it all resulted in the same error. It is not logically possible for all the files to be corrupt, especially after I verified by opening them in QGIS 2.0.
I tried to "import shapefile" package and rework this example (See Picture 3 and 4). I initially had troubles importing the package. See Picture 2 No use, and I decided to restart!
I tried shapefile = osgeo.ogr.Open() with another shapefile that I had downloaded from OSM (Open
Street Maps) and it just worked fine. However, the GetLayerCount()
wouldn't work yet.I tried again by uninstalling the
standalone version of Python to avoid any further complexities, and resorted to
the OSGeo4w environment which could apparently help me install all the packages
required, easily using the "ez_setup" and "pip"
utilities. QGIS 2.0 installs OSGeo4W by
default (shown here:)
Picture 1:
QGIS (Windows Start Menu) |
Step 1: Start OSGeo4W
# Note: OSGeo4W
brings the user to the command prompt C:\> and stops at that. The
content displayed after the prompt in this image is a result of user
input.
Step 2: Type 'python' at the prompt
# Note: The prompt displays version details of Python similar to any standalone Python installation or IDE.
| Picture 2: OSGeo4W on Windows 7 installed along with QGIS 2.0 (by default) |
While I tried the import repeatedly,
it just worked fine at one instance. But this was of no use to my initial
problem of "Object None Type". Yes, I was merely successful in
installing the shapefile package.
When this did not work either, I decided to read through
GDAL documentation and get to the root of this problem. I looked up the GDAL documentation
for this, because I doubted if this method / function is actually an option in
python or is limited to C. But a few examples on nullege.com did
confirm the use of this method. The GDAL API Tutorial only shows this as an option in
C and does not provide any notes on this function for Python or even C++ for
that matter.
Learning of the day: GetLayerCount() is not a
method of Class Layer, but is a
method of the Class DataSource. Functions of the Class DataSource, require a handle (hDS) which in turn use a driver. So,
there may not be a problem with the actual shapefile or the Layers, but with the Open method of the OGR Object itself. As there is an open method associated with the driver class also. Please Pardon my non-Python Vocabulary in mixing up methods, classes and functions, because I am just a fortnight old.
Hence, the solution lies in opening with the driver method. I also landed up making a small mind map to get to this, and I will post this one later. I looked up for examples where shapefiles were opened using this logic. Found one (PCJERKS), and I was more than glad to plug in the code.
While
GDAL API Tutorial clearly confirms that the OGRDataSource may have more
than one layers, other websites mentions that shapefile usually have one layer
only. This one could be a separate discussion. And my findings only seemed validated. I now have the question of why the Author of Python Geospatial Development - Second Edition 1 used a for loop in the first place to swift through the layers.
Solution:
- Re-installed Python 2.7.5 and GDAL
- Used the driver.Open() instead of ogr.open()
- See Pictures 5 to look at the error while opening the 2009 file
- See Picture 6 for the detailed code and success in opening the 2012 shapefile
Inference:
- The 2009 file is actually corrupted
- Shapefile mostly have one layer of point or line type or any geometric data type / feature.
Never imagine that programming instincts are good enough
to take you through a project. Every project involves thorough reading and preparation,
or you may end up wasting time on trial and error methods.
| Picture 5: Proof that 2009 shapefile is actually corrupted |
| Picture 6: The code used to access the TIGER data (shapefile) and proof that this file has one layer only |
|
|
|
|
|
|