Posts Tagged ‘Django’

pyCOBOL is now available on Sourceforge

July 6, 2010

I have started releasing the COBOL data conversion tools to Sourceforge. Cobol2dms.py is not included in the package yet, but I plan to include it within the next few weeks. The project is located here.

Troubleshooting Django SQL & wildcards in QuerySet filters

July 1, 2010

This post uses the MySQL Sample database. The following items are just to setup a quick test environment for demonstration purposes.

Imports the database:
c:\> mysql -u root -p < c:\work\sampledatabase.sql
Set the DATABASE_NAME in Django settings.py
c:\> notepad settings.py DATABASE_NAME = 'classicmodels'
Automatically builds all the Django models from the classicmodels database
c:\> python manage.py inspectdb >> models.py
Loads the Python shell with the Django environment variables
c:\> python manage.py dbshell

Troubleshooting database queries in Django

connection.queries contains a list of the SQL statements run by Django, including auto-generated SQL from Django filters, etc.

>>> from myproject.myapp.models import *
>>> from django.db import connection
>>> from pprint import pprint
>>> pprint(list(Products.objects.values_list('productname').filter(productname__icontains='Harley')))
[(u'1969 Harley Davidson Ultimate Chopper',), 
(u'2003 Harley-Davidson Eagle Drag Bike',), 
(u'1936 Harley Davidson El Knucklehead',)]
>>> print connection.queries[-1]['sql']
SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE %Harley% LIMIT 21

Issues with wildcards in Django filters

The Django docs state that it handles the wildcards for you, so you don’t need to worry about them. What it should say is that you don’t need to worry about them if you’re not allowing wildcards in your filters. The % and _ are automatically escaped by Django and will match the literal % and _ characters, they will not be treated like standard SQL wildcard characters. I haven’t found a way to un-escape them.

>>> Products.objects.values_list('productname').filter(productname__icontains='H%y')
[(u'1969 H%y',)]
>>> print connection.queries[-1]['sql']
SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE %H\%y% LIMIT 21
>>>
c:\> python manage.py dbshell
mysql> use classicmodels;
mysql> SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE '%h\%y%' LIMIT 21;
+-------------+
| productName | 
+-------------+ 
| 1969 H%y    |
+-------------+ 
1 row in set (0.00 sec) 

mysql> SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE '%H%y%' LIMIT 21; 
+---------------------------------------+
| productName                           |
+---------------------------------------+
| 1969 Harley Davidson Ultimate Chopper |
| 2003 Harley-Davidson Eagle Drag Bike  |
| 1957 Chevy Pickup                     |
| 1998 Chrysler Plymouth Prowler        |
| 1936 Harley Davidson El Knucklehead   |
| 18th Century Vintage Horse Carriage   |
| 1948 Porsche Type 356 Roadster        |
| 1970 Chevy Chevelle SS 454            |
| 1966 Shelby Cobra 427 S/C             |
| 1928 British Royal Navy Airplane      |
| 18th century schooner                 |
| 1958 Chevy Corvette Limited Edition   |
| 2002 Chevy Corvette                   |
| 1992 Porsche Cayenne Turbo Silver     |
| 1936 Chrysler Airflow                 |
| 2002 Yamaha YZR M1                    |
| The Mayflower                         |
| HMS Bounty                            |
| The Queen Mary                        |
| 1969 H%y                              |
+---------------------------------------+
20 rows in set (0.00 sec)

mysql> SELECT `products`.`productName` FROM `products` WHERE `products`.`productName` LIKE '%Harl_y%' LIMIT 21; 
+---------------------------------------+
| productName                           |
+---------------------------------------+
| 1969 Harley Davidson Ultimate Chopper |
| 2003 Harley-Davidson Eagle Drag Bike  |
| 1936 Harley Davidson El Knucklehead   |
+---------------------------------------+
3 rows in set (0.00 sec)

>>> Products.objects.values_list('productname').filter(productname__icontains='Harl_y')
[]

If you want to allow % or _ wildcards in a search you’ll need to build a work-around using QuerySet methods like filter(regex__fieldname), extra(where=’???’), or raw(‘SQL statement’). Here is a sample of allowing % as a wildcard to search a description field using a QuerySet regex filter method. By default I try to use the contains filter option first since it appears to be faster than regex:

import re
from myproject.myapp.models import *

MIN_NUM_CHARS = 3
LEGAL_SEARCH_CHARS_RE = re.compile(r'[\w%, ]')

def search_desc(search_str):
	model, kwds = MyModel(), {}
	if len(search_str) >= MIN_NUM_CHARS:
		if '%' in search_str:
			search = '%{0}%'.format(search_str.strip('%'))
			search = ''.join([ i.replace('%', '.*') for i in search 
				if LEGAL_SEARCH_CHARS_RE.match(i) ])
			kwds.update({'desc__iregex': search})
		else:
			kwds.update({'desc__icontains': search})
		return model.objects.filter(**kwds)

A better method is to use the Django search filter

For example:

Entry.objects.filter(desc__search="+Django -jazz Python")

Django Memory Error – How-to work with large databases

June 26, 2010

If you’re working with resource intensive operations on large databases in Django, chances are you’ll encounter a ”Memory Error”.  I’ve seen this error numerous times over the last couple of weeks as part of a large data conversion project and wanted to share some the tips I learned.  Number one, DON’T PANIC!!!   It’s not Django’s fault, Django’s just doing some caching for you to speed things up.   This is a good thing, seriously…  you want Django optimized for performance by default since most resource intense operations are one-off tasks, like data imports, bulk updates, special ad-hoc reports, etc.  In these types of tasks all we need to do is slow Django down a bit and ease up on the memory, here’s how… 

  1. Use MyModel.objects.all().iterator() instead of MyModel.objects.all()
  2. If you don’t need all of the fields in a query use values_list
  3. Use the  Memory efficient Django Queryset Iterator, written by Thierry Schellenbach
  4. set DEBUG = False in your settings.py file.   If DEBUG is True, then Django saves a copy of every SQL statement it has executed. See Django FAQ: Why is Django leaking memory? (spoiler: Django’s not really leaking memory).

Ranked in the order of highest performance, but also in the order of most memory intensive. 

  1. MyModel.objects.all() – fastest, for standard queries
  2. MyModel.objects.all().iterator() – medium, for large queries
  3. queryset_iterator(MyModel.objects.all()) – slowest, for monster queries

One size doesn’t fit all, use the best method for your particular task.  By the way, the Django Snippet: “Memory Efficient Django Queryset Iterator” is solid. I’ve been using it a lot over the past couple of weeks and been validating the results along the way – works like a champ :-)

Windows quick & easy Django + MySql installation recipe

May 28, 2010

The following instructions are written for Windows Django development installations.  Have fun…

(more…)


Follow

Get every new post delivered to your Inbox.