Bug Fix: Multiple Citations from the Same Source

Today I fixed a bug in the URL class that prevented the display of quote context for sources that were cited more than once per post.

This was caused by a bad data-structure decision: using a dictionary, rather than a list of dictionaries.

One additional benefit of this change is that citations are de-duplicated so that sources are only requested once, rather than clobbered with multiple requests.

Here’s how duplicates are detected:

def citations_list_duplicates(self):
  from collections import Counter

  duplicate_counter = {}
  duplicate_urls = []

  urls = self.citation_urls()
  duplicate_counter = Counter(urls)

  for cited_url, count in duplicate_counter.items():
    if count > 1:
      duplicate_urls.append(cited_url)
  return duplicate_urls

Here’s how duplicate URLs are pre-fetched and cached:

def citations(self):
  '''Pre-fetch and cache URL 
     if it is found in more than one quote
     This prevents sources from being clobbered 
     with multiple requests in parallel
  '''
  for url in self.citations_list_duplicates():
    d = Document(url)
    d.download_resource() # request and cache result

This was Issue #26 for the web-service project, posted to GitHub.

  • Here’s a test post in which a test article is posted with multiple quotes from the same article.