Tagdjango

Machine learning empowered blackkiwi

balckkiwi og

A blackkiwi which tells your mood.

Blackkiwi is a django powered website which uses machine learning to tell if you wrote happy or sad things on your latest Facebook status. And it tends to be wayyy positive. But we will get there…

The genesis of blackkiwi

There were two main things combined with the genesis of blackkiwi.

The first it was this curiosity about Natural Text processing and classificaiton techniques. In particular I wanted to write some classifers to see how well they were performing and I also wanted to try to do and test something new.

But I needed some kind of application. This is usually a good trick in programming in general. If you build towards something, it is always easier to stay motivated and actually getting it done, instead of giving up the hobby and end up playing World of tanks on the play :).

The second ingredient was to try to test the release process via gitlab, using automatic push via CI to a server. As stack I wanted to use a classic dokku stack which I’m very happy to use, beccause it basically brings the nice and easy way to deploy similar to heroku/gondor style to your own server.

Last but not least, I wanted to test natural language processing, because I wanted to do something about my facebook feed. Lately, giving maybe to all the political happenings like Brexit, Trump, migration crisis I saw an increase of posts from people being extremely racists, hate-fulled and extremely violent. This toegether with total non-sense and antiscientific claims.

The classic way would be to try to have a conversantion, and try to explain that these positions are unacceptable and also dangerous for the whole community, but this usually ends up in a fight with the trolls, and TBH, I don’t think it is a winnable fight.

However I thought it could be a good idea to try to get something going, where you can get the facebook status and see where you were basically landing. Were your statements close to for example racist individual, or you were more close to intelligent and inspiring characters?

Of course this is quite complicated to build, but I decided that I have to start somewhere, so I settled on an application able to tell if you were happy or sad on Facabook to start.

Blackkiwi: how does it work?

Conceptual there are three main parts:

  1. the kiwi goes to Facebook to get the user’s mood after being authorized (it is a good kiwi)
  2. the kiwi works very hard to try to understand if you were happy or not, and it writes it down
  3. the kiwi then draws this moods on a plot, to show your mood in a timeseries fashion ways.

It’s a pretty clever and hardworking kiwi, our own. I’m not sure what should be the name. feel free to propose one in the comment, if you like.

The computation stack: the classifiers

Two problems needed to be solved here:

  1. we needed a way to connect to facebook and get the moods out in same form, so we could feed them to the classifiers
  2. we had to build, train and then load the classifiers

The first part of the job was quite a new adventure. I never used Facebook Graph Api or created an app on that platform before, so there was a little bit of learning. At then end of several experimentations I’ve settled to use facebook-sdk. Nice piece of software which does most of the job.

For example, our collector class looks like this:

# -*- coding: utf-8 -*-
import logging
import argparse

import facebook
import requests

# create logger
logger = logging.getLogger(__name__)

class FBCollector(object):
 def __init__(self, access_token, user):
 self.graph = facebook.GraphAPI(access_token)
 self.profile = self.graph.get_object(user)
 logger.debug("Collector initialized")

 def collect_all_messages(self, required_length=50):
 """Collect the data from Facebook
 
 Returns a list of dictionary.
 Each item is of the form:
 
 ```
 {'message': '<message text here>', 
 'created_time': '2016-11-12T22:59:25+0000', 
 'id': '10153812625140426_10153855125395426'}
 ```
 
 The `id` is a facebook `id` and it is always the same.
 
 :return: collected_data, a list of dictionary with keys: `message`, `created_time` and `id`
 """
 logger.debug("Message collection start.")
 collected_data = []
 request = self.graph.get_connections(self.profile['id'], 'posts')
 
 while len(collected_data) < required_length:
 try:
 data = request['data']
 collected_data.extend(data)
 # going next page
 logger.debug("Collected so far: {0} messages. Going to next page...".format(len(collected_data)))
 request = requests.get(request['paging']['next']).json()
 except KeyError:
 logger.debug("No more pages. Collection finished.")
 # When there are no more pages (['paging']['next']), break from the
 # loop and end the script.
 break

 return collected_data

if __name__ == "__main__":
 logger.setLevel(logging.DEBUG)
 # create console handler and set level to debug
 ch = logging.StreamHandler()
 ch.setLevel(logging.DEBUG)
 # create formatter
 formatter = logging.Formatter('%(asctime)s|%(name)s:%(lineno)d|%(levelname)s - %(message)s')
 # add formatter to ch
 ch.setFormatter(formatter)
 # add ch to logger
 logger.addHandler(ch)
 
 parser = argparse.ArgumentParser(description='Process some integers.')
 parser.add_argument('access_token', help='You need a temporary access token. Get one from https://developers.facebook.com/tools/explorer/')
 parser.add_argument('--user', help="user with public message you want to parse", default="BillGates")
 args = parser.parse_args()
 fb_collector = FBCollector(args.access_token, args.user)
 messages = fb_collector.collect_all_messages()
 logger.info("Collected corpus with {0} messages".format(len(messages)))

As you can see you need a token to collect the message. This token is obtained by the profile of the facebook user, which will let you collect his/her status. note that you need permissions to do this for real, and your app needs to be approved by Facebook, however you can get the messages of a public user, like Bill Gates in the example, and then get them out in a nice organized list of dictionaries.

So have a way to connect to Facebook, and given we have the right token ™, we can get the status updates out. We’ve got to classify them now…

May the 4th has passed

The classifiers bit is quite complex. First we need to find a corpus, then we need to create the classifiers, then to train them. Then save them, so we can then load them up and use them.

We build the classifiers using the nice NLTK library, together with Scikit-Learn. All the classifiers perform pretty similar, and I decided to go for a voted classifiers, which decided if the text is positive or negative using the majority consensus. Instead of using pickle to save them, we are using dill, ‘caue it plays well with classes.

Once they have been trained, we can load them up and use them. This is the loading function:

def load_classifier(self):
    naive_bayes_classifier = dill.load(open(self.naive_classifier_filename, "rb"))
    MNB_classifier = dill.load(open(self.multinomialNB_filename, "rb"))
    BernoulliNB_classifier = dill.load(open(self.bernoulli_filename, "rb"))
    LogisticRegression_classifier = dill.load(open(self.logistic_regression_filename, "rb"))
    SGDClassifier_classifier = dill.load(open(self.sgd_filename, "rb"))
    LinearSVC_classifier = dill.load(open(self.linear_svc_filename, "rb"))
    NuSVC_classifier = dill.load(open(self.nu_svc_filename, "rb"))

    voted_classifier = VoteClassifier(naive_bayes_classifier,
                              LinearSVC_classifier,
                              SGDClassifier_classifier,
                              MNB_classifier,
                              BernoulliNB_classifier,
                              LogisticRegression_classifier,
                              NuSVC_classifier)
    self.voted_classifier = voted_classifier
    self.word_features = dill.load(open(self.word_features_filename, "rb"))
    logger.info("Classifiers loaded and ready to use.")

and the analyzer API looks like this:

analyzer = Analyzer()
classified, confidence = analyzer.analyze_text("today is a good day! :)")

The computation stack: the web

django meme

Yep. Django. Always. 🙂

These are the installed app in the blackkiwi project

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    
    'django.contrib.sites',
    
    # our stuff
    'moody', # we are first so our templates get picked first instead of allauth
    'contact',
    
    'allauth',
    'allauth.account',
    'allauth.socialaccount',
    'allauth.socialaccount.providers.facebook',
    'bootstrapform',
    'pipeline',
    
]

All the integration with Facebook is happily handled by the django-allauth which works pretty well, and I suggest you to take a look.

For example, in this case I wanted to override the templates already provided by the django-alluth and I have put our app moody before allauth, so our own templates do get found and picked up by the template loaders before the allauth proided.

So that way, once the user authorize us, we can pick the right ™ token, collect his/her messages, and then score them with the classifiers.

Then we plot them on the site using D3.js, like you can see here.

The deploy is done using gitlab, with testing/staging/production system, using the gitlab CI. But we leave this for another post, ’cause this is way too long anyway.

Have fun!

 

Say hello to Loqu4cius

loqu4cius

Loqu4cius is a lightweight blog engine based on Django (not this Django), that runs on google app engine and it uses as backend CloudSQL, which is, as google put it, MySQL on the cloud.

A bit of history

Google appengine has the ability to run scalable app. So far it was possible to use django on it, given the fact Python was one of the two supported languages, however the back end was big table, which is not compatible with the classic RDBMS used by django.

This made impossible to use span relation and over, so the only usable bit of django were the templates, the URLs router but not the model…

Django-nonrel to the rescue.

A project called django-nonrel came to the rescue, and it created a compability layer between the NoSQL backend and the classic django ORM. Most of the span relationship were working, however some of the join, like the many2many were not available.

Fast forward to our time

Fast forward to today, google made it possible to have a classic RDBMS available, with the possibility to use all the ORM goodies, included django third app that can speed up and reuse the development.

So now Google-cloud to the rescue.

To check it out, I’ve came up with Loqu4cius.

It features a tag cloud that makes it be 2.0, is based on Twitter Bootstrap, and I’ve styled with some colors and the fonts (directly from google font), a search bar and the ability to enter rich text using ckeditor. The comments are integrated using disqus, that is the way to go right now.

The code is on GitHub with a quick readme, for any question the comments are here :).

Some thoughts about the development

Google appengine comes with some limitation, but with the possibility to add third parties libraries it is possible to re-use a lot of the django apps already available. (Let’s agree on terminology: app –> a single application that does one thing, for example it manages the tags, project –> a collection of all the apps and related files that runs the entire site.)

My strategy is to create a virtualenv and than copy all the necessary modules into the lib folder. This gives me the ability to install a package with

pip install package_name

and all the dependencies very easily. After that it’s a matter or using the apps and make it work pretty nice.

CSS writing

I like to use less to write CSS, but I don’t want to have a client compilation of the less file, and I want only to serve CSS in production, therefore I use two helper to get the job done.

First I use a python script that finds all the less file and compiles them into css, calling the lessc compiler.

However I don’t want everytime that I write a new bit of the less file, to call the script myself, so I use watchdog to call the script everytime the less file gets saved.

It would be nice to have a tool that can launch both the development server and this script in one go, and it actually doable. It’s called honcho and it accepts a classic Procfile.

For example for loqu4cius this is the Procfile.Dev

web: ./serve.sh
less: watchmedo shell-command --patterns="*.less" --command='./scripts/build_less.py "${watch_src_path}" ' static/less/

launching it with honcho -f Procfile.dev start makes sure to launch the development server, and to recompile and move the file to the collectstatic folder as required in one go, so you can focus on just developing.

Last but not least, I’ve created a quick release script, called release_site.py, which:

  • increases the app.yaml version of the site
  • performs the syncdb in production
  • uploads the site using appcfg.py,
  • commit the modifies app.yaml to the repo
  • tags the repo with the version number

so you can always now which commit refers to which version on googleappengine.

To figure out how to set up the enviroment in a way to have a streamlined development took me a bunch of days, and I’m eager to know other solutions to the same problems!

An overview of Ruby on Rails from a Django user

So Ruby On Rails is worshipped as the best thing to develop a web-application after sliced bread, so I’ve decided to take a look at it. Ruby on Rails use an Model-Controller-View (MVC) pattern, which has the goal to disentangle the way you represent the data with the data itself. It may does not make too much sense, but trust me, it does.

The MVC paradigm is also used in Django, which is a python web-framework which uses the same principle and which I’m familiar with, given the fact I’ve picked it up to develop the SustainableSouk.

So the first thing I needed to do is to map my Python knowledge to Ruby, and my Django knowledge to Rails.

Difference in the languages

Ruby is a dynamic typed language, with an interpreter which takes care of the garbage collector and so on. Both languages are strongly Object-Oriented, and mostly they work in the same way, however there are some little difference which are good to point out: (1) the return value at the end of one method, (2) the difference between puts and print

Talk is cheap, show me the code — Linus Torwalds

The same class in Python and Ruby

As you can see, the languages are amazingly similar at first glance.
Ruby does not care about indentation, but it defines the logical code with a keyword at the beginning (class or def for example), and it closes them with the end keyword.

The other major difference, at this level, is the implicit return of the method. In Ruby every method return the last line. And puts covers the same function of print.

The frameworks

Well, let’s move on on Rails, are we not here for that?
Rails ships with rake tasks, which makes very quick to throw the scaffolding of a project in no time (rails new my_project). This is similar to the django startproject command for djangonauts.

Ruby is all about configuration, and somehow resembles a lot django. You have a controller (view in django) function which gets called when a certain url match a certain pattern, defined in the config.rb (urls.py in django). The main difference is that ruby gives a lot of urls for free using a RESTful scheme.

So the best way to get used with a system is to create something, so I set up a repo to do just that.

The process is quite straightforward, especially if you already know HTML5, CSS, SCSS and Javascript which are assets you can re-use in the rails app. I’ve deployed the demo app on heroku here

Summary

Moving from Python to Ruby, and from Django to Rails is not a too big jump, and I reckon, at least at the beginning when you don’t go deep into other libraries, you can get quite up to speed pretty soon. At the end, it’s more or less the same way of thinking.

P.S.: If you are in Cambridge and interested at Django, HTML5 and so on we have set up a Django Cambridge Meetup and our first event-get-together is going to be on the 17th of October. I hope to see you there.

Permaculture links

It’s been awhile that I’m looking at Permaculture design and all the things connected with it.

Quickly two very valuable links about Permaculture:

Especially the latest link has a clear scheme of the garden.

It’s awesome

scheme from deepgreen

Scheme for the backyard permaculture food forset from deepgreen

Rendering of the food forest

Rendering of the food forest garden from deepgreen

On the other side I’m getting my hands dirty building a system to make local food easier to find/trade/swap with a bunch of friends. We are using django as web development framework and have some serious intention about it.

If you are a django ninja and want to be involved (or just want to be involved) just send me an e-mail to mattions. Attach the gmail.com after it. Please use a sensible subject, so I know why you are writing to me (or link to this post.)

 

© 2017 Train of Thoughts

Theme by Anders NorénUp ↑

By continuing to use the site (scrolling or clicking counts), you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close