Making a turn

I'm two weeks into the Zipfian Academy Data Science boot camp. I decided to "branch" this blog into this new one. Follow me there in my journey into Data Science.

Configuring my MacBook Retina (MacOSX 10.8.3)

I 've got my new machine and in order to set up my Java / Python development environment from my previous computer I had to spend some  tweaking stuff for Mountain Lion 10.8.3 Below some links that helped me to solve some problems: 1) Updating the SVN to 1.7.9 2) Configuring Apache, php and MySQL

3) Getting  MySQLDB to work After installing it with   easy_install MySQL-pythonI had to follow the solution described in the link below to make it work properly.

A pipeline for biomedical ontology maintenance and release

A pipeline for biomedical ontology maintenance and release: current status and future requirements Carlo Torniai , Matthew Brush and Melissa Haendel Download the PDF Summary: This note describes a maintenance and release pipeline for ontologies that leverage a modular reuse of external ontologies. Here we describe our approach and identify a set of requirements for continued enhancement and integration of external tools. INTRODUCTION One hopes that when creating well-designed ontologies for a certain domain that they will be reused and extended by others. Despite the large effort to promote reuse of existing ontologies, in particular within the biomedical domain (Smith 2007), the tools and methodologies that facilitate multiple ontology integration and reuse are far from being mature. We have previously published on our ontology engineering approach (Torniai 2011) to build modu-lar application ontologies leveraging the MIREOT principle (Courtot 2009). In this process we have recognized the advantages of maintaining synchronization with externally referenced ontolo-gies, specifically to facilitate data currency and therefore interoper-ability. Here, we describe the approach we have developed for an ontology maintenance and release pipeline that takes into account these needs for ontology reuse and interoperability, and identify requirements for available tools to be integrated into this process. METHODS For the ontologies we have developed (eagle-i and CTSAconnect among others), we have used Protégé for editing and we have in-cluded references to external ontology entities using the MIREOT principle. In this context, our approach to develop a maintenance and release pipeline was driven by three requirements: • Facilitate editing, maintenance and release of the ontology even by non-technical people • Reuse available tools • Automate the maintenance and release process as much as possible The requirements above translated into the following decisions: • Use Ontofox (Xiang 2010; as an implementation of the MIREOT principle • Use the OBO Ontology Release Tool (Oort) ( to manage releases. • Define specific organization for the files containing external referenced entities Rather than having all the entities referenced from external ontolo-gies together in one single ‘external.owl’ file, as originally sug-gested in the MIREOT specification (, we create a file for each external ontology source. These files are grouped under a same directory and named according to the fol-lowing convention: [ontology-prefix]-imports.owl (Fig. 1A). For instance the main ontology file for eagle-i (ero.owl) imports all these “imports” files as well as full ontologies such the basic for-mal ontology (BFO), the Relation Ontology (RO), and the Infor-mation Artifact Ontology (IAO) (Fig. 1B). The separation of each import file according to the source ontology was functional to im-plement the workflow for updating MIREOTed classes and proper-ties represented in Figure 1C. A Python script ( generates an On-tofox input file by parsing each imports file, pulling classes and their subClassOf information and defining custom preferences about how to map annotations on these classes to appropriate IAO annotation properties . The generated Ontofox input file is used by another Python script (, which ac-cesses Ontofox programmatically to return an updated owl file. The MIREOTed entities update pipeline is the first step of our release workflow: it generates all the updated files used by the Oort tool, which generates all the ontology release files (see for an example). [caption id="attachment_391" align="alignleft" width="243" caption="Fig. 1. A) The base directory structure for the ero ontology with all the imports files. B) The chain of imports of the main ontology .C) Release pipeline: MIREOTed entities (classes and properties) update and usage of Oort"]Fig. 1. A) The base directory structure for the ero ontology with all the imports files. B) The chain of imports of the main ontology .C) Release pipeline: MIREOTed entities (classes and properties) update and usage of Oort [/caption] DISCUSSION While implementing this pipeline, we identified a set of feature requests for Ontofox (e.g. the need to provide within the output file a list of classes that are no longer in use in the referenced source ontology) and Oort (e.g. the need for referencing local ontology files through the catalog xml file generated by Protégé) that were promptly implemented by the respective developer teams. We could have used a similar workflow while maintaining a single file for the MIREOTed entities. However, having a single file for each source ontology was advantageous because it is easier to manage and synchronize external entities with each ontology source and it facilitates parallel editing. Use of a single, large owl file to hold all the external entities posed an issue for editors be-cause it was difficult to manage changes from the sources that are all in different locations and follow different release schedules. Moreover there are cases in which there is the need to apply MIREOT differently according to source ontology. For instance, each source uses a different set of annotation properties to record class metadata, and specification of mappings for non-IAO proper-ties needs to reflect these differences. There are also cases where developers will want to specify different “levels” of MIREOT for different source ontologies, for example where class axioms need to be imported along with annotation properties. We have found this flexibility important in our reuse of classes from the Ontology of biomedical Investigations. Using separate import files for each source ontology has been essential for facilitating such needs for customizing MIREOT principle for specific source ontologies. An interesting consideration about our design pattern is that the minimal information about the imported entities (the content of the external.owl file defined in the original MIREOT specification), is now specified in the Ontofox input file generated in our pipeline. It would be extremely useful if Ontofox would support as input the actual owl import files, or at least an owl file with the following information: entity to import, entity superclass in the target ontolo-gy, annotations to specify the “level” of MIREOT desired (simple annotation, full closure, and so on). While it is true that different ontologies have different develop-ment strategies and different teams (and skills) involved in the development, there are some common requirements for mainte-nance and release tools that we have identified across several on-tology efforts. One such issue is that not all the ontologies are available in Ontofox or from a single SPARQL endpoint. For in-stance, we use the VIVO ontology (http: and the Ontology for Clinical Research (OCRe; for which we need to maintain a separate updating process. Therefore, anoth-er important feature for maintenance and release tools would be to be able to configure different SPARQL endpoints (or other access means) for different ontology sources. Another important issue is the need to have reporting about what has been updated and what has been changed during the update process of MIREOTed entities. Currently, we run our tests (com-paring the original and the updated imports file) through a custom set of Java scripts and enhancements to this end have been request-ed to Oort developers. Finally, we believe that in addition to Oort and Ontofox, Protégé (http: should play a key role in an integrated pipeline of ontology maintenance and release. It would be great if Protégé generated the proper file structure for imported entities and performed the updates (through a plug-in that could, for instance, programmatically access Ontofox). To this end, we are providing requirements to the Protégé team in a coordinated initiative together with other efforts (GO (, the OBI consortium (, VirtualFlyBrain ( and others). CONCLUSION The design pattern and the maintenance and release pipeline we have described has been adopted successfully for the Reagent On-tology (ReO, and for the new version of Common Anatomy Reference Ontology (CARO, Longer term, such a pipeline should be integrated into existing tools and we will continue to feed our requirements to Oort, Onto-fox and to the Protégé team as well. If these tools will provide a flexible level of customization the biomedical ontology community will be able to have more automated, standardized reliable and widely usable maintenance and release pipeline for building modu-lar ontologies using the MIREOT principle. ACKNOWLEDGEMENTS The authors wish to thank, in addition to the the eagle-i consorti-um, the Ontofox and Oort developer teams - in particular Allen Xiang, He Yongqun, Chris Mungall, and Heiko Dietze for their support and prompt responsiveness in addressing our requests. Funding: This work was supported by the National Institutes of Health and the American Recovery & Reinvestment Act [grant number 1U24RR029825-01]. REFERENCES

Best albums of 2012

It's that time of the year again.... As usual this lis the list of  the records i've listened the most... not the most "cool" records that you never listen to. Enjoy! And happy 2013 everyone! Best Album of 2012 1  Andrew Bird - Break it yourself 2  Grizzly Bear - Shields 3  The Lumineers - The Lumineers 4  Sufjan Stevens  - Silver & Gold* 5   The Walkmen  - Heaven Best Live performance 0 - Sufjan Stevens - Surfjohn Stevens Christmas Sing-A-Long: Seasonal Affective Disorder Yuletide Disaster Pageant on Ice It is a category of his on… not just a concert. It probably doesn't belong here. Just have a look at this to have an idea 1 -  Grizzly Bear 2 -  Springsteen (live in Portland). Way better then the shows in LA and Chicago. 3 -  Andrew Bird For more live bits of this 2012 check my channel Best album from the past: We are the Augustines - Rise Ye Sunken Ships Worst Album of the year: Mumford and Sons - Babel * I've been so obsessed with this later that if i had it before November it would have been the album i've listened the most...

Installing Fuxi on Mac OS X 10.7.3

I have been trying to install Fuxi on my Mac (run-on Mac OS X 10.7.3. and Python 2.7.2 as default). I've tried unsuccessfully to run the simple script: wget python2.5 easy_install-2.5 fuxi But no luck. So I had to build everything form scratch. Below the steps to get Fuxi work. Note that everything should be bind to Python 2.5 ( i've made few changes from the script in this post). *This script requires that ez_setup, mercurial (hg)and subversion (svn) are installed. *Sources will be installed in the working directory.sudo easy_install-2.5 pyparsing svn export  # create local copy of layercakecd layercake-python sudo python2.5 build sudo python2.5 install cd .. hg clone fuxi   # clone FuXi from repository using Mercurialcd fuxi sudo python2.5 build sudo python2.5 install Then of course I had to force your dev environment  (EasyEclipse in my case) to use the 2.5 Interpreter. Things now are working fine.

Lion and pycurl

Just for future memory: When upgrading to Lion (the following works for 10.7.2 and 10.7.3)  make sure: 1) Re-install Dev tools. Note that you will need to install the command line tools going to XCode > Preferences > Download. 2) Make sure you have a latest version of Python (you can download  dmg form 3) If you are running 10.3.2 :Fix the compiler! Make sure you are linking to the proper gcc. A quick and dirty fix below: sudo ln -s /usr/bin/llvm-gcc-4.2 /usr/bin/gcc-4.2 (See this post) The previous step isn't necessary anymore if using Xcode 4.2  and Lion 10.7.3 you need to MANUALLY install the command line tools. In order to do that open Xcode, go to preferences and Download (see pic before) and click install on Command line tools (see pic below) xtools 4) Running the following (note I made sure to use the easy_install version /usr/bin): sudo env ARCHFLAGS="-arch x86_64" /usr/bin/easy_install setuptools pycurl==7.19.0

Best Albums of 2011

bestof2011I took me more than usual to write the traditional post about the best albums of the year. My memory fails me ... it's a sign of age i guess :-) Anyway it took me a while to pick number 3 to 5 and also to go through all the concerts I've seen to choose the best performances. Anyway here they are.... Best Albums of 2011 1. The Roots - Radical Face (read my review) 2. Father, Son,  Holy Ghost - Girls 3. Belong - The Pain of Being Pure at Heart 4. Lupercalia - Patrick Wolf 5. Last Night on Earth - Noah and the Whale Honorable Mention - The People's Key - Bright Eyes Best Album from the past Astro Coast - Surfer Blood Best Live performance 1) The Cure: Reflections 2) Bon Iver 3) Bright Eyes I've seen many shows this year (at least on this I did better than last here.. you can see some on my youtube channel here)

Ben Cooper: A journey of runaways and returns

ben_cooper_radical_facer400 My review of the Roots from Radical face has been published. Read it here. Italian version coming soon.

The Ides of March: the betrayal of a life without forgiveness

The ides of MarchThe movie - directed by George Clooney and based on a Beau Willimon's Broadway play Farragut North - tells the story of Stephen Myers (Ryan Gosling) who works, under Paul Zara (Philip Seymour Hoffman), as junior campaign manager for the Democrat presidential candidate Governor Mike Morris (George Clooney).

Steve, while starting a relationship with a camping intern, is offered to switch side and go to work for the rival Campaign Manager Tom Duffy (Paul Giamatti), This will trigger a chain of events that will change the lives of the characters forever.

You may think that this is yet another political movie with the usual cliché but you ‘d be terribly wrong: politics is the magnifying lens through which a bigger drama then presidential elections is depicted.

In fact what makes you uneasy and almost hurts you physically, while you are caught in the plot and the climax of the movie, aren't the terrible mistakes and betrayals, or the striking contrast between the ideal and the impossibility to live up to it, but it's rather the impossibility of forgiveness, the absolute absence of a merciful sight.

Each character is profoundly alone with his secrets, errors, betrayals and the weight of them. There is no escape from this deadly condition where only T.S. Elliot's "usury, power and lust" seem to prevail. Even ideals or religion are empty words (the constitution as religion in the opening scene of the movie) or reverent formalism (the desperate speech of a "catholic" father facing the death of his child) they cannot touch or change life.

And the solution isn't telling the truth (so much so that the movie won’t tell if Steve will reveal it), the way out is not to condemn or unveil mistakes and secrets, trying to make things right. This won't fill the void of what is missing nor make you happy: when Steve seems to have obtained - at an incredible high price - what it wants, he stands sad among the crowd while listening to the speech that declares his success.

There is a need of something else that is missing. This absence and its terrible consequences are so well depicted in the movie that makes it - though painful - absolutely worth watching.

This great movie is not about the dirty rules and cynicism of politics, it's about something bigger than that: it's about the despair of a life where forgiveness and mercy are impossible.

A life that often, with the complicity of the powers that be, we condemn ourselves to live, but that, fortunately, is not the only possible one.

Php, curl and SPARQL: turning shell scripts into (simple) dynamic web pages

We have SPARQL endpoints requiring http authentication and for my own needs I usually write shell scripts using curl to execute SPARQL queries and to save results in files to be handed to the team responsible for data QA.

I wanted to provide some simple tool for data QA folks (which have no technical knowledge of RDF, OWL or SPARQL whatsoever) to allow them to select and run a set of precompiled SPARQL queries without my intervention.

tool interface Since most of my script are using cURL I just decided to develop php pages using cURL to access the SPARQL service.

The process was pretty straightforward: I just had to enable curl form the php.ini and experiment a little :-)

The only issue I encountered was related to the encoding of double quotes included in some SPARQL queries posted  from an HTML form on my godaddy server. It took me a while to realize that that was the issue since on my local Apache  (working on MAX OS) everything worked fine. I've solved the issue with magic quotes and using the &quote; in the form field.

Below the php code that gets the parameters posted form an html form (username,  password, institution, query_value) to access the proper SPARQL service.

<?php // test for double quotes if (get_magic_quotes_gpc()) { function stripslashes_gpc(&$value) { $value = stripslashes($value); } array_walk_recursive($_GET, 'stripslashes_gpc'); array_walk_recursive($_POST, 'stripslashes_gpc'); array_walk_recursive($_COOKIE, 'stripslashes_gpc'); array_walk_recursive($_REQUEST, 'stripslashes_gpc'); } //extract data from the post extract($_POST); //get parameters values $institution = $_GET['institution']; $url = 'https://'.$institution.'sparql.service.url; $view_value='user'; $format_value='text/html'; $query_value=$_GET['query_value']; $user=$_GET['user']; $password=$_GET['password']; $auth = $user.':'.$password; $fields = array( 'view'=>urlencode($view_value), 'query'=>urlencode($query_value), 'format'=>urlencode($format_value) ); //url-ify the data for the POST foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value.'&'; } rtrim($fields_string,'&'); //open connection $ch = curl_init(); //set the url, number of POST vars, POST data curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_USERPWD, $auth); curl_setopt($ch,CURLOPT_POST,count($fields)); curl_setopt($ch,CURLOPT_POSTFIELDS,$fields_string); //execute post $result = curl_exec($ch); //close connection curl_close($ch); ?>

A side note: I went trough this awesome tutorial for enabling SAPRQL queries from Drupal using  ARC2 and a bunch of Drupal modules. I made it work but that was way harder then the php scripting ;-) Few more notes about this sometimes soon.

Latest from Flickr

			<p><a href=\"\">carlotorniai</a> posted a photo:</p>	<p><a href=\"\" title=\"\"><img src=\"\" width=\"240\" height=\"180\" alt=\"\" /></a></p> 			<p><a href=\"\">carlotorniai</a> posted a photo:</p>	<p><a href=\"\" title=\"\"><img src=\"\" width=\"240\" height=\"179\" alt=\"\" /></a></p> 			<p><a href=\"\">carlotorniai</a> posted a photo:</p>	<p><a href=\"\" title=\"\"><img src=\"\" width=\"240\" height=\"179\" alt=\"\" /></a></p> 			<p><a href=\"\">carlotorniai</a> posted a photo:</p>	<p><a href=\"\" title=\"\"><img src=\"\" width=\"240\" height=\"179\" alt=\"\" /></a></p> 			<p><a href=\"\">carlotorniai</a> posted a video:</p>	<p><a href=\"\" title=\"\"><img src=\"\" width=\"240\" height=\"135\" alt=\"\" /></a></p> 			<p><a href=\"\">carlotorniai</a> posted a video:</p>	<p><a href=\"\" title=\"\"><img src=\"\" width=\"240\" height=\"135\" alt=\"\" /></a></p>