News & Blog

🔬 Transform your computational chemistry!

Sorry for the long silence.. moving my whole toolchain!

DecorativeLately, it has been quite silent here on my blog. I have been moving my whole workflow and toolchain to a new paradigm, so I put the production of new content on hold until I started to get a working implementation. This is now slowly coming together, but it is still not quite there yet. For now I will minimally add new information here, and later on transition all the current content to the new system. Here are a few brief snippets of information.

I was really fascinated by discovering the full depth of possibilities with the Emacs text editor. I was in particular inspired by the fantastic materials put online by David Wilson on his System Crafter's website. My Emacs-workflow approach extends also to usage in computational sciences, for instance using notebook-like literate programming approaches (org-mode tangling in Emacs jargon) for my daily research work. The talk (in French only, sorry) on Esthétique et Notebook (Emacs) by Nicolas Rougier is extremely inspiring in this respect.

Such a change in my day-to-day working method has many ramifications on very diverse topics such as managing your dotfiles, taking notes (I am re-balancing from Evernote towards Emacs org-roam now), programming IDEs (doing Unity development within Emacs!), generating my contents (website, blog etc.) and many more. So far, I did a first test of linking my original org file blog entries to my website workflow, where I still use (at least for the present one) RapidWeaver. Within RW, I use mostly Markdown, or some simple plain html. The org format in Emacs can efficiently be changed into either of those languages, Markdown or html. So this org->html->RW chain will do temporarily for now. In the longer run I want to achieve more automation.

Unfortunately, I did not manage to script RapidWeaver (e.g. for generating a new blog post from the Emacs export). I experimented with another tool, at least for the blog part: LazyBlorg. It is both very powerful and customizable and ties in very well with org-mode. For now I am experimenting on my personal blog, going by the name B@amCode#. There I will report much more about my Emacs meanderings, tools and workflows to streamline tasks with my whole new toolchain. So if you are interested in these more technical bits, have a look there! On the present blog here, I will concentrate on scientific research and related topics.

Taming the Wild West of Data Management: 4 Tips for Organizing Your Dataset List

Managing a list of datasets as a researcher can be a bit of a challenge, especially when it comes to citing them properly. Unlike a bibliography of publications, there are currently very few tools available to help with this task. In these early days of data management, it can be a bit complicated to keep track of all the datasets you have produced, want to use or are interested in.

One approach to organizing your datasets is to collect their doi identifiers from publicly accessible repositories such as OpenAIRE. There is a convenient CSV export functionality, for instance. You can then use a code like the doiclient python tool contributed by Jonathan Barnoud to retrieve the metadata for this doi list. It uses the nice Crosscite citation formatter. From there, you can extract for instance a bibtex bibliography of all your datasets. With such a bibliography you can then use tools such as pybtex to format the metadata into markdown or html for inclusion on your website.

One potential difficulty you may encounter is with figshare, which is a popular platform for sharing datasets. Many datasets on figshare do not have their own doi, only the doi of the publication they refer to. This can make it difficult to properly process and cite these datasets.

It would be great if there were a data management software with a catalog similar to the ones we have for publications, such as Zotero, but more specific to data. By that I mean the ability for instance to dynamically update, in case there is a new version of the dataset, and also not duplicate different versions of a given dataset for instance. Unfortunately, such a tool does not seem to exist yet, but it would certainly be a welcome addition to the data management landscape.

In the meantime, it is important to do your best to properly cite and organize your datasets. If you have any feedback or suggestions on how to improve this process, please don't hesitate to share it. Here is a link to the new datasets page on my website, where you can see the results of my efforts.

In summary, I recommend to

  1. collect and use doi identifiers for your datasets whenever possible. Maybe a few alternative identifiers such as figshare id, as well, in some cases

  2. automate the treatment of your dois with existing tools such as doiclient or crosscite that allow you e.g. to retrieve a bibtex bibliography of your data

  3. use tools such as pybtex to manage the bibtex conversion to any desired format, including html and markdown

  4. keep a lookout for a data reference management tool that would simplify and streamline these tasks


A new website arises!

I am happy to announce that I have started the new design for my website! I look forward to showing you the fresh look and improved usability that the new design will bring. Stay tuned for more updates as the new design becomes available.

The first thing I am implementing is the blog. I am experimenting with the options to customize a nice experience (so I hope at least). You can leave feedback here (using Disqus) if you have suggestions or comments.
Read More…