Netatmo: converting CSV data into Line Protocol and importing into InfluxDB

Screenshot of influxdb data explorer

Today, I will continue the articles dedicated to the Netatmo personal weather station and teach you how to import the downloaded data in CSV format into an InfluxDB database. This tutorial assumes that you have downloaded the data as shown in the previous blog post and have the files in CSV format on your computer.

I created a virtual machine to store and analyze the data and installed Ubuntu 23.10 on it. Unfortunately, the latest version of Ubuntu does not work well in the virtual machine, likely due to graphics card driver issues. I configured the virtual machine to use 6 processor cores, 32 GB of RAM, and a 150 GB SSD. This configuration is more than sufficient for my needs.

I then installed InfluxDB version 2.7.6 following the instructions on the InfluxDb website. From the InfluxDB web interface, I created an organization named InfluxDB and a bucket named Netatmo.

I know that version 3.0 of InfluxDB has also been released, but it is unavailable for local installation, and I do not want to use their cloud solution just for a few small experiments. Therefore, I will stay on version 2.7.6 for a while and plan to update the code when the new InfluxDB 3.0 Edge version becomes available.

Next, I installed Anaconda, following the installation guides. For this tutorial, I am using conda 24.1.2. Following the Anaconda tutorials, I created an environment named influxdb and set it to activate automatically. The Python version I am using is 3.11.9.

In the past, I have had compatibility issues with old Python code that no longer works with current modules. It's a royal pain to make such a code work.

To make your life easier if time passes and the software changes too much, I will mention here the modules used by my code and their versions:

  • jupyterlab 4.0.11
  • matplotlib 3.8.4
  • numpy 1.26.4
  • pandas 2.2.2
  • pytz 2024.1
  • scikit-learn 1.4.2
  • scikit-learn-intelex 2023.1.1
  • scipy 1.13.1
  • influxdb-client 1.43.0

After fulfilling these prerequisites, we can convert the CSV files into the Line Protocol format used by InfluxDB.

Converting Netatmo Data to Line Protocol

To better understand the conversion process, I will list below the first ten lines from a CSV file downloaded from Netatmo:

"Living room";26.004168;44.452849;Outdoor;Outdoor
Timestamp;"Timezone : Europe/Bucharest";Temperature;Humidity
1717189436;"2024/06/01 00:03:56";24.1;56
1717189692;"2024/06/01 00:08:12";24.1;56
1717190000;"2024/06/01 00:13:20";24.1;57
1717190308;"2024/06/01 00:18:28";24;56
1717190615;"2024/06/01 00:23:35";24;56
1717190872;"2024/06/01 00:27:52";24;56
1717191179;"2024/06/01 00:32:59";24;56

The first two lines of the file contain metadata. Unfortunately, the "Name" field is populated with the name of the base station, in my case, "Living room," and not with the station's name as it appears in the Netatmo application and website. I want to have the sensor location in InfluxDB as a _measurement field, which is not found in the metadata, as shown in the Line Protocol code below:
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=24.1,Humidity=56 1717189436
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=24.1,Humidity=56 1717189692
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=24.1,Humidity=57 1717190000
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=24.0,Humidity=56 1717190308
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=24.0,Humidity=56 1717190615
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=24.0,Humidity=56 1717190872
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=24.0,Humidity=56 1717191179
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=24.0,Humidity=56 1717191538
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=23.9,Humidity=56 1717191846
Rosu,lat=44.452849,long=26.004168,ModuleType=Outdoor,Timezone=Europe/Bucharest Temperature=23.9,Humidity=56 1717192153

In the above example, "Rosu" is the name of the village where I live. It is close to Bucharest, the capital of Romania, and is more of a commuter suburb than an idyllic village. For privacy reasons, I have modified the sensor's geographical location, placing it in the middle of a nearby forest.

A small note: the outdoor module only records temperature and humidity. Pressure is recorded only by the base station; if we ever need it, it must be imported separately. The data recorded by the base station, including pressure, will have a different timestamp than the data recorded by the outdoor module.

Considering all these aspects, my proposed code does the following:

1. Creates the variable `measurement = Rosu` for the measurement name.
2. Lists all files with the CSV extension in the current folder.
3. For each CSV file:
   - Checks if the first line of the file is `Name;Long;Lat;ModuleName;ModuleType`. We will process only files containing this text on the first line.
   - Extracts timezone, geographical coordinates, and a tag with the sensor name (e.g., Outdoor) from the metadata.
   - Reads each line of code and extracts data regarding temperature, humidity, and timestamp.
   - Takes the first timestamp from the CSV file, converts it to local time, and extracts the month and year. We will use the time and the `_measurement` variable to name the files in Line Protocol format.
   - Creates the destination file in the format `measurement_year_month.lp`.
   - Starting from the fourth line of the CSV file, the data is converted into Line Protocol format and written into the destination file.
   - Closes the files at the end.

Importing Line Protocol Files into InfluxDB

Now that we have the data converted into Line Protocol format, we must import them into InfluxDB. For this, I wrote a small shell script to handle the import:

Copy this code into the folder where the Line Protocol format data is located. Edit the authentication and InfluxDB configuration details, then make it executable and run it with the command ./lp2influxdb.sh.

This script will import all files with the .lp extension into InfluxDB. At the end, all data will be available in InfluxDB, and you can explore it using the InfluxDB web interface.

Importing a Single CSV File

To import a single CSV file, remove the part of the code that lists CSV files and manually specify the file name. The loop iterating through the files is no longer needed.

Below is the modified code:

The shell script that imports data into InfluxDB is also slightly modified to accept the LP file name as an argument:

Copy this code into a file in the location where the Line Protocol format data is stored, edit the authentication and InfluxDB configuration details, then make it executable and run it with the command `./lp2influxdb_single.sh file.lp`.

Now, we have the data into InfluxDb, we can move forward to analyze it, but this will be the subject of a future blog post.

Post a Comment