Data Format Guide

This document explains how to structure chip identification logs for the E-Waste Reverse Engineering Clinic database. Following this format ensures that data from different contributors can be merged and analyzed consistently.

CSV Structure

All chip logs should be saved as CSV files in the data/ directory with this format:

board_name,chip_type,address_or_id,interface,condition,notes

File Naming Convention

Use this pattern: YYYYMMDD_board-description.csv

Examples:

Column Definitions

board_name (Required)

The make and model of the device or circuit board you are examining.

Format: Manufacturer Model-Number

Examples:

If the manufacturer is unknown, use a descriptive name:

chip_type (Required)

The function or category of the chip.

Common Types:

If you are not sure, use Unknown IC and describe what you can see in the notes field.

address_or_id (Required)

The identifying information for the chip. This varies depending on how you identified it.

For I²C devices:

For SPI devices:

For chips identified by markings:

For USB devices:

If you cannot determine any of these, write unknown and describe what you can see in the notes field.

interface (Required)

The communication protocol or interface used by the chip.

Standard Values:

If the chip uses multiple interfaces, list the primary one or the one you used to identify it.

condition (Required)

The working state of the chip or board.

Standard Values:

Be honest if you did not test it. unknown is a valid and useful data point.

Any additional observations, context, or details that might be useful.

Include:

Examples:

Example Entries

Simple I²C EEPROM

board_name,chip_type,address_or_id,interface,condition,notes
Netgear WNR2000,Memory,0x50,I2C,unknown,8-pin SOIC package marked 24C256

Microcontroller

board_name,chip_type,address_or_id,interface,condition,notes
TP-Link TL-WR841N,MCU,AR9331,SPI,working,Main processor - 400MHz MIPS - QFN package

USB Device

board_name,chip_type,address_or_id,interface,condition,notes
Generic USB Hub,USB Controller,VID:05E3 PID:0608,USB,working,Genesys Logic 4-port hub chip

Unknown Chip

board_name,chip_type,address_or_id,interface,condition,notes
HP Printer Board,Unknown IC,unknown,Unknown,dead,16-pin SOIC with burn marks - no readable markings

Complete Example File

Filename: 20250115_netgear-wnr2000.csv

board_name,chip_type,address_or_id,interface,condition,notes
Netgear WNR2000,MCU,AR9331,SPI,working,Main MIPS processor - 32MB RAM integrated
Netgear WNR2000,Memory,0x50,I2C,unknown,EEPROM 24C256 - 8-pin SOIC - likely stores MAC and config
Netgear WNR2000,Memory,W25Q32BV,SPI,working,32Mb Flash - 8-pin SOIC - stores firmware
Netgear WNR2000,Ethernet PHY,AR8035,Ethernet,working,Single-port Gigabit PHY - 32-pin QFN
Netgear WNR2000,Power Management,AMS1117,Analog,working,3.3V LDO regulator - SOT-223 package

Extended Fields

You can add custom fields for specific use cases, but keep the core fields consistent. Add new columns to the right of the standard ones.

Example with custom field:

board_name,chip_type,address_or_id,interface,condition,notes,salvage_priority
HP LaserJet 1020,Memory,0x50,I2C,working,256Kb EEPROM,high
HP LaserJet 1020,USB Controller,VID:03F0 PID:0517,USB,working,Standard HP interface,low

Using the log_to_csv.py Script

The repository includes a helper script to create properly formatted logs.

python3 scripts/log_to_csv.py

The script will prompt you for:

  1. Board name
  2. Number of chips to log
  3. For each chip:
    • Chip type
    • Address or ID
    • Interface
    • Condition
    • Notes

It will generate a timestamped CSV file in the data/ directory.

Script Example Session

Board name: Netgear WNR2000
How many chips to log? 3

Chip 1 of 3
Chip type: MCU
Address or ID: AR9331
Interface: SPI
Condition (working/dead/unknown): working
Notes: Main MIPS processor

Chip 2 of 3
Chip type: Memory
Address or ID: 0x50
Interface: I2C
Condition (working/dead/unknown): unknown
Notes: EEPROM 24C256

Chip 3 of 3
Chip type: Memory
Address or ID: W25Q32BV
Interface: SPI
Condition (working/dead/unknown): working
Notes: 32Mb Flash

Data saved to: data/20250115_netgear-wnr2000.csv

Data Quality Guidelines

Be Specific

Instead of “chip near the USB port”, write “USB controller chip - GL850G - 28-pin SSOP - 2cm from USB-A connector”.

Be Accurate

If you are not sure, say so. “Possibly an EEPROM based on I2C address 0x50” is better than claiming it is definitely a 24C256.

Be Consistent

Use the same terminology and format as existing logs. This makes the data more useful for analysis.

Be Complete

Fill in all required fields. A partial log with missing data is harder to work with than a complete log that says “unknown” in some fields.

Common Mistakes

Using Quotes in CSV

Do not use quotes unless the field contains a comma:

Wrong:

"Netgear WNR2000","MCU","AR9331","SPI","working","Main processor"

Right:

Netgear WNR2000,MCU,AR9331,SPI,working,Main processor

Right (with comma in notes):

Netgear WNR2000,MCU,AR9331,SPI,working,"Main processor, 400MHz MIPS"

Inconsistent Terminology

Pick one term and stick with it:

Missing Notes

The notes field is optional but incredibly valuable. Even a simple note like “8-pin package” helps the next person identify a similar chip faster.

Submitting Your Data

Once you have created a CSV file:

  1. Test that it loads without errors:
    python3 -c "import csv; list(csv.DictReader(open('data/your-file.csv')))"
    
  2. Add it to the repository:
    git add data/your-file.csv
    git commit -m "Add chip identification log for [board name]"
    git push
    
  3. Open a pull request on GitHub

See CONTRIBUTING.md for complete instructions on submitting data.

Analyzing the Data

The CSV format makes it easy to analyze findings with standard tools.

Count chips by type

cut -d',' -f2 data/*.csv | sort | uniq -c

Find all I²C devices

grep ",I2C," data/*.csv

Load into Python

import csv
import glob

chips = []
for filename in glob.glob('data/*.csv'):
    with open(filename) as f:
        chips.extend(list(csv.DictReader(f)))

# Count by chip type
from collections import Counter
print(Counter(chip['chip_type'] for chip in chips))

Load into a spreadsheet

Open any CSV file in LibreOffice Calc, Excel, or Google Sheets. Use the comma as the delimiter.

Questions?

If you are not sure how to log something or have a case that does not fit this format, open an issue on GitHub or ask at a workshop. We can update this guide based on real-world usage.

The goal is useful data, not perfect data. Do your best and document what you find.

0 object(s)
 

Data Format Guide

Data Format Guide

This document explains how to structure chip identification logs for the E-Waste Reverse Engineering Clinic database. Following this format ensures that data from different contributors can be merged and analyzed consistently.

CSV Structure

All chip logs should be saved as CSV files in the data/ directory with this format:

board_name,chip_type,address_or_id,interface,condition,notes

File Naming Convention

Use this pattern: YYYYMMDD_board-description.csv

Examples:

Column Definitions

board_name (Required)

The make and model of the device or circuit board you are examining.

Format: Manufacturer Model-Number

Examples:

If the manufacturer is unknown, use a descriptive name:

chip_type (Required)

The function or category of the chip.

Common Types:

If you are not sure, use Unknown IC and describe what you can see in the notes field.

address_or_id (Required)

The identifying information for the chip. This varies depending on how you identified it.

For I²C devices:

For SPI devices:

For chips identified by markings:

For USB devices:

If you cannot determine any of these, write unknown and describe what you can see in the notes field.

interface (Required)

The communication protocol or interface used by the chip.

Standard Values:

If the chip uses multiple interfaces, list the primary one or the one you used to identify it.

condition (Required)

The working state of the chip or board.

Standard Values:

Be honest if you did not test it. unknown is a valid and useful data point.

Any additional observations, context, or details that might be useful.

Include:

Examples:

Example Entries

Simple I²C EEPROM

board_name,chip_type,address_or_id,interface,condition,notes
Netgear WNR2000,Memory,0x50,I2C,unknown,8-pin SOIC package marked 24C256

Microcontroller

board_name,chip_type,address_or_id,interface,condition,notes
TP-Link TL-WR841N,MCU,AR9331,SPI,working,Main processor - 400MHz MIPS - QFN package

USB Device

board_name,chip_type,address_or_id,interface,condition,notes
Generic USB Hub,USB Controller,VID:05E3 PID:0608,USB,working,Genesys Logic 4-port hub chip

Unknown Chip

board_name,chip_type,address_or_id,interface,condition,notes
HP Printer Board,Unknown IC,unknown,Unknown,dead,16-pin SOIC with burn marks - no readable markings

Complete Example File

Filename: 20250115_netgear-wnr2000.csv

board_name,chip_type,address_or_id,interface,condition,notes
Netgear WNR2000,MCU,AR9331,SPI,working,Main MIPS processor - 32MB RAM integrated
Netgear WNR2000,Memory,0x50,I2C,unknown,EEPROM 24C256 - 8-pin SOIC - likely stores MAC and config
Netgear WNR2000,Memory,W25Q32BV,SPI,working,32Mb Flash - 8-pin SOIC - stores firmware
Netgear WNR2000,Ethernet PHY,AR8035,Ethernet,working,Single-port Gigabit PHY - 32-pin QFN
Netgear WNR2000,Power Management,AMS1117,Analog,working,3.3V LDO regulator - SOT-223 package

Extended Fields

You can add custom fields for specific use cases, but keep the core fields consistent. Add new columns to the right of the standard ones.

Example with custom field:

board_name,chip_type,address_or_id,interface,condition,notes,salvage_priority
HP LaserJet 1020,Memory,0x50,I2C,working,256Kb EEPROM,high
HP LaserJet 1020,USB Controller,VID:03F0 PID:0517,USB,working,Standard HP interface,low

Using the log_to_csv.py Script

The repository includes a helper script to create properly formatted logs.

python3 scripts/log_to_csv.py

The script will prompt you for:

  1. Board name
  2. Number of chips to log
  3. For each chip:
    • Chip type
    • Address or ID
    • Interface
    • Condition
    • Notes

It will generate a timestamped CSV file in the data/ directory.

Script Example Session

Board name: Netgear WNR2000
How many chips to log? 3

Chip 1 of 3
Chip type: MCU
Address or ID: AR9331
Interface: SPI
Condition (working/dead/unknown): working
Notes: Main MIPS processor

Chip 2 of 3
Chip type: Memory
Address or ID: 0x50
Interface: I2C
Condition (working/dead/unknown): unknown
Notes: EEPROM 24C256

Chip 3 of 3
Chip type: Memory
Address or ID: W25Q32BV
Interface: SPI
Condition (working/dead/unknown): working
Notes: 32Mb Flash

Data saved to: data/20250115_netgear-wnr2000.csv

Data Quality Guidelines

Be Specific

Instead of “chip near the USB port”, write “USB controller chip - GL850G - 28-pin SSOP - 2cm from USB-A connector”.

Be Accurate

If you are not sure, say so. “Possibly an EEPROM based on I2C address 0x50” is better than claiming it is definitely a 24C256.

Be Consistent

Use the same terminology and format as existing logs. This makes the data more useful for analysis.

Be Complete

Fill in all required fields. A partial log with missing data is harder to work with than a complete log that says “unknown” in some fields.

Common Mistakes

Using Quotes in CSV

Do not use quotes unless the field contains a comma:

Wrong:

"Netgear WNR2000","MCU","AR9331","SPI","working","Main processor"

Right:

Netgear WNR2000,MCU,AR9331,SPI,working,Main processor

Right (with comma in notes):

Netgear WNR2000,MCU,AR9331,SPI,working,"Main processor, 400MHz MIPS"

Inconsistent Terminology

Pick one term and stick with it:

Missing Notes

The notes field is optional but incredibly valuable. Even a simple note like “8-pin package” helps the next person identify a similar chip faster.

Submitting Your Data

Once you have created a CSV file:

  1. Test that it loads without errors:
    python3 -c "import csv; list(csv.DictReader(open('data/your-file.csv')))"
    
  2. Add it to the repository:
    git add data/your-file.csv
    git commit -m "Add chip identification log for [board name]"
    git push
    
  3. Open a pull request on GitHub

See CONTRIBUTING.md for complete instructions on submitting data.

Analyzing the Data

The CSV format makes it easy to analyze findings with standard tools.

Count chips by type

cut -d',' -f2 data/*.csv | sort | uniq -c

Find all I²C devices

grep ",I2C," data/*.csv

Load into Python

import csv
import glob

chips = []
for filename in glob.glob('data/*.csv'):
    with open(filename) as f:
        chips.extend(list(csv.DictReader(f)))

# Count by chip type
from collections import Counter
print(Counter(chip['chip_type'] for chip in chips))

Load into a spreadsheet

Open any CSV file in LibreOffice Calc, Excel, or Google Sheets. Use the comma as the delimiter.

Questions?

If you are not sure how to log something or have a case that does not fit this format, open an issue on GitHub or ask at a workshop. We can update this guide based on real-world usage.

The goal is useful data, not perfect data. Do your best and document what you find.