Dataset — Industrial3D

Semantic Classes

12-Class MEP Taxonomy

Classes organized by frequency tier. Tier 1 dominates 77% of labeled points; Tier 3 spans 7 classes totaling only 3%.

Tier 1 — Head ≈ 77% of all points

Pipe / Pip

Cylindrical conduits for fluid transport across all diameters.

Rectangular Beam / RBeam

Rectangular structural steel support members and cable trays.

Duct / Dct

Rectangular air-handling conduits for HVAC and ventilation.

Tier 2 — Common ≈ 20% of all points

I-Beam / IBeam

I-shaped structural steel beams for primary structural support.

Tank / Tnk

Large cylindrical or rectangular water/chemical storage vessels.

Tier 3 — Tail ≈ 3% of all points · 7 classes · 215:1 imbalance vs. RBeam

Flange / Flg

Circular disc connectors joining pipe sections at bolted joints.

Valve / Val

Flow-control devices (gate, ball, butterfly) on pipelines.

Pump / Pmp

Mechanical devices for moving fluid through the system.

Strainer / Str

In-line filtration devices preventing debris from entering equipment.

Elbow / Elb

45° and 90° curved pipe connectors for directional changes.

Tee / Tee

T-shaped junctions splitting or combining pipe flows.

Reducer / Rdr

Concentric or eccentric fittings for pipe diameter transitions.

All 12 semantic classes illustrated with point cloud examples

⤢ Click to enlarge

All 12 semantic classes illustrated with representative point cloud examples at 6 mm TLS resolution.

Statistics

Distribution & Scale

Long-tail class distribution with a 215:1 head-to-tail ratio — 3.5× more severe than S3DIS.

Class distribution across 3 tiers showing 215:1 imbalance

⤢ Click to enlarge

Long-tail distribution of 612.7M labeled points across 3 tiers. The 215:1 head:tail ratio drives the dual crisis.

Dataset At a Glance

Total labeled points	612.7 M
Raw scan points	2.3+ B
TLS resolution	6 mm @ 10 m
Ranging accuracy	4 mm @ 10 m
Point attributes	XYZ + RGB + I
Annotation effort	754 person-hours
Class imbalance ratio	215 : 1
Scale vs. closest MEP	6.6× larger

Acquisition

Captured using a Leica BLK360 terrestrial laser scanner in operational water treatment facilities in Hong Kong. Each room was scanned from multiple stations (2–8 per room) to minimise occlusion. Raw scans were registered using Leica Cyclone, then merged and manually annotated in CloudCompare.

Scene Gallery

Representative Scenes

20 unique rooms across 13 areas. 4 representative scenes shown with ground-truth annotations. Click any image or video to view it full-screen.

Scene gallery: 4 representative Industrial3D scenes

⤢ Click to enlarge

Scene gallery: 4 representative rooms showing annotated point clouds from training and test areas.

TRAIN

⤢ Enlarge

Area 2 — Service Gallery

Largest scene at 79.6M points. Highest MEP density with dense pipe networks, cable trays, and structural steelwork.

TEST

⤢ Enlarge

Area 12 — SPH Pump Room

Primary test scene. Compact equipment room with pump arrays, valves, and dense pipe connections — representative of test-set difficulty.

TEST

⤢ Enlarge

Area 6-1 — 93m Psu Room

Moderate complexity with large pump sets and parallel pipe runs. Test set area with both head and tail class instances.

TRAIN

⤢ Enlarge

Area 3 — 93m Tank Area

Large-scale tank structure showing geometric diversity between tanks, pipe bundles, and structural steelwork at facility scale.

⤢ Click to enlarge

Additional scenes from the dataset showing the diversity of industrial environments across the 13 areas.

Data Splits

Train / Validation / Test Protocol

Area-based splits following S3DIS convention to prevent data leakage between rooms of the same facility.

Split	Areas	Points	% Total	Description
Train	1, 2, 3, 4, 5, 7, 8, 10, 11, 13 (10 areas)	527.8 M	86.1%	Primary training set. Includes the largest area (Area 2, 79.6M pts).
Val	9 (OSCG) (1 area)	15.1 M	2.5%	Validation area used for hyperparameter tuning.
Test	6 (93m Psu/VSPA-1) + 12 (SPH) (2 areas)	84.9 M	13.9%	Held-out test set. Evaluated via vote-based smooth testing (test_smooth = 0.95).

Evaluation Metric

Mean Intersection over Union (mIoU) across all 12 classes. Vote-based evaluation with test_smooth = 0.95 is the standard protocol. Per-class IoU is reported for all 12 classes to expose failure modes on tail classes (Reducer, Strainer, Pump, etc.).

Download

Access the Dataset

📄

Paper (arXiv)

Read the full paper including dataset construction protocol, annotation methodology, benchmark results, and analysis.

arXiv:2603.28660 →

⎘

GitHub Repository

Benchmark code, PyTorch implementations, training configs, and evaluation scripts. Pre-trained models released upon acceptance.

github.com/PointCloudYC/Industrial3D →

🗂

Preview Materials

RGB + ground-truth videos and renders for 4 representative rooms. Available on Google Drive now.

Google Drive →

📦

Full Dataset

20 room scenes, 612.7M labeled points in .txt + .ply format (XYZRGB + class labels). Released upon journal acceptance.

Coming Soon

Data Formats

The dataset will be provided in two formats:

.txt — space-delimited: X Y Z R G B label per point
.ply — binary PLY with vertex attributes: x y z red green blue label

Labels are integer class IDs (0–11) matching the order: Duct, Elbow, Flange, IBeam, Pipe, Pump, RBeam, Reducer, Strainer, Tank, Tee, Valve.

Industrial3D Dataset

12-Class MEP Taxonomy

Distribution & Scale

Representative Scenes

Train / Validation / Test Protocol

Evaluation Metric

Access the Dataset

Data Formats