Lesson 5.4: Communication Design Patterns

You now know how to build topics (continuous streams), services (request/response), and custom interfaces (type-safe communication). But how do you put them together into a real system?

In this lesson, you'll learn communication architecture patterns—how real robots design systems that coordinate multiple nodes, each with a specific job. You'll see how NASA designs rover communication, how warehouse robots organize sensor data, and how humanoid robots orchestrate body control.

The key insight: Good architecture is invisible. You don't notice it until something breaks.

The Decision Framework: Topics vs Services (Revisited)

Let's formalize the decision you've been making intuitively:

START: Do we need to send data?
  └─ YES
       └─ Does the receiver need to respond/confirm?
            ├─ NO  →  Use TOPIC (fire and forget)
            └─ YES
                 └─ Can the sender block while waiting?
                      ├─ YES  →  Use SERVICE (synchronous)
                      └─ NO   →  Use ACTION (Module 2)

Real examples:

Scenario	Pattern	Why
Camera publishes images 30x/second	Topic	Continuous, no response needed
Battery publishes status every 2 sec	Topic	Continuous stream
Move robot forward 1 meter	Service	Needs confirmation, relatively rare
Enable/disable robot	Service	Command with response, infrequent
Publish odometry (position updates)	Topic	Continuous, multiple subscribers
Get current battery level	Service	Query response, on-demand

Decision tree for robotics:

Sensor reading? → TOPIC
Movement command? → SERVICE
Emergency stop? → SERVICE (fast, synchronous)
Telemetry data? → TOPIC
Configuration change? → SERVICE
Motor speed updates? → TOPIC
Pick object (on/off)? → SERVICE

Architectural Pattern 1: Layered Communication (Hub and Spoke)

Most robots have a central coordinator that talks to specialized modules.

                    ┌─────────────┐
                    │  Main       │
                    │  Coordinator│
                    └──────┬──────┘
            ┌───────┬──────┼──────┬────────┐
            │       │      │      │        │
        SUBSCRIBE SUBSCRIBE SUBSCRIBE PUBLISH
        TO STATUS  TO SENSOR TO BATTERY
            │       │      │      │
         ┌──▼──┐ ┌─▼──┐ ┌─▼──┐ ┌─▼───┐
         │Motor│ │Arm │ │Head│ │Legs │
         │Ctrl │ │Ctrl│ │Ctrl│ │Ctrl │
         └─────┘ └────┘ └────┘ └─────┘

Pattern:

Central node subscribes to status from all modules
Central node sends commands to specific modules via services
Modules continuously publish sensor data via topics
Emergency stops use dedicated high-priority topic

Advantage: Coordinator has complete picture. Modules are independent.

Disadvantage: Central node becomes bottleneck for high-frequency updates.

Architectural Pattern 2: Distributed Pub/Sub (Many-to-Many)

When you don't need a central coordinator:

┌────────────┐    ┌─────────────┐    ┌──────────┐
│  Sensors   │    │  Planning   │    │  Actuators
│            │    │             │    │
│ Publishes: │    │ Subscribes: │    │ Subscribes:
│ - camera   │────▶ - camera    │    │ - motion_cmd
│ - lidar    │    │ - lidar     │    │
│ - imu      │    │             │    │ Publishes:
│            │    │ Publishes:  │    │ - motor_state
│            │    │ - motion_cmd────▶
│            │    │ - gesture   │    │
│            │    │             │    │
└────────────┘    └─────────────┘    └──────────┘

Pattern:

Each node publishes its output data
Each node subscribes to the data it needs
No central coordinator
Nodes are loosely coupled

Advantage: Scales well, any node can come/go, no bottleneck.

Disadvantage: Hard to debug (complex interconnections).

Architectural Pattern 3: Request/Response Clusters

Use services for discrete tasks:

┌──────────────────────────────────────┐
│          Robot Main Loop             │
│                                      │
│  1. Call /get_sensor_reading         │
│  2. Call /plan_motion                │
│  3. Call /execute_command            │
│  4. Call /report_status              │
└────┬─────────┬────────────┬──────────┘
     │         │            │
     │         │            │
   SERVICE   SERVICE      SERVICE
     │         │            │
  ┌──▼──┐  ┌──▼──┐      ┌──▼──┐
  │Sensor│  │Planner   │Motor │
  │Server│  │Server    │Server│
  └──────┘  └────────┘  └─────┘

Pattern:

Main loop orchestrates via service calls
Services execute discrete tasks
Synchronous, deterministic

Advantage: Clear control flow, easy to sequence operations.

Disadvantage: Slower (services wait for response).

Real Example: Two-Node Communication

Let's build a complete system with two nodes:

Sensor Node: Publishes sensor readings and responds to queries
Control Node: Subscribes to sensors and commands them

Interface Package (Shared)

Create interfaces (my_robot_interfaces):

msg/SensorReading.msg:

builtin_interfaces/Time timestamp
string sensor_name
float64 value

srv/GetSensorData.srv:

string sensor_id
---
float64 latest_value
string status

srv/SetMotorSpeed.srv:

string motor_id
float64 speed_rpm
---
bool success
string message

Sensor Node

import rclpy
from rclpy.node import Node
from my_robot_interfaces.msg import SensorReading
from my_robot_interfaces.srv import GetSensorData

class SensorNode(Node):
    def __init__(self):
        super().__init__('sensor_node')

        # Publish continuous sensor data
        self.publisher_ = self.create_publisher(
            SensorReading, 'sensor_data', 10)

        # Respond to queries
        self.service = self.create_service(
            GetSensorData,
            'query_sensor',
            self.query_callback)

        # Simulate sensor data
        self.temperature = 25.0
        self.timer = self.create_timer(1.0, self.publish_sensor)

    def publish_sensor(self):
        """Publish sensor readings continuously."""
        msg = SensorReading()
        msg.timestamp.sec = 0  # Simplified
        msg.sensor_name = 'temperature'
        self.temperature += 0.1  # Simulate gradual increase
        msg.value = self.temperature

        self.publisher_.publish(msg)
        self.get_logger().info(f'Published: temperature={msg.value:.1f}')

    def query_callback(self, request, response):
        """Respond to sensor queries."""
        self.get_logger().info(f'Query for sensor: {request.sensor_id}')

        if request.sensor_id == 'temperature':
            response.latest_value = self.temperature
            response.status = 'OK'
        else:
            response.latest_value = 0.0
            response.status = 'UNKNOWN_SENSOR'

        return response

def main(args=None):
    rclpy.init(args=args)
    node = SensorNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

Control Node

import rclpy
from rclpy.node import Node
from my_robot_interfaces.msg import SensorReading
from my_robot_interfaces.srv import GetSensorData, SetMotorSpeed

class ControlNode(Node):
    def __init__(self):
        super().__init__('control_node')

        # Subscribe to continuous sensor stream
        self.subscription = self.create_subscription(
            SensorReading,
            'sensor_data',
            self.sensor_callback,
            10)

        # Create clients for on-demand queries
        self.sensor_client = self.create_client(
            GetSensorData, 'query_sensor')
        self.motor_client = self.create_client(
            SetMotorSpeed, 'set_motor')

        # Control loop
        self.timer = self.create_timer(2.0, self.control_loop)
        self.latest_temp = 0.0

    def sensor_callback(self, msg):
        """Handle streaming sensor data."""
        self.latest_temp = msg.value
        self.get_logger().debug(
            f'Received {msg.sensor_name}: {msg.value:.1f}')

    def control_loop(self):
        """Main control logic every 2 seconds."""
        self.get_logger().info(
            f'Control loop: latest temp = {self.latest_temp:.1f}')

        # Decision: if temp high, slow motor
        if self.latest_temp > 30.0:
            self.command_motor('motor_0', 50.0)  # Reduce to 50 RPM
        else:
            self.command_motor('motor_0', 100.0)  # Normal speed

    def command_motor(self, motor_id, speed_rpm):
        """Send motor command via service."""
        if not self.motor_client.wait_for_service(timeout_sec=1.0):
            self.get_logger().error('Motor service unavailable')
            return

        request = SetMotorSpeed.Request()
        request.motor_id = motor_id
        request.speed_rpm = speed_rpm

        future = self.motor_client.call_async(request)
        future.add_done_callback(self.motor_response_callback)

    def motor_response_callback(self, future):
        """Handle motor command response."""
        response = future.result()
        self.get_logger().info(f'Motor response: {response.message}')

def main(args=None):
    rclpy.init(args=args)
    node = ControlNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

Motor Service Node

import rclpy
from rclpy.node import Node
from my_robot_interfaces.srv import SetMotorSpeed

class MotorNode(Node):
    def __init__(self):
        super().__init__('motor_node')

        self.service = self.create_service(
            SetMotorSpeed,
            'set_motor',
            self.motor_callback)

        self.motor_speeds = {}
        self.get_logger().info('Motor service ready')

    def motor_callback(self, request, response):
        """Execute motor commands."""
        self.motor_speeds[request.motor_id] = request.speed_rpm

        self.get_logger().info(
            f'Set {request.motor_id} to {request.speed_rpm} RPM')

        response.success = True
        response.message = f'Motor set to {request.speed_rpm} RPM'

        return response

def main(args=None):
    rclpy.init(args=args)
    node = MotorNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

Testing

Terminal 1: Sensor node

ros2 run my_first_package sensor_node

Terminal 2: Motor node

ros2 run my_first_package motor_node

Terminal 3: Control node

ros2 run my_first_package control_node

You'll see:

Sensor continuously publishes temperature
Control receives it and adjusts motor based on temperature
Motor service responds to commands

Debugging Communication Issues

Problem 1: "Service Not Available"

# Symptom: Call fails immediately
# Cause: Service not running
# Fix: Check it's launched
ros2 service list
# Should see your service

Problem 2: "Message Type Mismatch"

# Symptom: RuntimeError about message types
# Cause: Publishing wrong type to topic
msg = String()  # WRONG
msg = RobotStatus()  # CORRECT

# Fix: Verify publisher and subscriber types match

Problem 3: "Slow Response"

# Symptom: Service takes 2+ seconds
# Cause: Callback does heavy computation
def callback(self, request, response):
    response.value = expensive_calculation()  # Blocks!
    return response

# Fix: Do expensive work asynchronously
self.executor.submit(expensive_calculation, response)

Problem 4: "Dead Nodes"

# Symptom: One node crashes, system hangs
# Cause: Other nodes waiting forever
# Fix: Always use timeouts
cli.wait_for_service(timeout_sec=2.0)

# Or use try/except
try:
    response = cli.call(request)
except Exception:
    self.get_logger().error('Service failed')

Visualization Tools

# See all nodes and connections
rqt_graph

# See all topics
ros2 topic list

# See all services
ros2 service list

# Monitor single topic
ros2 topic echo /topic_name

# Monitor service call rate
ros2 service call /service_name ServiceType "{field: value}"

Design Antipatterns (What NOT to Do)

Antipattern 1: Service for Continuous Data

# BAD: Service publishing temperature every 0.1 seconds
self.client.call(GetTemperature)  # Blocks until response
self.client.call(GetTemperature)  # Blocks again

Fix: Use topic instead. Services are for infrequent calls.

Antipattern 2: Nested Services

# BAD: Service handler calls another service
def callback(self, request, response):
    # This blocks waiting for another service!
    response = self.other_client.call(other_request)  # Dangerous!
    return response

Fix: Use topics for data flow, services only for simple queries.

Antipattern 3: Synchronous Everything

# BAD: Main loop calls 5 services sequentially
self.service1.call()  # Block 100ms
self.service2.call()  # Block 100ms
self.service3.call()  # Block 100ms
# Total: 300ms for each loop iteration

Fix: Use async calls and callbacks, or restructure with topics.

Antipattern 4: Hardcoded Topic Names

# BAD: Topic name buried in code
self.create_publisher(Type, '/robot/sensor/temp', 10)
# Hard to rename or reconfigure

# GOOD: Parameterized
self.declare_parameter('sensor_topic', '/robot/sensor/temp')
topic = self.get_parameter('sensor_topic').value
self.create_publisher(Type, topic, 10)

Key Principles for Good Architectures

Clear Ownership: Each topic/service is owned by one publisher/service
Loose Coupling: Nodes don't depend on internal implementation of others
Graceful Degradation: System handles missing nodes (timeouts, fallbacks)
Observable: Use ROS 2 tools to visualize and debug
Testable: Mock out nodes, test communication in isolation

Try With AI

You have a working multi-node system. Let's improve it architecturally.

Ask your AI:

"I have sensor, control, and motor nodes communicating via topics and services. As the robot gets more complex, I want to add a planning node, a safety monitor, and a logging service. Where should each one fit? What topics/services should be added? Draw a communication diagram and explain the design rationale."

Expected outcome: AI will suggest:

Safety monitor as separate high-priority listener
Planning node subscribes to sensors, publishes motion plans
Logger service that multiple nodes call
Rationale for each choice

Challenge the design:

"What if the planner gets slow? Should the control node wait for planning or proceed with old plan?"

Expected outcome: AI will explain:

Async patterns so control doesn't block
Fallback behaviors
Timeout handling
Tradeoffs between freshness and responsiveness

Iterate together:

"Got it. Show me the code for a control loop that: (1) publishes status topic, (2) calls planning service async, (3) calls motor service only when safe, (4) has fallback if planner unavailable. Full implementation with error handling."

This demonstrates architecture design through collaboration.

Exercises

Extend the two-node system with a third "safety monitor" node that subscribes to sensor data and publishes warnings
Add a parameter service that lets you change motor speed limits without restarting
Create a launch file that starts all three nodes together
Visualize communication with rqt_graph
Implement error handling so the control node continues if motor service unavailable

Reflection

Before the capstone (Lesson 6), think about:

When would you use a topic vs service in your design?
How do you debug a system with 5+ nodes?
What breaks most often in multi-node systems?

Next chapter (Chapter 6): Building Systems

You now know all the communication patterns. Next you'll learn how to launch multiple nodes, organize large projects, and build real robot systems that scale.

The Decision Framework: Topics vs Services (Revisited)​

Architectural Pattern 1: Layered Communication (Hub and Spoke)​

Architectural Pattern 2: Distributed Pub/Sub (Many-to-Many)​

Architectural Pattern 3: Request/Response Clusters​

Real Example: Two-Node Communication​

Interface Package (Shared)​

Sensor Node​

Control Node​

Motor Service Node​

Testing​

Debugging Communication Issues​

Problem 1: "Service Not Available"​

Problem 2: "Message Type Mismatch"​

Problem 3: "Slow Response"​

Problem 4: "Dead Nodes"​

Visualization Tools​

Design Antipatterns (What NOT to Do)​

Antipattern 1: Service for Continuous Data​

Antipattern 2: Nested Services​

Antipattern 3: Synchronous Everything​

Antipattern 4: Hardcoded Topic Names​

Key Principles for Good Architectures​

Try With AI​

Exercises​

Reflection​

The Decision Framework: Topics vs Services (Revisited)

Architectural Pattern 1: Layered Communication (Hub and Spoke)

Architectural Pattern 2: Distributed Pub/Sub (Many-to-Many)

Architectural Pattern 3: Request/Response Clusters

Real Example: Two-Node Communication

Interface Package (Shared)

Sensor Node

Control Node

Motor Service Node

Testing

Debugging Communication Issues

Problem 1: "Service Not Available"

Problem 2: "Message Type Mismatch"

Problem 3: "Slow Response"

Problem 4: "Dead Nodes"

Visualization Tools

Design Antipatterns (What NOT to Do)

Antipattern 1: Service for Continuous Data

Antipattern 2: Nested Services

Antipattern 3: Synchronous Everything

Antipattern 4: Hardcoded Topic Names

Key Principles for Good Architectures

Try With AI

Exercises

Reflection