|
# WPML Support Ticket: Gutenberg Block Translation Issues
## Environment
- WordPress Theme with custom Gutenberg blocks
- WPML + WPML Page Builders extension
- Custom blocks using standard WordPress `RichText.Content` components
- `wpml-config.xml` in theme root with `` configuration
---
## Issue 1: Theme's wpml-config.xml not loaded for Gutenberg blocks
### Problem
The `wpml-gutenberg-config` option in `wp_options` remains empty even though a valid `wpml-config.xml` exists in the theme root directory. WPML reads other sections of the config (custom-fields, taxonomies) but ignores the `` section.
### Expected behavior
WPML should parse the `` section from the theme's `wpml-config.xml` and populate the `wpml-gutenberg-config` option.
### Actual behavior
The option stays empty, causing WPML to use fallback behavior (registering entire HTML blocks instead of extracting text via XPath selectors).
### Workaround implemented
We manually parse `wpml-config.xml` and write the config directly to the `wpml-gutenberg-config` option via PHP:
```php
$xml = simplexml_load_file(get_template_directory() . 'https://cdn.wpml.org/wpml-config.xml');
// Parse gutenberg-blocks and store in option
update_option('wpml-gutenberg-config', $parsedConfig);
```
---
## Issue 2: Non-breaking space (nbsp) encoding mismatch
### Problem
When WPML extracts strings from Gutenberg blocks, it adds trailing non-breaking spaces (`xC2xA0` in UTF-8). However, the actual post content may have:
- Regular spaces (0x20)
- ` ` HTML entity
- ` ` numeric entity
- No trailing space at all
This causes translation lookup to fail because the hash is calculated with the nbsp but the content doesn't contain it.
### Example
- WPML stores: `"Wichtig: Degus sind Nageprofis.xC2xA0"` (ends with nbsp)
- Post content has: `"Wichtig: Degus sind Nageprofis. "` (ends with regular space)
- Hash mismatch → Translation not applied
### Hex evidence
```
WPML string ends with: 6572207365696e2ec2a0 (sein. + nbsp)
Post content ends with: 6572207365696e2e20 (sein. + space)
```
### Workaround implemented
We generate multiple encoding variants when searching for strings to replace:
```php
$variants[] = str_replace("xC2xA0", ' ', $string); // nbsp → space
$variants[] = str_replace("xC2xA0", ' ', $string); // nbsp → entity
$variants[] = preg_replace('/xC2xA0+$/u', '', $string); // remove trailing nbsp
```
---
## Issue 3: Ampersand encoding mismatch
### Problem
Similar to nbsp, ampersands are handled inconsistently:
- WPML stores: `"Sauberkeit & Hygiene"` (decoded)
- Post content has: `"Sauberkeit & Hygiene"` (HTML encoded)
For "VISUAL" strings (containing HTML tags), WPML skips `html_entity_decode()`, but the hash is still calculated with the decoded version.
### Code reference in WPML
```php
// wpml-page-builders/classes/Integrations/Gutenberg/strings-in-block/class-html.php
// Line ~200
if ($type !== 'VISUAL') {
$string = html_entity_decode($string);
}
```
### Workaround implemented
```php
$variants[] = str_replace('&', '&', $string);
```
---
## Issue 4: Emoji handling inconsistency
### Problem
When posts contain emojis (e.g., `�� Important text...`), WPML may extract the string with the emoji, but the translated post content might not have it (if the translation was created before emojis were added).
### Workaround implemented
```php
$variants[] = preg_replace('/^[x{1F300}-x{1F9FF}]+s*/u', '', $string);
```
---
## Summary of Workarounds
We implemented a `WpmlServiceProvider` class that:
1. Manually syncs Gutenberg config from `wpml-config.xml` to `wpml-gutenberg-config` option
2. Hooks into `wpml_found_strings_in_block` to normalize encoding during string registration
3. Provides a resync tool that tries multiple encoding variants when applying translations
### Files affected
- `app/Providers/WpmlServiceProvider.php` (all workarounds)
- Admin page: Tools → WPML Resync
---
## Requested Fix
Please ensure consistent encoding handling in WPML's Gutenberg string extraction and translation lookup:
1. **Load theme's `wpml-config.xml` gutenberg-blocks section** into `wpml-gutenberg-config` option
2. **Normalize nbsp characters** - either always store as UTF-8 `xC2xA0` or always as entity, but be consistent between extraction and lookup
3. **Handle trailing whitespace consistently** - don't add trailing nbsp during extraction if it's not in the original content
4. **Apply same encoding rules** for both VISUAL and NON-VISUAL strings
---
## Test Case
1. Create a custom Gutenberg block with `RichText.Content` component
2. Add text containing `&` character (e.g., "Sauberkeit & Hygiene")
3. Configure block in `wpml-config.xml` with XPath
4. Create translation
5. Observe: Translation exists in database but is not applied to content
---
## Related Files in This Project
| File | Description |
|------|-------------|
| `app/Providers/WpmlServiceProvider.php` | All workarounds implemented |
| `wpml-config.xml` | Gutenberg block configuration |
| `docs/WPML-ENCODING-FIX.md` | Technical documentation of the fix |
|