Skip Navigation

This thread is resolved. Here is a description of the problem and solution.

Problem:
If you're experiencing issues with internal links in translated texts pointing to the original content instead of the corresponding translated content in your WooCommerce store, and the 'Translate Link Targets' tool is not resolving the issue, this summary might help.
Solution:
Firstly, ensure you are using WP All Import Pro for importing. You can address this issue by creating a function to map and replace URLs during the import process. Here's a step-by-step approach:
1. Extract internal URLs from both the original and translated content using a unique identifier.
2. Map these URLs so that each original URL corresponds to a translated URL based on the data in your Excel sheet.
3. Replace the original URLs in the translated content with the correct translated URLs during the import process.
For existing posts where links are incorrect, you might consider using a custom function to fix translated internal links. This function would scan each post for internal links, check if they are internal, and replace them with the correct translated URLs if a translation exists.
Note: These solutions involve custom coding and might require adjustments specific to your setup. If these steps seem complex or if they do not resolve your issue, we recommend consulting with a professional from WPML Contractors.

Please note that this solution might be outdated or not applicable to your specific case. We highly recommend checking related known issues at https://wpml.org/known-issues/, verifying the version of the permanent fix, and confirming that you have installed the latest versions of themes and plugins. If the problem persists, please open a new support ticket.

This is the technical support forum for WPML - the multilingual WordPress plugin.

Everyone can read, but only WPML clients can post here. WPML team is replying on the forum 6 days per week, 22 hours per day.

This topic contains 12 replies, has 2 voices.

Last updated by Bruno Kos 1 month ago.

Assisted by: Bruno Kos.

Author Posts
September 21, 2024 at 7:17 am #16204465

piaP-6

Background of the issue:
I have a WooCommerce store with thousands of products. The products are imported regularly in different languages from an Excel sheet. The text is translated without taking note of links in the text, which means that the translated text is linking to the original content. I need help fixing that, so the customer can use the internal links in the texts and stay at the same language version.

Symptoms:
The translated text is linking to the original content instead of the corresponding translated content. Running 'Translate Link Targets' does not seem to make any difference.

Questions:
How can I ensure that internal links in the translated text point to the corresponding translated content?
Why is the 'Translate Link Targets' tool not fixing the issue?

September 24, 2024 at 5:46 am #16212381

Bruno Kos
Supporter

Languages: English (English ) German (Deutsch ) French (Français )

Timezone: Europe/Zagreb (GMT+01:00)

Hi,

Translate Link Targets would not work here as there would be no translation packages for the import, given that these are in fact manual translations and don't go through translation editor.

Are you using WP All Import Pro for this? Because if so, perhaps something like this would work:

- extract internal URLs from both the original and translated content using a unique identifier.
- Map URLs: A function will map the original URLs to their corresponding translated URLs based on the Excel sheet data.
- Replace URLs: During the import, replace the original URLs in the translated content with the correct translated URLs.

hidden link could be perhaps used for this, but this would be All Import question.

September 24, 2024 at 8:48 am #16213158

piaP-6

Hi Bruno,

Thanks for your reply.

Yes, I am using WP All Import Pro to import the material. Maybe the linking in the Excel sheet should be based on Post ID / Term ID in the original language, and then inserting the correct links on import. That should work for future uploads.

However, we have around 3,000 posts in each language already, which should be fixed. Could instead Sticky Links help me in that?

Best Regards,

September 24, 2024 at 9:59 am #16213622

Bruno Kos
Supporter

Languages: English (English ) German (Deutsch ) French (Français )

Timezone: Europe/Zagreb (GMT+01:00)

You could scan the content with sticky urls, but given that URLs are already wrong it may not work, it would only replace what is there already and not replace it with original URLs.

For future imports you could try something like this:

function create_url_mapping_from_content($original_data, $translated_data) {
    $url_mapping = [];
    foreach ($original_data as $unique_id => $original_entry) {
        if (isset($translated_data[$unique_id])) {
            preg_match_all('/<a href="([^"]+)"/', $original_entry['content'], $original_matches);
            preg_match_all('/<a href="([^"]+)"/', $translated_data[$unique_id]['content'], $translated_matches);
            if (count($original_matches[1]) === count($translated_matches[1])) {
                foreach ($original_matches[1] as $index => $original_url) {
                    $translated_url = $translated_matches[1][$index];
                    $url_mapping[$original_url] = $translated_url;
                }
            }
        }
    }
    return $url_mapping;
}

function replace_urls_in_translated_content($content, $url_mapping) {
    foreach ($url_mapping as $original_url => $translated_url) {
        $content = str_replace($original_url, $translated_url, $content);
    }
    return $content;
}

add_action('pmxi_saved_post', 'update_internal_urls_in_imported_content', 10, 3);
function update_internal_urls_in_imported_content($post_id, $xml, $is_update) {
    if (get_post_type($post_id) !== 'product') return;
    $content = get_post_field('post_content', $post_id);
    $original_data = get_original_data_from_excel();
    $translated_data = get_translated_data_from_excel();
    $url_mapping = create_url_mapping_from_content($original_data, $translated_data);
    $updated_content = replace_urls_in_translated_content($content, $url_mapping);
    wp_update_post(['ID' => $post_id, 'post_content' => $updated_content]);
}

While for existing products, in the event that Sticky fails, you could try something like this:

function fix_translated_internal_links() {
    // Get all posts (this could be changed to specific post types, like 'product')
    $args = [
        'post_type' => 'product', // Change this if you want to target other post types
        'posts_per_page' => -1,    // Get all posts
        'post_status' => 'publish' // Only published posts
    ];

    $posts = get_posts($args);

    foreach ($posts as $post) {
        $content = $post->post_content;

        // Find all internal links using a regex to match <a href="...">
        preg_match_all('/<a href="([^"]+)"/', $content, $matches);

        if (!empty($matches[1])) {
            foreach ($matches[1] as $original_url) {
                // Check if it's an internal link
                if (strpos($original_url, home_url()) !== false) {
                    // Get the post ID from the original URL
                    $post_id = url_to_postid($original_url);

                    if ($post_id) {
                        // Get the current post language
                        $current_language = apply_filters('wpml_element_language_code', null, ['element_id' => $post->ID, 'element_type' => 'post_' . $post->post_type]);

                        // Get the translated post ID for the correct language
                        $translated_post_id = apply_filters('wpml_object_id', $post_id, get_post_type($post_id), false, $current_language);

                        if ($translated_post_id) {
                            // Get the translated URL
                            $translated_url = get_permalink($translated_post_id);

                            // Replace the original URL with the translated URL in the content
                            $content = str_replace($original_url, $translated_url, $content);
                        }
                    }
                }
            }

            // If the content has been updated, update the post
            if ($content !== $post->post_content) {
                wp_update_post([
                    'ID' => $post->ID,
                    'post_content' => $content
                ]);
            }
        }
    }
}

Note that this is not tested and falls outside of the WPML scope, but you may want to connect with https://wpml.org/contractors/ for a permanent solution.

September 24, 2024 at 11:13 am #16214167

piaP-6

Hi Bruno,

Thanks for your reply. I will review and test the code before running it. But thanks a lot for a very nice start!

Regarding Sticky Links, it doesn't seem to fix the issue. It looks like that it is not connecting links between languages, but just looking for the links in each language and converting them to another format. So for a link to a Danish category in a Norwegian translated post, the link is converted to a sticky link to the Danish category (danish slug version) instead of the Norwegian one. I hoped, there was a check for slug to be in the same language as the post.

Can you confirm this behavior? And should I just stop using sticky links at all?

September 24, 2024 at 12:34 pm #16214813

Bruno Kos
Supporter

Languages: English (English ) German (Deutsch ) French (Français )

Timezone: Europe/Zagreb (GMT+01:00)

"but just looking for the links in each language and converting them to another format."

That's right, it is how it works, it doesn't do the connection so you can remove it.

September 24, 2024 at 1:22 pm #16215169

piaP-6

Hi Bruno,

Thanks for your reply. I have investigated further, and it looks actually like there is something in Sticky Links that does what I was hoping. This product (hidden link) is uploaded in Norwegian but at upload it contained a link to a Danish product category (hidden link). However, in the backend, the value of the href attribute is now "/?product_cat=stagelys" (the Danish slug), but in front end the URL is shown with the Norwegian domain and slug (hidden link). Does that sound reasonable?

That is actually pretty much what I wanted. However, it does not work for all links. E.g. on the linked product, there is a link to a product search page which is Danish and not translated to the Norwegian version.

One solution could be some kind of web crawler where I could verify that pages on the Norwegian domain were linking to Norwegian URLs. Are you aware of such system?

September 25, 2024 at 6:50 am #16218193

Bruno Kos
Supporter

Languages: English (English ) German (Deutsch ) French (Français )

Timezone: Europe/Zagreb (GMT+01:00)

I haven't tested or used any of these tools, so I can't comment on their functionality, but perhaps something like hidden link or hidden link would work.

September 30, 2024 at 7:44 am #16234667

piaP-6

Hi Bruno,

Thanks for your suggestions. I have considered Screaming Frog myself. However, I find the pricing too high for this usage.

I will look into Xenu's Link Sleuth - looks interesting. Also, I have tried making a script in Python (using Scrapy) that can crawl the website and identify incorrect language linking.

I will turn back later this week (or in the beginning of next week) with my findings and the code for the script (if it works...) for other to use.

I hope you can keep the thread open that long.

Best Regards,
Peter

September 30, 2024 at 9:38 am #16235201

Bruno Kos
Supporter

Languages: English (English ) German (Deutsch ) French (Français )

Timezone: Europe/Zagreb (GMT+01:00)

Tickets are automatically closed after approximately 10 days, so you should be able to respond within that period. If needed, you can also open a new ticket at any time.

Feel free to share any code that might be helpful for our other clients. However, please note that we won’t be able to provide support for custom code, as it falls outside our support scope.

October 10, 2024 at 5:35 am #16272878

Bruno Kos
Supporter

Languages: English (English ) German (Deutsch ) French (Français )

Timezone: Europe/Zagreb (GMT+01:00)

As requested, I am reopening the ticket, let me know if need further help with this.

October 10, 2024 at 7:54 am #16273216

piaP-6

I just wanted to share my findings. Importing posts with URL's in different languages is complicated, and we have uploaded thousands of links that are pointing to other language versions than intended.

Therefore, I have been scraping whole website for links pointing to other domains using Python and Scrapy (hidden link). The script I used was this:

import scrapy
from urllib.parse import urlparse

class LinksSpider(scrapy.Spider):
    name = "links_spider"
    allowed_domains = ['no.example.com']
    start_urls = ['<em><u>hidden link</u></em>;

    # List of keywords to exclude
    excluded_keywords = ["add-to-cart=", "remove_item=", "removed_item="]

    def parse(self, response):
        base_domain = urlparse(response.url).netloc

        # Extract all links from the page
        for link in response.css('a'):
            href = link.css('::attr(href)').get()
            full_url = response.urljoin(href)

            # Skip URLs containing excluded keywords
            if any(keyword in full_url for keyword in self.excluded_keywords):
                continue  # Skip this URL and don't yield a request

            # Check if the a tag has the class wpml-ls-link or nav-top-link
            link_classes = link.css('::attr(class)').get()
            parent_classes = link.xpath('parent::*').css('::attr(class)').get()

            # Determine if the <a> tag or its parent has the desired classes
            if (parent_classes and 'wpml-ls-item' in parent_classes):
                link_type = "external, OK"
            else:
                # Classify link as internal or external
                if urlparse(full_url).netloc == base_domain:
                    link_type = "internal"
                    # Follow internal links to further crawl
                    yield scrapy.Request(full_url, callback=self.parse)
                else:
                    link_type = "external"
            
            links_inprogress = len(self.crawler.engine.slot.inprogress)
            links_scheduler = len(self.crawler.engine.slot.scheduler)
            self.logger.info(f"Remaining links inprogress: {links_inprogress}")
            self.logger.info(f"Remaining links in scheduler: {links_scheduler}")
            
            # Yield each link as a separate result
            yield {
                'page_url': response.url,  # The page that was crawled
                'link_url': full_url,      # The link on that page
                'link_type': link_type     # Whether it's internal, external, or external with special class
            }

Some links were shown in several products and therefore I made the following PHP script for search and replace in post content and excerpt in a specific language

<?php
// Add this to your theme's functions.php or a custom plugin
require('wp-load.php');

// Query products
$args = array(
	'post_type' => 'product',
	'posts_per_page' => -1,
);

$products = get_posts($args);

$i = 0;

// Set the string you want to search and replace
$search_url = '<em><u>hidden link</u></em>';
$replace_url = '<em><u>hidden link</u></em>';

$search_string = 'href="' . $search_url . '"';
$replace_string = 'href="' . $replace_url . '"';

// Set the WPML language to search in (use WPML language code)
$wpml_language = 'no'; // Change this to the language you want to filter by (e.g., 'en', 'da', etc.)

// Safe mode: If true, no actual replacements will be made, only changes will be displayed
$safe_mode = true;

echo 'safe_mode = ';
var_dump($safe_mode);

echo "<table border='1'>
	<tr>
		<th>Product ID</th>
		<th>Product Title</th>
		<th>Original content</th>
		<th>Orignal short description</th>
		<th>New content</th>
		<th>New short description</th>
	</tr>";

// Process each product
foreach ($products as $product) {
	$post_id = $product->ID;
	$language_details = apply_filters('wpml_post_language_details', NULL, $post_id);
	
	if ($language_details['language_code'] !== $wpml_language){
		continue;
	}
	
    $original_content = $product->post_content;
	$original_short_description = $product->post_excerpt;

    // Check if the content or short description contains the search string
    if (str_contains($original_content, $search_string) || str_contains($original_short_description, $search_string)) {
		echo "<tr>";
			echo "<td>Product ID: " . $product->ID . "</td>";
			echo "<td>" . get_the_title($product->ID) . "</td>";

			// Display the original content and short description
			echo "<td>" . $original_content . "</td>";
			echo "<td>" . $original_short_description . "</td>";

			// Generate new content and short description with replacements
			$new_content = str_replace($search_string, $replace_string, $original_content);
			$new_short_description = str_replace($search_string, $replace_string, $original_short_description);
			
			// In safe mode, display the changes without applying them
			echo "<td>" . $new_content . "</td>";
			echo "<td>" . $new_short_description . "</td>";
		echo "</tr>";

        if ($safe_mode) {
			echo "<tr>
					<td>Nothing changed!</td>
					<td>Nothing changed!</td>
					<td>Nothing changed!</td>
					<td>Nothing changed!</td>
					<td>Nothing changed!</td>
					<td>Nothing changed!</td>
				</tr>";
        } else {
            // Apply the changes if safe mode is off
            wp_update_post(array(
                'ID'           => $product->ID,
                'post_content' => $new_content,
                'post_excerpt' => $new_short_description, // Update short description via post_excerpt
            ));
			
			echo "<tr>
					<td>Updated!</td>
					<td>Updated!</td>
					<td>Updated!</td>
					<td>Updated!</td>
					<td>Updated!</td>
					<td>Updated!</td>
				</tr>";
        }
    }
}

echo '</table>';

echo "Script completed.";

In that way, I could fix the thousands of incorrect links in maybe 15-20 hours (including development of the scripts).

October 10, 2024 at 9:33 am #16273844

Bruno Kos
Supporter

Languages: English (English ) German (Deutsch ) French (Français )

Timezone: Europe/Zagreb (GMT+01:00)

Thank you for sharing this, I styled the answer a bit so that the code part is prominent.

October 10, 2024 at 12:13 pm #16274945

piaP-6

You should exclude line 52 in the code styling since that line is not code, and it is separating the two scripts.